A private repo for our own Python packages

Google Artifact Registry is a service from Google to store an image, a module in Python, NodeJS, and much more.

Posted Mar 25, 2023

6 min read

Functions are very common in programming schemes. They are useful for repeated operations and readability. Then, what if we have to add the same functions into our programs for many?

Imagine we have many projects in our hands, and some of them need a same function. We might end up copying that same function into each. That will be a redundancy problem which mean we probably have to maintain the function multiple times based on number of copies.

Talk a basic one

Let’s say we have files like this.

We have a function to sum all integers in a list here in “adder.py”.

We can import it into a main program “main.py” like this.

Run and the output should be like the following.

But, how could we share this adder function for the team?

Introducing…

Google Artifact Registry

Google Artifact Registry is a service from Google to store an image, a package in Python, NodeJS, and much more (see full list here.)

We now use it to store our functions. Here is the list of our missions today.

Build a package and upload to Google Artifact Registry repository
Prepare setting to access the repo
Install the package
Test if we can import the package successfully

Let’s go!

1. Build and upload

1.1. Prepare a repository in Google Artifact Registry

Make sure the API is enabled, otherwise enable it.

Create a repo. Feel free to use the web console, but this time we use gcloud command.

  
gcloud artifacts repositories create {REPO-NAME} \
  --repository-format=python \
  --location={LOCATION}

Verify if the repo is ready

1.2. Prepare a package

Install libraries.

Setup files for packaging.

“LICENSE”
“README.md”
“pyproject.toml”
“src/” files.
Should have a folder with same name as project name in pyproject.toml at line #6 to avoid naming mistakes.
“test” files.
Can be empty at this moment.

1.3. Build the package

python3 -m build

As a result, we should see the folder “dist” in the same directory as “src”.

1.4. Upload to Google Artifact Registry

Now it’s time to upload our package to the repo on Google Artifact Registry.

  
twine upload \
  --repository-url https://{LOCATION}-python.pkg.dev/{PROJECT-ID}/{REPO-NAME}/ dist/*

1.5. Verify the package

Web UI

List packages

  
  gcloud artifacts packages list \
    --repository={REPO-NAME} \
    --location={LOCATION}

List package versions

  
  gcloud artifacts versions list \
    --package={PACKAGE-NAME} \
    --repository={REPO-NAME} \
    --location={LOCATION}

2. Access the repo

Now we already have the first package in our Google Artifact Registry repo. So what should we do next to access and grab it?

We need 3 things

“.pypirc”
“pip.conf”
“requirements.txt” with our index URLs

2.1. Print setting from the repo

Run the command

  
gcloud artifacts print-settings python \
  --project={PROJECT-ID} \
  --repository={REPO-NAME} \
  --location={LOCATION}

And we should get the similar output.

2.2. Copy a part of output to “.pypirc”

The “.pypirc” would be like this.

2.3. Copy another part to “pip.conf”

Like this one.

2.4. Add a package name for “requirements.txt”

-i means the flag --index-url. We need this to tell pip to find this package name in that URL as well.

2.5. Final structure

3. Install the packages

At this step, we should install the packages we developed. Just using the command.

pip install -r requirements.txt

See we finally got the package in our environment now. Verify with the command.

  
pip list | grep {PACKAGE-NAME}

When we go see the folders inside our “virtualenv”, we would find our files there.

4. Test the package

The last step is to ensure we can import the package properly and successfully. Now we can import from the folder name like this.

And run it with confidence.

YEAH!! WE DID IT!!

Integrate with Docker image

Let’s move to next topic. Basically Docker image is a fundamental tool for development. We shall apply this package with the image as follows.

1. Prepare structure

Let’s say we have files in this structure. don’t forget “.pypirc”, “pip.conf”, and “requirements.txt”

2. Understand “OAuth 2.0 token” from GCP

When we work with a Docker image, need to know that we can’t directly access GCP APIs unlike running a gcloud command on our laptop. This means, our one big question is how can we authenticate to access the Google Artifact Registry repo.

The answer is, to authenticate through “OAuth 2.0 Token”.

In brief, “OAuth 2.0 token” is a long long string used for authenticating to a system, in this case is Google Cloud Platform. Follow the link below to read more.

3. Apply OAuth 2.0 token

We will generate the OAuth 2.0 token and add it into the “requirements.txt” in order to authorized access and read then download the package.

This is what “requirements.txt” in OAuth 2.0 token version looks like.

At the token part, that ya29.abc123, we need to generate it with the command.

gcloud auth print-access-token

Learn more about this command here.

One thing to remember is storing credentials in Git is bad practice.

So what should we do? We will create the “requirements.txt” with OAuth 2.0 token from the raw version inside the image and delete that OAuth 2.0 token version as soon as the installation is completed.

4. Define Dockerfile

As mentioned above, now we can create a Dockerfile

Get a token as a parameter by ARG TOKEN at line #4.
Normally “requirements.txt” has -i as https://{LOCATION}..., so we need to substitute to another with awk (using sed before yet I got many errors).
Once substitution completed, save result into another requirements.txt, name it tokenized_requirements.txt
pip install from “tokenized_requirements.txt”.
Delete “tokenized_requirements.txt” not to leak the credentials
Put CMD at the end to run the command when an image container is run.

5. Build an image and test run

Now build an image with this command

  
docker build \
  --no-cache \
  --progress=plain \
  --build-arg TOKEN=$(gcloud auth print-access-token) \
  -t entry-point:latest .

--no-cache means building this image without any cache from previous builds.
--progress=plain means printing out the build progress in plain format.
variable TOKEN can be parsed via flag --build-arg.
Name it “entry-point” by flag -t.

Once the image is there, we can run to see the result.

docker run -it --name testpy entry-point

And yes, it’s correct.

Bottomline diagram

I write the diagram to summarize all process above.

Repo

All materials in this blog also is at the github repo.

Bonus track

If using Google Cloud Composer, we can setup to install the package from Google Artifact Registry by following this link.

References

devops, integration

This post is licensed under CC BY 4.0 by the author.