How I ended up debugging Google Cloud’s CLI to get my Kubernetes application to run

Amir Bilu
3 min readNov 9, 2022

Here’s the story on how I debugged gcloud CLI.

I was working on one of our GKE instances and I attached a NetworkPolicy to a newly created pod. NetworkPolicy is a Kubernetes resource that allows you to control traffic flow at the IP address or port level. The pod ran an application that required access to a GCS bucket. That’s it! Other than that, there shouldn’t have been any other egress traffic.

To make sure the pod could only access GCS, I set up a private service connect endpoint on IP address: 10.10.40.1 and allowed egress traffic.

Here’s how the NetworkPolicy appeared:

This policy allows:

  • Egress traffic to the private connect service IP address
  • Egress traffic to kube-dns for DNS queries

Everything seemed to be set up correctly, but my app kept on failing to connect to GCS. To make my debugging easier, I installed gcloud & gsutil on the pod (I had to disable the network policy first in order to get internet connection) and tried to list the items in the GCS bucket:

gsutil ls gs://my-bucket

Unsurprisingly, that didn’t work either. This was the error I got:

ServiceException: 401 Anonymous caller does not have storage.objects.list access to the Google Cloud Storage bucket.

That was interesting. Why did gcloud say I’m an anonymous caller? The pod should have a service account attached to it, inherited from the node.

So I ran:

gcloud auth list

Indeed, the list was empty:

No credentialed accounts

I figured the pod probably got its service account from somewhere (?) and the NetworkPolicy was blocking the request. I was a few hours into the process already, and was about to give up for the day, but then I figured that the answer was probably in the gcloud source code!

Gsutil is an open source project written in Python: https://github.com/GoogleCloudPlatform/gsutil.

While my Python skills aren’t that great (to say the least), I noticed that gsutil uses google-auth-library-python

Navigating through the project, I found the following method under google/auth/compute_engine/_metadata.py:

(https://github.com/googleapis/google-auth-library-python/blob/d15092ff8b66b3039641d482a0debafde4ba0077/google/auth/compute_engine/_metadata.py#L208)

Great! So gsutil pulls the service account info from the GKE metadata server.

On the top of the same file, I noticed that the domain of the metadata server:

metadata.google.internal

Which according to this https://cloud.google.com/kubernetes-engine/docs/concepts/workload-identity#metadata_server

Resolves to 169.254.169.254:80.

To allow the pod access to the metadata server, I added an egress rule to my NetworkPolicy:

Ran:

gcloud auth list

Voila! It managed to find the service account:

Credentialed AccountsACTIVE ACCOUNT* My-user@my-project.iam.gserviceaccount.com

gsutil ls gs://my-bucket worked as well.

And my app was finally able to connect to GCS.

This took me a few hours of debugging, and since I couldn’t find anything online about it, I thought I’d share it with others in the hope that it might help!

--

--

Amir Bilu

Senior Software Engineer @Tabnine, vim lover, handstand expert, and using arch btw