How I ended up debugging Google Cloud’s CLI to get my Kubernetes application to run
Here’s the story on how I debugged gcloud
CLI.
I was working on one of our GKE instances and I attached a NetworkPolicy
to a newly created pod. NetworkPolicy
is a Kubernetes resource that allows you to control traffic flow at the IP address or port level. The pod ran an application that required access to a GCS bucket. That’s it! Other than that, there shouldn’t have been any other egress traffic.
To make sure the pod could only access GCS, I set up a private service connect endpoint on IP address: 10.10.40.1
and allowed egress traffic.
Here’s how the NetworkPolicy appeared:
This policy allows:
- Egress traffic to the private connect service IP address
- Egress traffic to
kube-dns
for DNS queries
Everything seemed to be set up correctly, but my app kept on failing to connect to GCS. To make my debugging easier, I installed gcloud & gsutil on the pod (I had to disable the network policy first in order to get internet connection) and tried to list the items in the GCS bucket:
gsutil ls gs://my-bucket
Unsurprisingly, that didn’t work either. This was the error I got:
ServiceException: 401 Anonymous caller does not have storage.objects.list access to the Google Cloud Storage bucket.
That was interesting. Why did gcloud say I’m an anonymous caller? The pod should have a service account attached to it, inherited from the node.
So I ran:
gcloud auth list
Indeed, the list was empty:
No credentialed accounts
I figured the pod probably got its service account from somewhere (?) and the NetworkPolicy
was blocking the request. I was a few hours into the process already, and was about to give up for the day, but then I figured that the answer was probably in the gcloud source code!
Gsutil is an open source project written in Python: https://github.com/GoogleCloudPlatform/gsutil.
While my Python skills aren’t that great (to say the least), I noticed that gsutil uses google-auth-library-python
Navigating through the project, I found the following method under google/auth/compute_engine/_metadata.py:
Great! So gsutil
pulls the service account info from the GKE metadata server.
On the top of the same file, I noticed that the domain of the metadata server:
metadata.google.internal
Which according to this https://cloud.google.com/kubernetes-engine/docs/concepts/workload-identity#metadata_server
Resolves to 169.254.169.254:80
.
To allow the pod access to the metadata server, I added an egress rule to my NetworkPolicy
:
Ran:
gcloud auth list
Voila! It managed to find the service account:
Credentialed AccountsACTIVE ACCOUNT* My-user@my-project.iam.gserviceaccount.com
gsutil ls gs://my-bucket
worked as well.
And my app was finally able to connect to GCS
.
This took me a few hours of debugging, and since I couldn’t find anything online about it, I thought I’d share it with others in the hope that it might help!