Cloud / DevOps / Kubernetes

Kubernetes Networking Deep Dive

April 18, 20268 min read

CloudDevOpsKubernetes

Understanding pod-to-pod communication, ingress controllers, and service mesh patterns in production clusters.

Kubernetes networking is one of those topics that seems simple on the surface but hides a lot of nuance. Every pod gets its own IP address, every service gets a stable DNS name, and traffic just... flows. Until it doesn't.

The Flat Network Model

Kubernetes mandates that all pods can communicate with each other without NAT. This is the "flat network" model and it's one of the most important design decisions in the entire system. It means that your application code doesn't need to care about the underlying network topology.

In practice, this is implemented by CNI (Container Network Interface) plugins like Calico, Cilium, or Flannel. Each takes a different approach:

Calico uses BGP to distribute routes across nodes
Cilium leverages eBPF for high-performance, kernel-level packet processing
Flannel creates a simple overlay network using VXLAN

Service Discovery and DNS

Every Kubernetes service gets a DNS entry in the form <service>.<namespace>.svc.cluster.local. CoreDNS handles the resolution, and it's surprisingly fast — most queries resolve in under a millisecond.

apiVersion: v1
kind: Service
metadata:
  name: my-api
  namespace: production
spec:
  selector:
    app: my-api
  ports:
    - port: 80
      targetPort: 8080

This service is reachable at my-api.production.svc.cluster.local from anywhere in the cluster.

Ingress Controllers

Getting traffic into the cluster is where ingress controllers come in. NGINX Ingress Controller is the most common, but Traefik and Envoy-based solutions like Contour are gaining ground.

The key insight is that an ingress controller is just a reverse proxy that watches the Kubernetes API for Ingress resources and reconfigures itself accordingly. It's a beautiful pattern — declarative infrastructure at its finest.

Service Mesh: Do You Need One?

Service meshes like Istio and Linkerd add a sidecar proxy to every pod, giving you mTLS, traffic splitting, retries, and observability for free. The trade-off is complexity and resource overhead.

My rule of thumb: if you have fewer than 20 services, you probably don't need a service mesh. If you have more than 50, you probably do. In between, it depends on your team's maturity and your observability requirements.

What I've Learned

After running Kubernetes in production for three years, the networking layer is where most subtle bugs hide. Misconfigured network policies, DNS caching issues, and conntrack table exhaustion have all bitten me at least once. The best defense is understanding the fundamentals deeply — once you know how packets actually flow through the system, debugging becomes much more tractable.