Making a choice on a CNI (Container Network Interface, necessary for pod to pod communication) to use in production is not always easy and replacing a CNI after the fact is also not an easy task. Weave and Calico are one of the most popular CNIs out there and I have been lucky to run both of them in production, in this post I’ll attempt to provide a non-bias review of both CNI implementations.
Installing Weave is pretty straightforward, from the official docs:
Installing Calico is more involved since it requires an etcd v2
cluster to function, you have to make a decision if you intend on using the
same cluster you use for Kubernetes or you want to set up an entirely different
etcd cluster for Calico. Once you download the calico
you can modify the
etcd_endpoints as described in the official
kubectl apply calico_config to install Calico.
Weave operates at layer 2 of the OSI layer and uses the VXLAN protocol to overlay a layer 2 network on an existing network, Weave uses the Open vSwitch kernel module to program the kernel’s FDB (Forwarding Database). Weave only requires port 6783 (TCP and UDP) and port 6784 (UDP) open on the nodes that need to participate in the overlay network. Pods on the overlay network can then transparently communicate with each other like they were all plugged into the same switch.
VXLAN is a relatively new protocol; the VXLAN RFC (RFC 7348) was published in August 2014.
Calico operates at layer 3 of the OSI layer, it uses the BGP protocol to share
route information among the nodes participating in the network with each of
the node acting as a gateway to the pods running on them. Calico requires port
179 (TCP) open. Calico makes each of the node proxy_arp to the pods running on
each of them and installs a default gateway
169.254.1.1 on each of the pods,
Inspecting the neighbors known to the pod reveals just one neighbor as below:
That neighbor is the other end of the veth pair on the node, as shown below:
The BGP Protocol has been around for a while, the earliest RFC (RFC 1654) for BGP was written in July 1994. The BGP protocol powers a significant portion of the internet.
IPAM (IP Address Management)
Weave uses a CRDT to fairly distribute IP addresses among nodes, it is well documented here, so I will not rehash it.
Calico assigns a
/26 IP address block to each node and stores the assignment
in etcd, the IP blocks are bound to the nodename in etcd. When using an ASG one
needs to run
calicoctl delete node <nodeName> when an instance is being
deleted or come up with tricks to ensure the nodename is consistent across
recreation, otherwise the entire pod cidr can be easily depleted. Weave had a
similar problem but it has been fixed in this
Assigning a UPDATE Calico
does not fragment IP address allocation, when the entire CIDR is exhausted and
there is no
lead to IP address fragmentation where some nodes still have free IP addresses
but other nodes completely run out of IP address blocks
/26 to assign, it will attempt to steal
/32 addresses from the
/26 blocks that have free IP address slots.
As long as the nodes participating in the cluster can reach each other on port 6783 (UDP and TCP) and port 6784 (UDP) the overlay will just work.
Calico works at Layer 3 and sets up each of the nodes as gateway to the pods running on them. In standard networking a host requires layer 2 connectivity to the gateway it configures for any route, due to this limitation BGP alone will not work, however Calico has support for IPIP protocol for traffic crossing layer 2 boundaries. For IPIP to work you need to allow the traffic through your firewall.
Debugging packet traversal with tools like
traceroute does not reflect nodes
the packet passed through before getting to the final pod, since the pod
assumes direct connectivity. To introspect information programmed in the kernel
with Open vSwitch you will have to use tools like
odp. Debugging an overlay network
requires a different thought process from what most linux/network
administrators are used to since you have to be cognizant of the fact that
layer 2 becomes layer 3 and layer 3 becomes layer 2 when a packet goes from a
pod on one node to a pod on another node. Weave programs iptables only when
there are changes in the cluster, so you can modify the rules for debugging
purposes and be guaranteed they will stay the same if nothing changes in the
cluster with respect to the node while you are debugging.
Debugging packet traversal with tools like
traceroute should indicate the
node the packet passed through before getting to the destination pod.
Introspecting routes installed can be done using the iproute2 utility e.g:
ro sh proto bird table all. Calico continuously attempts to keep the state of
iptables synchronized with its assumed internal state (a similar behavior to
kube-proxy) which could be frustrating when you are attempting to debug and it
installs its rules first so you can’t easily log Information about the
iptables chains a packet traverses in order to debug iptables related problems.
Weave supports IPsec encryption out of the box while Calico does not support any form of encryption.
They both support ingress
and egress Network Policy (Weave only supports
ingress and has an open
issue on egress). Calico has
support for both egress and ingress and is the pioneer.
By default Calico and Weave are both full meshes so they both begin to degrade once the cluster reaches a certain size (about 100 nodes). They both have support for reducing the full mesh problem, Weave via multi-hop switching and Calico via route reflection.