This page looks best with JavaScript enabled

【Kubernetes】 Calico Network Plugin Detailed Explanation

 ·  ☕ 8 min read  ·  👻 Ganfeng · 👀... views

Calico is a pure three-layer data center network solution that seamlessly integrates with IaaS cloud architectures like OpenStack, providing controlled IP communication between VMs, containers, and bare metal. Why is it called pure three-layer? Because all data packets are routed to the corresponding hosts and containers through BGP protocol synchronization on all machines or data centers.

Simply put, Calico creates a bunch of veth pairs on the host, one end of which is on the host and the other end in the container’s network namespace. Then several routes are set up in both the container and host to complete network interconnection.

1.

Unveiling the Calico Network Model

Next, we will use specific examples to help everyone understand the communication principles of Calico network.

We will create a new busybox to learn about Calico’s network model.

The yaml file for the busybox is as follows:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
apiVersion: v1
kind: Pod
metadata:
  name: busybox
  namespace: default
spec:
  nodeSelector:
    kubernetes.io/hostname: node1
  containers:
  - name: busybox
    image: busybox:1.28.4
    command:
      - sleep
      - "3600"
    imagePullPolicy: IfNotPresent
    resources:
      limits:
        cpu: 100m
        memory: 100Mi
      requests:
        cpu: 100m
        memory: 100Mi
  restartPolicy: Always

# create
kubectl apply -f busybox.yaml

Choose any node in the k8s cluster as an experimental node, enter the container POD, and check the IP address of container A.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
[root@master ~]# kubectl get pods -o wide
NAME                             READY   STATUS    RESTARTS   AGE     IP                NODE     NOMINATED NODE   READINESS GATES
busybox                          1/1     Running   0          2m26s   192.168.166.132   node1    <none>           <none>
# Discover that the pod is running on Node1
# SSH into Node1
# Use docker exec -it busy-box sh or kubectl exec -it busybox -- sh
# Check IP information
$ ip a
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue qlen 1000
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8 scope host lo
       valid_lft forever preferred_lft forever
2: tunl0@NONE: <NOARP> mtu 1480 qdisc noop qlen 1000
    link/ipip 0.0.0.0 brd 0.0.0.0
4: eth0@if37: <BROADCAST,MULTICAST,UP,LOWER_UP,M-DOWN> mtu 8980 qdisc noqueue 
    link/ether fe:5f:a9:07:20:05 brd ff:ff:ff:ff:ff:ff
    inet 192.168.166.132/32 brd 192.168.166.132 scope global eth0
       valid_lft forever preferred_lft forever

Here, the container is obtaining a /32 host address, which means that Container A is being treated as a single-point local network. Let’s take a look at the default route for Container A.

1
2
3
$ ip route
default via 169.254.1.1 dev eth0 
169.254.1.1 dev eth0 scope link

Now here’s the problem. The routing table tells us that 169.254.1.1 is the container’s default gateway, but we can’t find any network card that corresponds to this IP address. What’s going on?

Don’t panic, let’s recall that when a packet’s destination address is not local, it queries the routing table. After finding the gateway from the routing table, it first uses ARP to get the gateway’s MAC address. Then it changes the destination MAC address in the outgoing network packet to the gateway’s MAC address, but the gateway’s IP address doesn’t appear in the network packet header. In other words, nobody cares about the actual IP address as long as the corresponding MAC address is found and can respond to ARP.

With this in mind, we can continue by using the ip neigh command to check the local ARP cache.

1
2
$ ip neigh
169.254.1.1 dev eth0 lladdr ee:ee:ee:ee:ee:ee ref 1 used 0/0/0 probes 4 REACHABLE

The MAC address should have been hard-coded into Calico and can respond to ARP requests. But how exactly was it implemented?

Let’s first recall the normal situation. The kernel sends an ARP request to ask who owns the IP address 169.254.1.1 in the entire Layer 2 network, and the device that owns this IP address returns its MAC address to the requester. But now the situation is quite embarrassing. Neither the container nor the host has this IP address, and even the port calicba2f87f6bb on the host has a useless MAC address of ee:ee:ee:ee:ee:ee. According to reason, the container and the host network cannot communicate at all. So how did Calico achieve this?

I’ll cut to the chase here. In fact, Calico leverages the proxy ARP feature of the network card. Proxy ARP is a variant of the ARP protocol. When an ARP request target crosses a network segment, the gateway device receiving the ARP request will return its own MAC address to the requester. This is called proxy ARP. For example:

In the above figure, when the computer sends an ARP request to obtain the MAC address of server 8.8.8.8, the router (gateway) receiving the request will judge that the target 8.8.8.8 does not belong to the local network segment (that is, crosses the network segment). At this time, it returns its own interface MAC address to the PC. When the computer subsequently accesses the server, the target MAC is directly encapsulated as MAC254.

Now we know that Calico essentially leverages proxy ARP to tell a “white lie”. Next, let’s confirm this by checking the host’s network card information and routing information.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
$ ip addr
...
37: cali12d4a061371@if4: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 8980 qdisc noqueue state UP group default 
    link/ether ee:ee:ee:ee:ee:ee brd ff:ff:ff:ff:ff:ff link-netnsid 1
    inet6 fe80::ecee:eeff:feee:eeee/64 scope link 
       valid_lft forever preferred_lft forever
...

$ ip route
...
192.168.166.132 dev cali12d4a061371 scope link
...

Check if proxy ARP is enabled:

1
2
$ cat /proc/sys/net/ipv4/conf/cali12d4a061371/proxy_arp
1

If you are still not confident, you can use tcpdump to capture and verify the packets.

1
2
3
4
5
6
$ tcpdump -i cali12d4a061371 -e -nn
tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
listening on calicba2f87f6bb, link-type EN10MB (Ethernet), capture size 262144 bytes

14:27:13.565539 ee:ee:ee:ee:ee:ee > 0a:58:ac:1c:ce:12, ethertype IPv4 (0x0800), length 4191: 10.96.0.1.443 > 172.17.8.2.36180: Flags [P.], seq 403862039:403866164, ack 2023703985, win 990, options [nop,nop,TS val 331780572 ecr 603755526], length 4125
14:27:13.565613 0a:58:ac:1c:ce:12 > ee:ee:ee:ee:ee:ee, ethertype IPv4 (0x0800), length 66: 172.17.8.2.36180 > 10.96.0.1.443: Flags [.], ack 4125, win 2465, options [nop,nop,TS val 603758497 ecr 331780572], length 0

Summary:

  1. Calico cleverly directs all traffic of the workload to a special gateway IP address 169.254.1.1, which flows to the main host’s calixxx network device and ultimately converts all Layer 2 traffic into Layer 3 traffic for forwarding.

  2. Enabling proxy ARP on the host implements ARP response, suppressing ARP broadcast storms and preventing ARP table inflation.

  3. Simulated Networking

Since we have already mastered the networking principles of Calico, we can proceed with manual simulation verification. The architecture is shown in the figure:

First, execute the following command on Host0:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
$ ip link add veth0 type veth peer name eth0
$ ip netns add ns0
$ ip link set eth0 netns ns0
$ ip netns exec ns0 ip a add 10.20.1.2/24 dev eth0
$ ip netns exec ns0 ip link set eth0 up
$ ip netns exec ns0 ip route add 169.254.1.1 dev eth0 scope link
$ ip netns exec ns0 ip route add default via 169.254.1.1 dev eth0
$ ip link set veth0 up
$ ip route add 10.20.1.2 dev veth0 scope link
$ ip route add 10.20.1.3 via 192.168.1.16 dev ens192
echo 1 > /proc/sys/net/ipv4/conf/veth0/proxy_arp

Execute the following command on Host1:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
$ ip link add veth0 type veth peer name eth0
$ ip netns add ns1
$ ip link set eth0 netns ns1
$ ip netns exec ns1 ip a add 10.20.1.3/24 dev eth0
$ ip netns exec ns1 ip link set eth0 up
$ ip netns exec ns1 ip route add 169.254.1.1 dev eth0 scope link
$ ip netns exec ns1 ip route add default via 169.254.1.1 dev eth0
$ ip link set veth0 up
$ ip route add 10.20.1.3 dev veth0 scope link
$ ip route add 10.20.1.2 via 192.168.1.32 dev ens192
echo 1 > /proc/sys/net/ipv4/conf/veth0/proxy_arp

Network connectivity test:

1
2
3
4
5
# Host0
$ ip netns exec ns1 ping 10.20.1.3
PING 10.20.1.3 (10.20.1.3) 56(84) bytes of data.
64 bytes from 10.20.1.3: icmp_seq=ttl=62 time=0.303 ms
64 bytes from 10.20.1.3: icmp_seq=ttl=62 time=0.334 ms

The experiment was successful!

The specific forwarding process is as follows:

  1. All data packets in the ns0 network space are forwarded to a virtual IP address 169.254.1.1 and an ARP request is sent.
  2. When Host0’s veth port receives the ARP request, it returns its MAC address directly to ns0 by enabling the proxy ARP function of the network card.
  3. ns0 sends an IP packet with a destination address of ns1.
  4. Because an address like 169.254.1.1 is used, Host determines that it is a layer 3 routing forwarding, queries the local routing 10.20.1.3 via 192.168.1.16 dev ens192, and sends it to the other end Host1. If BGP is configured, the proto protocol will be shown as BIRD here.
  5. When Host1 receives the data packet of 10.20.1.3, it matches the local routing table 10.20.1.3 dev veth0 scope link, forwards the data packet to the corresponding veth0 port, and thus reaches ns1.
  6. The return path is similar.

Through this experiment, we can have a clear understanding of the data forwarding process of Calico network. First, a special route needs to be configured for all ns, and veth’s proxy ARP function is used to change all the forwarding from ns to layer 3 routing forwarding. Then, host routing is used for forwarding. This method not only implements layer 2 and layer 3 forwarding within the same host but also enables cross-host forwarding.


What's on this Page