Tuesday, September 21, 2021

[Service Mesh] Linkerd2 Features 重點整理

 整理官方文件 https://linkerd.io/2.10/features/ 如下:

HTTP, HTTP/2, and gRPC Proxying

Linkerd can proxy all TCP connections, and will automatically enable advanced features (including metrics, load balancing, retries, and more) for HTTP, HTTP/2, and gRPC connections.

TCP Proxying and Protocol Detection

Linkerd is capable of proxying all TCP traffic, including TLS connections, WebSockets, and HTTP tunneling.

In most cases, Linkerd can do this without configuration. To do this, Linkerd performs protocol detection to determine whether traffic is HTTP or HTTP/2 (including gRPC). If Linkerd detects that a connection is HTTP or HTTP/2, Linkerd will automatically provide HTTP-level metrics and routing.

If Linkerd cannot determine that a connection is using HTTP or HTTP/2**, Linkerd will proxy the connection as a plain TCP connection, applying mTLS and providing byte-level metrics** as usual.

Note Client-initiated HTTPS will be treated as TCP, not as HTTP, as Linkerd will not be able to observe the HTTP transactions on the connection.

Configuring protocol detection

( We ITRI currently get TCP info from Layer 4 Network Stack in Kernel )

In some cases, Linkerd’s protocol detection cannot function because it is not provided with enough client data. This can result in a 10-second delay in creating the connection as the protocol detection code waits for more data. This situation is often encountered when using “server-speaks-first” protocols, or protocols where the server sends data before the client does, and can be avoided by supplying Linkerd with some additional configuration.

There are two basic mechanisms for configuring protocol detection: opaque ports and skip ports. Marking a port as opaque instructs Linkerd to proxy the connection as a TCP stream and not to attempt protocol detection. Marking a port as skip bypasses the proxy entirely.

By default, Linkerd automatically marks some ports as opaque, including the default ports for SMTP, MySQL, PostgresQL, and Memcache. Services that speak those protocols, use the default ports, and are inside the cluster do not need further configuration.

The following table summarizes some common server-speaks-first protocols and the configuration necessary to handle them. The “on-cluster config” column refers to the configuration when the destination is on the same cluster; the “off-cluster config” to when the destination is external to the cluster.

some common server-speaks-first protocols

  • No configuration is required if the standard port is used. If a non-standard port is used, you must mark the port as opaque.

Retries and Timeouts

Automatic retries are one the most powerful and useful mechanisms a service mesh has for gracefully handling partial or transient application failures.

Timeouts work hand in hand with retries. Once requests are retried a certain number of times, it becomes important to limit the total amount of time a client waits before giving up entirely. Imagine a number of retries forcing a client to wait for 10 seconds.

Automatic mTLS

By default, Linkerd automatically enables mutual Transport Layer Security (mTLS) for most TCP traffic between meshed pods, by establishing and authenticating secure, private TLS connections between Linkerd proxies. This means that Linkerd can add authenticated, encrypted communication to your application with very little work on your part.

Telemetry and Monitoring

One of Linkerd’s most powerful features is its extensive set of tooling around observability—the measuring and reporting of observed behavior in meshed applications

To gain access to Linkerd’s observability features you only need to install the Viz extension:

linkerd viz install | kubectl apply -f -

Linkerd’s telemetry and monitoring features function automatically, without requiring any work on the part of the developer. These features include:

  • Recording of top-line (“golden”) metrics (request volume, success rate, and latency distributions) for HTTP, HTTP/2, and gRPC traffic.
  • Recording of TCP-level metrics (bytes in/out, etc) for other TCP traffic. (We ITRI record TCP and UDP Tx/Rx bytes both)
  • Reporting metrics per service, per caller/callee pair, or per route/path (with Service Profiles).
  • Generating topology graphs that display the runtime relationship between services.
  • Live, on-demand request sampling.

This data can be consumed in several ways:

Golden metrics

Success Rate

This is the percentage of successful requests during a time window (1 minute by default).

In the output of the command linkerd viz routes -o wide, this metric is split into EFFECTIVE_SUCCESS and ACTUAL_SUCCESS. For routes configured with retries, the former calculates the percentage of success after retries (as perceived by the client-side), and the latter before retries (which can expose potential problems with the service).

Traffic (Requests Per Second)

This gives an overview of how much demand is placed on the service/route. As with success rates, linkerd viz routes --o wide splits this metric into EFFECTIVE_RPS and ACTUAL_RPS, corresponding to rates after and before retries respectively.

Latencies ( We ITRI's latency is defined as time to client → server → client . We also have service's response time)

Times taken to service requests per service/route are split into 50th, 95th and 99th percentiles. Lower percentiles give you an overview of the average performance of the system, while tail percentiles help catch outlier behavior.

Load Balancing

For HTTP, HTTP/2, and gRPC connections, Linkerd automatically load balances requests across all destination endpoints without any configuration required. (For TCP connections, Linkerd will balance connections.)

Linkerd uses an algorithm called EWMA, or exponentially weighted moving average, to automatically send requests to the fastest endpoints. This load balancing can improve end-to-end latencies.

Service discovery

For destinations that are not in Kubernetes, Linkerd will balance across endpoints provided by DNS.

For destinations that are in Kubernetes, Linkerd will look up the IP address in the Kubernetes API. If the IP address corresponds to a Service, Linkerd will load balance across the endpoints of that Service and apply any policy from that Service’s Service Profile. On the other hand, if the IP address corresponds to a Pod, Linkerd will not perform any load balancing or apply any Service Profiles.

Load balancing gRPC

Linkerd’s load balancing is particularly useful for gRPC (or HTTP/2) services in Kubernetes, for which Kubernetes’s default load balancing is not effective.

Automatic Proxy Injection

Linkerd automatically adds the data plane proxy to pods when the linkerd.io/inject: enabled annotation is present on a namespace or any workloads, such as deployments or pods. This is known as “proxy injection”. ( Here a lot of details in Adding Your Services to Linkerd)

Details

Proxy injection is implemented as a Kubernetes admission webhook. This means that the proxies are added to pods within the Kubernetes cluster itself, regardless of whether the pods are created by kubectl, a CI/CD system, or any other system.

For each pod, two containers are injected:

  1. linkerd-init, a Kubernetes Init Container that configures iptables to automatically forward all incoming and outgoing TCP traffic through the proxy. (Note that this container is not present if the Linkerd CNI Plugin has been enabled.)
  2. linkerd-proxy, the Linkerd data plane proxy itself.

Note that simply adding the annotation to a resource with pre-existing pods will not automatically inject those pods. You will need to update the pods (e.g. with kubectl rollout restart etc.) for them to be injected. This is because Kubernetes does not call the webhook until it needs to update the underlying resources.

CNI Plugin

Linkerd installs can be configured to run a CNI plugin that rewrites each pod’s iptables rules automatically. Rewriting iptables is required for routing network traffic through the pod’s linkerd-proxy container. When the CNI plugin is enabled, individual pods no longer need to include an init container that requires the NET_ADMIN capability to perform rewriting. This can be useful in clusters where that capability is restricted by cluster administrators.

Distributed Tracing

(We ITRI don't require code changes. We provide the downstream of related trajectories )

Linkerd can be configured to emit trace spans from the proxies, allowing you to see exactly what time requests and responses spend inside.

Unlike most of the features of Linkerd, distributed tracing requires both code changes and configuration. (You can read up on Distributed tracing in the service mesh: four myths for why this is.)

Furthermore, Linkerd provides many of the features that are often associated with distributed tracing, without requiring configuration or application changes, including:

  • Live service topology and dependency graphs
  • Aggregated service health, latencies, and request volumes
  • Aggregated path / route health, latencies, and request volumes

For example, Linkerd can display a live topology of all incoming and outgoing dependencies for a service, without requiring distributed tracing or any other such application modification:

The Linkerd dashboard showing an automatically generated topology graph

https://linkerd.io/images/books/webapp-detail.png



Monday, September 20, 2021

Some Docker run arguments mapping to Kubernetes YAML


Some Docker run arguments mapping to Kubernetes YAML
For instance: 

docker run -ti --rm -v /lib/modules:/lib/modules --net=host --pid=host --privileged \ ubuntu:18.04 bash 

Mapping Table:

SR-IOV Setting and Network device plugin for Multus CNI

 The main content is from here:

https://github.com/intel/sriov-network-device-plugin


Enable IOMMU

#Enable IOMMU
sudo su
vi /etc/default/grub

GRUB_DEFAULT=0
GRUB_TIMEOUT_STYLE=hidden
GRUB_TIMEOUT=0
GRUB_DISTRIBUTOR=`lsb_release -i -s 2> /dev/null || echo Debian`
GRUB_CMDLINE_LINUX_DEFAULT="maybe-ubiquity intel_iommu=on iommu=pt"
GRUB_CMDLINE_LINUX=""

Update grub and verify it
sudo update-grub
cd /boot/grub
vi grub.cfg

Enable SR-IOV PF card interfaces with DHCP

sudo vi /etc/netplan/00-installer-config.yaml

network:
ethernets:
enp4s0:
addresses:
- 140.96.27.152/24
gateway4: 140.96.27.254
nameservers:
addresses:
- 140.96.254.99
- 140.96.254.100
search:
- itri.ds
enp5s0f0:
dhcp4: yes
enp5s0f1:
dhcp4: yes

Install Golang for compiling SR-IOV's driver

In current development, Golang version in the environment is v1.14
cd /tmp
wget https://dl.google.com/go/go1.14.1.linux-amd64.tar.gz
tar xvf go1.14.1.linux-amd64.tar.gz
sudo mv go /usr/local

We can add the following content in ~/.profile or ~/.bashrc
#Golang path
export PATH=$PATH:/usr/local/go/bin
export GOPATH=$HOME/go
export PATH=$PATH:$GOPATH/bin

Then, renewing the shell sessions
source ~/.profile

Creating SR-IOV Virtual Functions

enable_vf.sh
#!/bin/bash
# -> enable VF
#
#
echo ""
SRIOV_CARD_PF1=enp5s0f0
#SRIOV_CARD_PF2=enp5s0f1
echo "PF status:"
echo "---"
sudo ethtool ${SRIOV_CARD_PF1}
echo ""
#sudo ethtool ${SRIOV_CARD_PF2}
echo "---"
echo ""
echo "VF status:"
echo "---"
if [ "$1" = "0" ]; then
# disable VF
sudo sh -c "echo 0 > /sys/class/net/${SRIOV_CARD_PF1}/device/sriov_numvfs"
#sudo sh -c "echo 0 > /sys/class/net/${SRIOV_CARD_PF2}/device/sriov_numvfs"
else
# enable VF
sudo sh -c "echo 6 > /sys/class/net/${SRIOV_CARD_PF1}/device/sriov_numvfs"
#sudo sh -c "echo 4 > /sys/class/net/${SRIOV_CARD_PF2}/device/sriov_numvfs"
fi
sudo lshw -class network -businfo
echo ""

Build SR-IOV CNI

Compile SR-IOV-CNI (supported from release 2.0+):
$ git clone https://github.com/intel/sriov-cni.git
$ cd sriov-cni
$ make -j8
$ sudo cp build/sriov /opt/cni/bin

Build and run SR-IOV network device plugin

If you want to build the docker image locally then follow the following steps:
Clone the sriov-network-device-plugin
$ git clone https://github.com/intel/sriov-network-device-plugin.git
$ cd sriov-network-device-plugin

Build docker image binary using make
$ make image

Install one compatible CNI meta plugin

Install Multus
$ git clone https://github.com/intel/multus-cni.git && cd multus-cni
$ cat ./images/multus-daemonset.yml | kubectl apply -f -
$ kubectl get pods --all-namespaces | grep -i multus

Network Object CRDs
Multus uses Custom Resource Definitions(CRDs) for defining additional network attachments. These network attachment CRDs follow the standards defined by K8s Network Plumbing Working Group(NPWG). Please refer to Multus documentation for more information.

Deploy the Device Plugin

The images directory contains example Dockerfile, sample specs along build scripts to deploy the SR-IOV device plugin as daemonset. Please see README.md for more information about the Docker images.
# Create ConfigMap
$ kubectl create -f deployments/configMap.yaml

# Create sriov-device-plugin-daemonset ( When rebooted, we need to delete and apply this yaml again )
$ kubectl create -f deployments/k8s-v1.16/sriovdp-daemonset.yaml

Create the SR-IOV Network CRD

$ cd sriov-network-device-plugin
$ kubectl create -f deployments/sriov-crd.yaml

On successful run, the allocatable resource list for the node should be updated with resources discovered by the plugin as shown below. Note that the resource name is appended with the -resource-prefix i.e. "intel.com/intel_sriov_netdevice".
$ kubectl get node node1 -o json | jq '.status.allocatable'
{
"cpu": "8",
"ephemeral-storage": "885084245370",
"hugepages-1Gi": "0",
"hugepages-2Mi": "0",
"intel.com/intel_sriov_dpdk": "0",
"intel.com/intel_sriov_netdevice": "6",
"intel.com/mlnx_sriov_rdma": "0",
"memory": "16203124Ki",
"pods": "110"
}

Check SRIOV Network in details

$ kubectl get network-attachment-definitions.k8s.cni.cncf.io --all-namespaces
NAMESPACE NAME AGE
default sriov-net1 14h

$ kubectl describe network-attachment-definitions.k8s.cni.cncf.io sriov-net1
Name: sriov-net1
Namespace: default
Labels: <none>
Annotations: k8s.v1.cni.cncf.io/resourceName: intel.com/intel_sriov_netdevice
API Version: k8s.cni.cncf.io/v1
Kind: NetworkAttachmentDefinition
Metadata:
Creation Timestamp: 2020-09-09T09:58:36Z
Generation: 1
Managed Fields:
API Version: k8s.cni.cncf.io/v1
Fields Type: FieldsV1
fieldsV1:
f:metadata:
f:annotations:
.:
f:k8s.v1.cni.cncf.io/resourceName:
f:spec:
.:
f:config:
Manager: kubectl-create
Operation: Update
Time: 2020-09-09T09:58:36Z
Resource Version: 29832
Self Link: /apis/k8s.cni.cncf.io/v1/namespaces/default/network-attachment-definitions/sriov-net1
UID: b1dd915b-9d70-4a33-9aee-66afdf7551c2
Spec:
Config: { "type": "sriov", "cniVersion": "0.3.1", "name": "sriov-network", "ipam": { "type": "host-local", "subnet": "10.56.217.0/24", "routes": [{ "dst": "0.0.0.0/0" }], "gateway": "10.56.217.1" } }
Events: <none>

# Trouble shooting
$ kubectl describe pods <pod name>
$ kubectl get <pod name> -o yaml

Python Script to get SR-IOV interface's IP address

#!/usr/bin/python
from kubernetes import client, config
import json
def main():
try:
config.load_kube_config()
v1 = client.CoreV1Api()
print("Listing nodes with their IPs:")
ret = v1.list_namespaced_pod('default')
for i in ret.items:
#print("%s\t%s\t%s" % (i.status.pod_ip, i.metadata.namespace, i.metadata.name))
annot = i.metadata.annotations
sriov_network_status = annot['k8s.v1.cni.cncf.io/networks-status']
d = json.loads(sriov_network_status)
for item in d:
if ('sriov' in item['name']) or ('macvlan' in item['name']):
for ip in item['ips']:
print(i.metadata.name, item['ips'])
except Exception as e:
pass
return isExt
if __name__ == '__main__':
main()

helloworld-python ['10.56.217.27']
helloworld-python-access-pod ['10.56.217.29']
testpod1 ['10.56.217.28']


Set Label "isext=true" on a Pod

$ kubectl label pods testpod1 isext=true

Python Script to get pod label of "isext"

#!/usr/bin/python
from kubernetes import client, config
import json
def get_label_isext(my_metadata):
try:
labels = my_metadata.labels
for k, v in labels.items():
if k == 'isext' and v == 'true':
return True
return False
except Exception as e:
return False
def main():
config.load_kube_config()
v1 = client.CoreV1Api()
ret = v1.list_namespaced_pod('')
for i in ret.items:
is_ext = get_label_isext(i.metadata)
print("pod:", i.metadata.name, "isExt:", is_ext)
if __name__ == '__main__':
main()

pod: helloworld-python isExt: False
pod: helloworld-python-access-pod isExt: False
pod: nfd-master-tk2g9 isExt: False
pod: nfd-worker-wdlmw isExt: False
pod: testpod1 isExt: True

安裝 DPDK 與 Pktgen on Ubuntu 18.04

 安裝 DPDK 與 Pktgen on Ubuntu 18.04

Compile and Install DPDK
sudo apt -y install vim git wget curl python3 python3-pip
sudo apt install build-essential libnuma-dev libpcap-dev linux-headers-`uname -r`

wget --no-check-certificate https://fast.dpdk.org/rel/dpdk-19.11.tar.xz
git clone git://dpdk.org/apps/pktgen-dpdk --depth=1

#build DPDK
export RTE_SDK=/home/dpdk1/git/dpdk
export RTE_TARGET=build
cd dpdk
make config T=x86_64-native-linux-gcc
make -j `nproc` T=x86_64-native-linux-gcc
make install


Compile Pktgen
sudo apt install -y lua5.3 liblua5.3-dev
make -j `nproc` T=x86_64-native-linux-gcc

配置huge page in memory
echo 1024 > /sys/devices/system/node/node0/hugepages/hugepages-2048kB/nr_hugepages
mkdir /mnt/huge
mount -t hugetlbfs nodev /mnt/huge


Thursday, September 16, 2021

BCC programming 筆記

安裝 BCC 使用 Source Code
For Bionic (18.04 LTS)

Install related libraries and packages
$ sudo apt-get -y install bison build-essential cmake flex git libedit-dev \
libllvm6.0 llvm-6.0-dev libclang-6.0-dev python zlib1g-dev libelf-dev \
iperf3 luajit libluajit-5.1-dev netperf linux-headers-$(uname -r)

Compile BCC source code and install
$ git clone --recursive https://github.com/iovisor/bcc.git
$ git submodule update --recursive
# Or $ git pull --recurse-submodules

$ mkdir bcc/build; cd bcc/build
$ cmake .. -DCMAKE_INSTALL_PREFIX=/usr -DPYTHON_CMD="python python3.8"
$ make -j `nproc`
$ sudo make install 


Quick Start Guide

A Docker container is provided for user to try out bcc.

From your host shell:

docker run -it --rm \
  --privileged \
  -v /lib/modules:/lib/modules:ro \
  -v /usr/src:/usr/src:ro \
  -v /etc/localtime:/etc/localtime:ro \
  --workdir /usr/share/bcc/tools \
  zlim/bcc

常用Linux Commands筆記

 Use rsync on my Windows 10 D:\SourceCode to sync folder from Linux's ~/SourceCode

rsync -r liudanny@<your IP address>:/home/liudanny/SourceCode /cygdrive/d/

List all users with their UID
awk -F: '{printf "%s:%s\n",$1,$3}' /etc/passwd

kubectl command completion in ~/.bashrc
# enable programmable completion features (you don't need to enable
# this, if it's already enabled in /etc/bash.bashrc and /etc/profile
# sources /etc/bash.bashrc).
if ! shopt -oq posix; then
if [ -f /usr/share/bash-completion/bash_completion ]; then
. /usr/share/bash-completion/bash_completion
elif [ -f /etc/bash_completion ]; then
. /etc/bash_completion
fi
fi

source <(kubectl completion bash)

Netfilter 研究筆記

 netfilter-like kernel module to get source and destination address

USDT Python Tracing 範例

 Reference for USDT Tracepoint for Python

https://github.com/paulross/dtrace-py
https://www.collabora.com/news-and-blog/blog/2019/05/14/an-ebpf-overview-part-5-tracing-user-processes/
https://github.com/iovisor/bcc/pull/698

Install build tools and python prerequisites
sudo apt install systemtap-sdt-dev

sudo apt install build-essential libssl-dev zlib1g-dev libncurses5-dev libncursesw5-dev libreadline-dev libsqlite3-dev libgdbm-dev libdb5.3-dev libbz2-dev libexpat1-dev liblzma-dev tk-dev libffi-dev

Download and extract python
wget https://www.python.org/ftp/python/3.7.0/Python-3.7.0.tar.xz
tar xf Python-3.7.0.tar.xz
cd Python-3.7.0
curl -o Python-3.7.0.tgz https://www.python.org/ftp/python/3.7.0/Python-3.7.0.tgz
tar -xzf Python-3.7.0.tgz
cd Python-3.7.0