etcd 常见问题
增强版 backup 方案

etcd 数据加密
- https://kubernetesjo/docs/tasks/administer-cluster/encrypt-data/
apiVersion: API Server.config.k8s.io/v1
kind: Encryptionconfiguration
resources:
- resources:
- secrets
providers:
- identity: {}
- aesgcm:
keys:
- name: key1
secret: c2VjcmV0IGlzIHNlY3VyZQ==
- name: key2
secret: dGhpcyBpcyBwYXNzd29yZA==
- aescbc:
keys:
- name: key1
secret: c2VjcmV0IGlzIHNlY3VyZQ==
- secretbox:
keys:
- name: key1
secret: YWJjZGVmZ2hpamtsbW5vcHFyc3R1dnd4eXoxMjM0NTY=
- kms:
name: myKmsPlugin
endpoint: unix:///tmp/socketfile.sock
cachesize: 100
Kubernetes 中数据分离
- 对于大规模集群/大量的事件会对etcd造成压力
- API server启动脚本中指定etcd servers集群
/usr/local/bin/kube-apiserver --etcd-servers=https://localhost:4001 --etcd- cafile=/etc/ssl/kubernetes/ca.crt--storage-backend=etcd3 --etcd-servers- overrides=/events#https://localhost:4002
查询 APIServer
返回某namespace中的所有Pod
GET /api/vl /namespaces/test/pods
200 OK
Content-Type: application/json
(
"kind": "PodList",
"apiVersion": "vl",
"metadata": ("resourceversion":"!0245"},
"items": [...]
}
从 12345 版本开始,监听所有对象变化
GET /api/vl /namespaces/test/pods?watch=1&resourceVersion=10245
200 OK
Transfer-Encoding: chunked
Content-Type: application/json
{
"type": "ADDED",
"object": {"kind": "Pod", "apiVersion": "vl", "metadata": ("resourceversion": "10596",...}
)
(
"type": "MODIFIED",
"object": {"kind": "Pod", "apiVersion": "vl", "metadata": {"resourceVersion": "11020",...),...)
}
分页查询
GET /api/v1/pods?limit=500
---
200 OK
Content-Type: application/json
{
"kind": "PodList,
"apiVersion": "vl",
"metadata": {
"resourceVersion":"10245",
"continue": "HENCODED_CONTINUE_TOKEN",
},
"items": [...] // returns pods 1-500
}
GET /api/v1/pods?limit=500&contnue=ENCODED_CONTINUE_TOKEN
---
200 OK
Content-Type: application/json
{
"kind": "PodList",
"apiVersion": "v1",
"metadata": {
"resourceVersionn":"10245",
"continue": "ENCODED_CONTINUE_TOKEN_2,
},
"items": [...] // returns pods 501-1000
}
Resourceversion
- 单个对象的 resourceversion
- 对象的最后修改 resourceversion
- List 对象的 resourceversion
- 生成 list response 时的 resourceversion
- List 行为
- List 对象时,如果不加 resourceversion,意味着需要 Most Recent 数据,请求会击穿 APIServer 缓存,直接发送至 etcd
- APIServer 通过 Label 过滤对象查询时,过滤动作是在 APIServer 端,APIServer 需要向 etcd 发起全量查询请求

遭遇到的陷阱
频繁的 leader election
etcd 分裂
etcd 不响应
与 apiserver 之间的链路阻塞
磁盘暴涨
少数 etcd 成员 Down

Master 节点出现网络分区
Case:网络分区出现
Group#1:master-1,master-2
Group#2:master-3,master-4,master-5

课后练习 5.2
在 Kubernetes 集群中创建一个高可用的 etcd 集群
参考资料
B树和B+树: https://segmentfault.com/a/1190000020416577
Etcd流程分析: https://www.jianshu.com/p/2614fdb5d1c3