简介
采集 Kubernetes 集群指标上报到 DataFlux 中。kubernetes 集群指标采集,主要通过两个 input 插件完成:
kubernetes
: 主要针对集群中 kubelet 的数据采集,即 node 节点相关性能
kube_inventory
: 主要针对集群中 api-server 的数据采集,即集群、pod 相关性能
前置条件
- 已安装 DataKit(DataKit 安装文档)
- 采集器可能会报错:
/run/secrets/kubernetes.io/serviceaccount/token: no such file or directory
。执行如下两个命令:
mkdir -p /run/secrets/kubernetes.io/serviceaccount
touch /run/secrets/kubernetes.io/serviceaccount/token
error making HTTP request to http://<k8s-host>/stats/summary: dial tcp <k8s-hosst>:10255: connect: connect refused
,按如下方式调整 k8s 配置:
- 编辑所有节点的
/var/lib/kubelet/config.yaml
文件,加入readOnlyPort
这个参数:readOnlyPort: 10255
- 重启kubelet 服务:
systemctl restart kubelet.service
kube_inventory
采集器配置
进入 DataKit 安装目录下的 conf.d/k8s
目录,复制 kube_inventory.conf.sample
并命名为 kube_inventory.conf
。示例如下:
[[inputs.kube_inventory]]
## URL for the Kubernetes API
url = "https://127.0.0.1"
## Namespace to use. Set to "" to use all namespaces.
# namespace = "default"
## Use bearer token for authorization. ('bearer_token' takes priority)
## If both of these are empty, we'll use the default serviceaccount:
## at: /run/secrets/kubernetes.io/serviceaccount/token
# bearer_token = "/path/to/bearer/token"
## OR
# bearer_token_string = "abc_123"
## Set response_timeout (default 5 seconds)
# response_timeout = "5s"
## Optional Resources to exclude from gathering
## Leave them with blank with try to gather everything available.
## Values can be - "daemonsets", deployments", "endpoints", "ingress", "nodes",
## "persistentvolumes", "persistentvolumeclaims", "pods", "services", "statefulsets"
# resource_exclude = [ "deployments", "nodes", "statefulsets" ]
## Optional Resources to include when gathering
## Overrides resource_exclude if both set.
# resource_include = [ "deployments", "nodes", "statefulsets" ]
## selectors to include and exclude as tags. Globs accepted.
## Note that an empty array for both will include all selectors as tags
## selector_exclude overrides selector_include if both set.
selector_include = []
selector_exclude = ["*"]
## Optional TLS Config
# tls_ca = "/path/to/cafile"
# tls_cert = "/path/to/certfile"
# tls_key = "/path/to/keyfile"
## Use TLS but skip chain & host verification
# insecure_skip_verify = false
## Uncomment to remove deprecated metrics.
# fielddrop = ["terminated_reason"]
配置好后, 重启 DataKit 即可生效
Kubernetes Permissions
如果使用 RBAC 授权, 则需要创建一个集群角色以列出 persistentvolumes
和 nodes
。然后, 需要创建一个聚合的 ClusterRole
, 它最终将绑定到用户或组。
---
kind: ClusterRole
apiVersion: rbac.authorization.k8s.io/v1
metadata:
name: influx:cluster:viewer
labels:
rbac.authorization.k8s.io/aggregate-view-telegraf: "true"
rules:
- apiGroups: [""]
resources: ["persistentvolumes", "nodes"]
verbs: ["get", "list"]
---
kind: ClusterRole
apiVersion: rbac.authorization.k8s.io/v1
metadata:
name: influx:telegraf
aggregationRule:
clusterRoleSelectors:
- matchLabels:
rbac.authorization.k8s.io/aggregate-view-telegraf: "true"
- matchLabels:
rbac.authorization.k8s.io/aggregate-to-view: "true"
rules: [] # Rules are automatically filled in by the controller manager.
将新创建的聚合ClusterRole与以下配置文件绑定, 并根据需要更新主题。
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
name: influx:telegraf:viewer
roleRef:
apiGroup: rbac.authorization.k8s.io
kind: ClusterRole
name: influx:telegraf
subjects:
- kind: ServiceAccount
name: telegraf
namespace: default
指标集 kubernetes_daemonset
标签 |
描述 |
daemonset_name |
|
namespace |
|
指标 |
描述 |
类型 |
generation |
|
|
current_number_scheduled |
|
|
desired_number_scheduled |
|
|
number_available |
|
|
number_misscheduled |
|
|
number_ready |
|
|
number_unavailable |
|
|
updated_number_scheduled |
|
|
指标集 kubernetes_deployment
标签 |
描述 |
deployment_name |
|
namespace |
|
指标 |
描述 |
类型 |
replicas_available |
|
|
replicas_unavailable |
|
|
created |
|
|
指标集 kubernetes_endpoints
标签 |
描述 |
endpoint_name |
|
namespace |
|
hostname |
|
node_name |
|
port_name |
|
port_protocol |
|
kind |
|
指标 |
描述 |
类型 |
created |
|
|
generation |
|
|
ready |
|
|
port |
|
|
指标集 kubernetes_ingress
标签 |
描述 |
ingress_name |
|
namespace |
|
hostname |
|
ip |
|
backend_service_name |
|
path |
|
host |
|
指标 |
描述 |
类型 |
created |
|
|
generation |
|
|
backend_service_port |
|
|
tls |
|
|
指标集 kubernetes_node
指标 |
描述 |
类型 |
capacity_cpu_cores |
|
|
capacity_memory_bytes |
|
|
capacity_pods |
|
|
allocatable_cpu_cores |
|
|
allocatable_memory_bytes |
|
|
allocatable_pods |
|
|
指标集 kubernetes_persistentvolume
标签 |
描述 |
pv_name |
|
phase |
|
storageclass |
|
指标集 kubernetes_persistentvolumeclaim
标签 |
描述 |
pvc_name |
|
namespace |
|
phase |
|
storageclass |
|
指标集 kubernetes_pod_container
标签 |
描述 |
container_name |
|
namespace |
|
node_name |
|
pod_name |
|
指标 |
描述 |
类型 |
restarts_total |
|
|
state |
|
|
terminated_reason |
|
|
resource_requests_cpu_units |
|
|
resource_requests_memory_bytes |
|
|
resource_limits_cpu_units |
|
|
resource_limits_memory_bytes |
|
|
指标集 kubernetes_service
标签 |
描述 |
service_name |
|
namespace |
|
port_name |
|
port_protocol |
|
external_name |
|
cluster_ip |
|
指标 |
描述 |
类型 |
created |
|
|
generation |
|
|
port |
|
|
target_port |
|
|
指标集 kubernetes_statefulset
标签 |
描述 |
statefulset_name |
|
namespace |
|
指标 |
描述 |
类型 |
created |
|
|
generation |
|
|
replicas |
|
|
replicas_current |
|
|
replicas_ready |
|
|
replicas_updated |
|
|
spec_replicas |
|
|
kubernetes 采集器配置
进入 DataKit 安装目录下的 conf.d/k8s
目录,复制 kubernetes.conf.sample
并命名为 kubernetes.conf.sample
。示例如下:
[[inputs.kubernetes]]
## URL for the kubelet
url = "http://127.0.0.1:10255"
## Use bearer token for authorization. ('bearer_token' takes priority)
## If both of these are empty, we'll use the default serviceaccount:
## at: /run/secrets/kubernetes.io/serviceaccount/token
# bearer_token = "/path/to/bearer/token"
## OR
# bearer_token_string = "abc_123"
## Pod labels to be added as tags. An empty array for both include and
## exclude will include all labels.
# label_include = []
# label_exclude = ["*"]
## Set response_timeout (default 5 seconds)
# response_timeout = "5s"
## Optional TLS Config
# tls_ca = /path/to/cafile
# tls_cert = /path/to/certfile
# tls_key = /path/to/keyfile
## Use TLS but skip chain & host verification
# insecure_skip_verify = false
指标集 kubernetes_node
指标 |
描述 |
类型 |
cpu_usage_nanocores |
|
|
cpu_usage_core_nanoseconds |
|
|
memory_available_bytes |
|
|
memory_usage_bytes |
|
|
memory_working_set_bytes |
|
|
memory_rss_bytes |
|
|
memory_page_faults |
|
|
memory_major_page_faults |
|
|
network_rx_bytes |
|
|
network_rx_errors |
|
|
network_tx_bytes |
|
|
network_tx_errors |
|
|
fs_available_bytes |
|
|
fs_capacity_bytes |
|
|
fs_used_bytes |
|
|
runtime_image_fs_available_bytes |
|
|
runtime_image_fs_capacity_bytes |
|
|
runtime_image_fs_used_bytes |
|
|
指标集 kubernetes_pod_container
标签 |
描述 |
container_name |
|
namespace |
|
node_name |
|
pod_name |
|
指标 |
描述 |
类型 |
cpu_usage_nanocores |
|
|
cpu_usage_core_nanoseconds |
|
|
memory_usage_bytes |
|
|
memory_working_set_bytes |
|
|
memory_rss_bytes |
|
|
memory_page_faults |
|
|
memory_major_page_faults |
|
|
rootfs_available_bytes |
|
|
rootfs_capacity_bytes |
|
|
rootfs_used_bytes |
|
|
logsfs_avaialble_bytes |
|
|
logsfs_capacity_bytes |
|
|
logsfs_used_bytes |
|
|
指标集 kubernetes_pod_volume
标签 |
描述 |
volume_name |
|
namespace |
|
node_name |
|
pod_name |
|
指标 |
描述 |
类型 |
available_bytes |
|
|
capacity_bytes |
|
|
used_bytes |
|
|
指标集 kubernetes_pod_network
标签 |
描述 |
namespace |
|
node_name |
|
pod_name |
|
指标 |
描述 |
类型 |
rx_bytes |
|
|
rx_errors |
|
|
tx_bytes |
|
|
tx_errors |
|
|
采集器示例数据
kubernetes_pod_container,container_name=deis-controller,namespace=deis,node_name=ip-10-0-0-0.ec2.internal,pod_name=deis-controller-3058870187-xazsr cpu_usage_core_nanoseconds=2432835i,cpu_usage_nanocores=0i,logsfs_available_bytes=121128271872i,logsfs_capacity_bytes=153567944704i,logsfs_used_bytes=20787200i,memory_major_page_faults=0i,memory_page_faults=175i,memory_rss_bytes=0i,memory_usage_bytes=0i,memory_working_set_bytes=0i,rootfs_available_bytes=121128271872i,rootfs_capacity_bytes=153567944704i,rootfs_used_bytes=1110016i 1476477530000000000
kubernetes_pod_network,namespace=deis,node_name=ip-10-0-0-0.ec2.internal,pod_name=deis-controller-3058870187-xazsr rx_bytes=120671099i,rx_errors=0i,tx_bytes=102451983i,tx_errors=0i 1476477530000000000
kubernetes_pod_volume,volume_name=default-token-f7wts,namespace=default,node_name=ip-172-17-0-1.internal,pod_name=storage-7 available_bytes=8415240192i,capacity_bytes=8415252480i,used_bytes=12288i 1546910783000000000
kubernetes_configmap,configmap_name=envoy-config,namespace=default,resource_version=56593031 created=1544103867000000000i 1547597616000000000
kubernetes_daemonset,daemonset_name=telegraf,selector_select1=s1,namespace=logging number_unavailable=0i,desired_number_scheduled=11i,number_available=11i,number_misscheduled=8i,number_ready=11i,updated_number_scheduled=11i,created=1527758699000000000i,generation=16i,current_number_scheduled=11i 1547597616000000000
kubernetes_deployment,deployment_name=deployd,selector_select1=s1,namespace=default replicas_unavailable=0i,created=1544103082000000000i,replicas_available=1i 1547597616000000000
kubernetes_node,node_name=ip-172-17-0-2.internal allocatable_pods=110i,capacity_memory_bytes=128837533696,capacity_pods=110i,capacity_cpu_cores=16i,allocatable_cpu_cores=16i,allocatable_memory_bytes=128732676096 1547597616000000000
kubernetes_persistentvolume,phase=Released,pv_name=pvc-aaaaaaaa-bbbb-cccc-1111-222222222222,storageclass=ebs-1-retain phase_type=3i 1547597616000000000
kubernetes_persistentvolumeclaim,namespace=default,phase=Bound,pvc_name=data-etcd-0,selector_select1=s1,storageclass=ebs-1-retain phase_type=0i 1547597615000000000
kubernetes_pod,namespace=default,node_name=ip-172-17-0-2.internal,pod_name=tick1 last_transition_time=1547578322000000000i,ready="false" 1547597616000000000
kubernetes_service,cluster_ip=172.29.61.80,namespace=redis-cache-0001,port_name=redis,port_protocol=TCP,selector_app=myapp,selector_io.kompose.service=redis,selector_role=slave,service_name=redis-slave created=1588690034000000000i,generation=0i,port=6379i,target_port=0i 1547597616000000000
kubernetes_pod_container,container_name=telegraf,namespace=default,node_name=ip-172-17-0-2.internal,node_selector_node-role.kubernetes.io/compute=true,pod_name=tick1,state=running,readiness=ready resource_requests_cpu_units=0.1,resource_limits_memory_bytes=524288000,resource_limits_cpu_units=0.5,restarts_total=0i,state_code=0i,state_reason="",resource_requests_memory_bytes=524288000 1547597616000000000
kubernetes_statefulset,namespace=default,selector_select1=s1,statefulset_name=etcd replicas_updated=3i,spec_replicas=3i,observed_generation=1i,created=1544101669000000000i,generation=1i,replicas=3i,replicas_current=3i,replicas_ready=3i 1547597616000000000