search
数据采集 采集源配置 Kubernetes 监控指标采集

Kubernetes 监控指标采集

简介

采集 Kubernetes 集群指标上报到 DataFlux 中。kubernetes 集群指标采集,主要通过两个 input 插件完成:

  • kubernetes: 主要针对集群中 kubelet 的数据采集,即 node 节点相关性能
  • kube_inventory: 主要针对集群中 api-server 的数据采集,即集群、pod 相关性能

前置条件

  • 已安装 DataKit(DataKit 安装文档
  • 采集器可能会报错:
    • /run/secrets/kubernetes.io/serviceaccount/token: no such file or directory。执行如下两个命令:
      • mkdir -p /run/secrets/kubernetes.io/serviceaccount
      • touch /run/secrets/kubernetes.io/serviceaccount/token
    • error making HTTP request to http://<k8s-host>/stats/summary: dial tcp <k8s-hosst>:10255: connect: connect refused,按如下方式调整 k8s 配置:
      • 编辑所有节点的 /var/lib/kubelet/config.yaml 文件,加入readOnlyPort 这个参数:readOnlyPort: 10255
      • 重启kubelet 服务:systemctl restart kubelet.service

kube_inventory 采集器配置

进入 DataKit 安装目录下的 conf.d/k8s 目录,复制 kube_inventory.conf.sample 并命名为 kube_inventory.conf。示例如下:

[[inputs.kube_inventory]]
  ## URL for the Kubernetes API
  url = "https://127.0.0.1"

  ## Namespace to use. Set to "" to use all namespaces.
  # namespace = "default"

  ## Use bearer token for authorization. ('bearer_token' takes priority)
  ## If both of these are empty, we'll use the default serviceaccount:
  ## at: /run/secrets/kubernetes.io/serviceaccount/token
  # bearer_token = "/path/to/bearer/token"
  ## OR
  # bearer_token_string = "abc_123"

  ## Set response_timeout (default 5 seconds)
  # response_timeout = "5s"

  ## Optional Resources to exclude from gathering
  ## Leave them with blank with try to gather everything available.
  ## Values can be - "daemonsets", deployments", "endpoints", "ingress", "nodes",
  ## "persistentvolumes", "persistentvolumeclaims", "pods", "services", "statefulsets"
  # resource_exclude = [ "deployments", "nodes", "statefulsets" ]

  ## Optional Resources to include when gathering
  ## Overrides resource_exclude if both set.
  # resource_include = [ "deployments", "nodes", "statefulsets" ]

  ## selectors to include and exclude as tags.  Globs accepted.
  ## Note that an empty array for both will include all selectors as tags
  ## selector_exclude overrides selector_include if both set.
  selector_include = []
  selector_exclude = ["*"]

  ## Optional TLS Config
  # tls_ca = "/path/to/cafile"
  # tls_cert = "/path/to/certfile"
  # tls_key = "/path/to/keyfile"
  ## Use TLS but skip chain & host verification
  # insecure_skip_verify = false

  ## Uncomment to remove deprecated metrics.
  # fielddrop = ["terminated_reason"]

配置好后, 重启 DataKit 即可生效

Kubernetes Permissions

如果使用 RBAC 授权, 则需要创建一个集群角色以列出 persistentvolumesnodes。然后, 需要创建一个聚合的 ClusterRole, 它最终将绑定到用户或组。

---
kind: ClusterRole
apiVersion: rbac.authorization.k8s.io/v1
metadata:
  name: influx:cluster:viewer
  labels:
    rbac.authorization.k8s.io/aggregate-view-telegraf: "true"
rules:
  - apiGroups: [""]
    resources: ["persistentvolumes", "nodes"]
    verbs: ["get", "list"]

---
kind: ClusterRole
apiVersion: rbac.authorization.k8s.io/v1
metadata:
  name: influx:telegraf
aggregationRule:
  clusterRoleSelectors:
    - matchLabels:
        rbac.authorization.k8s.io/aggregate-view-telegraf: "true"
    - matchLabels:
        rbac.authorization.k8s.io/aggregate-to-view: "true"
rules: [] # Rules are automatically filled in by the controller manager.

将新创建的聚合ClusterRole与以下配置文件绑定, 并根据需要更新主题。

---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
  name: influx:telegraf:viewer
roleRef:
  apiGroup: rbac.authorization.k8s.io
  kind: ClusterRole
  name: influx:telegraf
subjects:
  - kind: ServiceAccount
    name: telegraf
    namespace: default

指标集 kubernetes_daemonset

标签 描述
daemonset_name
namespace
指标 描述 类型
generation
current_number_scheduled
desired_number_scheduled
number_available
number_misscheduled
number_ready
number_unavailable
updated_number_scheduled

指标集 kubernetes_deployment

标签 描述
deployment_name
namespace
指标 描述 类型
replicas_available
replicas_unavailable
created

指标集 kubernetes_endpoints

标签 描述
endpoint_name
namespace
hostname
node_name
port_name
port_protocol
kind
指标 描述 类型
created
generation
ready
port

指标集 kubernetes_ingress

标签 描述
ingress_name
namespace
hostname
ip
backend_service_name
path
host
指标 描述 类型
created
generation
backend_service_port
tls

指标集 kubernetes_node

标签 描述
node_name
指标 描述 类型
capacity_cpu_cores
capacity_memory_bytes
capacity_pods
allocatable_cpu_cores
allocatable_memory_bytes
allocatable_pods

指标集 kubernetes_persistentvolume

标签 描述
pv_name
phase
storageclass
指标 描述 类型
phase_type

指标集 kubernetes_persistentvolumeclaim

标签 描述
pvc_name
namespace
phase
storageclass
指标 描述 类型
phase_type

指标集 kubernetes_pod_container

标签 描述
container_name
namespace
node_name
pod_name
指标 描述 类型
restarts_total
state
terminated_reason
resource_requests_cpu_units
resource_requests_memory_bytes
resource_limits_cpu_units
resource_limits_memory_bytes

指标集 kubernetes_service

标签 描述
service_name
namespace
port_name
port_protocol
external_name
cluster_ip
指标 描述 类型
created
generation
port
target_port

指标集 kubernetes_statefulset

标签 描述
statefulset_name
namespace
指标 描述 类型
created
generation
replicas
replicas_current
replicas_ready
replicas_updated
spec_replicas

kubernetes 采集器配置

进入 DataKit 安装目录下的 conf.d/k8s 目录,复制 kubernetes.conf.sample 并命名为 kubernetes.conf.sample。示例如下:

[[inputs.kubernetes]]
  ## URL for the kubelet
  url = "http://127.0.0.1:10255"

  ## Use bearer token for authorization. ('bearer_token' takes priority)
  ## If both of these are empty, we'll use the default serviceaccount:
  ## at: /run/secrets/kubernetes.io/serviceaccount/token
  # bearer_token = "/path/to/bearer/token"
  ## OR
  # bearer_token_string = "abc_123"

  ## Pod labels to be added as tags.  An empty array for both include and
  ## exclude will include all labels.
  # label_include = []
  # label_exclude = ["*"]

  ## Set response_timeout (default 5 seconds)
  # response_timeout = "5s"

  ## Optional TLS Config
  # tls_ca = /path/to/cafile
  # tls_cert = /path/to/certfile
  # tls_key = /path/to/keyfile
  ## Use TLS but skip chain & host verification
  # insecure_skip_verify = false

指标集 kubernetes_node

标签 描述
node_name
指标 描述 类型
cpu_usage_nanocores
cpu_usage_core_nanoseconds
memory_available_bytes
memory_usage_bytes
memory_working_set_bytes
memory_rss_bytes
memory_page_faults
memory_major_page_faults
network_rx_bytes
network_rx_errors
network_tx_bytes
network_tx_errors
fs_available_bytes
fs_capacity_bytes
fs_used_bytes
runtime_image_fs_available_bytes
runtime_image_fs_capacity_bytes
runtime_image_fs_used_bytes

指标集 kubernetes_pod_container

标签 描述
container_name
namespace
node_name
pod_name
指标 描述 类型
cpu_usage_nanocores
cpu_usage_core_nanoseconds
memory_usage_bytes
memory_working_set_bytes
memory_rss_bytes
memory_page_faults
memory_major_page_faults
rootfs_available_bytes
rootfs_capacity_bytes
rootfs_used_bytes
logsfs_avaialble_bytes
logsfs_capacity_bytes
logsfs_used_bytes

指标集 kubernetes_pod_volume

标签 描述
volume_name
namespace
node_name
pod_name
指标 描述 类型
available_bytes
capacity_bytes
used_bytes

指标集 kubernetes_pod_network

标签 描述
namespace
node_name
pod_name
指标 描述 类型
rx_bytes
rx_errors
tx_bytes
tx_errors

采集器示例数据

kubernetes_pod_container,container_name=deis-controller,namespace=deis,node_name=ip-10-0-0-0.ec2.internal,pod_name=deis-controller-3058870187-xazsr cpu_usage_core_nanoseconds=2432835i,cpu_usage_nanocores=0i,logsfs_available_bytes=121128271872i,logsfs_capacity_bytes=153567944704i,logsfs_used_bytes=20787200i,memory_major_page_faults=0i,memory_page_faults=175i,memory_rss_bytes=0i,memory_usage_bytes=0i,memory_working_set_bytes=0i,rootfs_available_bytes=121128271872i,rootfs_capacity_bytes=153567944704i,rootfs_used_bytes=1110016i 1476477530000000000
kubernetes_pod_network,namespace=deis,node_name=ip-10-0-0-0.ec2.internal,pod_name=deis-controller-3058870187-xazsr rx_bytes=120671099i,rx_errors=0i,tx_bytes=102451983i,tx_errors=0i 1476477530000000000
kubernetes_pod_volume,volume_name=default-token-f7wts,namespace=default,node_name=ip-172-17-0-1.internal,pod_name=storage-7 available_bytes=8415240192i,capacity_bytes=8415252480i,used_bytes=12288i 1546910783000000000
kubernetes_configmap,configmap_name=envoy-config,namespace=default,resource_version=56593031 created=1544103867000000000i 1547597616000000000
kubernetes_daemonset,daemonset_name=telegraf,selector_select1=s1,namespace=logging number_unavailable=0i,desired_number_scheduled=11i,number_available=11i,number_misscheduled=8i,number_ready=11i,updated_number_scheduled=11i,created=1527758699000000000i,generation=16i,current_number_scheduled=11i 1547597616000000000
kubernetes_deployment,deployment_name=deployd,selector_select1=s1,namespace=default replicas_unavailable=0i,created=1544103082000000000i,replicas_available=1i 1547597616000000000
kubernetes_node,node_name=ip-172-17-0-2.internal allocatable_pods=110i,capacity_memory_bytes=128837533696,capacity_pods=110i,capacity_cpu_cores=16i,allocatable_cpu_cores=16i,allocatable_memory_bytes=128732676096 1547597616000000000
kubernetes_persistentvolume,phase=Released,pv_name=pvc-aaaaaaaa-bbbb-cccc-1111-222222222222,storageclass=ebs-1-retain phase_type=3i 1547597616000000000
kubernetes_persistentvolumeclaim,namespace=default,phase=Bound,pvc_name=data-etcd-0,selector_select1=s1,storageclass=ebs-1-retain phase_type=0i 1547597615000000000
kubernetes_pod,namespace=default,node_name=ip-172-17-0-2.internal,pod_name=tick1 last_transition_time=1547578322000000000i,ready="false" 1547597616000000000
kubernetes_service,cluster_ip=172.29.61.80,namespace=redis-cache-0001,port_name=redis,port_protocol=TCP,selector_app=myapp,selector_io.kompose.service=redis,selector_role=slave,service_name=redis-slave created=1588690034000000000i,generation=0i,port=6379i,target_port=0i 1547597616000000000
kubernetes_pod_container,container_name=telegraf,namespace=default,node_name=ip-172-17-0-2.internal,node_selector_node-role.kubernetes.io/compute=true,pod_name=tick1,state=running,readiness=ready resource_requests_cpu_units=0.1,resource_limits_memory_bytes=524288000,resource_limits_cpu_units=0.5,restarts_total=0i,state_code=0i,state_reason="",resource_requests_memory_bytes=524288000 1547597616000000000
kubernetes_statefulset,namespace=default,selector_select1=s1,statefulset_name=etcd replicas_updated=3i,spec_replicas=3i,observed_generation=1i,created=1544101669000000000i,generation=1i,replicas=3i,replicas_current=3i,replicas_ready=3i 1547597616000000000