跳转至

配置原生 Prometheus 获取监控数据

Kube-OVN 提供了丰富的监控数据,用于 OVN/OVS 健康状态检查,以及容器网络和主机网络的连通性检查。Kube-OVN 配置了 ServiceMonitor,可以用于 Prometheus 动态获取监控指标。

在某些情况下,只安装了 Prometheus Server,没有安装其他的组件,可以通过修改 Prometheus 的配置,动态获取集群环境的监控数据。

Prometheus 配置

以下的配置文档,参考自 Prometheus 服务发现

权限配置

Prometheus 部署在集群内,需要通过 k8s apiserver 来访问集群内的资源,从而实现查询业务的监控数据。

参考以下 yaml,配置 Prometheus 需要的权限:

apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
  name: prometheus
rules:
- apiGroups: [""]
  resources:
  - nodes
  - nodes/proxy
  - services
  - endpoints
  - pods
  verbs: ["get", "list", "watch"]
- apiGroups:
  - extensions
  resources:
  - ingresses
  verbs: ["get", "list", "watch"]
- nonResourceURLs: ["/metrics"]
  verbs: ["get"]
---
apiVersion: v1
kind: ServiceAccount
metadata:
  name: prometheus
  namespace: default
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
  name: prometheus
roleRef:
  apiGroup: rbac.authorization.k8s.io
  kind: ClusterRole
  name: prometheus
subjects:
- kind: ServiceAccount
  name: prometheus
  namespace: default

Prometheus 配置文件

Prometheus 的启动,依赖于配置文件 prometheus.yml,可以将该文件内容配置在 ConfigMap 内,动态挂载到 Pod 中。

参考以下 yaml,创建 Prometheus 使用的 ConfigMap 文件:

apiVersion: v1
kind: ConfigMap
metadata:
  name: prometheus-config
data:
  prometheus.yml: |-
    global:
      scrape_interval:     15s 
      evaluation_interval: 15s
    scrape_configs:
    - job_name: 'prometheus'
      static_configs:
      - targets: ['localhost:9090']

    - job_name: 'kubernetes-nodes'
      tls_config:
        ca_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
      bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token
      kubernetes_sd_configs:
      - role: node

    - job_name: 'kubernetes-service'
      tls_config:
        ca_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
      bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token
      kubernetes_sd_configs:
      - role: service

    - job_name: 'kubernetes-endpoints'
      tls_config:
        ca_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
      bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token
      kubernetes_sd_configs:
      - role: endpoints

    - job_name: 'kubernetes-ingress'
      tls_config:
        ca_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
      bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token
      kubernetes_sd_configs:
      - role: ingress

    - job_name: 'kubernetes-pods'
      tls_config:
        ca_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
      bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token
      kubernetes_sd_configs:
      - role: pod

Prometheus 提供了基于角色查询 Kubernetes 资源监控的操作,具体配置可以查看官方文档 kubernetes_sd_config

在 Kubernetes 集群中,Prometheus 支持查询监控指标的角色包含 node、service、pod、endpoints 和 ingress。在 ConfigMap 配置文件中给出了以上全部资源的监控查询配置示例,可以根据需要选择配置。

Prometheus 部署

参考以下 yaml 文件,部署 Prometheus Server:

apiVersion: apps/v1
kind: Deployment
metadata:
  labels:
    app: prometheus
  name: prometheus
  namespace: default
spec:
  replicas: 1
  selector:
    matchLabels:
      app: prometheus
  strategy:
    rollingUpdate:
      maxSurge: 25%
      maxUnavailable: 25%
    type: RollingUpdate
  template:
    metadata:
      labels:
        app: prometheus
    spec:
      serviceAccountName: prometheus
      serviceAccount: prometheus
      containers:
      - image: docker.io/prom/prometheus:latest
        imagePullPolicy: IfNotPresent
        name: prometheus
        command:
        - "/bin/prometheus"
        args:
        - "--config.file=/etc/prometheus/prometheus.yml"
        ports:
        - containerPort: 9090
          protocol: TCP
        volumeMounts:
        - mountPath: "/etc/prometheus"
          name: prometheus-config
      volumes:
      - name: prometheus-config
        configMap:
          name: prometheus-config

在部署完 Prometheus 之后,参考以下 yaml 文件,部署 Prometheus Service:

kind: Service
apiVersion: v1
metadata:
  name: prometheus
  namespace: default
  labels:
    name: prometheus
spec:
  ports:
    - name: test
      protocol: TCP
      port: 9090
      targetPort: 9090
  type: NodePort
  selector:
    app: prometheus
  sessionAffinity: None

将 Prometheus 通过 NodePort 暴露后,即可通过节点来访问 Prometheus。

Prometheus 监控数据验证

查看环境上 Prometheus 相关的信息:

# kubectl get svc 
NAME         TYPE        CLUSTER-IP     EXTERNAL-IP   PORT(S)          AGE
kubernetes   ClusterIP   10.4.0.1       <none>        443/TCP          8d
prometheus   NodePort    10.4.102.222   <none>        9090:32611/TCP   8d
# kubectl get pod -o wide
NAME                          READY   STATUS    RESTARTS   AGE    IP          NODE              NOMINATED NODE   READINESS GATES
prometheus-7544b6b84d-v9m8s   1/1     Running   0          3d5h   10.3.0.7    192.168.137.219   <none>           <none>
# kubectl get endpoints -o wide
NAME         ENDPOINTS                                                        AGE
kubernetes   192.168.136.228:6443,192.168.136.232:6443,192.168.137.219:6443   8d
prometheus   10.3.0.7:9090                                                    8d

通过 NodePort 访问 Prometheus,查看 Status/Service Discovery 动态查询到的数据:

可以看到当前可以查询到集群上全部的 Service 数据信息。

配置查询指定的资源

以上的 ConfigMap 配置中,没有添加过滤条件,查询了所有的资源数据。如果只需要某个角色的资源数据,则可以添加过滤条件。

以 Service 为例,修改 ConfigMap 内容,只查询关心的 Service 监控数据。

    - job_name: 'kubernetes-service'
      tls_config:
        ca_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
      bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token
      kubernetes_sd_configs:
      - role: service
      relabel_configs:
      - source_labels: [__meta_kubernetes_service_annotation_prometheus_io_scrape]
        action: "keep"
        regex: "true"
      - action: labelmap
        regex: __meta_kubernetes_service_label_(.+)
      - source_labels: [__meta_kubernetes_namespace]
        target_label: kubernetes_namespace
      - source_labels: [__meta_kubernetes_service_name]
        target_label: kubernetes_service_name
      - source_labels: [__meta_kubernetes_service_annotation_prometheus_io_path]
        action: replace
        target_label: __metrics_path__
        regex: "(.+)"

Service 默认监控路径为 /metrics。如果 Service 提供的监控指标是其他的路径,可以通过给 Service 添加 annotation prometheus.io/path 来指定采集路径。

应用以上 yaml,更新 ConfigMap 信息,重建 Prometheus Pod,使配置生效。

查看 kube-system Namespace 下的 Service 信息:

# kubectl get svc -n kube-system
NAME                  TYPE        CLUSTER-IP     EXTERNAL-IP   PORT(S)                  AGE
kube-dns              ClusterIP   10.4.0.10      <none>        53/UDP,53/TCP,9153/TCP   13d
kube-ovn-cni          ClusterIP   10.4.228.60    <none>        10665/TCP                13d
kube-ovn-controller   ClusterIP   10.4.172.213   <none>        10660/TCP                13d
kube-ovn-monitor      ClusterIP   10.4.242.9     <none>        10661/TCP                13d
kube-ovn-pinger       ClusterIP   10.4.122.52    <none>        8080/TCP                 13d
ovn-nb                ClusterIP   10.4.80.213    <none>        6641/TCP                 13d
ovn-northd            ClusterIP   10.4.126.234   <none>        6643/TCP                 13d
ovn-sb                ClusterIP   10.4.216.249   <none>        6642/TCP                 13d

给 Service 添加 annotation prometheus.io/scrape="true"

# kubectl annotate svc -n kube-system kube-ovn-cni  prometheus.io/scrape=true
service/kube-ovn-cni annotated
# kubectl annotate svc -n kube-system kube-ovn-controller  prometheus.io/scrape=true
service/kube-ovn-controller annotated
# kubectl annotate svc -n kube-system kube-ovn-monitor  prometheus.io/scrape=true
service/kube-ovn-monitor annotated
# kubectl annotate svc -n kube-system kube-ovn-pinger  prometheus.io/scrape=true
service/kube-ovn-pinger annotated

查看配置后的 Service 信息:

# kubectl get svc -o yaml -n kube-system kube-ovn-controller
apiVersion: v1
kind: Service
metadata:
  annotations:
    helm.sh/chart-version: v3.10.0-alpha.55
    helm.sh/original-name: kube-ovn-controller
    ovn.kubernetes.io/vpc: ovn-cluster
    prometheus.io/scrape: "true"                        // 添加的 annotation
  labels:
    app: kube-ovn-controller
  name: kube-ovn-controller
  namespace: kube-system
spec:
  clusterIP: 10.4.172.213
  clusterIPs:
  - 10.4.172.213
  internalTrafficPolicy: Cluster
  ipFamilies:
  - IPv4
  ipFamilyPolicy: SingleStack
  ports:
  - name: metrics
    port: 10660
    protocol: TCP
    targetPort: 10660
  selector:
    app: kube-ovn-controller
  sessionAffinity: None
  type: ClusterIP
status:
  loadBalancer: {}

查看 Prometheus Status Targets 信息,可以看到只有添加了 annotation 的 Service 被过滤出来:

更多关于 relabel 添加过滤参数的信息,可以参考 Prometheus-Relabel

微信群 Slack Twitter Support

评论