通过Consul+Prometheus自动注册node-exporter实现自动监控OpenStack的VM
2013 年 2 月 14 日
1. 提出问题
在工作中OpenStack集群的vm需要解决基础性能指标的监控,如果每台的启动再去手动添加监控node_exporter,再写prometheus.yml的话对于吾等懒程序员简直就是噩梦,由此开始设计基于Prometheus+Consul的监控方案。
2. 解决方案
1. 通过将node_exporter打包进Image实现强制自动部署 2. 通过开发一个小程序自动注册node_exporter到consul,同时小程序也与node_exporter一样打包进Image 3. 配置Prometheus发现node_exporter
3. 部署Consul集群
3.1 集群规划
系统 | 主机名 | IP |
---|---|---|
Centos-7.7 | compute-7-1 | 172.16.100.71 |
Centos-7.7 | compute-7-2 | 172.16.100.72 |
Centos-7.7 | compute-7-3 | 172.16.100.73 |
3.1 自行下载Consul并安装
Consul v1.7.2
3.1.1 配置master token
$ curl \ --request PUT \ http://172.16.100.71:8500/v1/acl/bootstrap
3.1.2 配置获取到的master token
compute-7-1:
{ "bootstrap_expect": 1, "datacenter": "sibat_consul", "primary_datacenter":"sibat_consul", "data_dir": "/data/consul", "start_join":[ "172.16.100.72", "172.16.100.73" ], "retry_join":[ "172.16.100.72", "172.16.100.73" ], "connect":{ "enabled": true }, "server": true, "client_addr": "0.0.0.0", "ui": true, "node_name": "compute-7-1", "bind_addr": "172.16.100.71", "advertise_addr": "172.16.100.71", "enable_script_checks": false, "enable_local_script_checks": true, "log_file": "/var/log", "log_rotate_bytes": 300000000, "log_rotate_duration": "360h", "log_level": "info", "encrypt": "gEjZMbDxnA5UDS5DJRI3Nn5KvOwdVa46jneHK0gFDa8=", "acl": { "enabled": true, "default_policy": "deny", "enable_token_persistence": true, "tokens": { "master": "8dc1eb67-1f5f-4e10-ad9d-5e58b047647c" } } }
compute-7-2
{ "datacenter": "sibat_consul", "primary_datacenter":"sibat_consul", "data_dir": "/data/consul", "connect":{ "enabled": true }, "server": true, "client_addr": "0.0.0.0", "ui": true, "node_name": "compute-7-2", "bind_addr": "172.16.100.72", "advertise_addr": "172.16.100.72", "enable_script_checks": false, "enable_local_script_checks": true, "log_file": "/var/log", "log_rotate_bytes": 300000000, "log_rotate_duration": "360h", "log_level": "info", "acl_datacenter": "sibat_consul", "encrypt": "gEjZMbDxnA5UDS5DJRI3Nn5KvOwdVa46jneHK0gFDa8=", "acl": { "enabled": true, "default_policy": "deny", "enable_token_persistence": true, "tokens": { "master": "8dc1eb67-1f5f-4e10-ad9d-5e58b047647c" } } }
compute-7-3
{ "datacenter": "sibat_consul", "primary_datacenter":"sibat_consul", "data_dir": "/data/consul", "connect":{ "enabled": true }, "server": true, "client_addr": "0.0.0.0", "ui": true, "node_name": "compute-7-3", "bind_addr": "172.16.100.73", "advertise_addr": "172.16.100.73", "enable_script_checks": false, "enable_local_script_checks": true, "log_file": "/var/log", "log_rotate_bytes": 300000000, "log_rotate_duration": "360h", "log_level": "info", "acl_datacenter": "sibat_consul", "encrypt": "gEjZMbDxnA5UDS5DJRI3Nn5KvOwdVa46jneHK0gFDa8=", "acl": { "enabled": true, "default_policy": "deny", "enable_token_persistence": true, "tokens": { "master": "8dc1eb67-1f5f-4e10-ad9d-5e58b047647c" } } }
在三个节点中启动
3.1.3 三个节点都执行
$ sudo useradd consul
$ sudo vim /usr/lib/systemd/system/consul.service Description=consul: the monitoring system Documentation=http://prometheus.io/docs/ [Service] User=consul Group=consul ExecStart=/usr/bin/consul agent -config-file /etc/consul.d/consul_config.json KillMode=process Restart=on-failure LimitNOFILE=65536 [Install] WantedBy=multi-user.target
$ sudo systemctl daemon-reload
3.1.4 在compute-7-2和compute-7-3执行
$ sudo systemctl restart consul && sudo systemctl enable consul
3.1.5 在compute-7-3执行
$ sudo systemctl restart consul && sudo systemctl enable consul
启动后我们会查看到服务器日志中出现与权限有关的错误,根据官方文档的说法是因为未配置agent的token导致的,因此还需要创建agent的token:
$ curl \ --request PUT \ --header "X-Consul-Token: 8dc1eb67-1f5f-4e10-ad9d-5e58b047647c" \ --data \ '{ "Name": "Agent Token", "Type": "client", "Rules": "node \"\" { policy = \"write\" } service \"\" { policy = \"read\" }" }'http://172.16.100.71:8500/v1/acl/create
3.1.6 配置获取到的agent token
compute-7-1:
{ "bootstrap_expect": 1, "datacenter": "sibat_consul", "primary_datacenter":"sibat_consul", "data_dir": "/data/consul", "start_join":[ "172.16.100.72", "172.16.100.73" ], "retry_join":[ "172.16.100.72", "172.16.100.73" ], "connect":{ "enabled": true }, "server": true, "client_addr": "0.0.0.0", "ui": true, "node_name": "compute-7-1", "bind_addr": "172.16.100.71", "advertise_addr": "172.16.100.71", "enable_script_checks": false, "enable_local_script_checks": true, "log_file": "/var/log", "log_rotate_bytes": 300000000, "log_rotate_duration": "360h", "log_level": "info", "encrypt": "gEjZMbDxnA5UDS5DJRI3Nn5KvOwdVa46jneHK0gFDa8=", "acl": { "enabled": true, "default_policy": "deny", "enable_token_persistence": true, "tokens": { "master": "8dc1eb67-1f5f-4e10-ad9d-5e58b047647c", "agent": "883efc94-0c59-c46f-67cf-4644ac4adad2" } } }
compute-7-2
{ "datacenter": "sibat_consul", "primary_datacenter":"sibat_consul", "data_dir": "/data/consul", "connect":{ "enabled": true }, "server": true, "client_addr": "0.0.0.0", "ui": true, "node_name": "compute-7-2", "bind_addr": "172.16.100.72", "advertise_addr": "172.16.100.72", "enable_script_checks": false, "enable_local_script_checks": true, "log_file": "/var/log", "log_rotate_bytes": 300000000, "log_rotate_duration": "360h", "log_level": "info", "acl_datacenter": "sibat_consul", "encrypt": "gEjZMbDxnA5UDS5DJRI3Nn5KvOwdVa46jneHK0gFDa8=", "acl": { "enabled": true, "default_policy": "deny", "enable_token_persistence": true, "tokens": { "master": "8dc1eb67-1f5f-4e10-ad9d-5e58b047647c", "agent": "883efc94-0c59-c46f-67cf-4644ac4adad2" } } }
compute-7-3
{ "datacenter": "sibat_consul", "primary_datacenter":"sibat_consul", "data_dir": "/data/consul", "connect":{ "enabled": true }, "server": true, "client_addr": "0.0.0.0", "ui": true, "node_name": "compute-7-3", "bind_addr": "172.16.100.73", "advertise_addr": "172.16.100.73", "enable_script_checks": false, "enable_local_script_checks": true, "log_file": "/var/log", "log_rotate_bytes": 300000000, "log_rotate_duration": "360h", "log_level": "info", "acl_datacenter": "sibat_consul", "encrypt": "gEjZMbDxnA5UDS5DJRI3Nn5KvOwdVa46jneHK0gFDa8=", "acl": { "enabled": true, "default_policy": "deny", "enable_token_persistence": true, "tokens": { "master": "8dc1eb67-1f5f-4e10-ad9d-5e58b047647c", "agent": "883efc94-0c59-c46f-67cf-4644ac4adad2" } } }
3.1.7 在compute-7-2和compute-7-3执行
$ sudo systemctl restart consul && sudo systemctl enable consul
3.1.8 在compute-7-3执行
$ sudo systemctl restart consul && sudo systemctl enable consul
待集群稳定后即可访问UI, http://172.16.100.71 :8500
4. 集成Prometheus
$ sudo vim /etc/prometheus/prometheus.yml ... - job_name: 'OpenStack-vms' consul_sd_configs: - server: "172.16.100.71:8500" token: '8dc1eb67-1f5f-4e10-ad9d-5e58b047647c' services: [] - server: "172.16.100.72:8500" token: '8dc1eb67-1f5f-4e10-ad9d-5e58b047647c' services: [] - server: "172.16.100.73:8500" token: '8dc1eb67-1f5f-4e10-ad9d-5e58b047647c' services: [] relabel_configs: - source_labels: [__meta_consul_tags] regex: ".*OpenStack-vms.*" replacement: OpenStack-vms action: keep target_label: env - regex: __meta_consul_service_metadata_(.+) action: labelmap ...
$ sudo systemctl restart prometheus
启动后,在prometheus UI就可以找到刚才配置的job_name了:
5. VMS自动注册
问题:关于自动注册,原生的组件中都没有较美好的方案。我刚开始使用curl的方式通过shell写入rc.local的方式自动注册,但是发现有时还是会出现没有注册的情况。同时发现consul并不是强一致性的注册中心,有时会出现相同的serviceid同时被注册到不同的节点的情况:
所以使用go语言开发了一个 小程序 自动注册node_exporter,并使用systemd设置开机自启动来达到自动注册的效果,并通过一套算法来避免重复注册以及实现均衡注册。
$ wget https://github.com/FrankenFuncc/consul-registy-service/releases/download/202006161758/consulR.zip $ unzip consulR.zip $ wget https://github.com/prometheus/node_exporter/releases/download/v1.0.0/node_exporter-1.0.0.linux-amd64.tar.gz $ tar -zxvf node_exporter-1.0.0.linux-amd64.tar.gz -C /usr/local/ $ mv /usr/local/node_exporter-1.0.0.linux-amd64.tar.gz /usr/local/node_exporter
Node_Exporter安装与开机自启动
$ vim [Unit] Description=node_exporter: the monitoring system Documentation=http://prometheus.io/docs/ [Service] ExecStart=/usr/local/node_exporter/node_exporter Restart=always StartLimitInterval=0 RestartSec=10 [Install] WantedBy=multi-user.target $ systemctl daemon-reload && systemctl start node_exporter && systemctl enable node_exporter
Consul安装与开机自启动
$ vim /etc/consul/consul.yaml System: ServiceName: consul-registy-service ListenAddress: 0.0.0.0 Port: 9984 #通过此IP与端口来检索出口网卡IP地址 FindAddress: 8.8.8.8:80 Logs: LogFilePath: /data/consul/consul.log LogLevel: info Consul: Address: 172.16.100.71:8500,172.16.100.72:8500,172.16.100.73:8500 Token: 8dc1eb67-1f5f-4e10-ad9d-5e58b047647c CheckTimeout: 5s CheckInterval: 5s CheckDeregisterCriticalServiceAfter: true CheckDeregisterCriticalServiceAfterTime: 5s Service: Tag: node-exporter #Address空则默认通过FindAddress配置来检索出口网卡IP地址 Address: Port: 9100
$ vim /usr/lib/systemd/system/consul.service [Unit] Description=Consul After=network-online.target [Service] User=nobody ExecStart=/usr/local/consul --confpath=/etc/consul/consul.yaml Restart=on-failure RestartSec=1 [Install] WantedBy=multi-user.target $ systemctl daemon-reload && systemctl start consul && systemctl enable consul
创建镜像后,用这个镜像就能被prometheus自动发现了。
欢迎关注我们的微信公众号,每天学习Go知识