- If you can’t measure it, you can’t manage it
- If you can’t measure it, you can’t prove it
- If you can’t measure it, you can’t improve it
A development approach driven by metrics to continuously improve software products and services:
- Build metrics into design, coding and DevOps from the beginning.
- Validate the achievement of goals, optimize, enhance and make decision based on metrics.
- Execute PDCA(Plan, Do, Check, Act) according to metrics.
- Usage: Measures how frequently and in what ways users interact with the system, indicating feature engagement and activity levels.
- Saturation: Reflects how close the system is to its maximum capacity, such as CPU, memory, or network usage.
- Error: Tracks the occurrence of failures or issues within the system, providing insights into stability and reliability.
- Delay: Measures the time it takes for the system to respond to requests, reflecting its performance and responsiveness.
- Gauge: The instantaneous value of something.
- Counter: An incrementing and decrementing value.
- Meter: The average rate of events over a period of time.
- Histogram: The statistical distribution of values in a stream of data.
- Timer: A histogram of durations and a meter of calls.
- Collected Usage
- ELK Usage - ElasticSearch, LogStash/Filebeat, Kibana
- TIGK Usage - Telegraf, InfluxDB, Grafana, Kafka
- Hadoop Usage
- Prometheus Usage: Promethues + Grafana
- 如何模拟弱网条件 - 限流, 丢包, 延迟和抖动
- 分析网络抓包用 python 更高效
- WebRTC 之度量与统计: 到底出了什么问题
- WebRTC 内部度量文件的分析
- C++程序度量驱动调优实例:看狄更斯的双城记,寻找性能瓶颈
- 微服务缓存的使用度量
- Redis 集群的构建和监控 - script
- JVM 参数怎么调
- 使用 Redis 记录微服务的应用程序性能指数 APDEX
- 线程池的监控与优化
- 内存溢出不可怕,手足无措才尴尬
- 微服务日志分析之ELKK
- 系统指标监控 collectd + influxDB + grafana
"The Way of Microservices: Metrics-Driven Development" -- Walter Fan, Jian Fu
Account Service based on Flask in Python
Alertor to check metrics and trigger alert based on ElasticSearch API.
Potato service based on Spring boot
- Consul snapshot:
- Web Page:
- Web API:
- Data Analysis scripts
- Performance testing scripts
Commonly used docker files
The installation and setup guideline of ELKK, TIG, etc.
The devops scripts
Take Ubuntu 16 as example
apt install docker
apt install docker-compose
apt install python3
apt install python3-pip
pip3 install virtualenv
virtualenv -p python3 venv
source venv/bin/activate
pip install fabric3
apt install openjdk-8-jdk
apt install maven
Please make sure the dependencies are ready. And you can start and debug the service one by one with consul and influxdb.
cd potato
docker-compose start consul influxdb
The fabric file (potao/fabfile.py) contains the building and deployment steps, so you can try it firstly, then try every stesp by yourself.
- python3
- fabric3
- jdk8
- maven3
cd potato
fab redeploy
- check running status
docker-compose ps
Name Command State Ports
consul docker-entrypoint.sh agent ... Up 8300/tcp, 8301/tcp, 8301/udp,
8302/tcp, 8302/udp,>8400/tcp,>8500/tcp,>8600/tcp,>8600/udp
influxdb /entrypoint.sh influxd Up>8083/tcp,>8086/tcp
local-mysql docker-entrypoint.sh --ini ... Up>3306/tcp, 33060/tcp
potato-scheduler java -jar /opt/potato-sche ... Up>9002/tcp
potato-server java -jar /opt/potato-app.jar Up>9003/tcp
potato-web java -jar /opt/potato-web.jar Up>9005/tcp
potato-zipkin /busybox/sh run.sh Up 9410/tcp,>9411/tcp
- Open the web portal of potato application
open http://localhost:9005