-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
256 integrate health check and monitoring for the server e5 #305
base: main
Are you sure you want to change the base?
256 integrate health check and monitoring for the server e5 #305
Conversation
Ngha-Boris
commented
Feb 3, 2025
•
edited
Loading
edited
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Check this @Ngha-Boris
…6-integrate-health-check-and-monitoring-for-the-server-e5
…6-integrate-health-check-and-monitoring-for-the-server-e5
docs/monitoring-and-health-checks.md
Outdated
``` | ||
# | ||
### Metrics Enpoint | ||
- **Route:** ```/metrics``` | ||
- **Description:** Exposes Prometheus-compatible metrics, which are scraped by Prometheus for monitoring. | ||
- **Response:** Raw Prometheus metrics data (e.g., CPU, memory usage). | ||
# | ||
### Prometheus Metrics | ||
|
||
Prometheus collects the following metrics: | ||
- **CPU Usage:** Tracks CPU time spent in different modes (idle, system, user). Alert if usageexceeds 80% for 2 minutes. | ||
- **Memory Usage:** Monitors available memory. Alert if usage exceeds 85% for 2 minutes. | ||
- **Disk Usage:** Monitors disk space. Alert if disk space is below 20%. | ||
- **API Response Time:** Measures API response times. Alert if the 95th percentile response time exceeds 1 second. | ||
- **HTTP Errors:** Tracks failed HTTP requests. Alert if failure rate exceeds 5% over 5 minutes. | ||
# | ||
### Alerting | ||
|
||
Prometheus triggers alerts based on defined conditions: | ||
|
||
**Important Alerts** | ||
- **InstanceDown:** Triggered if the server is unreachable for 30 seconds. | ||
- **HighCPUUsage:** Triggered if CPU usage is over 80% for 2 minutes. | ||
- **HighMemoryUsage:** Triggered if memory usage exceeds 85% for 2 minutes. | ||
- **SlowAPIResponse:** Triggered if API response times are too slow (95th percentile > 1s). | ||
- **DiskSpaceLow:** Triggered if disk space is below 20%. | ||
Alerts are routed to Alertmanager, which handles notification via "any" application(e.g slack discord etc) (configured in ```alertmanager.yml```). | ||
# | ||
### Docker Setup | ||
The monitoring stack is managed using Docker Compose. It includes: | ||
- **Prometheus:** Collects metrics. | ||
- **Grafana:** Visualizes metrics. | ||
- **Alertmanager:** Sends notifications based on alerts. | ||
- **Node Exporter:** Exposes system metrics (CPU, memory, disk usage). | ||
### Build and run | ||
```bash | ||
docker compose up --build | ||
``` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you double check the markdown here? When it is all red, it generally means the syntax is not correct. Maybe the hash signs with remaining lines empty are not necessary...
// src/main.rs | ||
use axum::routing::get; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What's the purpose of adding this comment?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That was an error i just removed it this morning. I will soon push.
mod health; | ||
mod metrics; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd prefer you simply import these modules than creating two hierarchies:
use didcomm_mediator::{app, health, metrics};
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay Sir
docs/monitoring-and-health-checks.md
Outdated
- **Response:** | ||
```json | ||
{ | ||
"Status": "OK", | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I could find that GitHub's problem is the level of indentation at this point, and possibly the extra comma too.
- **Response:** | |
```json | |
{ | |
"Status": "OK", | |
} | |
- **Response:** | |
```json | |
{ | |
"Status": "OK" | |
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I just fixed the issue but i am still working on updating the document.