Skip to content

A fully automated network monitoring system that detects network failures (packet loss, high latency, link failures), alerts you via Slack & Email, and performs auto-remediation using Python, Netmiko, and Shell scripts. It also provides a REST API and Prometheus-Grafana integration for real-time monitoring and dashboards.

Notifications You must be signed in to change notification settings

shibam120302/Automated_Network_Monitoring_System

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

30 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

πŸš€ Network Monitoring System

A fully automated network monitoring system that detects network failures (packet loss, high latency, link failures), alerts you via Slack & Email, and performs auto-remediation using Python, Netmiko, and Shell scripts. It also provides a REST API and Prometheus-Grafana integration for real-time monitoring and dashboards.

πŸ“Œ Features

  • πŸ“‘ Pings and monitors multiple network nodes automatically
  • πŸ” Detects:
    • Packet loss
    • Latency issues
    • Link down or host unreachable
  • 🚨 Alerts via:
    • Slack API
    • Email (SMTP)
  • 🧠 Auto-remediation:
    • Restarting network services
    • Sending commands via Netmiko
  • πŸ“Š Real-time metrics:
    • Prometheus metrics exporter
    • Grafana dashboards
  • 🌐 REST API for live status of devices

🧾 Project File Structure

Auto-Network-Monitoring/
β”œβ”€β”€ monitor.py                # Main monitoring script
β”œβ”€β”€ api.py                    # Flask API to check device status
β”œβ”€β”€ slack_alerts.py           # Slack webhook alert handler
β”œβ”€β”€ email_alerts.py           # Email alerting script
β”œβ”€β”€ remediation.sh            # Shell script to restart network services
β”œβ”€β”€ requirements.txt          # Python dependencies
β”œβ”€β”€ prometheus.yml            # Prometheus config
β”œβ”€β”€ .gitignore                # Git ignored files
β”œβ”€β”€ README.md                 # Project documentation
└── exporters/
    └── metrics_exporter.py   # Prometheus metrics exporter

πŸ–₯️ System Workflow

Flowchart

πŸ” How Each File Works

monitor.py

This script is the core monitoring engine. It performs the following actions:

  • Loops over a predefined list of network devices every 60 seconds.
  • Pings each device to check its reachability.
  • If a device is detected as down:
    • Sends out alerts via configured channels (Slack, Email).
    • Logs the failure event for record-keeping.
    • Attempts to perform automated remediation actions using Netmiko (for network devices) or a fallback shell script.

api.py

This file sets up a Flask web server to provide an API endpoint for checking the status of any monitored device. Example Request: To check the status of the device with IP 192.168.1.1, you would send a GET request to /status/192.168.1.1.

GET /status/192.168.1.1
Example Response:

JSON

{
  "ip": "192.168.1.1",
  "status": "up"
}

slack_alerts.py

This script is responsible for sending alert notifications to a specified Slack channel. It uses a Slack webhook URL to post messages.

email_alerts.py

This script handles sending alert notifications via email. It uses a Gmail SMTP server. You will need to customize this file with your Gmail credentials (email address and an app password for security).

remediation.sh

This is a shell script designed as a fallback remediation method. It attempts to restart network services like NetworkManager using systemctl. This script is executed if Netmiko-based remediation is not applicable or fails.

exporters/metrics_exporter.py

This script exposes custom metrics in a format that Prometheus can scrape.

It runs a small HTTP server on port 8000. It tracks the reachability status of each device: 0 indicates the device is OK (up), and 1 indicates the device is down.

βš™οΈ Setup & Deployment

🧱 Install dependencies: Open your terminal and run the following command to install all necessary Python libraries listed in the requirements.txt file. Bash

pip install -r requirements.txt

πŸ›  Configure your devices in monitor.py: Edit the monitor.py file to define the list of devices you want to monitor. Each device is a dictionary with its host (IP address or hostname), username, password, and device_type(e.g., cisco_ios, juniper_junos).

devices = [
    {"host": "192.168.1.1", "username": "admin", "password": "admin", "device_type": "cisco_ios"},
    {"host": "10.0.0.5", "username": "user", "password": "password123", "device_type": "linux"},
    # Add more devices here
]

πŸ” Set up your credentials: email_alerts.py: Open this file and enter your Gmail email address and an app password. (Using app passwords is more secure than using your main Google account password).

slack_alerts.py: Open this file and replace the placeholder Slack webhook URL with your actual webhook URL.

πŸ”„ Run Monitoring Script: Start the main monitoring process by executing:

python monitor.py

🌐 Run API server (Optional): If you want to use the API to check device statuses, run the Flask server in a separate terminal:

python api.py

πŸ“ˆ Prometheus + Grafana Integration

This setup allows you to visualize the network monitoring data.

Prometheus Setup: Install Prometheus: Download and install Prometheus from the official website if you haven't already. Configure Prometheus: Replace the contents of your prometheus.yml configuration file with the following. This tells Prometheus to scrape metrics from the metrics_exporter.py script.

scrape_configs:
  - job_name: 'network_monitor'
    static_configs:
      - targets: ['localhost:8000'] # Assumes metrics_exporter.py is running on the same machine

Start Prometheus: Navigate to your Prometheus directory in the terminal and start it with the new configuration:

./prometheus --config.file=prometheus.yml

πŸ“Š Grafana Setup

Connect Grafana to Prometheus

  1. Install Grafana if you haven't already.
  2. Open Grafana in your web browser: http://localhost:3000
  3. Add Prometheus as a data source, pointing it to your Prometheus server: http://localhost:9090

Create a New Dashboard

  1. In Grafana, create a new dashboard.

  2. Add a panel and use a PromQL query to display the status.
    Example to track packet loss (where 1 means down) for a specific device:

    network_packet_loss{device_ip="192.168.1.1"}
    

πŸ” Auto-Remediation Logic

The system attempts to automatically fix issues when a device goes down.

Flowchart

Netmiko (Primary Method β€” monitor.py)

For devices that support SSH access and are compatible with Netmiko (e.g., Cisco IOS, Juniper Junos), monitor.py can send commands like reload or other custom-defined commands to attempt recovery.

Shell Script (Fallback Method)

If Netmiko is not suitable or fails, the remediation.sh script is executed:

bash remediation.sh

πŸ§ͺ Sample Alert Output

Slack Message

You will receive a message in your configured Slack channel similar to:

ALERT: 192.168.1.2 is down or has high latency.

Email Subject

Emails will have a subject line like:

Network Alert

The email body will contain more details about the downed device.

🌐 REST API

The api.py script provides a simple REST API to query device status.

Route Method Description
/status/<ip> GET Returns the current status (up or down) of the device.

🧠 Future Improvements

Potential enhancements for this project include:

  • SNMP Trap Integration: Receive and process SNMP traps from network devices for more proactive monitoring.
  • Web Dashboard: Develop a dedicated web dashboard (beyond Grafana) for a live view of device statuses and logs.
  • Store Logs to Database: Implement logging to a persistent database (e.g., PostgreSQL, MySQL, InfluxDB) for better querying and historical analysis.
  • Kubernetes & Docker Compose Deployment: Create configurations for easier deployment and scaling using containerization technologies.

πŸ™Œ Contributing

Contributions are welcome!

  • For minor changes, feel free to submit a pull request.
  • For major changes or new features, please open an issue first to discuss the proposed changes.

πŸ“„ License

This project is licensed under the MIT License. See the LICENSE file for more details.

πŸ‘¨β€πŸ’» Author

Shibam Nath
GitHub: shibam120302

About

A fully automated network monitoring system that detects network failures (packet loss, high latency, link failures), alerts you via Slack & Email, and performs auto-remediation using Python, Netmiko, and Shell scripts. It also provides a REST API and Prometheus-Grafana integration for real-time monitoring and dashboards.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published