Skip to content

Python-based tools to parse ssh access logs, do robust GeoIP lookups, and visualize the results using streamlit

License

Notifications You must be signed in to change notification settings

nicholasRenninger/ssh-analysis

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

17 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Simple SSH Analysis

I wanted a way to quickly analyze my SSH access logs, as I saw quite a bit of traffic. Inspired by this (with this underyling GH repo) and this, I made some more tools to use in a notebook and turned them into a streamlit app.

About

Logs are read into a Protobuf object (SSHLogs), with each IP address being sent to a GeoIP DB and returned metadata stored as IPLookupData in each SSHLog (each SSHLog is stored as an element of SSHLogs). This SSHLogs object is transformed into a dataframe for more analysis.

Metadata associated with an IP address can be queried from online geo-coding services or from geolite2 DBs. This project uses an older, self-contained geolite2 package. You could easily modify this example to use a more accurate geoIP DB by modifying this function that maps an IP address to an IPLookupData object.

Since the most attacks came from US, you may want to better understand where in the US attacks are coming from. Getting FIPS codes from lat/lon requires the FCC Census API, which can take a very long time to process thousands of IP-lookup requests. Thus, you may want to cache things - this has already been done for the included datasets (data/*df_us.csv).

Install

I use poetry for dep management and install. You can install it easily by following these intstructions.

  • If you plan to use VS Code for running jupyter notebooks you will need set the venv location to the project directory to make it easy for VS Code to find the venv.
  • If you plan to use Jupyter for running jupyter notebooks, follow this guide if you want to be able to run the pyDeck visualizations.

Once you have poetry installed and on your path, install via:

git clone  ~/git/ssh_analysis
cd ~/git/ssh_analysis
# ONLY IF USING VS Code - changing where venvs are stored for VSCode
poetry config virtualenvs.in-project true
poetry install

If you wish to re-compile the python protobuf definitions, you will also need to install the protobuf compiler

Prepping data

Before you can run the analysis, you need to prep your SSH log data. You can do this via:

ll /var/log/auth.*
sudo gzip -d /var/log/auth.log.*.gz
sudo cat /var/log/auth.* > ~/Desktop/my_auth.log

You should now have all of your /var/log/auth.log concatenated into one file, ~/Desktop/my_auth.log. You should now move this file to the root of the ssh_analysis directory. Assuming you cloned ssh_analysis to ~/git:

mv ~/Desktop/my_auth.log ~/git/ssh_analysis/data/my_auth.log

Running the streamlit app

The package powers a streamlit app which can be easily run via:

streamlit run streamlit_app/SSH_Analysis_Home.py

Running analysis

To run the example, all you need to do is open the notebook ./ssh_analysis/auth_log_analysis.ipynb and run it with the python kernel you installed earlier. This can be done easily with Jupyter Lab or VS Code.

cd ~/git/ssh_analysis
poetry shell
jupyter lab

About

Python-based tools to parse ssh access logs, do robust GeoIP lookups, and visualize the results using streamlit

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published