Skip to content

Develop machine learning model capable of detecting specific applications such as Facebook, YouTube, Instagram, etc, from IP flow statistics

Notifications You must be signed in to change notification settings

malhomaid/ip-flow-analysis

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

19 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

IP Flow Analysis

Project Overview

The Project aim is to analyze IP network traffic flows to predict application layer protocol (specific application) such as Facebook, YouTube, and Instagram.

The dataset can be found here, the dataset contains 87 features. Each instance holds the information of an IP flow generated by a network device i.e., source and destination IP addresses, ports, interarrival times, layer 7 protocol (application) used on the flow that we want to predict class.

For more details go to the project blog post

Motivation

Considering that most of the network traffic classification datasets are aimed only at identifying the type of application an IP flow holds (WWW, DNS, FTP, P2P, Telnet,etc), this dataset goes a step further by generating machine learning models capable of detecting specific applications such as Facebook, YouTube, Instagram, etc, from IP flow statistics (currently 75 applications).

Libraries used

  • keras 2.2.4+
  • sklearn 0.21.2+
  • numpy 1.16.4+
  • seaborn 0.9.0+
  • pandas 0.25.0+
  • matplotlib 3.1.0+

Files

  • /docs folder contain project blog doc and images
  • ip-flow-analysis.ipynb is the notebook where the analysis happen
  • model.h5 is the deep learning model can be generated from the notebook
  • Dataset-Unicauca-Version2-87Atts.csv is the dataset should be downloaded from here

Analysis Summary

The conclusion of our analysis is that we can identify the type of IP flow application with 66% accuracy, for more details go to the project blog post

Future Improvement

We can improve the model by

  • using more features that we have dropped
  • extract new features like (Is the flow for ingoing traffic or outgoing? Is the port is privileged or not?)
  • aggregate flows by connection

Acknowledgements

I would like to thank Juan Sebastián Rojas and Universidad Del Cauca for providing this dataset

About

Develop machine learning model capable of detecting specific applications such as Facebook, YouTube, Instagram, etc, from IP flow statistics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published