Skip to content

A library that scans and reports unique TCP connections and possible Port Scanning activity on a server, linux, macos and windows. Dockerfile with multistage builds, user, perms etc.

License

Notifications You must be signed in to change notification settings

neobsv/tcpmetrics

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

35 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

tcpmetrics

Problem Description

New Connection Detection:

The definition of new connection is a unique(srcIP:srcPort -> dstIP:dstPort), meaning if one of these four entities are varied, then it counts as a new connection.

Port Scan Detection:

A port scan is defined as a variation with the local IPv4 address:local IPv4 port fields in the /proc/net/tcp file, where the local IPv4 address belongs to the set('0.0.0.0', '127.0.0.1', '10.0.2.15') and there are more than 3 variations in the field local IPv4 port.

Solution Design

The file /proc/net/tcp may be overwritten by the kernel while it is being read, so its best to not glob the entire file, and read it line by line instead. Reads to this file are atomic per line. source. Therefore, I opened the file in read only mode and parsed it line by line using bufio.

A file parser package was written to read the file and export the contents as a 2D array of strings, where each element is a field between blank spaces in the row, and each row is represented by a 1D array of strings.

A connection scanner object is used to detect new connections. This object dependson the tokens generated by file parser. The connection scanner function combs through the file and parses out the ip:port pairs in human readable format, and stores them in a set. The set is represented by a hashmap with a string key field and bool value, where the format of the string is "srcIP:srcPort -> dstIP:dstPort".

The connection scanner object contains a function to detect port scans. The port scanner accepts a list of tokens and parses unique(srcIP -> dstIP) (key) fields. The srcIP is tested to belong to set('0.0.0.0', '127.0.0.1', '10.0.2.15') first. Then the source port is checked to be unique using a hashmap with the connection string as key and a set of strings as values which contain the source port numbers. The number of elements in this set are tested to be greater than or equal to three. The output of this function is a hashmap[srcIP -> dstIP] = csv(unique source port numbers)

A queue object that stores an aggregate of the parsed tokens for a certian period of time is used to keep state, and I'm limiting the queue length to 10, giving this a 100s time window and connections are said to be unique in this time window. This queue aggregates tokens and maintains state for both the new connection detection mechanism and the port scans.

Testcase Specifications and Details

File Parser

  • Permutation 1: parse the example file mentioned in the doc
  • Permutation 2: use a negative example, and see if it fails appropriately

ConnectionScanner

  • Permutation 1: input is a 2D string which is a tokenized verison of the file, output is a connection map
  • Permutation 2: input 2D tokenized version of a file with a port scan, and it detects the port scan

Questions

Is it possible for this program to miss a connection?

Yes, if the resolution of the connections / port scans are smaller than 10 seconds, then the program will miss the connections / port scan. It is really important to understand that time resolution / time windows matter. A continuous monitoring solution is not truly continuous, since it is still scheduled by the kernel to run at specific cpu time windows, along with the OS itself.

If you weren't following these requirements, how would you solve the problem of logging every new connection?

There are a few options to solve this problem, the best solution I found for this problem is to use Suricata

a) Suricata

Read about this tool after I found that it is able to log flow information as well, and hence we wouldn't need to resort to another tool to perform deduplication. The rules to write flow keywords to capture only flow information can be found here Flow Keywords. I haven't used this tool, however this seems to be the best solution for the question.

Other possible solutions,

b) Snort

Although I haven't used it, I have observed people using the Network Intrusion Detection Tool Snort to monitor the network and log new connections according to specified altering rules, and the config could include log tcp any any... , and this could be configured to log only headers, and would need to be deduplicated.

./snort -dv -l ./log 0.0.0.0 -c snort.conf

Then this data could be logged to a database.

c) Configuring a router to send Netflow data

I did research on this topic during undergrad, and I was able to configure a Cisco router running on GNS3 to send flow data to a machine using UDP, and parse the packets using Perl automation running on a linux box (tap), and logged on the local fs. A suitable database could be used to do this at scale with many taps running over something like a large network.

Why did you choose x to write the build automation?

Golang has extensive support and tooling around the language itself in addition to being clear and consice. It is the appropriate tool for the job here since it is neither a very low level language like C, nor at a higher level like python. Golang is an excellent fit for this sort of application.

Is there anything else you would test if you had more time?

A db / in memory db may need to be used while doing this on a server which receives a large number of connections, since data over six cycles of this program may get quite large.

The port scan detection function works for a tuple(srcIP, dstIP) as unique keys and checks for multiple dstPort hits, this does not take factor in possible source IP randomization / spoofing.

Testing was not done rigorously, would need to include negative cases, improper values, same dstPort and dstPort (invaild combinations and logical errors).

What is the most important tool, script, or technique you have for solving problems in production? Explain why this tool/script/technique is the most important.

tshark, netstat, and tcpdump are all tools that deal with socket level connections and can be used for debugging.

  • tshark: continuous monitoring of a specified network interface on the host and the ability to author flags and different filtering parameters to capture network traffic correctly

  • netstat: reads /proc/net/* protocol files and displays currently established connections, listening ports, and a lot of other details across all network interfaces on the host

  • tcpdump: continually monitors a specified network interface to print out TCP/IP packet details and can be useful for monitoring TCP/IP packets sent by different applications on a host

About

A library that scans and reports unique TCP connections and possible Port Scanning activity on a server, linux, macos and windows. Dockerfile with multistage builds, user, perms etc.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published