Skip to content

wentacc/aws-vpc-flow-log-analyzer

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Flow Log Tag Mapper

A Python program that analyzes AWS VPC Flow Logs and maps network traffic to service tags based on destination port and protocol combinations.

Overview

This program processes AWS VPC Flow Logs (version 2) and maps the traffic flows to predefined service tags using a lookup table. It generates statistics on tag occurrences and port/protocol combinations.

Core Logic

The program consists of three main components:

  1. Lookup Table Loading (load_lookup_table)

    • Loads port/protocol to tag mappings from a CSV file
    • Creates a dictionary with (port, protocol) tuples as keys and tags as values
    • Handles case-insensitive protocol matching
    • Supports multiple port/protocol combinations mapping to the same tag
  2. Flow Log Parsing (parse_flow_log)

    • Processes AWS VPC Flow Log entries line by line
    • Extracts destination port (field 7) and protocol (field 8)
    • Maps protocol numbers to names (e.g., 6 → tcp)
    • Matches flows against lookup table to assign tags
    • Maintains counters for tags and port/protocol combinations
  3. Results Writing (write_results)

    • Outputs statistics in CSV format
    • Sorts tag counts by frequency (descending) then name
    • Sorts port/protocol combinations by port then protocol
    • Appends "Untagged" count at the end of tag statistics

Requirements

  • Python 3.6 or higher
  • Input files must be ASCII text files
  • Supports flow log files up to 10MB
  • Handles up to 10,000 lookup table mappings
  • Case-insensitive protocol matching

Command Syntax

python flow_tag_mapper.py <flow_log_file> <lookup_file> <output_file>

Example

python flow_tag_mapper.py flow.log lookup.csv output.csv

Input Files

  1. Flow Log File (flow.log)

    • AWS VPC Flow Logs version 2 format
    • Each line must contain at least 14 fields
    • Fields are space-separated
  2. Lookup Table (lookup.csv)

    • CSV format with header: dstport,protocol,tag
    • Protocol names are case-insensitive
    • Multiple entries can map to the same tag

Output Format

The program generates a CSV file with two sections:

  1. Tag Counts

    • Lists each tag and its occurrence count
    • Sorted by count (descending) then tag name
    • "Untagged" count appears last
  2. Port/Protocol Combination Counts

    • Lists each unique port/protocol combination
    • Includes occurrence count for each combination
    • Sorted by port then protocol

Testing

The program has been tested with:

  • Flow logs up to 10MB (1.4 million lines)
  • Lookup tables with 10,000 mappings
  • Various protocol cases (TCP/tcp/TcP)
  • Empty lines and invalid entries in lookup table
  • Missing fields in flow log entries
  • Invalid port numbers
  • Multiple port/protocol combinations mapping to same tag

Performance

  • Memory efficient: Streams flow log file line by line
  • Fast lookups: O(1) using dictionary for mappings
  • Memory usage stays under 10MB for 10MB input files
  • Processing time < 30 seconds for 1.4 million lines

Error Handling

  • Skips invalid lookup table entries
  • Handles missing fields in flow log entries
  • Reports parsing errors with line numbers
  • Continues processing after encountering errors
  • Validates port numbers and protocol mappings

Limitations

  • Only supports AWS VPC Flow Logs version 2 format
  • Limited to TCP, UDP, and ICMP protocols
  • Requires valid CSV format for lookup table
  • All input files must be ASCII encoded

Future Improvements

  • Support for custom flow log formats
  • Additional protocol support
  • Compressed file handling
  • Parallel processing for large files
  • Real-time monitoring mode

⚠️ Important Note

The sample output file provided in the email does not correctly correspond to the sample flow logs. Specifically:

  1. The sample output shows sv_P4 with a count of 1, but the sample flow logs do not contain any traffic matching this tag:

    • sv_P4 is mapped to port 22/tcp in the lookup table
    • The sample flow logs do not contain any entries with destination port 22
  2. The correct output for the given sample flow logs should be:

    Tag Counts:
    Tag,Count
    email,3        # From ports 110,993,143
    sv_P1,2        # From ports 23,25
    sv_P2,1        # From port 443
    Untagged,8     # All other ports
    

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages