AutoSketch

1. Introduction

AutoSketch is a sketch-oriented compiler for query-driven network telemetry. It can automatically compile high-level data-stream operators into sketch instances that can be readily deployed with low resource usage and incur limited accuracy drop. This work has been accepted by NSDI'24.

The major contributions of AutoSketch are as follows:

Combine the strengths of both sketch-based telemetry algorithms and query-driven network telemetry
Extend the capacity of conventional telemetry languages to perceive and control accuracy intent
Reduce the burden on users to select, configure, and implement Sketch algorithms
A framework capable of integrating many novel Sketch optimization techniques (e.g., SketchLib [NSDI’22], FlyMon [SIGCOMM’22], Sketchovsky [NSDI’23], BitSense [SIGCOMM’23], OmniWindow [SIGCOMM’23])

2. Environment requirement

We require the following dependencies to run AutoSketch programs.

Software Dependencies

pip3 install ply
pip3 install jinja2
sudo apt install libboost-all-dev -y
sudo apt install libjsoncpp-dev -y
sudo apt install libpcap-dev -y

# spdlog
git clone https://github.com/gabime/spdlog.git
cd spdlog && mkdir build && cd build
cmake .. && make -j && sudo make install

Switch SDE: Tofino SDE 9.13.1 is needed to compile the P4 code generated by AutoSketch. (Hint: Older versions of SDE should work correctly. However, we have not fully verified this in other environments.)
Trace data: We provide an archive including the pre-processed CAIDA trace file for running experiments. Due to the large size of the data, please download and extract it to the ${AutoSketch_dir}/data/ directory. PKU Drive

3. Trace Preprocess [Optional]

AutoSketch requires traffic data for benchmark-based searching to identify the sketch configurations with the minimal resource overhead that can meet the user's accuracy intent. The trace files we provide have already been preprocessed. If users need to use other trace files, they should preprocess them according to the following steps.

$ cd trace; mkdir build; cd build
$ cmake ..; make
$ ./preprocess ${AutoSketch_dir}/data/ equinix-nyc.dirB.20180419-130000.UTC.anon.pcap search_trace.bin
$ ./preprocess ${AutoSketch_dir}/data/ equinix-nyc.dirB.20180419-130100.UTC.anon.pcap verify_trace.bin

4. Run command

One Command to generate the backend P4 program
```
$ python compiler.py -i examples/newconn.py -p4o output/newconn/newconn.p4 -p4s [-p4v]
```
The -i parameter specifies the source file for the input query code.

The -p4o parameter specifies the path and filename for the generated P4 code.

The -p4s parameter indicates that during the compilation process, benchmark-based searching will be automatically executed to search for configuration parameters.

The -p4v parameter indicates that the obtained configuration parameters are verified during the search.
AutoSketch also supports step-by-step compilation to facilitate debugging.
1. Generate the benchmark-based searching program
```
$ python compiler.py -i examples/newconn.py -s output/newconn
```
  The -i parameter specifies the source file for the input query code.
  
  The -s parameter specifies the directory for generating the profiling program and related configuration files.
2. Run the benchmark-based searching
```
$ cd output/newconn
$ ls
autosketch-newconn.cpp  conf.json  Makefile
$ make
$ ./autosketch-newconn ./conf.json --search ./app-conf.json
```
  The --search parameter writes the searched configuration to the specified file.
3. Verify the searched configuration
```
$ ./autosketch-newconn ./conf.json --verify ./app-conf.json
```
  The --verify parameter verifies the configuration in the specified file.
4. Generate the P4 program based on the searched configuration
```
$ python compiler.py -i examples/newconn.py -p4o output/newconn/newconn.p4 -p4c output/newconn/app-conf.json
```
  The -p4c parameter indicates that an existing configuration file.

5. Input program requirements

The input program consists of several modules, each of which can be a User-Defined Function (UDF) or a definition of a task. Here is an example.

def remap_key(tcp.flag):
    if tcp.flag == SYNACK:
        nkey = ipv4.src_addr
    else:
        nkey = ipv4.dst_addr 
def sf(tcp.flag, tcp.seq, tcp.ack):# cnt nextseq
    if tcp.flag == SYNACK:
        nextseq = tcp.seq + 1
        cnt += 1
    elif nextseq == tcp.ack: 
        cnt -= 1

syn_flood = PacketStream()
            .filter(left_value="ipv4.protocol", op="eq", right_value="IP_PROTOCOLS_TCP")
            .filter(left_value="tcp.flags", op="eq", right_value="TCP_FLAG_ACK")
            .groupby(func_name="remap_key", index=[], args=["tcp.flags"], registers=[], out=["nkey"])
            .groupby(func_name="sf", index=["nkey"], args=["tcp.flags", "tcp.seq", "tcp.ack"], registers=["nextseq", "cnt"], out=["cnt"])
            .filter(left_value="cnt", op="gt", right_value="Thrd_SYN_FLOOD")
            .distinct(distinct_keys=["nkey"])

Description of the UDF format

The format for User-Defined Functions (UDF) takes inspiration from Python's function definition syntax but with some differences.

def func_name(args) # persist_state
    statements

args parameter specifies the arguments passed in, usually fields from the header.
persist_state defines variables that need to be globally saved across multiple operations, separated by spaces (think of it as defining a global table). This annotation cannot be omitted unless the function does not utilize a global state. For more details, refer to the description of the groupby operation in the operators section.

Description of the telemetry application format

name = PacketStream()
        .operators()

The name is the name of the task, PacketStream is a fixed identifier, and operators are the several operators that follow it.

Description of AutoSketch operators

Here are the conventions for each type of operator format:

filter: The parameters are (left_value, op, right_value), fundamentally acting as a conditional expression.
map: The parameters are (map_keys, new_import), where map_keys selects which key-value pairs from the original set to continue processing, and new_import introduces new key-value pairs, formatted as {"key": "value"}.
reduce: The parameters are (reduce_keys, result), where reduce_keys indicates which key(s) to use as a reference for the reduce operation, and result stores the outcome of the reduce.
zip: The parameters are (stream_name, left_key, right_key), with stream_name indicating which stream's operator sequence results to merge, and left_key and right_key indicating which keys to use as the basis for merging from the current stream and the incoming stream, respectively.
distinct: The parameters are (distinct_keys), indicating which keys to use as the basis for deduplication.
groupby: The parameters are (func_name, index, args, registers, out), where func_name corresponds to the function name in the UDF described above, index indicates which keys to use as a basis for building the lookup table, args corresponds to the arguments passed, registers are the registers the function needs to save, and out defines the result output to the key-value pair.

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
apps		apps
backend/p4		backend/p4
examples		examples
front		front
ir		ir
search		search
trace		trace
.gitignore		.gitignore
CMakeLists.txt		CMakeLists.txt
LICENSE.txt		LICENSE.txt
compiler.py		compiler.py
readme.md		readme.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

AutoSketch

1. Introduction

2. Environment requirement

3. Trace Preprocess [Optional]

4. Run command

5. Input program requirements

About

Releases

Packages

Languages

License

N2-Sys/AutoSketch

Folders and files

Latest commit

History

Repository files navigation

AutoSketch

1. Introduction

2. Environment requirement

3. Trace Preprocess [Optional]

4. Run command

5. Input program requirements

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages