AutoSketch is a sketch-oriented compiler for query-driven network telemetry. It can automatically compile high-level data-stream operators into sketch instances that can be readily deployed with low resource usage and incur limited accuracy drop. This work has been accepted by NSDI'24.
The major contributions of AutoSketch are as follows:
- Combine the strengths of both sketch-based telemetry algorithms and query-driven network telemetry
- Extend the capacity of conventional telemetry languages to perceive and control accuracy intent
- Reduce the burden on users to select, configure, and implement Sketch algorithms
- A framework capable of integrating many novel Sketch optimization techniques (e.g., SketchLib [NSDI’22], FlyMon [SIGCOMM’22], Sketchovsky [NSDI’23], BitSense [SIGCOMM’23], OmniWindow [SIGCOMM’23])
We require the following dependencies to run AutoSketch programs.
-
Software Dependencies
pip3 install ply pip3 install jinja2 sudo apt install libboost-all-dev -y sudo apt install libjsoncpp-dev -y sudo apt install libpcap-dev -y # spdlog git clone https://github.com/gabime/spdlog.git cd spdlog && mkdir build && cd build cmake .. && make -j && sudo make install
-
Switch SDE: Tofino SDE 9.13.1 is needed to compile the P4 code generated by AutoSketch. (Hint: Older versions of SDE should work correctly. However, we have not fully verified this in other environments.)
-
Trace data: We provide an archive including the pre-processed CAIDA trace file for running experiments. Due to the large size of the data, please download and extract it to the
${AutoSketch_dir}/data/
directory. PKU Drive
AutoSketch requires traffic data for benchmark-based searching to identify the sketch configurations with the minimal resource overhead that can meet the user's accuracy intent. The trace files we provide have already been preprocessed. If users need to use other trace files, they should preprocess them according to the following steps.
$ cd trace; mkdir build; cd build
$ cmake ..; make
$ ./preprocess ${AutoSketch_dir}/data/ equinix-nyc.dirB.20180419-130000.UTC.anon.pcap search_trace.bin
$ ./preprocess ${AutoSketch_dir}/data/ equinix-nyc.dirB.20180419-130100.UTC.anon.pcap verify_trace.bin
-
One Command to generate the backend P4 program
$ python compiler.py -i examples/newconn.py -p4o output/newconn/newconn.p4 -p4s [-p4v]
The
-i
parameter specifies the source file for the input query code.The
-p4o
parameter specifies the path and filename for the generated P4 code.The
-p4s
parameter indicates that during the compilation process, benchmark-based searching will be automatically executed to search for configuration parameters.The
-p4v
parameter indicates that the obtained configuration parameters are verified during the search. -
AutoSketch also supports step-by-step compilation to facilitate debugging.
-
Generate the benchmark-based searching program
$ python compiler.py -i examples/newconn.py -s output/newconn
The
-i
parameter specifies the source file for the input query code.The
-s
parameter specifies the directory for generating the profiling program and related configuration files. -
Run the benchmark-based searching
$ cd output/newconn $ ls autosketch-newconn.cpp conf.json Makefile $ make $ ./autosketch-newconn ./conf.json --search ./app-conf.json
The
--search
parameter writes the searched configuration to the specified file. -
Verify the searched configuration
$ ./autosketch-newconn ./conf.json --verify ./app-conf.json
The
--verify
parameter verifies the configuration in the specified file. -
Generate the P4 program based on the searched configuration
$ python compiler.py -i examples/newconn.py -p4o output/newconn/newconn.p4 -p4c output/newconn/app-conf.json
The
-p4c
parameter indicates that an existing configuration file.
-
The input program consists of several modules, each of which can be a User-Defined Function (UDF) or a definition of a task. Here is an example.
def remap_key(tcp.flag):
if tcp.flag == SYNACK:
nkey = ipv4.src_addr
else:
nkey = ipv4.dst_addr
def sf(tcp.flag, tcp.seq, tcp.ack):# cnt nextseq
if tcp.flag == SYNACK:
nextseq = tcp.seq + 1
cnt += 1
elif nextseq == tcp.ack:
cnt -= 1
syn_flood = PacketStream()
.filter(left_value="ipv4.protocol", op="eq", right_value="IP_PROTOCOLS_TCP")
.filter(left_value="tcp.flags", op="eq", right_value="TCP_FLAG_ACK")
.groupby(func_name="remap_key", index=[], args=["tcp.flags"], registers=[], out=["nkey"])
.groupby(func_name="sf", index=["nkey"], args=["tcp.flags", "tcp.seq", "tcp.ack"], registers=["nextseq", "cnt"], out=["cnt"])
.filter(left_value="cnt", op="gt", right_value="Thrd_SYN_FLOOD")
.distinct(distinct_keys=["nkey"])
Description of the UDF format
The format for User-Defined Functions (UDF) takes inspiration from Python's function definition syntax but with some differences.
def func_name(args) # persist_state
statements
-
args
parameter specifies the arguments passed in, usually fields from the header. -
persist_state
defines variables that need to be globally saved across multiple operations, separated by spaces (think of it as defining a global table). This annotation cannot be omitted unless the function does not utilize a global state. For more details, refer to the description of thegroupby
operation in the operators section.
Description of the telemetry application format
name = PacketStream()
.operators()
The name
is the name of the task, PacketStream
is a fixed identifier, and operators
are the several operators that follow it.
Description of AutoSketch operators
Here are the conventions for each type of operator format:
- filter: The parameters are
(left_value, op, right_value)
, fundamentally acting as a conditional expression. - map: The parameters are
(map_keys, new_import)
, wheremap_keys
selects which key-value pairs from the original set to continue processing, andnew_import
introduces new key-value pairs, formatted as{"key": "value"}
. - reduce: The parameters are
(reduce_keys, result)
, wherereduce_keys
indicates which key(s) to use as a reference for the reduce operation, andresult
stores the outcome of the reduce. - zip: The parameters are
(stream_name, left_key, right_key)
, withstream_name
indicating which stream's operator sequence results to merge, andleft_key
andright_key
indicating which keys to use as the basis for merging from the current stream and the incoming stream, respectively. - distinct: The parameters are
(distinct_keys)
, indicating which keys to use as the basis for deduplication. - groupby: The parameters are
(func_name, index, args, registers, out)
, wherefunc_name
corresponds to the function name in the UDF described above,index
indicates which keys to use as a basis for building the lookup table,args
corresponds to the arguments passed,registers
are the registers the function needs to save, andout
defines the result output to the key-value pair.