COLOR

COLOR is a framework for improving traditional cardinality estimators in graph databases by applying graph colorings to produce a compact summary of the data graph. The project uses the Julia language for its implementation and provides tools to build graphs and run estimations.

Installation

To set up a local copy of the project, complete the following steps:

Download the repository.

$ git clone

Include the necessary Julia Packages.

$ julia
julia> using Pkg;
julia> Pkg.instantiate();

Download query graphs, data graphs, and true cardinalities from G-Care and In-Memory Subgraph Matching, also available as zipped files.

API

Generally, the steps of a cardinality estimation will be:

Create a QueryGraph
Create a DataGraph
Create a ColorSummary of the DataGraph
Perform a cardinality estimation using the QueryGraph and the ColorSummary
Find the exact cardinality using the DataGraph and QueryGraph to evaluate the estimation.

Graph Creation

We define PropertyGraphs which contain information about edge and vertex labels. PropertyGraphs have two types: QueryGraphs and DataGraphs.

To create a QueryGraph, an initial graph with default wildcard (-1) labels can be constructed using either the number of vertices or an existing DiGraph, then the labels can be updated after construction.

# initialize a QueryGraph using the number of vertices...
q = QueryGraph(3)
# ... or initialize with an existing DiGraph
q = QueryGraph(DiGraph(3))
# now update the labels using functions from PropertyGraph.jl
add_labeled_edge!(q, (1, 2), 2)
update_data_labels!(q, 1, 3)
update_node_labels!(q, 2, [3,4])

DataGraphs are also created the same way with default wildcard (-1) labels. The main difference for a DataGraph is that its vertex data labels are not arbitrary and cannot be changed - instead, data labels are equivalent to vertex_id - 1.

# initialize a DataGraph using the number of vertices...
d = DataGraph(3)
# ... or initialize with an existing DiGraph
d = DataGraph(DiGraph(3))
# now update the labels using functions from PropertyGraph.jl
add_labeled_edge!(d, (1, 2), 2)
update_node_labels!(d, 2, [3,4])

Functions to convert graph files formatted like G-Care or In-Memory Subgraph Matching are also included in src/Datasets.jl:

# convert a .txt file (following G-Care format) into a PropertyGraph
d1 = load_dataset("data.txt")
q1 = load_query("query.txt")
# convert a .graph file (following Subgraph-Matching format) into a PropertyGraph
d2 = load_dataset("data.graph",subgraph_matching_data=true)
q2 = load_query("query.graph", subgraph_matching_data=true)

Summary Building

A lifted ColorSummary describing the overall DataGraph is necessary before performing cardinality estimation for a specified QueryGraph. To do this, simply use the generate_color_summary function from the ColorSummary.jl file:

d = load_dataset("data.txt")
summary = generate_color_summary(d)

Optional parameters for the summary-building can be changed such as the type of coloring to use or the amount of sampling to allow, but more information can be found in the code documentation.

Cardinality Estimation

After obtaining a lifted ColorSummary, a cardinality estimation can be achieved for any QueryGraph by using the get_cardinality_bounds function from the QuasiStableCardinalityEstimator.jl file:

d = load_dataset("data.txt")
q = load_queryset("query.txt")
summary = generate_color_summary(d)
estimate = get_cardinality_bounds(q, summary)

The resulting estimate will be a singular Float64 value.

Optional parameters for the estimation can be changed such as the sampling strategy or how to handle cycles, but more information can be found in the code documentation.

Exact Cardinality

To calculate the accuracy of an estimation, we compare the result to the exact cardinality of the QueryGraph in the DataGraph. To do this, a get_exact_size function is included in the ExactSizeCalculator.jl file:

d = load_dataset("data.txt")
q = load_queryset("query.txt")
exact_cardinality = get_exact_size(q, d)

The resulting cardinality will be a singular Float64 value.

Optional parameters for the calculation can be changed such as the timeout, but more information can be found in the code documentation.

The load_true_cardinality function from src/Datasets.jl also obtains the cardinality results from a given cardinality file (where the only element in the file is a singular cardinality value):

exact_cardinality = load_true_cardinality("cardinality.txt")

Scripts

There are a variety of Julia scripts provided which perform different experiments then save figures presenting the results to the Experiments/Results/Figures folder. These scripts are stored in the Experiments/Scripts folder.

For example, to find how using different maximum cycle sizes in the summary affects the cardinality estimation, the max_cycle_size.jl script can be called from the main directory:

$ julia Experiments/Scripts/max_cycle_size.jl

The bash script run_submitted_experiments.sh in the Experiments folder is also included and will run the experiments described in the submitted paper. The script will run all included experiments then save all the corresponding figures. Any figures that are included in the paper have their file name match their figure number (i.e. the figure presenting the effect of inference sampling on relative error is named fig_10.png). The experiments included in this bash script are:

degree_variance_exps.jl (figure 2)
comparison_exps.jl (figures 3, 4, 5, 6, 7, 8)
coloring_strategies.jl (figure 9)
construction_scaling.jl (figure 10)
proportion_updated.jl (figure 11)
max_cycle_size.jl (figure 12)
query_path_width_build.jl (figure 13)
max_inference_paths.jl (figure 14)

The script can be called from the main directory:

$ Experiments/run_submitted_experiments.sh

Name		Name	Last commit message	Last commit date
Latest commit History 331 Commits
.github/workflows		.github/workflows
Experiments		Experiments
docs		docs
src		src
test		test
.codecov.yml		.codecov.yml
.gitignore		.gitignore
Colorful_Cardinality_Calculation_Tech_Report.pdf		Colorful_Cardinality_Calculation_Tech_Report.pdf
LICENSE		LICENSE
Project.toml		Project.toml
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

COLOR

Installation

API

Graph Creation

Summary Building

Cardinality Estimation

Exact Cardinality

Scripts

About

Releases

Packages

Contributors 3

Languages

License

uwdb/color

Folders and files

Latest commit

History

Repository files navigation

COLOR

Installation

API

Graph Creation

Summary Building

Cardinality Estimation

Exact Cardinality

Scripts

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 3

Languages

Packages