Skip to content

Parquet demo project for the Workshop in the Course DIS. Benchmarks Parquet versus ORC, JSON and CSV

Notifications You must be signed in to change notification settings

silvanheller/parquet-demo

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

36 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

parquet-demo

Parquet demo project for the Workshop in the Course DIS

What's this?

This is intended to benchmark read and write performances of different file formats. The goal is to show both cases where Parquet is superior.

Runtime Dependencies

  • JVM

Development Dependencies

  • Install IntelliJ (which comes bundled with sbt)

For Plotting:

  • Install R
  • Install RStudio
  • Install ggplot2 with install.packages("ggplot2")
  • Install RStudioAPI with install.packages("rstudioapi")

Configuration / Running the Application

Set your desired benchmark values @ main/EvaluationRunner

Press 'run'. That's it.

The created data will be stored in the workshop/ folder. Benchmark values will be in the /results folder.

Create your plots by running plots/plot.r and changing the filename to the timestamp of your results/results_*.tsv file

About

Parquet demo project for the Workshop in the Course DIS. Benchmarks Parquet versus ORC, JSON and CSV

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published