Skip to content

Commit

Permalink
Update README
Browse files Browse the repository at this point in the history
  • Loading branch information
chrispypatt committed Apr 23, 2019
1 parent 4a62156 commit a024ce2
Showing 1 changed file with 18 additions and 21 deletions.
39 changes: 18 additions & 21 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,31 +1,28 @@
# RAPIDS-Groupby
Repository for EE 5351 Applied Parallel Programming final project. This project is to implement RAPIDS Groupby function in CUDA.
Repository for EE 5351 Applied Parallel Programming final project on sorting based Groupby and is being updated for EE 5355 Algorithmic Techniques for Scalable Many-core Computing final project on hashed based Groupby. This project is to implement RAPIDS Groupby function in CUDA.

Team member: Aaron Nightingale, Christopher Patterson, Jersin Nguetio, Menglu Liang, Tonglin Chen
Team members EE 5351: Aaron Nightingale, Christopher Patterson, Jersin Nguetio, Menglu Liang, Tonglin Chen

The program is tested under nvcc 8.0 and GTX 1080 GPU.

To build the program, type 'make' in the root folder of the files.

Command line usage:

./groupby

will use the default setting: 100000 data entries, 2 key_columns, 3 row_columns, and 4 distinct key per column

./groupby <num_rows>

will use the default column settings and use the argument as number of data entries

./groupby <num_rows> <key_cols> <val_cols>

will use num_rows as data entries, key_cols as number of key columns and row_cols as number of row columns
while maintaining 4 distinct key per column

./groupby <num_rows> <key_cols> <val_cols> <distinct_keys_per_col>

will use all the parameters to populate the data

Command line usage for sorting based GroupBy:
```
make
./groupby # Data Entries: 100k, key_columns: 2, row_columns: 3, unique keys per column: 4
./groupby <num_rows> # Data Entries: num_rows, key_columns: 2, row_columns: 3, unique keys per column: 4
./groupby <num_rows> <key_cols> <val_cols> # Data Entries: num_rows, key_columns: key_cols, row_columns: val_cols, unique keys per column: 4
./groupby <num_rows> <key_cols> <val_cols> <distinct_keys_per_col> # Data Entries: num_rows, key_columns: key_cols, row_columns: val_cols, unique keys per column: distinct_keys_per_col
```
Command line usage for hashed based GroupBy:
```
make groupby_hash
./groupby_hash # Data Entries: 100k, key_columns: 2, row_columns: 3, unique keys per column: 4
./groupby_hash <num_rows> # Data Entries: num_rows, key_columns: 2, row_columns: 3, unique keys per column: 4
./groupby_hash <num_rows> <key_cols> <val_cols> # Data Entries: num_rows, key_columns: key_cols, row_columns: val_cols, unique keys per column: 4
./groupby_hash <num_rows> <key_cols> <val_cols> <distinct_keys_per_col> # Data Entries: num_rows, key_columns: key_cols, row_columns: val_cols, unique keys per column: distinct_keys_per_col
```
Notice: If the number of distinct keys in each column is m, n key_columns will generate m^n distinct keys.

The program will populate random data, compute on CPU and GPU then validate the results.
Expand Down

0 comments on commit a024ce2

Please sign in to comment.