Skip to content

Commit c1bb64f

Browse files
committed
reduce data sampling rate from 10% to 3%
1 parent d8b71a1 commit c1bb64f

File tree

2 files changed

+3
-3
lines changed

2 files changed

+3
-3
lines changed

README.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -20,6 +20,6 @@ $ cd data
2020
$ snakemake -s download.smk -j1
2121
```
2222

23-
*NOTE: Only 10% of the entire OAS sequences were downloaded for now due to space and computational cost.*
23+
*NOTE: Only 3% of the entire OAS sequences were downloaded for now due to space and computational cost.*
2424

2525
*TODO: Hint for total # of sequences, total size*

data/download.smk

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -2,8 +2,8 @@ import pandas as pd
22

33
manifest = pd.read_csv('manifest_230324.csv')
44

5-
# Randomly sample 10% of the dataset.
6-
manifest = manifest.sample(frac=0.1, random_state=42)
5+
# Randomly sample 3% of the dataset.
6+
manifest = manifest.sample(frac=0.03, random_state=42)
77

88
f2type = {r.filename:r.seq_type for r in manifest.to_records()}
99
f2study = {r.filename:r.study for r in manifest.to_records()}

0 commit comments

Comments
 (0)