This repo holds the command-line application to do the informatics to support a high-throughout 16S sequencing run.
To begin using this tool, please clone it with:
user@machine:~$ git clone ssh://[email protected]:7999/~jdmccauley/htp_16s.git
Please install the following before trying to run this sofware since they are required:
Note that Python 3.11 can be installed (and probably should) and used with pyenv.
Build before running, testing, or developing.
user@machine:~$ cd htp_16s
user@machine:htp_16s$ poetry install
user@machine:htp_16s$ poetry build
user@machine:htp_16s$ pipx install dist/htp_16s-0.1.0-py3-none-any.whl
user@machine:~$ htp_16s <mode> <input dir path> <optional output dir path>
Where mode is either index or pool for the moment. Note that A01 is reserverd by default, but this can be overridden with the option --nostandard.
For index mode, put a .csv with plate_map somewhere in the name within an input directory, with three columns: PLATE_NAME, WELL_LOCATION and SAMPLE_NAME, like the following:
| PLATE_NAME | WELL_LOCATION | SAMPLE_NAME |
|---|---|---|
| my_plate_1 | A03 | my_sample |
| my_plate_1 | A05 | your_sample |
Then run with
user@machine:~$ htp_16s index <input dir path> <optional output dir path>
For pool mode, put the plate_map csv and all End RFU csvs in an input directory. Then run with
user@machine:~$ htp_16s pool <input dir path> <optional output dir path>
When running multiple plates, there's two method for running in index mode and one method for running in pool mode.
Note that the plate_maps in both cases must have UNIQUE plate names per plate.
For index mode, you can either provide one plate_map file with multiple UNIQUE values in the PLATE_NAME column, or you can provide one plate_map file for each plate in a subdirectory in your input directory (where each plate file still has UNIQUE PLATE_NAME values), like so:
my_input_dir/
├── plate_1
│ └── plate_1_plate_map.csv
└── plate_2
└── plate_2_plate_map.csv
and your plate_maps should have their respective plate names in the PLATE_NAME column:
plate_map_1.csv:
| PLATE_NAME | WELL_LOCATION | SAMPLE_NAME |
|---|---|---|
| my_plate_1 | A03 | my_sample |
| my_plate_1 | A05 | your_sample |
plate_map_2.csv:
| PLATE_NAME | WELL_LOCATION | SAMPLE_NAME |
|---|---|---|
| my_plate_2 | A03 | my_sample |
| my_plate_2 | A05 | your_sample |
You'll then get one single Miseq SampleSheet and one single file of primer transfer instructions (for all plates).
For pool mode, you must organize your input directory to have subdirectories per plate, with the respective plate maps and RFU files in each like so:
my_input_dir/
├── plate_1
│ ├── End Point Results 1.csv
│ ├── End Point Results 2.csv
│ ├── End Point Results 3.csv
│ ├── End Point Results 4.csv
│ └── plate_1_plate_map.csv
└── plate_2
├── End Point Results 1.csv
├── End Point Results 2.csv
├── End Point Results 3.csv
├── End Point Results 4.csv
└── plate_2_plate_map.csv
After running pool, you'll get an output directory with subdirs named after your input subdirs, with one set of biomek instructions each like so:
20230725_yv_m6cmq_pool_htp_16s
├── plate_1
│ └── plate_1_pooling_biomek_instructions.csv
└── plate_2
└── plate_2_pooling_biomek_instructions.csv
If you get stuck view the help message with htp_16s --help.
Otherwise, reach out to Josh at [email protected].
Run the tests with:
user@machine:htp_16s$ poetry run pytest
And test the main script with:
user@machine:htp_16s$ poetry run htp_16s <args>