A parallel DICOM crawler for extracting specific metadata (such as patient ID, image laterality, etc.) and writing the results to a CSV file
A CLI tool for crawling DICOM and Crystal-Eye files and extracting metadata to a CSV file
Usage: open-sight [OPTIONS] <FOLDER_PATHS>...
Arguments:
<FOLDER_PATHS>...
Options:
-c, --csv-out <CSV_OUT> [default: open_sight_results.csv]
-n, --num-jobs <NUM_JOBS> [default: 1]
-o, --overwrite
-b, --batch-size <BATCH_SIZE> [default: 50]
-h, --help Print help
-V, --version Print version
Copy DICOM files and Crystal-Eye files based on patient IDs
Usage: copy_src [OPTIONS] <PATIENT_ID_FILE> <OUTPUT_DIRECTORY>
Arguments:
<PATIENT_ID_FILE> File containing patient IDs
<OUTPUT_DIRECTORY> Directory to store copied files
Options:
-o, --overwrite Whether to overwrite existing files
-d, --database <DATABASE> Database file to use [default: open_sight.duckdb]
-h, --help Print help
Run in a terminal:
duckdb open_sight.duckdb
Then run these commands in the duckdb
terminal, assuming all.csv
was the file created by open-sight
:
CREATE TABLE open_sight (
patient_id VARCHAR,
patient_name VARCHAR,
laterality VARCHAR,
sex VARCHAR,
dob DATE,
scan_date DATE,
modality VARCHAR,
manufacturer VARCHAR,
series_description VARCHAR,
modified TIMESTAMP,
file_size BIGINT,
file_path VARCHAR PRIMARY KEY
);
CREATE UNIQUE INDEX idx_file_path ON open_sight ("file_path");
INSERT INTO open_sight
SELECT DISTINCT *
FROM read_csv_auto('all.csv') AS csv
WHERE NOT EXISTS (
SELECT 1
FROM open_sight
WHERE open_sight.file_path = csv.file_path
);
If just updating the DB, just run:
INSERT INTO open_sight
SELECT DISTINCT *
FROM read_csv_auto('all.csv') AS csv
WHERE NOT EXISTS (
SELECT 1
FROM open_sight
WHERE open_sight.file_path = csv.file_path
);
-- To get the new totals
select count(*) from open_sight;
-- Some basic table analysis
SELECT * FROM information_schema.tables WHERE table_schema = 'main';
SELECT * FROM duckdb_indexes();
SELECT * FROM duckdb_constraints();
SELECT * FROM duckdb_tables();
Crawling DICOM (or proprietary files if crystal-eye
is present) files and saving results to a CSV file
_input_folder_
: a folder containing DICOM files in no matter folder structure, with subfolders etc._csv_file_
: a CSV file where the results will be saved; if given a previous populated one, data already parsed will be skipped.
open-sight _input_folder_/* -c _csv_file_ 2>&1 | tee output.log
patient_ids.txt
: a simple file containing the patient_ids in rows._output_folder_
: the folder where the files will be copied.
copy_src patient_ids.txt /_output_folder_ -d open_sight.duckdb
Bump the version number by running cargo v [part]
where [part]
is major
, minor
, or patch
, depending on which part of the version number you want to bump.
cargo install cargo-v
# commit
cargo v patch -y #
# push
cargo build --release -j 10
git push origin --tags
- 0.3.4
- Sanitised version, removed all the unnecessary files and infos, safe-guarding privacy
- 0.3.3
- Renamed to
copy_src
andcopy_src_csv
- Renamed to
- 0.3.2
- Updated
copy_dcms
to use updated database format
- Updated
- 0.3.1
- Fixed a bug where DCM need to be checked first, then use
crystal-eye
- Fixed a bug where DCM need to be checked first, then use
- 0.3.0
- Updated
duckdb
tov1.0.0
- Ability to reuse the CSV to skip already processed files
- Updated
- 0.2.1
- Extend support to all extensions handled by
crystal-eye
:e2e
,fda
andsdb
- Extend support to all extensions handled by
- 0.2.0
- Added
E2E
support viacrystal-eye
- Added
- 0.1.6
- Added
copy_dcms
to replacefind_patid
andcopy_dcms_csv
- Added
- 0.1.5
- Changed
file_size
to u64 type and representingbytes
- Changed
- 0.1.4
- Renamed the table headers to lowercase with underscore instead of spaces
- 0.1.3
- Introduced
find_patid
- Refactored code to use
helpers.rs
- Introduced
- 0.1.2
- Reverted
path::absolute
, keep Windows file path way
- Reverted
- 0.1.1
- Able to use glob
- Retry routine for failed DCM during parsing
- Using experimental
path::absolute
to properly render Windows full path strings