-
Notifications
You must be signed in to change notification settings - Fork 2
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Add journal metadata table for recording impact factor.
- Loading branch information
Showing
17 changed files
with
22,559 additions
and
496 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,34 +0,0 @@ | ||
## Requirements | ||
|
||
- Python 3.11+ | ||
- json2parquet | ||
|
||
```bash | ||
# macOSx | ||
brew install domoritz/homebrew-tap/json2parquet | ||
|
||
# Linux | ||
cargo install json2parquet | ||
``` | ||
|
||
- Required Python packages: `pip install click duckdb` | ||
|
||
## Prepare additional data for each entity and relation | ||
|
||
### Compound | ||
|
||
Get additional data for each compound from [DrugBank](https://www.drugbank.ca/). You might need to request access to the DrugBank data. If you have access, download the DrugBank XML file and save it to the `data` directory. We assume the file is named `drugbank_5.1_2024-01-03.xml`. | ||
|
||
```bash | ||
# Convert the DrugBank XML file to TSV | ||
python3 data/drugbank.py tojson --input-file data/drugbank/drugbank_5.1_2024-01-03.xml --output-dir data/drugbank --format tsv && zip data/drugbank/drugbank_5.1_2024-01-03.tsv.zip data/drugbank/drugbank_5.1_2024-01-03.tsv | ||
|
||
# Convert the DrugBank XML file to JSON | ||
python3 data/drugbank.py tojson --input-file data/drugbank/drugbank_5.1_2024-01-03.xml --output-dir data/drugbank | ||
|
||
# Convert the JSON file to Parquet | ||
python3 data/drugbank.py tojson --input-file data/drugbank/drugbank_5.1_2024-01-03.xml --output-dir data/drugbank --format linejson | ||
json2parquet data/drugbank/drugbank_5.1_2024-01-03.jsonl data/drugbank/drugbank_5.1_2024-01-03.parquet | ||
``` | ||
|
||
### Gene | ||
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,34 @@ | ||
## Requirements | ||
|
||
- Python 3.11+ | ||
- json2parquet | ||
|
||
```bash | ||
# macOSx | ||
brew install domoritz/homebrew-tap/json2parquet | ||
|
||
# Linux | ||
cargo install json2parquet | ||
``` | ||
|
||
- Required Python packages: `pip install click duckdb` | ||
|
||
## Prepare additional data for each entity and relation | ||
|
||
### Compound | ||
|
||
Get additional data for each compound from [DrugBank](https://www.drugbank.ca/). You might need to request access to the DrugBank data. If you have access, download the DrugBank XML file and save it to the `data` directory. We assume the file is named `drugbank_5.1_2024-01-03.xml`. | ||
|
||
```bash | ||
# Convert the DrugBank XML file to TSV | ||
python3 data/drugbank.py tojson --input-file data/drugbank/drugbank_5.1_2024-01-03.xml --output-dir data/drugbank --format tsv && zip data/drugbank/drugbank_5.1_2024-01-03.tsv.zip data/drugbank/drugbank_5.1_2024-01-03.tsv | ||
|
||
# Convert the DrugBank XML file to JSON | ||
python3 data/drugbank.py tojson --input-file data/drugbank/drugbank_5.1_2024-01-03.xml --output-dir data/drugbank | ||
|
||
# Convert the JSON file to Parquet | ||
python3 data/drugbank.py tojson --input-file data/drugbank/drugbank_5.1_2024-01-03.xml --output-dir data/drugbank --format linejson | ||
json2parquet data/drugbank/drugbank_5.1_2024-01-03.jsonl data/drugbank/drugbank_5.1_2024-01-03.parquet | ||
``` | ||
|
||
### Gene |
File renamed without changes.
Large diffs are not rendered by default.
Oops, something went wrong.
File renamed without changes.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,2 @@ | ||
-- Drop the journal metadata table when rolling back the migration. | ||
DROP TABLE IF EXISTS biomedgps_journal_metadata; |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,18 @@ | ||
-- biomedgps_journal_metadata table is created to store metadata for journals, such as the journal name, the journal type, etc. | ||
|
||
CREATE TABLE IF NOT EXISTS biomedgps_journal_metadata ( | ||
journal_name VARCHAR(255) NOT NULL UNIQUE, -- The name of the journal | ||
abbr_name VARCHAR(255) NOT NULL UNIQUE, -- The abbreviation name of the journal | ||
issn VARCHAR(32) NOT NULL UNIQUE, -- The print ISSN of the journal | ||
eissn VARCHAR(32) NOT NULL UNIQUE, -- The electronic ISSN of the journal | ||
impact_factor DECIMAL(6, 3), -- The impact factor of the journal | ||
impact_factor_5_year DECIMAL(6, 3), -- The 5-year impact factor of the journal | ||
category VARCHAR(32), -- The category of the journal, such as Medicine, Biology, etc. | ||
jcr_quartile VARCHAR(8), -- Journal Citation Reports (JCR) quartile, such as Q1, Q2, etc. | ||
rank INTEGER, -- The rank of the journal in the category | ||
total_num_of_journals INTEGER, -- The total number of journals in the category | ||
CONSTRAINT biomedgps_journal_metadata_journal_name_uniq_key UNIQUE (journal_name), | ||
CONSTRAINT biomedgps_journal_metadata_abbr_name_uniq_key UNIQUE (abbr_name), | ||
CONSTRAINT biomedgps_journal_metadata_issn_uniq_key UNIQUE (issn), | ||
CONSTRAINT biomedgps_journal_metadata_eissn_uniq_key UNIQUE (eissn) | ||
); |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.