Updated May 2025.
- Fixes major error in the
vocab_rxnorm_*
andhigh_confidence
tables. - Adds a CLI for convenience when releasing (
build-zip --version v3.1.0
)
Summary
Because we prune the database for orphans, this update reduces the number of items in the database. However, we believe that this update is significantly more trustworthy than the previous one.
Type | v3.0.0 | v3.1.0 |
---|---|---|
Products | 51,460 | 42,268 |
Adverse effects | 7,134,660 | 5,472,613 |
Ingredients | 10,794 | 1,955 |
Explanation
In v3.0.0, products had wrong ingredients. OnSIDES v3.0.0 and onwards uses products as the main unit. Products have labels with side effects, and products get mapped to RxNorm. Secondarily, using relationships from RxNorm, products are mapped to their respective ingredients. These mappings are also used for the "high confidence" set.
Previously (v3.0.0), I did the product-to-ingredient mapping using the OMOP CONCEPT
and CONCEPT_RELATIONSHIP
tables, without conditioning on the types of edges or intermediate types, just on path length. This was wrong. In the new version (v3.1.0), we use the RxNorm default paths (see https://lhncbc.nlm.nih.gov/RxNav/applications/RxNavViews.html for details and rxnorm_ingredients.py for implementation). This adds complexity but fixes the mapping. As an example, to go from "branded dose form" to ingredient, the correct path is SBDF => SCDF => IN, or "branded dose form" -> "clinical dose form" -> "ingredient".
Check
-- On average, how many ingredients do drug products have?
WITH n_ingredients_per_label AS (
SELECT
label_id,
COUNT(DISTINCT ingredient_id) AS n_ingredients
FROM
product_label
INNER JOIN product_to_rxnorm USING (label_id)
INNER JOIN vocab_rxnorm_ingredient_to_product ON rxnorm_product_id = product_id
GROUP BY
label_id
)
SELECT
AVG(n_ingredients)
FROM
n_ingredients_per_label;
- In v3.0.0, this returns 1343 (implausibly high)
- In v3.1.0, this returns 1.13 (realistic)