Given an input Excel file with DOI in column B and Title in column D, this script queries the OpenAlex API for each row and appends Article Processing Charge (APC) fields and Open Access status to a new workbook.
This was created to take an output file from InCites Benchmarking & Analytics, and add additional useful information in addition to all the current metrics from that database.
It's important to know how OpenAlex sources the APC paid and APC list price. See their technical documentation for definitions and sources:
https://docs.openalex.org/api-entities/works/work-object#apc_list
-
Looks up each row by DOI first, then title as a fallback.
-
Appends these columns to your sheet:
APC (preferred)— the paid APC if present, otherwise the listed APC (includes currency, ≈USD if provided, and provenance)APC (preferred) USD— numeric USD value for analysis/summing (blanked for duplicates and for non-allowed OA statuses)APC USD discounted— (optional; appears only if a discount parameter is provided) numeric USD value after applying the discount percentageDiscount (X%)— (optional) the dollar amount discounted for each row, whereXis the percentage you specifiedAPC (paid)— formatted string for the paid APCAPC (list price)— formatted string for the listed APCOpenAlex ID— canonical OpenAlex work idOpenAlex oa_status— Open Access status from OpenAlexLookup method—doiortitleNotes— e.g.,Duplicate,OA Status not Diamond/Gold/Hybrid
-
Duplicates by OpenAlex ID
The first occurrence counts; subsequent occurrences keep text APC fields andoa_statusbut haveAPC (preferred) USDblanked andNotes = "Duplicate". -
OA status gating for totals
Only rows withOpenAlex oa_statusin {diamond, gold, hybrid} are included in the numeric USD total.
If an APC is present butoa_statusis not one of those, the script keeps the text APC fields, blanksAPC (preferred) USD, and addsNotes = "OA Status not Diamond/Gold/Hybrid". -
Optional Discount parameter
If you supply a discount percentage (e.g.,--discount 10), the script will:- Add the
APC USD discountedandDiscount (10%)columns. - Populate them based on the discount value.
- Include totals for both APC USD discounted and Discount (10%) at the bottom.
- Leave these columns out entirely if no discount is provided.
- Add the
-
Progress output every 50 rows.
-
Suppresses the openpyxl "no default style" warning.
-
Lightweight retry/backoff for transient HTTP errors (429/5xx).
-
Optional polite API contact via
--mailto [email protected]. -
Optional request throttling via
--rate-limit(requests per second). -
Sets Column W and Column X widths to ~150px (≈ 21.5), and applies the same width to discount-related columns when present.
-
Freezes the top row of the output workbook.
- Preferred APC = the paid APC (
apc_paid) if available in OpenAlex, otherwise the listed APC (apc_list).
pip install -r requirements.txtMinimal (uses defaults from the PARAMETERS dict at the top of the script, or pass via CLI):
python openalex_apc_from_excel.pyTypical:
python openalex_apc_from_excel.py "Web of Science Documents (16).xlsx" --mailto [email protected]Specify sheet and output path:
python openalex_apc_from_excel.py input.xlsx --sheet savedrecs -o output_with_apc.xlsxRespectful rate limiting (2 req/s):
python openalex_apc_from_excel.py input.xlsx --mailto [email protected] --rate-limit 2Include a discount calculation (10%):
python openalex_apc_from_excel_discount.py input.xlsx --mailto [email protected] --discount 10input_xlsx(positional, optional): Path to input workbook (defaults fromPARAMETERSif omitted)-o, --output-xlsx: Output workbook path (defaults to<input>_with_apc.xlsx)--sheet: Worksheet name (defaults to first sheet)--header-row: Zero-based header row index (default0)--doi-col: Zero-based DOI column index (default1→ column B)--title-col: Zero-based Title column index (default3→ column D)--mailto: Email for OpenAlex (addsmailto=query param)--rate-limit: Requests per second (float). Defaults to ~0.2s between requests when omitted.--discount: (optional) Percentage discount to apply to APC values (e.g.,--discount 10)
- The script appends the new columns to the right of your existing sheet.
- It inserts
OpenAlex oa_statusat Excel column AB (0-based index 27). If your sheet has fewer than 28 columns, it will be appended at the end. - The workbook ends with three summary rows:
- Total row showing:
- Total
APC (preferred) USD - Total
APC USD discounted(if discount used) - Total
Discount (X%)(if discount used)
- Total
- Unable to pull APC information for X papers (excludes rows marked
DuplicateandOA Status not Diamond/Gold/Hybrid) - X entries were duplicates
- Total row showing:
- Column W, Column X, and any discount-related columns are set to ~150px (≈ 21.5 width units).
- The top row is frozen for easier navigation in Excel.
- If both DOI and Title are missing in a row, APC fields are blank.
- Currency conversions rely on OpenAlex fields. If neither
value_usdnor a native USD value is present in the API result, the USD column is left blank. - Use
--mailtoso OpenAlex can contact you if needed. - You can add your own DOIs to the spreadsheet before calling OpenAlex for any additional papers not on the spreadsheet. The script will remove duplicates.
- OpenPyXL UserWarning about default style: already suppressed by the script.
- HTTP 429 / 5xx: the script retries with backoff.
- Multiple sheets: use
--sheetor set a default; otherwise the first sheet is used. - Header row: default is
0. Use--header-rowto adjust if needed. - Duplicates included in “Unable to pull” count: make sure you’re using the updated script (duplicates and OA-status-excluded rows are now excluded from that count).
| Note value | Meaning |
|---|---|
| Duplicate | This record’s OpenAlex ID was already retrieved earlier in the file. The first instance contributes to totals; this one does not. |
| OA Status not Diamond/Gold/Hybrid | The record has an APC value but its oa_status is not diamond, gold, or hybrid — therefore excluded from the USD total. |
| (blank) | Either an APC was successfully included in the total or no APC data was available for this record. |
MIT — see LICENSE.