Skip to content

Integrate html archive into backup script #3266

@gabestein

Description

@gabestein

What?

  • Run html scraping when community export script is run (feat: community export endpoint + ui #3260)
  • Adds PDF and JATS downloading to archive script
    • During run, checks to see if PDF and JATS downloads exist for a given pub release/historyKey
    • If not, triggers and waits for regeneration
    • Once PDF and JATS are regenerated, downloads the JATS and PDF into the same directory as the Pub HTML
    • Once downloaded, script rewrites download buttons for PDF and JATS, and html header metadata (citation_pdf_url) to point to relative URL of new downloads
    • Script should handle PDF or JATS generation failure and continue to run
    • Script should report any PDF or JATS generation failures to the user in the UI after the script is run, including a link to the pub and which filetype generation failed
  • Generated archive should also include the JSON output produced by feat: community export endpoint + ui #3260

Why?

Initial estimate

Optional

Additional context/supporting docs

Implementation checklist

Out of Scope

Additional notes/context/gotchas for reviewers

Issue Readiness

Metadata

Metadata

Assignees

Labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions