Skip to content

UNNA/conservancy

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

10 Commits
 
 
 
 
 
 

Repository files navigation

Conservancy

OVERVIEW

A utility for preserving websites on mirrors.UNNA.org. It is primarily a wrapper around wget, but performing additional verification tasks.

PREREQUISITES

USAGE

Running conserve <url> will slowly and recursively mirror the page, plus sibling & child pages (but not parent pages), using wget.

The site will be archived into a directory named for the URL's hostname. A wget log file will also be generated.

Upon completion, it'll output the following to STDOUT:

  • Any missing files (linking to Internet Archive's Wayback Machine if the files exist there, plus listing any similarly named files that were downloaded)
  • Any files still containing links to the URL

One can then manually try to find & replace missing files, clean up links, etc.

REFERENCE