This repository contains primarily rake tasks used to create the data structures that Mïmis consumes.
The heart of the program is a Neo4j database with the Awards and Series information from the Internet Speculative Fiction Database.
The work of populating the graph and writing to IPFS is done by various rake tasks.
docker run --name argus -p7474:7474 -p7687:7687 -v $HOME/neo4j/data:/data -v $HOME/neo4j/logs:/logs --env NEO4J_AUTH=neo4j/neo4j2 neo4j:latest
docker run
is used the first time only, subsequently usedocker start argus
git clone https://github.com/dhappy/argus
cd argus
rake neo4j:migrate
alias isotime='date +%Y-%m-%d@%H:%M:%S%:z'
function rlog() { rake $1 | tee log/$1.$(isotime).log; }
screen
rlog isfdb:awards
⌘^a c
rlog isfdb:series
⌘^a c
rlog isfdb:covers
⌘^a d
screen -r
# after many hours have passed and see how much data has been integrated into the graph.rake export:awards
# after everything is loaded
Neo4j has an interactive console you can access by visiting http://localhost:7474
.
Assumes that a dump of the Internet Speculative Fiction Database is loaded into MySql in the isfdb
database.
Saves the award year, category and books into the graph. The format of the graph is:
(:Award)-[:IN]->(:Year)-[:FOR]->(:Category)-[:NOMINEE]->(:Book|:Movie)
- There is a
result
property on theNominated
relation that is either:- The number that they placed in the competition.
- A text string like
Not on Ballot: Insufficient Nominations
describing a special situation. NULL
if the order is unspecified.
Saves the series nesting, contents and order into the graph. The format is:
(:Series)-[:CONTAINS*]->(:Series)-[:CONTAINS]->(:Book|:Movie)<-[CREATED]-(:Creators)
Creators
represents all the creators for a work. Names are joined by a & sign because the uniqueness constraint doesn't work with arrays.- There is a
rank
associatedContains
relations:MATCH (s:Series)-[c:CONTAINS]->(b:Book) ORDER BY c.rank RETURN s
Saves the covers isbn and image url into the graph. The format is:
(:Book)-[:PUBLICATION]->(:Version)-[:COVER]->(:Cover)
- This ISBN uniquifies a version.
Finds Content nodes with a url, but no IPFS id, then downloads the url and inserts it into IPFS. This works in conjunction with isfdb:covers
to collect the cover images referenced in the database,
For books without a -[:REPO]->
link, check the directory ../.../trainpacks/
for files matching the pattern *#{author}*#{title}*
or *#{title}*#{author}*
.
The page has an ⏩ Injest ⏭
button for each found file that will copy the given file to ../.../book/by/#{author}/#{title}/
.
Zip and rar files are uncompressed. If there is a single (html
|epub
|rtf
|mobi
|lit
) file it is renamed to index.#{ext}
.
index.htm
is renamed to index.html
which has an acceptably small chance of breaking a multipage document.