Skip to content

Proof of concept for importing TEI xml files into Neo4j.

Notifications You must be signed in to change notification settings

sarmbruster/neo4j-tei-importer

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

45 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

neo4j-tei-importer

Travis CI Status

neo4j-tei-importer imports data from XML files using the TEI schema to a Neo4j database.

Installation

This project uses gradle as build system. Since the gradle wrapper in included, there is no need to download anything beforehand.

To build the project just type:

./gradlew shadowJar

Copy (or symlink) the resulting file ./build/libs/neo4j-tei-importer-0.1-SNAPSHOT-all.jar to Neo4j’s plugins folder.

If you want to use the "old" REST endpoints in contrast to Cypher stored procedure, change configuration in $NEO4J/conf/neo4j.conf:

dbms.unmanaged_extension_classes=org.neo4j.extension.adwmainz.tei=/tei

After a restart using $NEO4J_HOME/bin/neo4j restart the extension is ready to use.

Usage

TEI

For an example of a imported XML TEI document refer to the following screenshot

sample document

import a TEI XML document via Cypher stored procedure

The preferred option for importing is calling a Cypher stored procedure. A URL to the XML document to be imported needs to be provided as parameter - could be either file, http, https or ftp:. The procedure returns the node representing the document in the graph.

Example:

CALL tei.import('file:///var/lib/neo4j/import/humboldt_soemmering01_1791.TEI-P5 (1).xml') yield node return node

upload local XML document via HTTP PUT

Using REST a XML TEI file can be imported by simple sending an HTTP PUT to REST endpoint http://host:7474/tei/import. E.g. via cURL:

curl [-u neo4j:<mypasswd>] -T src/test/resources/1667-09-23_Langius_a_Lubienietzki-mit-Regest-ohne-Kommentar.xml http://localhost:7474/tei/import

submit XML document via URL

Additionally it’s possible to omit the payload in the PUT request and specify the URL location of the document via url request parameter. Be aware to supply the url in urlencoded variant:

curl [-u neo4j:<mypasswd>] -X PUT http://localhost:7474/tei/import?url=http%3A%2F%2Fmyhost%2Fabc.xml

useful cypher queries on TEI data

get back the original text contained in the document

match (s:Source)-[:HAS_BODY]→(b), p=(b)-[:NEXT*]→(e) where not (e)-[:NEXT*]→() return apoc.text.join([x in nodes(p) where x:Word | x.text], " ") as text

windeck data

There’s a stored procedure to import windeck files:

call adwmainz.windeck("file:///var/lib/neo4j/import/windeck_graf.utf8", 12344)

Second parameter is the word offset number.

About

Proof of concept for importing TEI xml files into Neo4j.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published