Skip to content

Commit

Permalink
Update README.md
Browse files Browse the repository at this point in the history
  • Loading branch information
andrewvc committed Nov 5, 2013
1 parent 8c5b197 commit 50e3d8f
Showing 1 changed file with 1 addition and 1 deletion.
2 changes: 1 addition & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,7 @@ Imports wikipedia data dump XML into elasticsearch.
## Usage

* Download the pages-articles XML dump, find the link on [this page](http://en.wikipedia.org/wiki/Wikipedia:Database_download#XML_schema). You want pages-articles.xml.bz2. DO NOT UNCOMPRESS THE BZ2 FILE.
* Download the [wikiparse JAR](http://andrewvc-misc.s3.amazonaws.com/wikiparse-0.2.0.jar)
* From the releases page, download the [wikiparse JAR](https://github.com/andrewvc/wikiparse/releases)
* Run the jar on the BZ2 file: `java -jar -Xmx1g wikiparse-0.1.0.jar --es http://localhost:9200 /var/lib/elasticsearch/enwiki-latest-pages-articles.xml.bz2`
* The data will be indexed to an index named `en-wikipedia` (by default).
This can be changed with `--index` parameter.
Expand Down

0 comments on commit 50e3d8f

Please sign in to comment.