A minimalist command-line utility to pipe documents from a file or I/O stream into an Elasticsearch cluster.
Have you ever had thousands of sample documents in a file, and you just want to load them all into an unsecure local Elasticsearch cluster?
espipe docs.ndjson http://localhost:9200/new_indexAnd you're done.
The goal of espipe is to provide the simpliest way to bulk-load a dataset into Elasticsearch. It does not do any document trasnformation or enrichment, and only requires the inputs be valid, deserializable JSON objects in a newline-dilemited json (.ndjson) file or comma-separated value (.csv) file.
It is multi-threaded and capable of fully saturating the CPU of the sending host. This could potentially overwhelm the target cluster, so use with caution on large data sets.
Documents are batched into _bulk requests of 5,000 documents and sent with the create action. It is not opinionated if the target is an alias, regular index or a data stream; just define your index templates and ingest pipelines in advance.
- Make sure you have
cargoinstalled from rust-lang.org - Clone this repository to your local machine
- From the repository directory, run
cargo install --path .
Usage: espipe [OPTIONS] <INPUT> <OUTPUT>
Arguments:
<INPUT> The input URI to read docs from
<OUTPUT> The output URI to send docs to
Options:
-k, --insecure Ignore certificate validation
-a, --apikey <APIKEY> Apikey to authenticate via http header
-u, --username <USERNAME> Username for authentication
-p, --password <PASSWORD> Password for authentication
-q, --quiet Quiet mode, don't print runtime summary
-h, --help Print helpBoth the <INPUT> and <OUTPUT> arguments are URI-formatted strings.
The input URI can be a:
- A stream from
stdin:- - An unqualified file path:
file.ext,~/dir/file.ext - A fully-qualified
file://scheme URI:file:///Users/name/dir/file.ext
The output URI can be:
- A stream to
stdout:- - An unqualified file path:
file.ext,~/dir/file.ext - A fully-qualified
file://scheme URI:file:///Users/name/dir/file.ext - An
http://orhttps://scheme URL to an Elasticsearch cluster, including index name:http://example.com/index_name - A known host saved in the
~/.esdiag/hosts.ymlconfiguration file:localhost:index_name
When piping to an Elasticsearch output, the index name is required.
All authentication options only apply to an http(s) output.
You may create an ~/.esdiag/hosts.yml configuration file to much like an ~/.ssh/config file.
For example, here is a localhost definition with no authentication:
localhost:
auth: None
url: http://localhost:9200/This allows you to use localhost as a shorthand for http://localhost:9200/. Both commands are equivalent:
espipe docs.ndjson http://localhost:9200/new_index
espipe docs.ndjson localhost:new_indexAn Elasticsearch Service (ESS) cluster with API key authentication:
ess-cluster:
auth: Apikey
url: https://ess-cluster.es.us-west-2.aws.found.io/
apikey: "fak34p1k3ydcbcc2c134c3eb3bf967bcf67q=="Enabling you to use the shorthand:
espipe docs.ndjson https://esdiag.es.us-west-2.aws.found.io/new_index --apikey="fak34p1k3ydcbcc2c134c3eb3bf967bcf67q=="
espipe docs.ndjson ess-cluster:new_indexIf you need detailed logs on what espipe is doing, you can set the RUST_LOG environment variable:
export RUST_LOG=debugespipe docs.ndjson https://esdiag.es.us-west-2.aws.found.io/new_index --apikey="fak34p1k3ydcbcc2c134c3eb3bf967bcf67q=="-
Define a shell function that finds all
.ndjsonfiles recursively, callingespipeon each:function espipe-find() { for file in $(find $1 -name "*.ndjson" ); do echo -n "$file > "; espipe "$file" "$2"; done }
-
The
espipe-findfunction with the directory and output target index matching thelogs-*-*datastream template:espipe-find elastic-agent-123abc http://localhost:9200/logs-agent-default
This ingests all documents into a new datastream called logs-agent-default making the logs visible in Kibana's logs explorer.