osdump

osdump is a high performance tool for extracting documents from OpenSearch indexes and saving them into files.

Features

As high performance as a single worker solution can be
Opensearch queries are based on search_after
Uses fastjson for faster json parsing
Built-in support for compressing the output using brotli
Has some built-in sanity checks to ensure smooth operation
Comes with an example of performance optimized query

Installation

$ go install github.com/mikkolehtisalo/osdump@latest

Usage

The configuration options:

$ ~/go/bin/osdump -h
Usage of ./osdump:
  -base string
        opensearch base url (default "https://localhost:9200")
  -brotli
        compress using brotli
  -ca string
        CA certificate (default "ca.pem")
  -debug
        debug logging
  -file string
        target file for export (default "graylog_0.json")
  -index string
        opensearch index (default "graylog_0")
  -password string
        opensearch user (default "password")
  -quality int
        brotli quality setting (default 2)
  -query string
        query template file
  -size int
        search window size (default 1000)
  -user string
        opensearch user (default "graylog")

Example run:

$ ~/go/bin/osdump -user admin -password mysecretpassword -size 1000
2024/12/30 21:08:30 osdump.go:296: Starting to dump graylog_0
2024/12/30 21:08:30 osdump.go:300: Index graylog_0 has 272905 documents to dump
2024/12/30 21:09:53 osdump.go:320: Dumped 272905 records in 82 seconds, average speed 3314/second
2024/12/30 21:09:53 osdump.go:321: Finished dumping graylog_0

Requirements

Go 1.22+
Access to an OpenSearch instance

Limitations

Large dumps may require large amounts of disk space
Brotli compression is CPU heavy operation
Assumes opensearch security is configured (TLS enabled, and username/password required)
Single worker for querying opensearch, for now

Performance notes

Smaller window sizes seem to perform worse. Start with 1000 (default) and experiment larger sizes upto 10 000 (maximum supported by opensearch).
search_after requires always sorting field. It should never be fielddata type because it will be loaded fully into memory and sorted after that. If your cluster has a lot of activity it will evict the loaded data fast from the caches.
Fields of keyword type perform significantly better for sorting so try to always them for sorting. The performance difference may vary depending on the architecture and load of your opensearch cluster between 2x and 1000x.
You should never query opensearch without filter. A filter with match_all performs better than a query without filter. Having a filter disables scoring and enables most of the caching features of opensearch.
Opensearch has also request query, which attempts to cache the results for a specific request. I forced it to be enabled by request_cache=true , but it probably will not have any effect unless I implement retrying logic at some stage.

Keeping the previous in mind I designed the default query to be following, but you probably have to change (command line option -query) it for your needs;

{
	"size": {{.Size}},
	"query": {"bool": {"must": {"match_all": {}}}},{{if .After}}
	"search_after": ["{{.After}}"],{{end}}
	"sort": [
		{ "gl2_message_id": "asc" }  
	]
}

Contributing

This works for me. If you need more features, or find a bug, please open a pr, or an issue.

License

osdump is licensed under the MIT License.

Name		Name	Last commit message	Last commit date
Latest commit History 15 Commits
.gitignore		.gitignore
.goreleaser.yaml		.goreleaser.yaml
LICENSE		LICENSE
README.md		README.md
go.mod		go.mod
go.sum		go.sum
osdump.go		osdump.go

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

osdump

Features

Installation

Usage

Requirements

Limitations

Performance notes

Contributing

License

About

Uh oh!

Releases

Packages

Languages

License

tstenner/osdump

Folders and files

Latest commit

History

Repository files navigation

osdump

Features

Installation

Usage

Requirements

Limitations

Performance notes

Contributing

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages