Skip to content
This repository has been archived by the owner on Sep 21, 2021. It is now read-only.

Latest commit

 

History

History
57 lines (44 loc) · 2.07 KB

50_Reindexing.asciidoc

File metadata and controls

57 lines (44 loc) · 2.07 KB

Reindexing Your Data

Although you can add new types to an index, or add new fields to a type, you can’t add new analyzers or make changes to existing fields. If you were to do so, the data that had already been indexed would be incorrect and your searches would no longer work as expected.

The simplest way to apply these changes to your existing data is to reindex: create a new index with the new settings and copy all of your documents from the old index to the new index.

One of the advantages of the _source field is that you already have the whole document available to you in Elasticsearch itself. You don’t have to rebuild your index from the database, which is usually much slower.

To reindex all of the documents from the old index efficiently, use scroll to retrieve batches of documents from the old index, and the bulk API to push them into the new index.

Beginning with Elasticsearch v2.3.0, a {ref}/docs-reindex.html[Reindex API] has been introduced. It enables you to reindex your documents without requiring any plugin nor external tool.

Reindexing in Batches

You can run multiple reindexing jobs at the same time, but you obviously don’t want their results to overlap. Instead, break a big reindex down into smaller jobs by filtering on a date or timestamp field:

GET /old_index/_search?scroll=1m
{
    "query": {
        "range": {
            "date": {
                "gte":  "2014-01-01",
                "lt":   "2014-02-01"
            }
        }
    },
    "sort": ["_doc"],
    "size":  1000
}

If you continue making changes to the old index, you will want to make sure that you include the newly added documents in your new index as well. This can be done by rerunning the reindex process, but again filtering on a date field to match only documents that have been added since the last reindex process started.