How to copy some ElasticSearch data to a new index

Elasticsearch

Elasticsearch Problem Overview


Let's say I have movie data in my ElasticSearch and I created them like this:

curl -XPUT "http://192.168.0.2:9200/movies/movie/1" -d'
{
    "title": "The Godfather",
    "director": "Francis Ford Coppola",
    "year": 1972
}'

And I have a bunch of movies from different years. I want to copy all the movies from a particular year (so, 1972) and copy them to a new index of "70sMovies", but I couldn't see how to do that.

Elasticsearch Solutions


Solution 1 - Elasticsearch

Since ElasticSearch 2.3 you can now use the built in _reindex API

for example:

POST /_reindex
{
  "source": {
    "index": "twitter"
  },
  "dest": {
    "index": "new_twitter"
  }
}

Or only a specific part by adding a filter/query

POST /_reindex
{
  "source": {
    "index": "twitter",
    "query": {
      "term": {
        "user": "kimchy"
      }
    }
  },
  "dest": {
    "index": "new_twitter"
  }
}

Read more: https://www.elastic.co/guide/en/elasticsearch/reference/current/docs-reindex.html

Solution 2 - Elasticsearch

The best approach would be to use elasticsearch-dump tool https://github.com/taskrabbit/elasticsearch-dump.

The real world example I used :

elasticdump \
  --input=http://localhost:9700/.kibana \
  --output=http://localhost:9700/.kibana_read_only \
  --type=mapping
elasticdump \
  --input=http://localhost:9700/.kibana \
  --output=http://localhost:9700/.kibana_read_only \
  --type=data

Solution 3 - Elasticsearch

Check out knapsack: https://github.com/jprante/elasticsearch-knapsack

Once you have the plugin installed and working, you could export part of your index via query. For example:

curl -XPOST 'localhost:9200/test/test/_export' -d '{
"query" : {
    "match" : {
        "myfield" : "myvalue"
    }
},
"fields" : [ "_parent", "_source" ]
}'

This will create a tarball with only your query results, which you can then import into another index.

Solution 4 - Elasticsearch

To reindex specific type from source index to destination index type syntax is

POST _reindex/
 {
 "source": {
 "index": "source_index",
 "type": "source_type",
 "query": {
  // add filter criteria
   }
 },
 "dest": {
  "index": "dest_index",
  "type": "dest_type"
  }
}

Solution 5 - Elasticsearch

You can do it easily with elasticsearch-dump (https://github.com/taskrabbit/elasticsearch-dump) in three steps. In the following example I copy the index "thor" to "thor2"

elasticdump --input=http://localhost:9200/thor --output=http://localhost:9200/thor2 --type=analyzer

elasticdump --input=http://localhost:9200/thor --output=http://localhost:9200/thor2 --type=mapping

elasticdump --input=http://localhost:9200/thor --output=http://localhost:9200/thor2 --type=data

Solution 6 - Elasticsearch

Well the straightforward way to do this is to write code, with the API of your choice, querying for "year": 1972 and then indexing that data into a new index. You would use the Search api or the Scan and Scroll API to get all the documents and then either index them one by one or use the Bulk Api:

http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/search-search.html

http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/search-request-scroll.html

http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/docs-index_.html

http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/docs-bulk.html

Assuming you don't want to do this via code but are looking for a direct way of doing this, I suggest the Elasticsearch Snapshot and Restore. Basically you would take a snapshot of your existing index, restore it into a new index and then use the Delete command to delete all documents with a year other than 1972.

> Snapshot And Restore > > The snapshot and restore module allows to create snapshots of > individual indices or an entire cluster into a remote repository. At > the time of the initial release only shared file system repository was > supported, but now a range of backends are available via officially > supported repository plugins.

http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/modules-snapshots.html

> Delete By Query API > > The delete by query API allows to delete documents from one or more > indices and one or more types based on a query. The query can either > be provided using a simple query string as a parameter, or using the > Query DSL defined within the request body.

http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/docs-delete-by-query.html

Solution 7 - Elasticsearch

Since v7.4 the _clone api was introduced and can easily satisfy your need: (read for the relevant prerequisites and monitoring involved)

POST /<index>/_clone/<target-index>

Or:

PUT /<index>/_clone/<target-index>

Solution 8 - Elasticsearch

If the intent were to copy some portion of the data or the entire data to an index with the same settings/mappings as that of the original index one could use the clone api to achieve the same. Something like below:

> POST /<index>/_clone/<target-index>

OR

> PUT /<index>/_clone/<target-index>

However if the intent is to copy the data to a new index with the different settings/mappings than the original index one could use the reindex api to achieve the same. Something like below:

POST _reindex/

 {

     "source": {

         "index": "source_index",

         "type": "source_type",

         "query": {

              // add filter criteria

         }

    },

   "dest": {

       "index": "dest_index",

       "type": "dest_type"

   }

}

*Note: In case of reindex api the target index has to be created prior to actual api call.

For further reading on difference between clone and reindex refer https://stackoverflow.com/questions/59198756/whats-the-difference-between-cloning-and-reindexing-an-index-in-elasticsearch/69268032#69268032

Attributions

All content for this solution is sourced from the original question on Stackoverflow.

The content on this page is licensed under the Attribution-ShareAlike 4.0 International (CC BY-SA 4.0) license.

Content TypeOriginal AuthorOriginal Content on Stackoverflow
QuestioncybergoofView Question on Stackoverflow
Solution 1 - ElasticsearchLudo - Off the recordView Answer on Stackoverflow
Solution 2 - ElasticsearchMAQView Answer on Stackoverflow
Solution 3 - ElasticsearchcoffeeaddictView Answer on Stackoverflow
Solution 4 - ElasticsearchRamesh PapagantiView Answer on Stackoverflow
Solution 5 - ElasticsearchjpereiraView Answer on Stackoverflow
Solution 6 - ElasticsearchJohn PetroneView Answer on Stackoverflow
Solution 7 - ElasticsearchmorkView Answer on Stackoverflow
Solution 8 - ElasticsearchYDFView Answer on Stackoverflow