Elasticsearch query to return all records

DatabaseElasticsearchQuery StringElasticsearch Dsl

Database Problem Overview


I have a small database in Elasticsearch and for testing purposes would like to pull all records back. I am attempting to use a URL of the form...

http://localhost:9200/foo/_search?pretty=true&q={'matchAll':{''}}

Can someone give me the URL you would use to accomplish this, please?

Database Solutions


Solution 1 - Database

I think lucene syntax is supported so:

http://localhost:9200/foo/_search?pretty=true&q=*:*

size defaults to 10, so you may also need &size=BIGNUMBER to get more than 10 items. (where BIGNUMBER equals a number you believe is bigger than your dataset)

BUT, elasticsearch documentation suggests for large result sets, using the scan search type.

EG:

curl -XGET 'localhost:9200/foo/_search?search_type=scan&scroll=10m&size=50' -d '
{
    "query" : {
        "match_all" : {}
    }
}'

and then keep requesting as per the documentation link above suggests.

EDIT: scan Deprecated in 2.1.0.

scan does not provide any benefits over a regular scroll request sorted by _doc. link to elastic docs (spotted by @christophe-roussy)

Solution 2 - Database

http://127.0.0.1:9200/foo/_search/?size=1000&pretty=1
                                   ^
  

Note the size param, which increases the hits displayed from the default (10) to 1000 per shard.

http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/search-request-from-size.html

Solution 3 - Database

elasticsearch(ES) supports both a GET or a POST request for getting the data from the ES cluster index.

When we do a GET:

http://localhost:9200/[your index name]/_search?size=[no of records you want]&q=*:*

When we do a POST:

http://localhost:9200/[your_index_name]/_search
{
  "size": [your value] //default 10
  "from": [your start index] //default 0
  "query":
   {
    "match_all": {}
   }
}   

I would suggest to use a UI plugin with elasticsearch http://mobz.github.io/elasticsearch-head/ This will help you get a better feeling of the indices you create and also test your indices.

Solution 4 - Database

> Note: The answer relates to an older version of Elasticsearch 0.90. Versions released since then have an updated syntax. Please refer to other answers that may provide a more accurate answer to the latest answer that you are looking for.

The query below would return the NO_OF_RESULTS you would like to be returned..

curl -XGET 'localhost:9200/foo/_search?size=NO_OF_RESULTS' -d '
{
"query" : {
    "match_all" : {}
  }
}'

Now, the question here is that you want all the records to be returned. So naturally, before writing a query, you wont know the value of NO_OF_RESULTS.

How do we know how many records exist in your document? Simply type the query below

curl -XGET 'localhost:9200/foo/_search' -d '

This would give you a result that looks like the one below

 {
hits" : {
  "total" :       2357,
  "hits" : [
    {
      ..................

The result total tells you how many records are available in your document. So, that's a nice way to know the value of NO_OF RESULTS

curl -XGET 'localhost:9200/_search' -d ' 

Search all types in all indices

curl -XGET 'localhost:9200/foo/_search' -d '

Search all types in the foo index

curl -XGET 'localhost:9200/foo1,foo2/_search' -d '

Search all types in the foo1 and foo2 indices

curl -XGET 'localhost:9200/f*/_search

Search all types in any indices beginning with f

curl -XGET 'localhost:9200/_all/type1,type2/_search' -d '

Search types user and tweet in all indices

Solution 5 - Database

This is the best solution I found using python client

  # Initialize the scroll
  page = es.search(
  index = 'yourIndex',
  doc_type = 'yourType',
  scroll = '2m',
  search_type = 'scan',
  size = 1000,
  body = {
    # Your query's body
    })
  sid = page['_scroll_id']
  scroll_size = page['hits']['total']
  
  # Start scrolling
  while (scroll_size > 0):
    print "Scrolling..."
    page = es.scroll(scroll_id = sid, scroll = '2m')
    # Update the scroll ID
    sid = page['_scroll_id']
    # Get the number of results that we returned in the last scroll
    scroll_size = len(page['hits']['hits'])
    print "scroll size: " + str(scroll_size)
    # Do something with the obtained page

https://gist.github.com/drorata/146ce50807d16fd4a6aa

Using java client

import static org.elasticsearch.index.query.QueryBuilders.*;

QueryBuilder qb = termQuery("multi", "test");

SearchResponse scrollResp = client.prepareSearch(test)
        .addSort(FieldSortBuilder.DOC_FIELD_NAME, SortOrder.ASC)
        .setScroll(new TimeValue(60000))
        .setQuery(qb)
        .setSize(100).execute().actionGet(); //100 hits per shard will be returned for each scroll
//Scroll until no hits are returned
do {
    for (SearchHit hit : scrollResp.getHits().getHits()) {
        //Handle the hit...
    }

    scrollResp = client.prepareSearchScroll(scrollResp.getScrollId()).setScroll(new TimeValue(60000)).execute().actionGet();
} while(scrollResp.getHits().getHits().length != 0); // Zero hits mark the end of the scroll and the while loop.

https://www.elastic.co/guide/en/elasticsearch/client/java-api/current/java-search-scrolling.html

Solution 6 - Database

Elasticsearch will get significant slower if you just add some big number as size, one method to use to get all documents is using scan and scroll ids.

https://www.elastic.co/guide/en/elasticsearch/reference/current/search-request-scroll.html

In Elasticsearch v7.2, you do it like this:

POST /foo/_search?scroll=1m
{
    "size": 100,
    "query": {
        "match_all": {}
    }
}

The results from this would contain a _scroll_id which you have to query to get the next 100 chunk.

POST /_search/scroll 
{
    "scroll" : "1m", 
    "scroll_id" : "<YOUR SCROLL ID>" 
}

Solution 7 - Database

If you want to pull many thousands of records then... a few people gave the right answer of using 'scroll' (Note: Some people also suggested using "search_type=scan". This was deprecated, and in v5.0 removed. You don't need it)

Start with a 'search' query, but specifying a 'scroll' parameter (here I'm using a 1 minute timeout):

curl -XGET 'http://ip1:9200/myindex/_search?scroll=1m' -d '
{
    "query": {
            "match_all" : {}
    }
}
'

That includes your first 'batch' of hits. But we are not done here. The output of the above curl command would be something like this:

{"_scroll_id":"c2Nhbjs1OzUyNjE6NU4tU3BrWi1UWkNIWVNBZW43bXV3Zzs1Mzc3OkhUQ0g3VGllU2FhemJVNlM5d2t0alE7NTI2Mjo1Ti1TcGtaLVRaQ0hZU0FlbjdtdXdnOzUzNzg6SFRDSDdUaWVTYWF6YlU2Uzl3a3RqUTs1MjYzOjVOLVNwa1otVFpDSFlTQWVuN211d2c7MTt0b3RhbF9oaXRzOjIyNjAxMzU3Ow==","took":109,"timed_out":false,"_shards":{"total":5,"successful":5,"failed":0},"hits":{"total":22601357,"max_score":0.0,"hits":[]}}

It's important to have _scroll_id handy as next you should run the following command:

    curl -XGET  'localhost:9200/_search/scroll'  -d'
    {
        "scroll" : "1m", 
        "scroll_id" : "c2Nhbjs2OzM0NDg1ODpzRlBLc0FXNlNyNm5JWUc1" 
    }
    '

However, passing the scroll_id around is not something designed to be done manually. Your best bet is to write code to do it. e.g. in java:

private TransportClient client = null; private Settings settings = ImmutableSettings.settingsBuilder() .put(CLUSTER_NAME,"cluster-test").build(); private SearchResponse scrollResp = null;

    this.client = new TransportClient(settings);
    this.client.addTransportAddress(new InetSocketTransportAddress("ip", port));

    QueryBuilder queryBuilder = QueryBuilders.matchAllQuery();
    scrollResp = client.prepareSearch(index).setSearchType(SearchType.SCAN)
                 .setScroll(new TimeValue(60000))    		                 
                 .setQuery(queryBuilder)
                 .setSize(100).execute().actionGet();

    scrollResp = client.prepareSearchScroll(scrollResp.getScrollId())
    			.setScroll(new TimeValue(timeVal))
    			.execute()
    			.actionGet();

Now LOOP on the last command use SearchResponse to extract the data.

Solution 8 - Database

use server:9200/_stats also to get statistics about all your aliases.. like size and number of elements per alias, that's very useful and provides helpful information

Solution 9 - Database

If it's a small dataset (e.g. 1K records), you can simply specify size:

curl localhost:9200/foo_index/_search?size=1000

The match all query isn't needed, as it's implicit.

If you have a medium-sized dataset, like 1M records, you may not have enough memory to load it, so you need a scroll.

A scroll is like a cursor in a DB. In Elasticsearch, it remembers where you left off and keeps the same view of the index (i.e. prevents the searcher from going away with a refresh, prevents segments from merging).

API-wise, you have to add a scroll parameter to the first request:

curl 'localhost:9200/foo_index/_search?size=100&scroll=1m&pretty'

You get back the first page and a scroll ID:

{
  "_scroll_id" : "DXF1ZXJ5QW5kRmV0Y2gBAAAAAAAAADEWbmJlSmxjb2hSU0tMZk12aEx2c0EzUQ==",
  "took" : 0,
...

Remember that both the scroll ID you get back and the timeout are valid for the next page. A common mistake here is to specify a very large timeout (value of scroll), that would cover for processing the whole dataset (e.g. 1M records) instead of one page (e.g. 100 records).

To get the next page, fill in the last scroll ID and a timeout that should last until fetching the following page:

curl -XPOST -H 'Content-Type: application/json' 'localhost:9200/_search/scroll' -d '{
  "scroll": "1m",
  "scroll_id": "DXF1ZXJ5QW5kRmV0Y2gBAAAAAAAAADAWbmJlSmxjb2hSU0tMZk12aEx2c0EzUQ=="
}'

If you have a lot to export (e.g. 1B documents), you'll want to parallelise. This can be done via sliced scroll. Say you want to export on 10 threads. The first thread would issue a request like this:

curl -XPOST -H 'Content-Type: application/json' 'localhost:9200/test/_search?scroll=1m&size=100' -d '{
  "slice": {
    "id": 0, 
    "max": 10 
  }
}'

You get back the first page and a scroll ID, exactly like a normal scroll request. You'd consume it exactly like a regular scroll, except that you get 1/10th of the data.

Other threads would do the same, except that id would be 1, 2, 3...

Solution 10 - Database

The best way to adjust the size is using size=number in front of the URL

Curl -XGET "http://localhost:9200/logstash-*/_search?size=50&pretty"

Note: maximum value which can be defined in this size is 10000. For any value above ten thousand it expects you to use scroll function which would minimise any chances of impacts to performance.

Solution 11 - Database

Simple! You can use size and from parameter!

http://localhost:9200/[your index name]/_search?size=1000&from=0

then you change the from gradually until you get all of the data.

Solution 12 - Database

You can use the _count API to get the value for the size parameter:

http://localhost:9200/foo/_count?q=<your query>

Returns {count:X, ...}. Extract value 'X' and then do the actual query:

http://localhost:9200/foo/_search?q=<your query>&size=X

Solution 13 - Database

From Kibana DevTools its:

GET my_index_name/_search
{
  "query": {
    "match_all": {}
  }
}

Solution 14 - Database

You actually don't need to pass a body to match_all, it can be done with a GET request to the following URL. This is the simplest form.

http://localhost:9200/foo/_search

Solution 15 - Database

http://localhost:9200/foo/_search/?**size**=1000&pretty=1

you will need to specify size query parameter as the default is 10

Solution 16 - Database

size param increases the hits displayed from from the default(10) to 500.

http://localhost:9200/[indexName]/_search?pretty=true&size=500&q=*:*</pre>

Change the from step by step to get all the data.

http://localhost:9200/[indexName]/_search?size=500&from=0</pre>

Solution 17 - Database

A simple solution using the python package elasticsearch-dsl:

from elasticsearch_dsl import Search
from elasticsearch_dsl import connections

connections.create_connection(hosts=['localhost'])

s = Search(index="foo")
response = s.scan()

count = 0
for hit in response:
    # print(hit.to_dict())  # be careful, it will printout every hit in your index
    count += 1

print(count)

See also https://elasticsearch-dsl.readthedocs.io/en/latest/api.html#elasticsearch_dsl.Search.scan .

Solution 18 - Database

Using kibana console and my_index as the index to search the following can be contributed. Asking the index to only return 4 fields of the index, you can also add size to indicate how many documents that you want to be returned by the index. As of ES 7.6 you should use _source rather than filter it will respond faster.

GET /address/_search
 {
   "_source": ["streetaddress","city","state","postcode"],
   "size": 100,
   "query":{
   "match_all":{ }
    }   
 }

Solution 19 - Database

For Elasticsearch 6.x

Request: GET /foo/_search?pretty=true

Response: In Hits-> total, give the count of the docs

    {
      "took": 1,
      "timed_out": false,
      "_shards": {
        "total": 5,
        "successful": 5,
        "skipped": 0,
        "failed": 0
      },
      "hits": {
        "total": 1001,
        "max_score": 1,
        "hits": [
          {

Solution 20 - Database

curl -X GET 'localhost:9200/foo/_search?q=*&pretty' 

Solution 21 - Database

By default Elasticsearch return 10 records so size should be provided explicitly.

Add size with request to get desire number of records.

http://{host}:9200/{index_name}/_search?pretty=true&size=(number of records)

Note : Max page size can not be more than index.max_result_window index setting which defaults to 10,000.

Solution 22 - Database

To return all records from all indices you can do:

curl -XGET http://35.195.120.21:9200/_all/_search?size=50&pretty

Output:

  "took" : 866,
  "timed_out" : false,
  "_shards" : {
    "total" : 25,
    "successful" : 25,
    "failed" : 0
  },
  "hits" : {
    "total" : 512034694,
    "max_score" : 1.0,
    "hits" : [ {
      "_index" : "grafana-dash",
      "_type" : "dashboard",
      "_id" : "test",
      "_score" : 1.0,
       ...

Solution 23 - Database

The maximum result which will return by elasticSearch is 10000 by providing the size

curl -XGET 'localhost:9200/index/type/_search?scroll=1m' -d '
{
   "size":10000,
   "query" : {
   "match_all" : {}
    }
}'

After that, you have to use Scroll API for getting the result and get the _scroll_id value and put this value in scroll_id

curl -XGET  'localhost:9200/_search/scroll'  -d'
{
   "scroll" : "1m", 
   "scroll_id" : "" 
}'

Solution 24 - Database

If still someone is looking for all the data to be retrieved from Elasticsearch like me for some usecases, here is what I did. Moreover, all the data means, all the indexes and all the documents types. I'm using Elasticsearch 6.3

curl -X GET "localhost:9200/_search?pretty=true" -H 'Content-Type: application/json' -d'
{
    "query": {
        "match_all": {}
    }
}
'

Elasticsearch reference

Solution 25 - Database

The official documentation provides the answer to this question! you can find it here.

{
  "query": { "match_all": {} },
  "size": 1
}

You simply replace size (1) with the number of results you want to see!

Solution 26 - Database

curl -XGET '{{IP/localhost}}:9200/{{Index name}}/{{type}}/_search?scroll=10m&pretty' -d '{
"query": {
"filtered": {
"query": {
"match_all": {}
}}'

Solution 27 - Database

None except @Akira Sendoh has answered how to actually get ALL docs. But even that solution crashes my ES 6.3 service without logs. The only thing that worked for me using the low-level elasticsearch-py library was through scan helper that uses scroll() api:

from elasticsearch.helpers import scan

doc_generator = scan(
    es_obj,
    query={"query": {"match_all": {}}},
    index="my-index",
)

# use the generator to iterate, dont try to make a list or you will get out of RAM
for doc in doc_generator:
    # use it somehow

However, the cleaner way nowadays seems to be through elasticsearch-dsl library, that offers more abstract, cleaner calls, e.g: http://elasticsearch-dsl.readthedocs.io/en/latest/search_dsl.html#hits

Solution 28 - Database

this is the query to accomplish what you want, (I am suggesting to use Kibana, as it helps to understand queries better)

GET my_index_name/my_type_name/_search
{
   "query":{
      "match_all":{}
   },
   size : 20,
   from : 3
}

to get all records you have to use "match_all" query.

size is the no of records you want to fetch (kind of limit). by default, ES will only return 10 records

from is like skip, skip first 3 records.

If you want to fetch exactly all the records, just use the value from the "total" field from the result once you hit this query from Kibana and the use it with "size".

Solution 29 - Database

Using Elasticsearch 7.5.1

http://${HOST}:9200/${INDEX}/_search?pretty=true&q=*:*&scroll=10m&size=5000

in case you can also specify the size of your array with &size=${number}

in case you don't know you index

http://${HOST}:9200/_cat/indices?v

Solution 30 - Database

You can use size=0 this will return you all the documents example

curl -XGET 'localhost:9200/index/type/_search' -d '
{
   size:0,
   "query" : {
   "match_all" : {}
    }
}'

Attributions

All content for this solution is sourced from the original question on Stackoverflow.

The content on this page is licensed under the Attribution-ShareAlike 4.0 International (CC BY-SA 4.0) license.

Content TypeOriginal AuthorOriginal Content on Stackoverflow
QuestionJohn LivermoreView Question on Stackoverflow
Solution 1 - DatabaseSteve CaseyView Answer on Stackoverflow
Solution 2 - Databaselfender6445View Answer on Stackoverflow
Solution 3 - DatabasePrerak DiwanView Answer on Stackoverflow
Solution 4 - DatabasevjpandianView Answer on Stackoverflow
Solution 5 - DatabaseHungUnicornView Answer on Stackoverflow
Solution 6 - DatabaseWoodyDRNView Answer on Stackoverflow
Solution 7 - DatabaseSomumView Answer on Stackoverflow
Solution 8 - DatabaseOussama L.View Answer on Stackoverflow
Solution 9 - DatabaseRadu GheorgheView Answer on Stackoverflow
Solution 10 - Databaseakshay misraView Answer on Stackoverflow
Solution 11 - DatabaseAminah NurainiView Answer on Stackoverflow
Solution 12 - DatabaseDanielView Answer on Stackoverflow
Solution 13 - DatabasebelostokyView Answer on Stackoverflow
Solution 14 - DatabaseKrakenView Answer on Stackoverflow
Solution 15 - DatabaseEdwin O.View Answer on Stackoverflow
Solution 16 - DatabasePrasanna JathanView Answer on Stackoverflow
Solution 17 - DatabaseasmaierView Answer on Stackoverflow
Solution 18 - DatabaseGregory NeelyView Answer on Stackoverflow
Solution 19 - DatabaseAnuragView Answer on Stackoverflow
Solution 20 - DatabaseDhruv SharmaView Answer on Stackoverflow
Solution 21 - DatabaseSatyendra SharmaView Answer on Stackoverflow
Solution 22 - DatabaseexceltiorView Answer on Stackoverflow
Solution 23 - DatabaseRAHUL JAINView Answer on Stackoverflow
Solution 24 - DatabaseSantosh Kumar ArjunanView Answer on Stackoverflow
Solution 25 - Databasechristouandr7View Answer on Stackoverflow
Solution 26 - DatabaseadityaView Answer on Stackoverflow
Solution 27 - DatabasesteliosView Answer on Stackoverflow
Solution 28 - Databaseniranjan_harpaleView Answer on Stackoverflow
Solution 29 - DatabaseTiago MediciView Answer on Stackoverflow
Solution 30 - DatabasepremkumarView Answer on Stackoverflow