Elasticsearch Bulk Index JSON Data

JsonElasticsearch

Json Problem Overview


I am trying to bulk index a JSON file into a new Elasticsearch index and am unable to do so. I have the following sample data inside the JSON

[{"Amount": "480", "Quantity": "2", "Id": "975463711", "Client_Store_sk": "1109"},{"Amount": "2105", "Quantity": "2", "Id": "975463943", "Client_Store_sk": "1109"},{"Amount": "2107", "Quantity": "3", "Id": "974920111", "Client_Store_sk": "1109"},{"Amount": "2115", "Quantity": "2", "Id": "975463798", "Client_Store_sk": "1109"},{"Amount": "2116", "Quantity": "1", "Id": "975463827", "Client_Store_sk": "1109"},{"Amount": "648", "Quantity": "3", "Id": "975464139", "Client_Store_sk": "1109"},{"Amount": "2126", "Quantity": "2", "Id": "975464805", "Client_Store_sk": "1109"},{"Amount": "2133", "Quantity": "1", "Id": "975464061", "Client_Store_sk": "1109"},{"Amount": "1339", "Quantity": "4", "Id": "974919458", "Client_Store_sk": "1109"},{"Amount": "1196", "Quantity": "5", "Id": "974920538", "Client_Store_sk": "1109"},{"Amount": "1198", "Quantity": "4", "Id": "975463638", "Client_Store_sk": "1109"},{"Amount": "1345", "Quantity": "4", "Id": "974919522", "Client_Store_sk": "1109"},{"Amount": "1347", "Quantity": "2", "Id": "974919563", "Client_Store_sk": "1109"},{"Amount": "673", "Quantity": "2", "Id": "975464359", "Client_Store_sk": "1109"},{"Amount": "2153", "Quantity": "1", "Id": "975464511", "Client_Store_sk": "1109"},{"Amount": "3896", "Quantity": "4", "Id": "977289342", "Client_Store_sk": "1109"},{"Amount": "3897", "Quantity": "4", "Id": "974920602", "Client_Store_sk": "1109"}]

I am using

 curl -XPOST localhost:9200/index_local/my_doc_type/_bulk --data-binary --data @/home/data1.json 

When I try to use the standard bulk index API from Elasticsearch I get this error

 error: {"message":"ActionRequestValidationException[Validation Failed: 1: no requests added;]"}

Can anyone help with indexing this type of JSON?

Json Solutions


Solution 1 - Json

What you need to do is to read that JSON file and then build a bulk request with the format expected by the _bulk endpoint, i.e. one line for the command and one line for the document, separated by a newline character... rinse and repeat for each document:

curl -XPOST localhost:9200/your_index/_bulk -d '
{"index": {"_index": "your_index", "_type": "your_type", "_id": "975463711"}}
{"Amount": "480", "Quantity": "2", "Id": "975463711", "Client_Store_sk": "1109"}
{"index": {"_index": "your_index", "_type": "your_type", "_id": "975463943"}}
{"Amount": "2105", "Quantity": "2", "Id": "975463943", "Client_Store_sk": "1109"}
... etc for all your documents
'

Just make sure to replace your_index and your_type with the actual index and type names you're using.

UPDATE

Note that the command-line can be shortened, by removing _index and _type if those are specified in your URL. It is also possible to remove _id if you specify the path to your id field in your mapping (note that this feature will be deprecated in ES 2.0, though). At the very least, your command line can look like {"index":{}} for all documents but it will always be mandatory in order to specify which kind of operation you want to perform (in this case index the document)

UPDATE 2

curl -XPOST localhost:9200/index_local/my_doc_type/_bulk --data-binary  @/home/data1.json

/home/data1.json should look like this:

{"index":{}}
{"Amount": "480", "Quantity": "2", "Id": "975463711", "Client_Store_sk": "1109"}
{"index":{}}
{"Amount": "2105", "Quantity": "2", "Id": "975463943", "Client_Store_sk": "1109"}
{"index":{}}
{"Amount": "2107", "Quantity": "3", "Id": "974920111", "Client_Store_sk": "1109"}

UPDATE 3

You can refer to this answer to see how to generate the new json style file mentioned in UPDATE 2.

UPDATE 4

As of ES 7.x, the doc_type is not necessary anymore and should simply be _doc instead of my_doc_type. As of ES 8.x, the doc type will be removed completely. You can read more about this here

Solution 2 - Json

As of today, 6.1.2 is the latest version of ElasticSearch, and the curl command that works for me on Windows (x64) is

curl -s -XPOST localhost:9200/my_index/my_index_type/_bulk -H "Content-Type: 
application/x-ndjson" --data-binary @D:\data\mydata.json

The format of the data that should be present in mydata.json remains the same as shown in @val's answer

Solution 3 - Json

A valid Elasticsearch bulk API request would be something like (ending with a newline):

POST http://localhost:9200/products_slo_development_temp_2/productModel/_bulk

{ "index":{ } } 
{"RequestedCountry":"slo","Id":1860,"Title":"Stol"} 
{ "index":{ } } 
{"RequestedCountry":"slo","Id":1860,"Title":"Miza"} 

Elasticsearch bulk api documentation: https://www.elastic.co/guide/en/elasticsearch/reference/current/docs-bulk.html

This is how I do it

I send a POST http request with the uri valiable as the URI/URL of the http request and elasticsearchJson variable is the JSON sent in the body of the http request formatted for the Elasticsearch bulk api:

var uri = @"/" + indexName + "/productModel/_bulk";
var json = JsonConvert.SerializeObject(sqlResult);
var elasticsearchJson = GetElasticsearchBulkJsonFromJson(json, "RequestedCountry");

Helper method for generating the required json format for the Elasticsearch bulk api:

public string GetElasticsearchBulkJsonFromJson(string jsonStringWithArrayOfObjects, string firstParameterNameOfObjectInJsonStringArrayOfObjects)
{
  return @"{ ""index"":{ } } 
" + jsonStringWithArrayOfObjects.Substring(1, jsonStringWithArrayOfObjects.Length - 2).Replace(@",{""" + firstParameterNameOfObjectInJsonStringArrayOfObjects + @"""", @" 
{ ""index"":{ } } 
{""" + firstParameterNameOfObjectInJsonStringArrayOfObjects + @"""") + @"
";
}

The first property/field in my JSON object is the RequestedCountry property that's why I use it in this example.

productModel is my Elasticsearch document type. sqlResult is a C# generic list with products.

Solution 4 - Json

This answer is for Elastic Search 7.x onwards. _type is deprecated. As others have mentioned, you can read the file programatically, and construct a request body as described below. Also, I see that each of your json object has the Id attribute. So, you could set the document's internal id (_id) to be the same as this attribute. Updated _bulk API would look like this:

HTTP Method: POST

URI: /<index_name>/_bulk

Request body (should end with a new line):

{"index":{"_id": "975463711"}}
{"Amount": "480", "Quantity": "2", "Id": "975463711", "Client_Store_sk": "1109"}
{"index":{"_id": "975463943"}}
{"Amount": "2105", "Quantity": "2", "Id": "975463943", "Client_Store_sk": "1109"}

Attributions

All content for this solution is sourced from the original question on Stackoverflow.

The content on this page is licensed under the Attribution-ShareAlike 4.0 International (CC BY-SA 4.0) license.

Content TypeOriginal AuthorOriginal Content on Stackoverflow
QuestionAmit PView Question on Stackoverflow
Solution 1 - JsonValView Answer on Stackoverflow
Solution 2 - JsonThomasView Answer on Stackoverflow
Solution 3 - JsonTadejView Answer on Stackoverflow
Solution 4 - JsonBinita BharatiView Answer on Stackoverflow