Problem Statement 1:
Use Elastic search scroll api . 'Scroll API ' can be used to retrieve large numbers of results (or even all results) from a single search request, it will work in same way as cursor on a traditional database.
Pros | Cons |
We can retrieve large data set | We can not use scroll api for real time user request |
We can slice the data based upon shards | Performance issues while using it for real time request |
Code Block |
Path: /{{IndexName}}/{{type}}/_search?scroll=1m
Request Data{
"query": {//Contains the query required to fetch the data
"size" : 1000,
Returns → {"scrollId":"SCROLL ID"hits:["data"]}After receiving the scroll Id We need to send this request till we get all the resultPath:
"scroll": "1m",
"scroll_id":"Scroll id" // received in the previous request
Returns {
"_scroll_id": "Scroll Id",
"hits": {
"total": 263,
"max_score": 0.11207403,
"hits": [
{data}//result data from scroll api
QueryBuilder qb = //query;
SearchResponse scrollResp = client.prepareSearch(indicesName)
.setScroll(new TimeValue(60000))
.setSize(100).get(); //max of 100 hits will be returned for each scroll
//Scroll until no hits are returned
do {
for (SearchHit hit : scrollResp.getHits().getHits()) {
//Handle the hit...
scrollResp = client.prepareSearchScroll(scrollResp.getScrollId()).setScroll(new TimeValue(60000)).execute().actionGet();
} while(scrollResp.getHits().getHits().length != 0);
Approach 1:
We can't start the service instantly or we can generate the batch metrics by running this service once in a day, it should be Async process , and process id need to be track. This process will generate file and upload to some storage and link will be share to user on email. second time we might use same file for particular date range : Ex , if user request for stats for a course batch and for that course batch report is already generated and report validity time not expire then we can re-use it , instead of re-generating.