Elasticsearch upgrade in Sunbird Platform

Problem Statement

Sunbird is using ES 5.4.3 whereas other platforms (LP/DP) are running on ES 6.3. Hence the Sunbird ES 5.4.3 needs to be upgraded to 6.3.

SB-11085 - Getting issue details... STATUS

Solution Approach

To accomplish the upgrade we can take below approach

Full cluster restart

Steps

  1. Disable shard allocation
  2. Stop indexing and sync flush
  3. shutdown all nodes
  4. upgrade each nodes and provide stored data path
  5. Start each upgraded node
  6. reenable allocation which was disabled in first step
backup and restore process. 

Steps -

  1. Register a repository in the existing ES 5.4.3
  2. Create a snapshot of the data from ES
  3. Register the same repo in the new ES 6.3
  4. Call the restore API in ES 6.3 for the snapshot created in step 2


Pros and cons
Approachproscons
Full cluster restart
downtime
Backup and restoreminimal downtimeneeds extra running nodes

Note

Snapshots are incremental which means that once sunbird in pointing to the new ES 6.3, we can repeat the snapshot and backup process to push any mew data created in between.


Problem Statement

How to register a repository?

Solution

For registering a repository, we need to define/add path.repo in elastisearch.yml as below

#
# Path to directory where to store the data (separate multiple locations by comma):
#
#path.data: /path/to/data
#
# Path to log files:
#
#path.logs: /path/to/logs
#
path.repo: ["/home/elastic/repo"]

Once we restart the elasticsearch with this config, It needs an API call as below

Request

PUT /_snapshot/{repoName}
{
  "type": "fs",
  "settings": {
        "location": "{location}", //the location where the repo should be, the  value should be present in path.repo
        "compress": true
  }
}

Response

{
    "acknowledged": true
}


example

curl -X PUT \
  http://localhost:9200/_snapshot/repo \
  -H 'Content-Type: application/json' \
  -H 'Postman-Token: 1b3d36e5-0609-47e4-8e19-d2dc945acddd' \
  -H 'cache-control: no-cache' \
  -d '{
  "type": "fs",
  "settings": {
        "location": "/home/elastic/repo",
        "compress": true
  }
}'

Please note that created repo can be verified by below API call

Request
GET /_snapshot/{repoName}

Response
{
  "type": "fs",
  "settings": {
        "location": "home/elastic/repo",
        "compress": true
  }
}


Problem Statement

How to create a snapshot in repo?

Solution

Once the repository is created, we can take snapshot by below API

Request
PUT /_snapshot/{repoName}/{snapshotName}

Response
{
    "accepted": true
}

example

curl -X PUT \
  http://localhost:9200/_snapshot/repo/snapshot_1 \
  -H 'cache-control: no-cache'

 It starts the snapshot process and the status of this process can be verified as below

Request
GET /_snapshot/{repoName}/{snapshotName}

Response
{
    "snapshots": [
        {
            "snapshot": "snapshot",
            "uuid": "Ht4gT_joQKKEoBF-Qcj5Vg",
            "version_id": 5010199,
            "version": "5.1.1",
            "indices": [
                "searchindex",
                "sunbirdplugin",
                "sunbirddataaudit",
                ".kibana"
            ],
            "state": "SUCCESS",
            "start_time": "2019-03-08T07:07:22.416Z",
            "start_time_in_millis": 1552028842416,
            "end_time": "2019-03-08T07:07:25.128Z",
            "end_time_in_millis": 1552028845128,
            "duration_in_millis": 2712,
            "failures": [],
            "shards": {
                "total": 16,
                "failed": 0,
                "successful": 16
            }
        }
    ]
}

Further details and considerations


Problem Statement

How to restore a snapshot?

Solution

A snapshot created can be restored by calling the below API, however it should be ensured that the same repository is registered where we want to restore

Request
POST /_snapshot/{repoName}/{snapshotName}/_restore

Response
{
    "accepted": true
}

example

curl -X POST \
  http://localhost:9200/_snapshot/repo/snapshot/_restore \
  -H 'cache-control: no-cache'


Problem Statement

How to get the status of snapshot or recovery process

Solution

Below API is used to get the status

Request
GET /_snapshot/{repoName}/{snapshotName}/_status

Response

{
    "snapshots": [
        {
            "snapshot": "snapshot",
            "repository": "repo",
            "uuid": "Ht4gT_joQKKEoBF-Qcj5Vg",
            "state": "SUCCESS",
            "shards_stats": {
                "initializing": 0,
                "started": 0,
                "finalizing": 0,
                "done": 16,
                "failed": 0,
                "total": 16
            },
            "stats": {
                "number_of_files": 187,
                "processed_files": 187,
                "total_size_in_bytes": 88761315,
                "processed_size_in_bytes": 88761315,
                "start_time_in_millis": 1552028842507,
                "time_in_millis": 2589
            },
            "indices": {
                ".kibana": {
                    "shards_stats": {
                        "initializing": 0,
                        "started": 0,
                        "finalizing": 0,
                        "done": 1,
                        "failed": 0,
                        "total": 1
                    },
                    "stats": {
                        "number_of_files": 7,
                        "processed_files": 7,
                        "total_size_in_bytes": 20398,
                        "processed_size_in_bytes": 20398,
                        "start_time_in_millis": 1552028842507,
                        "time_in_millis": 59
                    },
                    "shards": {
                        "0": {
                            "stage": "DONE",
                            "stats": {
                                "number_of_files": 7,
                                "processed_files": 7,
                                "total_size_in_bytes": 20398,
                                "processed_size_in_bytes": 20398,
                                "start_time_in_millis": 1552028842507,
                                "time_in_millis": 59
                            }
                        }
                    }
                },
                "sunbirdplugin": {
                    "shards_stats": {
                        "initializing": 0,
                        "started": 0,
                        "finalizing": 0,
                        "done": 5,
                        "failed": 0,
                        "total": 5
                    },
                    "stats": {
                        "number_of_files": 5,
                        "processed_files": 5,
                        "total_size_in_bytes": 805,
                        "processed_size_in_bytes": 805,
                        "start_time_in_millis": 1552028842508,
                        "time_in_millis": 1143
                    },
                    "shards": {
                        "0": {
                            "stage": "DONE",
                            "stats": {
                                "number_of_files": 1,
                                "processed_files": 1,
                                "total_size_in_bytes": 161,
                                "processed_size_in_bytes": 161,
                                "start_time_in_millis": 1552028843643,
                                "time_in_millis": 8
                            }
                        },
                        "1": {
                            "stage": "DONE",
                            "stats": {
                                "number_of_files": 1,
                                "processed_files": 1,
                                "total_size_in_bytes": 161,
                                "processed_size_in_bytes": 161,
                                "start_time_in_millis": 1552028843619,
                                "time_in_millis": 5
                            }
                        },
                        "2": {
                            "stage": "DONE",
                            "stats": {
                                "number_of_files": 1,
                                "processed_files": 1,
                                "total_size_in_bytes": 161,
                                "processed_size_in_bytes": 161,
                                "start_time_in_millis": 1552028843215,
                                "time_in_millis": 6
                            }
                        },
                        "3": {
                            "stage": "DONE",
                            "stats": {
                                "number_of_files": 1,
                                "processed_files": 1,
                                "total_size_in_bytes": 161,
                                "processed_size_in_bytes": 161,
                                "start_time_in_millis": 1552028842712,
                                "time_in_millis": 6
                            }
                        },
                        "4": {
                            "stage": "DONE",
                            "stats": {
                                "number_of_files": 1,
                                "processed_files": 1,
                                "total_size_in_bytes": 161,
                                "processed_size_in_bytes": 161,
                                "start_time_in_millis": 1552028842508,
                                "time_in_millis": 17
                            }
                        }
                    }
                },
                "sunbirddataaudit": {
                    "shards_stats": {
                        "initializing": 0,
                        "started": 0,
                        "finalizing": 0,
                        "done": 5,
                        "failed": 0,
                        "total": 5
                    },
                    "stats": {
                        "number_of_files": 11,
                        "processed_files": 11,
                        "total_size_in_bytes": 326568,
                        "processed_size_in_bytes": 326568,
                        "start_time_in_millis": 1552028842553,
                        "time_in_millis": 133
                    },
                    "shards": {
                        "0": {
                            "stage": "DONE",
                            "stats": {
                                "number_of_files": 4,
                                "processed_files": 4,
                                "total_size_in_bytes": 284725,
                                "processed_size_in_bytes": 284725,
                                "start_time_in_millis": 1552028842649,
                                "time_in_millis": 37
                            }
                        },
                        "1": {
                            "stage": "DONE",
                            "stats": {
                                "number_of_files": 1,
                                "processed_files": 1,
                                "total_size_in_bytes": 161,
                                "processed_size_in_bytes": 161,
                                "start_time_in_millis": 1552028842595,
                                "time_in_millis": 8
                            }
                        },
                        "2": {
                            "stage": "DONE",
                            "stats": {
                                "number_of_files": 1,
                                "processed_files": 1,
                                "total_size_in_bytes": 161,
                                "processed_size_in_bytes": 161,
                                "start_time_in_millis": 1552028842625,
                                "time_in_millis": 6
                            }
                        },
                        "3": {
                            "stage": "DONE",
                            "stats": {
                                "number_of_files": 1,
                                "processed_files": 1,
                                "total_size_in_bytes": 161,
                                "processed_size_in_bytes": 161,
                                "start_time_in_millis": 1552028842553,
                                "time_in_millis": 7
                            }
                        },
                        "4": {
                            "stage": "DONE",
                            "stats": {
                                "number_of_files": 4,
                                "processed_files": 4,
                                "total_size_in_bytes": 41360,
                                "processed_size_in_bytes": 41360,
                                "start_time_in_millis": 1552028842589,
                                "time_in_millis": 40
                            }
                        }
                    }
                },
                "searchindex": {
                    "shards_stats": {
                        "initializing": 0,
                        "started": 0,
                        "finalizing": 0,
                        "done": 5,
                        "failed": 0,
                        "total": 5
                    },
                    "stats": {
                        "number_of_files": 164,
                        "processed_files": 164,
                        "total_size_in_bytes": 88413544,
                        "processed_size_in_bytes": 88413544,
                        "start_time_in_millis": 1552028842654,
                        "time_in_millis": 2442
                    },
                    "shards": {
                        "0": {
                            "stage": "DONE",
                            "stats": {
                                "number_of_files": 34,
                                "processed_files": 34,
                                "total_size_in_bytes": 17551249,
                                "processed_size_in_bytes": 17551249,
                                "start_time_in_millis": 1552028843668,
                                "time_in_millis": 985
                            }
                        },
                        "1": {
                            "stage": "DONE",
                            "stats": {
                                "number_of_files": 35,
                                "processed_files": 35,
                                "total_size_in_bytes": 23112334,
                                "processed_size_in_bytes": 23112334,
                                "start_time_in_millis": 1552028843238,
                                "time_in_millis": 1096
                            }
                        },
                        "2": {
                            "stage": "DONE",
                            "stats": {
                                "number_of_files": 41,
                                "processed_files": 41,
                                "total_size_in_bytes": 18857283,
                                "processed_size_in_bytes": 18857283,
                                "start_time_in_millis": 1552028842792,
                                "time_in_millis": 812
                            }
                        },
                        "3": {
                            "stage": "DONE",
                            "stats": {
                                "number_of_files": 25,
                                "processed_files": 25,
                                "total_size_in_bytes": 11087982,
                                "processed_size_in_bytes": 11087982,
                                "start_time_in_millis": 1552028842654,
                                "time_in_millis": 543
                            }
                        },
                        "4": {
                            "stage": "DONE",
                            "stats": {
                                "number_of_files": 29,
                                "processed_files": 29,
                                "total_size_in_bytes": 17804696,
                                "processed_size_in_bytes": 17804696,
                                "start_time_in_millis": 1552028844359,
                                "time_in_millis": 737
                            }
                        }
                    }
                }
            }
        }
    ]
}


Problem Statement

Cluster consideration while backup and restore

Solution

For cluster implementation, the path.repo should be configured on all nodes with the same value which will be shared file storage.

We can also use s3 for elasticsearch backup and restore 


References

  1. https://www.elastic.co/guide/en/elasticsearch/reference/5.5/modules-snapshots.html
  2. https://www.elastic.co/guide/en/elasticsearch/plugins/5.5/repository-s3.html
  3. https://www.elastic.co/guide/en/elasticsearch/reference/6.3/restart-upgrade.html