Sunbird Elasticsearch migration to multi index

Problem Statement

Since multiple types are deprecated for Elasticsearch 6.x, There is no way to create a new index with multiple type other than migration from older version. This creates challenges as below.

New adopters cannot have sunbird in it's current state.
old static mapping update call doesn't work on migrated index with multiple types.

SB-11532 - Getting issue details... STATUS

Solution Approach

Solution approaches are documented in detail here. This document will concentrate on multi index approach. The multi index approach is divided into two parts

Create new indexes with the settings from old indexes
migration of data of old indexes with each type into separate indexes of single type.
Code changes to point to different indexes in different flows

Problem Statement

How to create new index with the setting from old indexes?

Solution Approach

For creating new index with original settings, first we need to get the settings of the old indices and then we use that setting to create new index

get settings of an index

get index setting

Request
GET /{indexName}/_settings

Response

{
  "searchindex" : {
    "settings" : {
      "index" : {
        "number_of_shards" : "5",
        "provided_name" : "searchindex",
        "creation_date" : "1540294977064",
        "analysis" : {
          "filter" : {
            "mynGram" : {
              "token_chars" : [
                "letter",
                "digit",
                "whitespace",
                "punctuation",
                "symbol"
              ],
              "min_gram" : "1",
              "type" : "ngram",
              "max_gram" : "20"
            }
          },
          "analyzer" : {
            "cs_index_analyzer" : {
              "filter" : [
                "lowercase",
                "mynGram"
              ],
              "type" : "custom",
              "tokenizer" : "standard"
            },
            "keylower" : {
              "filter" : "lowercase",
              "type" : "custom",
              "tokenizer" : "keyword"
            },
            "cs_search_analyzer" : {
              "filter" : [
                "lowercase",
                "standard"
              ],
              "type" : "custom",
              "tokenizer" : "standard"
            }
          }
        },
        "number_of_replicas" : "1",
        "uuid" : "HtjuANPTQH6Q3s4T9wTG3Q",
        "version" : {
          "created" : "5010199",
          "upgraded" : "6030099"
        }
      }
    }
  }
}

example 

curl -X GET http://11.2.3.58:9200/searchindex/_settings

With the response we need to prepare the settings for new indexes, copying the analysis and analyzer field and ignoring index specific fields like uuid, provided_name etc.

Once we have the settings prepared we can create index with the settings

Request

PUT /{indexName}

{
	"settings": {
		"index": {
			"number_of_shards": 5,
			"number_of_replicas": 1,
			"analysis": {
				"filter": {
					"mynGram": {
						"token_chars": [
							"letter",
							"digit",
							"whitespace",
							"punctuation",
							"symbol"
						],
						"min_gram": "1",
						"type": "ngram",
						"max_gram": "20"
					}
				},
				"analyzer": {
					"cs_index_analyzer": {
						"filter": [
							"lowercase",
							"mynGram"
						],
						"type": "custom",
						"tokenizer": "standard"
					},
					"keylower": {
						"filter": "lowercase",
						"type": "custom",
						"tokenizer": "keyword"
					},
					"cs_search_analyzer": {
						"filter": [
							"lowercase",
							"standard"
						],
						"type": "custom",
						"tokenizer": "standard"
					}
				}
			}
		}
	}
}

Response

{
    "acknowledged": true,
    "shards_acknowledged": true,
    "index": "content"
}

example

curl -X PUT \
  http://localhost:9200/content \
  -H 'Content-Type: application/json' \
  -H 'cache-control: no-cache' \
  -d '{
	"settings": {
		"index": {
			"number_of_shards": 5,
			"number_of_replicas": 1,
			"analysis": {
				"filter": {
					"mynGram": {
						"token_chars": [
							"letter",
							"digit",
							"whitespace",
							"punctuation",
							"symbol"
						],
						"min_gram": "1",
						"type": "ngram",
						"max_gram": "20"
					}
				},
				"analyzer": {
					"cs_index_analyzer": {
						"filter": [
							"lowercase",
							"mynGram"
						],
						"type": "custom",
						"tokenizer": "standard"
					},
					"keylower": {
						"filter": "lowercase",
						"type": "custom",
						"tokenizer": "keyword"
					},
					"cs_search_analyzer": {
						"filter": [
							"lowercase",
							"standard"
						],
						"type": "custom",
						"tokenizer": "standard"
					}
				}
			}
		}
	}
}'

Problem Statement

How to migrate old index data with multiple types data to new indexes with single type

Solution Approach

The old data can be migrated to new indexes with

reindex API in elasticsearch
sync functionality in sunbird

pros and cons

approach

pros

cons

comments

reindex API

can apply settings like size, throttling etc.

no involvement of sunbird application

sync flow

need to modify to include support for all types

Problem Statement

How can we use reindex API to migrate data?

Solution Approach

POST /_reindex call can be made with proper arguments

reindex API

Request

POST /_reindex
{
  "source": {
    "index": "{oldIndexName}",
    "type": "{type}"
  },
  "dest": {
    "index": "{newIndexName}",
    "type" : "_doc"
  }
}

Response

{
    "took": 632,
    "timed_out": false,
    "total": 114,
    "updated": 0,
    "created": 114,
    "deleted": 0,
    "batches": 1,
    "version_conflicts": 0,
    "noops": 0,
    "retries": {
        "bulk": 0,
        "search": 0
    },
    "throttled_millis": 0,
    "requests_per_second": -1,
    "throttled_until_millis": 0,
    "failures": []
}


example

curl -X POST \
  http://localhost:9200/_reindex \
  -H 'Content-Type: application/json' \
  -H 'cache-control: no-cache' \
  -d '{
  "source": {
    "index": "searchindex",
    "type": "org"
  },
  "dest": {
    "index": "org",
    "type" : "_doc"
  }
}'

Additional details

The reindex API need to be called for

user
org
usercourses
cbatch
content
badgeassociations
usernotes
userprofilevisibility
location

Open Questions

sunbirddataaudit index is used to log some of the request auditing in elasticsearch. Is it still needed and supported with new multi index way. (AuditLogActions.java has details of which API being audited currently)
sunbirdplugin index is used based on the API call, the type is passed into the request. need discussion as how to support it in new multi index format.
Currently health check url for elasticsearch checks if "searchindex" exists or not, since we are having multiple index for different entity how do we verify health check for elasticsearch? do we just check user index or all indexes or some other way.