Problem Statement
Since multiple types are deprecated for Elasticsearch 6.x, There is no way to create a new index with multiple type other than migration from older version. This creates challenges as below.
- New adopters cannot have sunbird in it's current state.
- old static mapping update call doesn't work on migrated index with multiple types.
- SB-11532Getting issue details... STATUS
Solution Approach
Solution approaches are documented in detail here. This document will concentrate on multi index approach. The multi index approach is divided into two parts
- Create new indexes with the settings from old indexes
- migration of data of old indexes with each type into separate indexes of single type.
- Code changes to point to different indexes in different flows
Problem Statement
How to create new index with the setting from old indexes?
Solution Approach
For creating new index with original settings, first we need to get the settings of the old indices and then we use that setting to create new index
get settings of an index
Request GET /{indexName}/_settings Response { "searchindex" : { "settings" : { "index" : { "number_of_shards" : "5", "provided_name" : "searchindex", "creation_date" : "1540294977064", "analysis" : { "filter" : { "mynGram" : { "token_chars" : [ "letter", "digit", "whitespace", "punctuation", "symbol" ], "min_gram" : "1", "type" : "ngram", "max_gram" : "20" } }, "analyzer" : { "cs_index_analyzer" : { "filter" : [ "lowercase", "mynGram" ], "type" : "custom", "tokenizer" : "standard" }, "keylower" : { "filter" : "lowercase", "type" : "custom", "tokenizer" : "keyword" }, "cs_search_analyzer" : { "filter" : [ "lowercase", "standard" ], "type" : "custom", "tokenizer" : "standard" } } }, "number_of_replicas" : "1", "uuid" : "HtjuANPTQH6Q3s4T9wTG3Q", "version" : { "created" : "5010199", "upgraded" : "6030099" } } } } } example curl -X GET http://11.2.3.58:9200/searchindex/_settings
With the response we need to prepare the settings for new indexes, copying the analysis and analyzer field and ignoring index specific fields like uuid, provided_name etc.
Once we have the settings prepared we can create index with the settings
Request PUT /{indexName} { "settings": { "index": { "number_of_shards": 5, "number_of_replicas": 1, "analysis": { "filter": { "mynGram": { "token_chars": [ "letter", "digit", "whitespace", "punctuation", "symbol" ], "min_gram": "1", "type": "ngram", "max_gram": "20" } }, "analyzer": { "cs_index_analyzer": { "filter": [ "lowercase", "mynGram" ], "type": "custom", "tokenizer": "standard" }, "keylower": { "filter": "lowercase", "type": "custom", "tokenizer": "keyword" }, "cs_search_analyzer": { "filter": [ "lowercase", "standard" ], "type": "custom", "tokenizer": "standard" } } } } } } Response { "acknowledged": true, "shards_acknowledged": true, "index": "content" } example curl -X PUT \ http://localhost:9200/content \ -H 'Content-Type: application/json' \ -H 'cache-control: no-cache' \ -d '{ "settings": { "index": { "number_of_shards": 5, "number_of_replicas": 1, "analysis": { "filter": { "mynGram": { "token_chars": [ "letter", "digit", "whitespace", "punctuation", "symbol" ], "min_gram": "1", "type": "ngram", "max_gram": "20" } }, "analyzer": { "cs_index_analyzer": { "filter": [ "lowercase", "mynGram" ], "type": "custom", "tokenizer": "standard" }, "keylower": { "filter": "lowercase", "type": "custom", "tokenizer": "keyword" }, "cs_search_analyzer": { "filter": [ "lowercase", "standard" ], "type": "custom", "tokenizer": "standard" } } } } } }'
Problem Statement
How to migrate old index data with multiple types data to new indexes with single type
Solution Approach
The old data can be migrated to new indexes with
- reindex API in elasticsearch
- sync functionality in sunbird
pros and cons
approach | pros | cons | comments |
---|---|---|---|
reindex API | can apply settings like size, throttling etc. no involvement of sunbird application | ||
sync flow | need to modify to include support for all types |
Problem Statement
How can we use reindex API to migrate data?
Solution Approach
POST /_reindex call can be made with proper arguments
Request POST /_reindex { "source": { "index": "{oldIndexName}", "type": "{type}" }, "dest": { "index": "{newIndexName}", "type" : "_doc" } } Response { "took": 632, "timed_out": false, "total": 114, "updated": 0, "created": 114, "deleted": 0, "batches": 1, "version_conflicts": 0, "noops": 0, "retries": { "bulk": 0, "search": 0 }, "throttled_millis": 0, "requests_per_second": -1, "throttled_until_millis": 0, "failures": [] } example curl -X POST \ http://localhost:9200/_reindex \ -H 'Content-Type: application/json' \ -H 'cache-control: no-cache' \ -d '{ "source": { "index": "searchindex", "type": "org" }, "dest": { "index": "org", "type" : "_doc" } }'
The reindex API need to be called for
- user
- org
- usercourses
- cbatch
- content
- badgeassociations
- usernotes
- userprofilevisibility
- location
Open Questions
- sunbirddataaudit index is used to log some of the request auditing in elasticsearch. Is it still needed and supported with new multi index way. (AuditLogActions.java has details of which API being audited currently)
- sunbirdplugin index is used based on the API call, the type is passed into the request. need discussion as how to support it in new multi index format.
- Currently health check url for elasticsearch checks if "searchindex" exists or not, since we are having multiple index for different entity how do we verify health check for elasticsearch? do we just check user index or all indexes or some other way.