Skip to end of metadata
Go to start of metadata

You are viewing an old version of this page. View the current version.

Compare with Current View Page History

« Previous Version 6 Next »

Problem Statement

Since multiple types are deprecated for Elasticsearch 6.x, There is no way to create a new index with multiple type other than migration from older version. This creates challenges as below.

  1. New adopters cannot have sunbird in it's current state.
  2. old static mapping update call doesn't work on migrated index with multiple types.

  SB-11532 - Getting issue details... STATUS

Solution Approach

Solution approaches are documented in detail here. This document will concentrate on multi index approach. The multi index approach is divided into two parts

  1. Create new indexes with the settings from old indexes
  2. migration of data of old indexes with each type into separate indexes of single type.
  3. Code changes to point to different indexes in different flows

Problem Statement

How to create new index with the setting from old indexes?

Solution Approach

For creating new index with original settings, first we need to get the settings of the old indices and then we use that setting to create new index

get settings of an index

get index setting
Request
GET /{indexName}/_settings

Response

{
  "searchindex" : {
    "settings" : {
      "index" : {
        "number_of_shards" : "5",
        "provided_name" : "searchindex",
        "creation_date" : "1540294977064",
        "analysis" : {
          "filter" : {
            "mynGram" : {
              "token_chars" : [
                "letter",
                "digit",
                "whitespace",
                "punctuation",
                "symbol"
              ],
              "min_gram" : "1",
              "type" : "ngram",
              "max_gram" : "20"
            }
          },
          "analyzer" : {
            "cs_index_analyzer" : {
              "filter" : [
                "lowercase",
                "mynGram"
              ],
              "type" : "custom",
              "tokenizer" : "standard"
            },
            "keylower" : {
              "filter" : "lowercase",
              "type" : "custom",
              "tokenizer" : "keyword"
            },
            "cs_search_analyzer" : {
              "filter" : [
                "lowercase",
                "standard"
              ],
              "type" : "custom",
              "tokenizer" : "standard"
            }
          }
        },
        "number_of_replicas" : "1",
        "uuid" : "HtjuANPTQH6Q3s4T9wTG3Q",
        "version" : {
          "created" : "5010199",
          "upgraded" : "6030099"
        }
      }
    }
  }
}

example 

curl -X GET http://11.2.3.58:9200/searchindex/_settings

With the response we need to prepare the settings for new indexes, copying the analysis and analyzer field and ignoring index specific fields like uuid, provided_name etc.

Once we have the settings prepared we can create index with the settings

Request

PUT /{indexName}

{
	"settings": {
		"index": {
			"number_of_shards": 5,
			"number_of_replicas": 1,
			"analysis": {
				"filter": {
					"mynGram": {
						"token_chars": [
							"letter",
							"digit",
							"whitespace",
							"punctuation",
							"symbol"
						],
						"min_gram": "1",
						"type": "ngram",
						"max_gram": "20"
					}
				},
				"analyzer": {
					"cs_index_analyzer": {
						"filter": [
							"lowercase",
							"mynGram"
						],
						"type": "custom",
						"tokenizer": "standard"
					},
					"keylower": {
						"filter": "lowercase",
						"type": "custom",
						"tokenizer": "keyword"
					},
					"cs_search_analyzer": {
						"filter": [
							"lowercase",
							"standard"
						],
						"type": "custom",
						"tokenizer": "standard"
					}
				}
			}
		}
	}
}

Response

{
    "acknowledged": true,
    "shards_acknowledged": true,
    "index": "content"
}

example

curl -X PUT \
  http://localhost:9200/content \
  -H 'Content-Type: application/json' \
  -H 'cache-control: no-cache' \
  -d '{
	"settings": {
		"index": {
			"number_of_shards": 5,
			"number_of_replicas": 1,
			"analysis": {
				"filter": {
					"mynGram": {
						"token_chars": [
							"letter",
							"digit",
							"whitespace",
							"punctuation",
							"symbol"
						],
						"min_gram": "1",
						"type": "ngram",
						"max_gram": "20"
					}
				},
				"analyzer": {
					"cs_index_analyzer": {
						"filter": [
							"lowercase",
							"mynGram"
						],
						"type": "custom",
						"tokenizer": "standard"
					},
					"keylower": {
						"filter": "lowercase",
						"type": "custom",
						"tokenizer": "keyword"
					},
					"cs_search_analyzer": {
						"filter": [
							"lowercase",
							"standard"
						],
						"type": "custom",
						"tokenizer": "standard"
					}
				}
			}
		}
	}
}'



Problem Statement

How to migrate old index data with multiple types data to new indexes with single type

Solution Approach 

The old data can be migrated to new indexes with

  1. reindex API in elasticsearch
  2. sync functionality in sunbird

pros and cons

approachprosconscomments
reindex API

can apply settings like size, throttling etc.

no involvement of sunbird application



sync flow
need to modify to include support for all types


Problem Statement

How can we use reindex API to migrate data?

Solution Approach

POST /_reindex call can be made with proper arguments

reindex API
Request

POST /_reindex
{
  "source": {
    "index": "{oldIndexName}",
    "type": "{type}"
  },
  "dest": {
    "index": "{newIndexName}",
    "type" : "_doc"
  }
}

Response

{
    "took": 632,
    "timed_out": false,
    "total": 114,
    "updated": 0,
    "created": 114,
    "deleted": 0,
    "batches": 1,
    "version_conflicts": 0,
    "noops": 0,
    "retries": {
        "bulk": 0,
        "search": 0
    },
    "throttled_millis": 0,
    "requests_per_second": -1,
    "throttled_until_millis": 0,
    "failures": []
}


example

curl -X POST \
  http://localhost:9200/_reindex \
  -H 'Content-Type: application/json' \
  -H 'cache-control: no-cache' \
  -d '{
  "source": {
    "index": "searchindex",
    "type": "org"
  },
  "dest": {
    "index": "org",
    "type" : "_doc"
  }
}'

Additional details

The reindex API need to be called for

  • user
  • org
  • usercourses
  • cbatch
  • content
  • badgeassociations
  • usernotes
  • userprofilevisibility
  • location


Open Questions

  1. sunbirddataaudit index is used to log some of the request auditing in elasticsearch. Is it still needed and supported with new multi index way. (AuditLogActions.java has details of which API being audited currently)
  2. sunbirdplugin index is used based on the API call, the type is passed into the request. need discussion as how to support it in new multi index format.
  3. Currently health check url for elasticsearch checks if "searchindex" exists or not, since we are having multiple index for different entity how do we verify health check for elasticsearch? do we just check user index or all indexes or some other way.



  • No labels