SB-20865: Migrate certificate data from Cassandra to Blog storage for already issue certificate

Overview

With cert-registry and related changes done with ticket/SB-20811, migration of existing certificate is required.

Steps for data migration

  1. Between dates, d1 and d2, get the list of certificates issued from ES 

  2. For each certificate get a record from Cassandra

  3. If row.jsonurl is empty || data.printURI is non-empty

  •  Upload the data into Cloud Store

  •  Update Cassandra with the following changes. Remove the data.printURI and append row.jsonURL with the uploaded artifact URL 

  • Sync ES so that the data JSON object is available for the portal to display course name, issued on details

Using the following query, get the data from ES and Note down the count before running the script

curl --location --request POST '{{host}}/api/certreg/v1/certs/search' \ --header 'Content-Type: application/json' \ --header 'Authorization: {{api-key}}' \ --data-raw '{ "request": { "_source": [ "identifier" ], "query": { "bool": { "must": [ { "bool": { "must_not": { "exists": { "field": "data" } } } }, { "bool": { "must_not": { "exists": { "field": "jsonUrl" } } } }, { "range": { "createdAt": { "gte": "2020-08-01", "lte": "2020-10-01" } } } ] } } } }'

Problem Statement: We cannot get all the records at one search call, in ES the limit is 10000. 

Solution:  Search using scroll in ES (Scroll | Elasticsearch Reference)

While a search request returns a single page of results, the scroll API can be used to retrieve large numbers of results (or even all results) from a single search request, in much the same way as you would use a cursor on a traditional database.

curl --location --request POST '11.2.3.58:9200/certs/_search?scroll=1m' \ --header 'Content-Type: application/json' \ --data-raw '{ "_source": [ "identifier", ], "size": 10000, "query": { "bool": { "must": [ { "bool": { "must_not": { "exists": { "field": "data" } } } }, { "bool": { "must_not": { "exists": { "field": "jsonUrl" } } } }, { "range": { "createdAt": { "gte": "2020-08-01", "lte": "2020-10-01" } } } ] } } }'

The result from the above request includes a _scroll_id, which should be passed to the scroll API in order to retrieve the next batch of results.

curl --location --request GET '11.2.3.58:9200/_search/scroll' \ --header 'Content-Type: application/json' \ --data-raw '{ "scroll": "1h", "scroll_id": "" }'

Prerequisite:

  • Connection to Cassandra and ES

  • require the following env variables to be set  (reference)

  • the values for env vars should be the same as cert-service env variables

  • Add only one IP for es_conn_info

Steps to run the script

Git Hub Repo

  1. mvn clean install

  2. cd target

  3. java -jar certificate-migration-1.0-SNAPSHOT-jar-with-dependencies.jar

Steps To Verify

  • Compare the count given post script.

  • Hit the following API: 

  • Get the count from the res, compare count after running and before running 

  • Check the logs to get the total records corrected, skipped 
    Total records: {} Total Records corrected: {} ,Certificates Successfully updated in cassandra: {} Successfully synced in ES : {}, Certificates Skipped : {} , certificates not found in cassandra : {}
    Certificates failed count : {}

  • At the end of the run, the program will generate the file named  update_cassandra_success_record.txt,update_es_success_record.txt  which has certId, so it is easy to identify the records which are successfully processed...

  • Login to Cassandra

  • select id,jsonUrl,updatedAt and data from from the table cert-registry(select id,jsonUrl,updatedat from sunbird.cert_registry; )

  • verify the column jsonUrl, updatedAt and data should be updated for the certId taken from file generated or logs. Data should not be containing printUri, updatedAt should be a recent date when jar starts executing.

  • And In ES, hit the following API by providing certId (any id, which got updated) in req, In the response data object should be present