...
In cassandra data store used by course progress monitoring, few columns which are having date in string format. This results in date and timestamp mismatches when comparing with timezones.
Thus, in order to maintain standards and to address the date comparison issue, datatype migration of these columns from string to timestamp is required.
This document details about the approach taken for migration.
...
Design
In order to
Below table depicts the tables and columns to be migrated.
Keypsace.Table name | Columns to be migratedExisting columns | New columns |
---|---|---|
sunbird_courses.course_batch | createddate | created_date |
sunbird_courses.user_enrolments | enrolleddate | enrolled_date |
sunbird_courses.user_content_consumption | lastaccesstime lastcompletedtime lastupdatedtime | last_access_time last_completed_time last_updated_time |
sunbird.page_management | createddate | created_date |
sunbird.page_section | createddate |
Option 1: Migration using temp tables
For the above mentioned tables, a temp table is created and migration is done using a spark script
Delete the existing table, and re-create the table with proper datatype with same name.
Migrate the data from temp table to newly created table.
The details related to code changes and steps to run the script are here.
Option 2: Migration using new columns
Add new columns with type timestamp.
Migrate the data from older columns to new columns
Drop the older columns having text data.
Update APIs, jobs and data products to read data from and write data to new columns.
created_date |
Assumption
All the date timezones are in UTC, as per cassandra doc, timestamps are preferred with UTC.
Clarifications Required
Since user_enrolments and user_content_consumption tables have huge data and are distributes across nodes, is it good approach to migrate using a temp table?