Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

In cassandra data store used by course progress monitoring, few columns which are having date in string format. This results in date and timestamp mismatches when comparing with timezones.
Thus, in order to maintain standards and to address the date comparison issue, datatype migration of these columns from string to timestamp is required.
This document details about the approach taken for migration.

...

Design

In order to

Below table depicts the tables and columns to be migrated.

Keypsace.Table name

Columns to be migratedExisting columns

New columns

sunbird_courses.course_batch

createddate
startdate
enddate
enrollmentenddate
updateddate

created_date
start_date
end_date
enrollment_enddate
updated_date

sunbird_courses.user_enrolments

enrolleddate

enrolled_date

sunbird_courses.user_content_consumption

lastaccesstime

lastcompletedtime

lastupdatedtime

last_access_time

last_completed_time

last_updated_time

sunbird.page_management

createddate

created_date

sunbird.page_section

createddate
updateddate

Option 1: Migration using temp tables

  • For the above mentioned tables, a temp table is created and migration is done using a spark script

  • Delete the existing table, and re-create the table with proper datatype with same name.

  • Migrate the data from temp table to newly created table.

  • The details related to code changes and steps to run the script are here.

Option 2: Migration using new columns

  • Add new columns with type timestamp.

  • Migrate the data from older columns to new columns

  • Drop the older columns having text data.

  • Update APIs, jobs and data products to read data from and write data to new columns.

created_date
updated_date

Assumption

  • All the date timezones are in UTC, as per cassandra doc, timestamps are preferred with UTC.

Clarifications Required

  • Since user_enrolments and user_content_consumption tables have huge data and are distributes across nodes, is it good approach to migrate using a temp table?