Skip to end of metadata
Go to start of metadata

You are viewing an old version of this page. View the current version.

Compare with Current View Page History

« Previous Version 2 Next »

Overview

In cassandra data store used by course progress monitoring, few columns which are having date in string format. This results in date and timestamp mismatches when comparing with timezones.
Thus, in order to maintain standards and to address the date comparison issue, datatype migration of these columns from string to timestamp is required.
This document details about the approach taken for migration.

Migration Approach

Below table depicts the tables and columns to be migrated.

Keypsace.Table name

Columns to be migrated

sunbird_courses.course_batch

createddate
startdate
enddate
enrollmentenddate
updateddate

sunbird_courses.user_enrolments

enrolleddate

sunbird_courses.user_content_consumption

lastaccesstime

lastcompletedtime

lastupdatedtime

sunbird.page_management

createddate

sunbird.page_section

createddate
updateddate

Option 1: Migration using temp tables

  • For the above mentioned tables, a temp table is created and migration is done using a spark script

  • Delete the existing table, and re-create the table with proper datatype with same name.

  • Migrate the data from temp table to newly created table.

  • The details related to code changes and steps to run the script are here.

Option 2: Migration using new columns

  • Add new columns with type timestamp.

  • Migrate the data from older columns to new columns

  • Drop the older columns having text data.

  • Update APIs, jobs and data products to read data from and write data to new columns.

Assumption

  • All the date timezones are in UTC, as per cassandra doc, timestamps are preferred with UTC.

Clarifications Required

  • Since user_enrolments and user_content_consumption tables have huge data and are distributes across nodes, is it good approach to migrate using a temp table?

  • No labels