Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

Introduction

This wiki explains current design of course batches, problems with it and proposed changes to the design and implementation to resolve the current issues and able to handle scale.

Background & Problem statement:

In current state changes stored in lms service are not reactive to any changes in the course content. This structure creates the issue when some of the contents changes and Sunbird uses locally stored copy of the content. For example, Once the user has completed the full course, if any unit is added or removed from the course, the changes are not dynamically reflected. There are use cases to enable this.

Jira Legacy
serverSystem JIRA
columnskey,summary,type,created,updated,due,assignee,reporter,priority,status,resolution
serverId2207a759-5bc8-39c5-9cd2-aa9ccc1f65dd
keySB-12089


Image Added


Key Design Problems:

1. Easy way to maintain course details for a batch.
2. Better handling of course progress computation.
3. Easy to fetch data for given course or batch or user.


Design:

Managing Course Details:

Batches are created for a course. Course details are most of the time constant but, when the creator publishes it the data will be updated. So, copying the course details and saving along with course batch and user_courses is not good.

Option 1: Maintain cache with ttl

Option 2: Always fetch it from source

Solution Approach

A complete solution can be divided into below points

  1. remove the course data stored into db (ex. leafNodeCounts, courseName, tocUrl etc.)
  2. Restructure the DB tables and corresponding ES
  3. migrate the old data according to new structure
  4. Code changes in accordance with the restructured data model
  5. Change the sync process of user progress update to an async process with the help of kafka messaging.


Current DB structure related to course batch progress

...

Code Block
COPY sunbird.user_courses (batchid,userid,courseid,contentid,active,addedby,datetime,enrolleddate,lastreadcontentid,lastreadcontentstatus,status,completedon) TO '../files/usercourses.csv' WITH HEADER = TRUE;

DROP TABLE sunbird.user_courses;

CREATE TABLE IF NOT EXISTS sunbird.user_courses_new(batchid text, userid text, courseId text, contentid text, active boolean, addedBy text, dateTime timestamp, enrolledDate text,lastReadContentId text,
lastReadContentStatus int,status int, completedon timestamp, contentconsumption map<text,frozen<map<text,text>>>, PRIMARY KEY (batchid,userid));

CREATE INDEX inx_ucs_userid ON sunbird.user_courses_new (userid);
CREATE INDEX inx_ucs_courseid ON sunbird.user_courses_new (courseid);
CREATE INDEX inx_ucs_status ON sunbird.user_courses_new (status);
CREATE INDEX inx_ucs_content_consumption ON sunbird.user_courses_new (contentconsumption);

COPY sunbird.user_courses (batchid,userid,courseid,contentid,active,addedby,datetime,enrolleddate,lastreadcontentid,lastreadcontentstatus,status,completedon) FROM '../files/usercourses.csv' WITH HEADER = TRUE;

...

  1. all the user progress and content consumption is in the same row eradicates ambiguity.
  2. Easier easier to get the data based on batchid as it's the partition key, or batchid and userid.
  3. A migration of data from content_consumption to user_courses will be needed. 

Cons

  1. the content/state/read is hampered as there is no way to output certain map component, hence the filtering of contents needs to be done in java, which is an overhead.
  2. Related to above point, secondary indexes and data centralization is anti-pattern in cassandra.

...

  1. content/state/read would be efficient with userid as partition key and batchid/courseid as cluster key
  2. no migration needed
  3. user_courses will hold sufficient info to show a user's current progress.


Notes

→ Since sunbird will no longer store course info, all the course info would be fetched at runtime from KP.

→ A user's current progress and updates will be calculated whenever the api call happens, hence would provide updated state

→ Since sunbird would not be holding course details (like leafnodecount), all the background processes as configured in analytics team would need to fetch latest course details and would need to calculate the progress based on stored progress.