This wiki explains current design of course batches, problems with it and proposed changes to the design and implementation to resolve the current issues and able to handle scale.
In current state changes stored in lms service are not reactive to any changes in the course content. This structure creates the issue when some of the contents changes and Sunbird uses locally stored copy of the content. For example, Once the user has completed the full course, if any unit is added or removed from the course, the changes are not dynamically reflected. There are use cases to enable this.
1. Easy way to maintain course details for a batch.
2. Better handling of course progress computation.
3. Easy to fetch data for given course or batch or user.
Batches are created for a course. Course details are most of the time constant but, when the creator publishes it the data will be updated. So, copying the course details and saving along with course batch and user_courses is not good.
Option 1: Maintain cache with ttl
Option 2: Always fetch it from source
A complete solution can be divided into below points
course_batch (ES → cbatch)
column | type | description | used | indexed | can be removed |
---|---|---|---|---|---|
id | text | batchid | PK | ||
countdecrementdate | text | ||||
countdecrementstatus | boolean | ||||
countincrementdate | text | ||||
countincrementstatus | boolean | ||||
coursecreator | text | Y | |||
courseid | text | Y | |||
createdby | text | ||||
createddate | text | ||||
createdfor | list<text> | ||||
description | text | ||||
enddate | text | ||||
enrollmenttype | text | Y | |||
hashtagid | text | ||||
mentors | list<text> | ||||
name | text | ||||
participant | map<text, boolean> | ||||
startdate | text | ||||
status | int | Y | |||
updateddate | text |
content_consumption
column | type | description | used | indexed | can be removed |
---|---|---|---|---|---|
id | text | hash of userid, batchid, courseid and contentid | PK | ||
batchid | text | ||||
completedcount | int | how many time the content was completed | |||
contentid | text | Y | |||
contentversion | text | ||||
courseid | text | Y | |||
datetime | timestamp | ||||
grade | text | N | Y | ||
lastaccesstime | text | ||||
lastcompletedtime | text | ||||
lastupdatedtime | text | ||||
progress | int | progress pushed | |||
result | text | N | Y | ||
score | text | N | Y | ||
status | int | 1 for progress 2 for completed | Y | ||
userid | text | Y | |||
viewcount | int | how many times user viewed the content |
user_courses (ES → usercourses)
column | type | description | used | indexed | can be removed |
---|---|---|---|---|---|
id | text | hash of userid, batchid and courseid | PK | ||
active | boolean | turns false when user unenroll | |||
addedby | text | ||||
batchid | text | Y | |||
completedon | timestamp | ||||
contentid | text | ||||
courseid | text | Y | |||
courselogourl | text | Y | |||
coursename | text | Y | Y | ||
datetime | timestamp | ||||
delta | text | N | Y | ||
description | text | Y | |||
enrolleddate | text | ||||
grade | text | N | Y | ||
lastreadcontentid | text | ||||
lastreadcontentstatus | int | ||||
leafnodescount | int | Y | |||
progress | int | count of content completed | |||
status | int | 1 for progress 2 for completed | Y | ||
tocurl | text | Y | |||
userid | text |
DB Changes approach 1
Remove unused column from user_courses and move the content_consumption as a map into the user_courses table.
Steps involved
COPY sunbird.user_courses (batchid,userid,courseid,contentid,active,addedby,datetime,enrolleddate,lastreadcontentid,lastreadcontentstatus,status,completedon) TO '../files/usercourses.csv' WITH HEADER = TRUE; DROP TABLE sunbird.user_courses; CREATE TABLE IF NOT EXISTS sunbird.user_courses(batchid text, userid text, courseId text, contentid text, active boolean, addedBy text, dateTime timestamp, enrolledDate text,lastReadContentId text, lastReadContentStatus int,status int, completedon timestamp, contentconsumption map<text,frozen<map<text,text>>>, PRIMARY KEY (batchid,userid)); CREATE INDEX inx_ucs_userid ON sunbird.user_courses (userid); CREATE INDEX inx_ucs_courseid ON sunbird.user_courses (courseid); CREATE INDEX inx_ucs_status ON sunbird.user_courses (status); CREATE INDEX inx_ucs_content_consumption ON sunbird.user_courses (contentconsumption); COPY sunbird.user_courses (batchid,userid,courseid,contentid,active,addedby,datetime,enrolleddate,lastreadcontentid,lastreadcontentstatus,status,completedon) FROM '../files/usercourses.csv' WITH HEADER = TRUE; |
2. Run a migration to copy from content_consumption to user_courses contentconsumption column as a map of map (can be a talend ETL job)
3. Code changes corresponsing to new DB structure
4. Progress can be calculated from contentconsumption column of user_courses
5. Corresponsing mapping changes in ES
Pros
Cons
DB Changes approach 2
This approach requires minimum db changes and no migration as we holds all the user_courses related info in user_courses as progress as map<text,int> which defines if a particular content is read or completed. Hence there is no reliance on content_consumption.
Any call to update the user progress will be forwarded to both the table user_courses and content_consumption. All the user course progress would be read from user_courses whereas content_consumption would be holding data as audit and for content/state/read call.
Currently we first put data into content_consumption and then based on it the entry is made into user_courses which is creating problems.
Pros
Notes
→ Since sunbird will no longer store course info, all the course info would be fetched at runtime from KP.
→ A user's current progress and updates will be calculated whenever the api call happens, hence would provide updated state
→ Since sunbird would not be holding course details (like leafnodecount), all the background processes as configured in analytics team would need to fetch latest course details and would need to calculate the progress based on stored progress.