Introduction

This wiki explain the current design and implementation of tracking and monitoring collections. The challenges we have at scale and the proposed design to handle them.

Background & Problem Statement

The sunbird platform supports collection tracking and monitoring. It uses the below APIs to capture the content tracking data, generates progress and score metrics and provide the summary.

...

We have a single API (Content State Update) to capture all the tracking information. So, it has a complex logic to identify the given input is for content progress or assessment submission and etc,.

...

Single API to capture all the tracking data.
Read after write of consumption data and basic summary.

Design:

Extended Enrolment Consumption:

Every instance adapting the sunbrid will have to select one of the option from different consumption modes, this would allow the application to mange the user in avoiding the consumption of content more than once based of specific predefined rules
Mode for any instance will be one-one mapping
Tracking and monitoring of the user consumption can be done for any new context like program, event etc

...

Following are the different modes provided to new instance:

...

Scenario

...

Write Request

...

Read Request

...

Carry Forward Consumption

The content consumed is marked as complete irrespective of context

...

Code Block

language	js

{
    "userid": "<<userid>>"
    "collectionid" : "<<courseid>>",
    "contentid" :"<<contenid>>"
}

Note: Progress will be captured directly under the context

Code Block

language	json

{
  "userId": "<<userid>>",
  "collectionId": "<<courseid>>",
  "contentId": "<<contentid>>"
}

...

Copy Forward Consumption

The content consumed is marked as complete along with new entry in the database for the context

Code Block

language	js

{
    "userid": "<<userid>>",
    "collectionid": "<<courseid>>",
    "contextid": "<<programid>>",
    "contentid": "<<contentid>>"
}

Code Block

language	json

{
  "userId": "<<userid>>",
  "collectionId": "<<courseid>>",
  "contentId": "<<contentid>>"
}

...

Strict Mode Consumption

The content will be consumed as new one every time

Code Block

language	js

{
    "userid": "<<userid>>",
    "collectionid": "<<courseid>>",
    "contextid": "<<programid>>",
    "contentid": "<<contenid>>"
}

Viewing Service:

Viewing Service collects the “content view updates” and generate events to process and provide summary to the users.

When a user starts viewing a content, a view entry created. There are three stages when a user view the content. They are start, progress and end. Considering these three stages we have 3 API endpoints to capture this information for each stage.

An event will be generated when a content view ends. The summary computation jobs will read these event to process and compute the overall summary of the collection.

The computed summary will be available from API interface to download and view.

Summary Computation Jobs - Flink:

The Flink jobs are used to read and compute the summary of a collection consumption progress when the user view ends. It also computes the score for the current view and best score using all the previous views.

Info
The event is just a trigger to initiate the computation of the collection progress. The job fetches the raw data from DB to compute the overall progress. When an assessment type content (Ex: QuestionSet) view ends, it expects the ASSESS events data to assessment submit API for score metrics computation.

Once the view ends, the progress and score will be updated asynchronously by the flink jobs.

Content Consumption Flow:

...

Assessment Consumption Flow:

...

At the end all the clients and report jobs need is the following map for every collection:

Code Block

language	js

content_status = {
  "<content_id>": <status>
}

For ex: content_status = {
  "do_1234": 2, // Completed
  "do_1235": 2, // Completed
  "do_1236": 2, // Completed
  "do_1237": 1, // In Progress
  "do_1238": 0  // Not started
}

Key Design Problems:

Single API to capture all the tracking data.
Read after write of consumption data and basic summary.
Data is written and fetched from multiple tables leading to consistency issues between API and Reporting Jobs
The low level tables (user_content_consumption & user_assessments) grow at an exponential rate when we start to track everything
Archiving old data is not possible as the API’s read data from low level tables

Design

To be able to handle the above design problems, we have analyzed how similar products (like netflix) track everything what a user does (or views) and that too at scale. Based on the analysis we have broken down the APIs into more granular APIs with a single DB update so that each API can be scaled independently.

Following are few of the Cassandra scale issues for various approaches:

Use a Map datatype - We could have used a map datatype and updated the content status via API. But this would result in multiple SSTables (per addition) and tombstones (update & deletes). As the API is the most used api and the table would have billions in records, this would result in reads getting slowed down drastically and entire cluster slowed down. We would have been forced to do compaction at regular intervals
Use a frozen map datatype - With a frozen map datatype we would have a way around multiple SSTables lookup and tombstones but the API would not be able to append/add to the map. It needs to always replace the map. This would fail if there are two concurrency write requests for a user (can happen if the data is stored offline and synced to server) and only one write would have succeeded.

We have worked around Cassandra scaling issues (read from more than 2 SSTables and tombstones) and read after write scenario by having a high performance cache at the center. With this approach:

The update APIs can update the content status in the low level table and update the content status map in redis (using hmset). Concurrent requests would not be a problem
The read API can read the content status directly from redis. Meanwhile the content_status is updated in a frozen map datatype field via the activity aggregator job.
The job serializes by user requests and reads the status from low level table before computing the overall content_status. This would ensure consistency between API and reporting

Content Consumption APIs

...

Assessment Consumption APIs

...

Viewer Service

Viewing Service collects the “content view updates” and generate events to process and provide summary to the users.

When a user starts viewing a content, a view entry created. There are three stages when a user view the content. They are start, progress and end. Considering these three stages we have 3 API endpoints to capture this information for each stage.

An event will be generated when a content view ends. The summary computation jobs will read these event to process and compute the overall summary of the collection.

The computed summary will be available from API interface to download and view.

Summary Computation Jobs - Flink:

The Flink jobs are used to read and compute the summary of a collection consumption progress when the user view ends. It also computes the score for the current view and best score using all the previous views.

Info
The event is just a trigger to initiate the computation of the collection progress. The job fetches the raw data from DB to compute the overall progress. When an assessment type content (Ex: QuestionSet) view ends, it expects the ASSESS events data to assessment submit API for score metrics computation.

Once the view ends, the progress and score will be updated asynchronously by the flink jobs.

Extended Enrolment Consumption:

Every new instance adapting the sunbird platform will have to select one of the option from 3 context modes, this would allow the application to mange the user in avoiding the consumption of content more than once based of specific predefined rules
Mode for any instance will be one to one mapping
With the extended design , tracking and monitoring of the user consumption can be done for any new context like program, event etc

...

Following are the different modes provided to new instance:

Scenario

Write Request

Read Request

1

Carry Forward Consumption

The content consumed is marked as complete irrespective of context

Code Block

language	js

{
    "userid": "<<userid>>"
    "collectionid" : "<<courseid>>",
    "contentid" :"<<contenid>>"
}

Note: Progress will be captured directly under the context

Code Block

language	json

{
  "userId": "<<userid>>",
  "collectionId": "<<courseid>>",
  "contentId": "<<contentid>>"
}

2

Copy Forward Consumption

The content consumed is marked as complete along with new entry in the database for the context

Code Block

language	js

{
    "userid": "<<userid>>",
    "collectionid": "<<courseid>>",
    "contextid": "<<programid>>",
    "contentid": "<<contentid>>"
}

Code Block

language	json

{
  "userId": "<<userid>>",
  "collectionId": "<<courseid>>",
  "contentId": "<<contentid>>"
}

3

Strict Mode Consumption

The content will be consumed as new one every time

Code Block

language	js

{
    "userid": "<<userid>>",
    "collectionid": "<<courseid>>",
    "contextid": "<<programid>>",
    "contentid": "<<contenid>>"
}

Viewer-Service - Content Consumption Scenarios:

...

Version	Old Version 33	New Version 34
Changes made by	Revathi Kotla	Santhosh Vasabhaktula
Saved on	Aug 23, 2021	Aug 25, 2021

Page Comparison

Versions Compared

Key

Introduction

Background & Problem Statement

Design:

Extended Enrolment Consumption:

Viewing Service:

Summary Computation Jobs - Flink:

Content Consumption Flow:

Assessment Consumption Flow:

Key Design Problems:

Design

Content Consumption APIs

Assessment Consumption APIs

Viewer Service

Summary Computation Jobs - Flink:

Extended Enrolment Consumption:

Viewer-Service - Content Consumption Scenarios: