Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

Introduction

This wiki explain the current design and implementation of tracking and monitoring collections. The challenges we have at scale and the proposed design to handle them.

Background & Problem Statement

The sunbird platform supports collection tracking and monitoring. It uses the below APIs to capture the content tracking data, generates progress and score metrics and provide the summary.

...

We have a single API (Content State Update) to capture all the tracking information. So, it has a complex logic to identify the given input is for content progress or assessment submission and etc,.

...

...

  1. Single API to capture all the tracking data.

  2. Read after write of consumption data and basic summary.

Design:

Extended Enrolment Consumption:

  • Every instance adapting the sunbrid will have to select one of the option from different consumption modes, this would allow the application to mange the user in avoiding the consumption of content more than once based of specific predefined rules

  • Mode for any instance will be one-one mapping

  • Tracking and monitoring of the user consumption can be done for any new context like program, event etc

...

Following are the different modes provided to new instance:

...

Scenario

...

Write Request

...

Read Request

...

Carry Forward Consumption

  • The content consumed is marked as complete irrespective of context

...

Code Block
languagejs
{
    "userid": "<<userid>>"
    "collectionid" : "<<courseid>>",
    "contentid" :"<<contenid>>"
}

Note: Progress will be captured directly under the context

Code Block
languagejson
{
  "userId": "<<userid>>",
  "collectionId": "<<courseid>>",
  "contentId": "<<contentid>>"
}

...

Copy Forward Consumption

  • The content consumed is marked as complete along with new entry in the database for the context

Code Block
languagejs
{
    "userid": "<<userid>>",
    "collectionid": "<<courseid>>",
    "contextid": "<<programid>>",
    "contentid": "<<contentid>>"
}
Code Block
languagejson
{
  "userId": "<<userid>>",
  "collectionId": "<<courseid>>",
  "contentId": "<<contentid>>"
}

...

Strict Mode Consumption

  • The content will be consumed as new one every time

Code Block
languagejs
{
    "userid": "<<userid>>",
    "collectionid": "<<courseid>>",
    "contextid": "<<programid>>",
    "contentid": "<<contenid>>"
}

Viewing Service:

Viewing Service collects the “content view updates” and generate events to process and provide summary to the users.

When a user starts viewing a content, a view entry created. There are three stages when a user view the content. They are start, progress and end. Considering these three stages we have 3 API endpoints to capture this information for each stage.

An event will be generated when a content view ends. The summary computation jobs will read these event to process and compute the overall summary of the collection.

The computed summary will be available from API interface to download and view.

Summary Computation Jobs - Flink:

The Flink jobs are used to read and compute the summary of a collection consumption progress when the user view ends. It also computes the score for the current view and best score using all the previous views.

Info
  • The event is just a trigger to initiate the computation of the collection progress. The job fetches the raw data from DB to compute the overall progress.

  • When an assessment type content (Ex: QuestionSet) view ends, it expects the ASSESS events data to assessment submit API for score metrics computation.

Once the view ends, the progress and score will be updated asynchronously by the flink jobs.

Content Consumption Flow:

...

Assessment Consumption Flow:

...

At the end all the clients and report jobs need is the following map for every collection:

Code Block
languagejs
content_status = {
  "<content_id>": <status>
}

For ex: content_status = {
  "do_1234": 2, // Completed
  "do_1235": 2, // Completed
  "do_1236": 2, // Completed
  "do_1237": 1, // In Progress
  "do_1238": 0  // Not started
}

Key Design Problems:

  1. Single API to capture all the tracking data.

  2. Read after write of consumption data and basic summary.

  3. Data is written and fetched from multiple tables leading to consistency issues between API and Reporting Jobs

  4. The low level tables (user_content_consumption & user_assessments) grow at an exponential rate when we start to track everything

  5. Archiving old data is not possible as the API’s read data from low level tables

Design

To be able to handle the above design problems, we have analyzed how similar products (like netflix) track everything what a user does (or views) and that too at scale. Based on the analysis we have broken down the APIs into more granular APIs with a single DB update so that each API can be scaled independently.

Following are few of the Cassandra scale issues for various approaches:

  1. Use a Map datatype - We could have used a map datatype and updated the content status via API. But this would result in multiple SSTables (per addition) and tombstones (update & deletes). As the API is the most used api and the table would have billions in records, this would result in reads getting slowed down drastically and entire cluster slowed down. We would have been forced to do compaction at regular intervals

  2. Use a frozen map datatype - With a frozen map datatype we would have a way around multiple SSTables lookup and tombstones but the API would not be able to append/add to the map. It needs to always replace the map. This would fail if there are two concurrency write requests for a user (can happen if the data is stored offline and synced to server) and only one write would have succeeded.

We have worked around Cassandra scaling issues (read from more than 2 SSTables and tombstones) and read after write scenario by having a high performance cache at the center. With this approach:

  1. The update APIs can update the content status in the low level table and update the content status map in redis (using hmset). Concurrent requests would not be a problem

  2. The read API can read the content status directly from redis. Meanwhile the content_status is updated in a frozen map datatype field via the activity aggregator job.

  3. The job serializes by user requests and reads the status from low level table before computing the overall content_status. This would ensure consistency between API and reporting

Content Consumption APIs

...

Assessment Consumption APIs

...

Viewer Service

Viewing Service collects the “content view updates” and generate events to process and provide summary to the users.

When a user starts viewing a content, a view entry created. There are three stages when a user view the content. They are start, progress and end. Considering these three stages we have 3 API endpoints to capture this information for each stage.

An event will be generated when a content view ends. The summary computation jobs will read these event to process and compute the overall summary of the collection.

The computed summary will be available from API interface to download and view.

Summary Computation Jobs - Flink:

The Flink jobs are used to read and compute the summary of a collection consumption progress when the user view ends. It also computes the score for the current view and best score using all the previous views.

Info
  • The event is just a trigger to initiate the computation of the collection progress. The job fetches the raw data from DB to compute the overall progress.

  • When an assessment type content (Ex: QuestionSet) view ends, it expects the ASSESS events data to assessment submit API for score metrics computation.

Once the view ends, the progress and score will be updated asynchronously by the flink jobs.

Extended Enrolment Consumption:

  • Every new instance adapting the sunbird platform will have to select one of the option from 3 context modes, this would allow the application to mange the user in avoiding the consumption of content more than once based of specific predefined rules

  • Mode for any instance will be one to one mapping

  • With the extended design , tracking and monitoring of the user consumption can be done for any new context like program, event etc

...

Following are the different modes provided to new instance:

Scenario

Write Request

Read Request

1

Carry Forward Consumption

  • The content consumed is marked as complete irrespective of context

Code Block
languagejs
{
    "userid": "<<userid>>"
    "collectionid" : "<<courseid>>",
    "contentid" :"<<contenid>>"
}

Note: Progress will be captured directly under the context

Code Block
languagejson
{
  "userId": "<<userid>>",
  "collectionId": "<<courseid>>",
  "contentId": "<<contentid>>"
}

2

Copy Forward Consumption

  • The content consumed is marked as complete along with new entry in the database for the context

Code Block
languagejs
{
    "userid": "<<userid>>",
    "collectionid": "<<courseid>>",
    "contextid": "<<programid>>",
    "contentid": "<<contentid>>"
}

Code Block
languagejson
{
  "userId": "<<userid>>",
  "collectionId": "<<courseid>>",
  "contentId": "<<contentid>>"
}

3

Strict Mode Consumption

  • The content will be consumed as new one every time

Code Block
languagejs
{
    "userid": "<<userid>>",
    "collectionid": "<<courseid>>",
    "contextid": "<<programid>>",
    "contentid": "<<contenid>>"
}

Viewer-Service - Content Consumption Scenarios:

...