Skip to end of metadata
Go to start of metadata

You are viewing an old version of this page. View the current version.

Compare with Current View Page History

« Previous Version 34 Next »

Introduction

This wiki explain the current design and implementation of tracking and monitoring collections. The challenges we have at scale and the proposed design to handle them.

Background & Problem Statement

The sunbird platform supports collection tracking and monitoring. It uses the below APIs to capture the content tracking data, generates progress and score metrics and provide the summary.

  1. Content State Update API - To capture content progress and submit assessment.

  2. Content State Read API - To read the individual content consumption and assessment attempts status.

  3. Enrolment List API - To access all the enrolment metrics of a given user.

The content state update API capture the content progress and assessment data. It generate events for score and overall progress computation by activity-aggregator and assessment-aggregator jobs.

We have a single API (Content State Update) to capture all the tracking information. So, it has a complex logic to identify the given input is for content progress or assessment submission and etc,.

At the end all the clients and report jobs need is the following map for every collection:

content_status = {
  "<content_id>": <status>
}

For ex: content_status = {
  "do_1234": 2, // Completed
  "do_1235": 2, // Completed
  "do_1236": 2, // Completed
  "do_1237": 1, // In Progress
  "do_1238": 0  // Not started
}

Key Design Problems:

  1. Single API to capture all the tracking data.

  2. Read after write of consumption data and basic summary.

  3. Data is written and fetched from multiple tables leading to consistency issues between API and Reporting Jobs

  4. The low level tables (user_content_consumption & user_assessments) grow at an exponential rate when we start to track everything

  5. Archiving old data is not possible as the API’s read data from low level tables

Design

To be able to handle the above design problems, we have analyzed how similar products (like netflix) track everything what a user does (or views) and that too at scale. Based on the analysis we have broken down the APIs into more granular APIs with a single DB update so that each API can be scaled independently.

Following are few of the Cassandra scale issues for various approaches:

  1. Use a Map datatype - We could have used a map datatype and updated the content status via API. But this would result in multiple SSTables (per addition) and tombstones (update & deletes). As the API is the most used api and the table would have billions in records, this would result in reads getting slowed down drastically and entire cluster slowed down. We would have been forced to do compaction at regular intervals

  2. Use a frozen map datatype - With a frozen map datatype we would have a way around multiple SSTables lookup and tombstones but the API would not be able to append/add to the map. It needs to always replace the map. This would fail if there are two concurrency write requests for a user (can happen if the data is stored offline and synced to server) and only one write would have succeeded.

We have worked around Cassandra scaling issues (read from more than 2 SSTables and tombstones) and read after write scenario by having a high performance cache at the center. With this approach:

  1. The update APIs can update the content status in the low level table and update the content status map in redis (using hmset). Concurrent requests would not be a problem

  2. The read API can read the content status directly from redis. Meanwhile the content_status is updated in a frozen map datatype field via the activity aggregator job.

  3. The job serializes by user requests and reads the status from low level table before computing the overall content_status. This would ensure consistency between API and reporting

Content Consumption APIs

Assessment Consumption APIs

Viewer Service

Viewing Service collects the “content view updates” and generate events to process and provide summary to the users.

When a user starts viewing a content, a view entry created. There are three stages when a user view the content. They are start, progress and end. Considering these three stages we have 3 API endpoints to capture this information for each stage.

An event will be generated when a content view ends. The summary computation jobs will read these event to process and compute the overall summary of the collection.

The computed summary will be available from API interface to download and view.

Summary Computation Jobs - Flink:

The Flink jobs are used to read and compute the summary of a collection consumption progress when the user view ends. It also computes the score for the current view and best score using all the previous views.

  • The event is just a trigger to initiate the computation of the collection progress. The job fetches the raw data from DB to compute the overall progress.

  • When an assessment type content (Ex: QuestionSet) view ends, it expects the ASSESS events data to assessment submit API for score metrics computation.

Once the view ends, the progress and score will be updated asynchronously by the flink jobs.

Extended Enrolment Consumption:

  • Every new instance adapting the sunbird platform will have to select one of the option from 3 context modes, this would allow the application to mange the user in avoiding the consumption of content more than once based of specific predefined rules

  • Mode for any instance will be one to one mapping

  • With the extended design , tracking and monitoring of the user consumption can be done for any new context like program, event etc

Following are the different modes provided to new instance:

Scenario

Write Request

Read Request

1

Carry Forward Consumption

  • The content consumed is marked as complete irrespective of context

{
    "userid": "<<userid>>"
    "collectionid" : "<<courseid>>",
    "contentid" :"<<contenid>>"
}

Note: Progress will be captured directly under the context

{
  "userId": "<<userid>>",
  "collectionId": "<<courseid>>",
  "contentId": "<<contentid>>"
}

2

Copy Forward Consumption

  • The content consumed is marked as complete along with new entry in the database for the context

{
    "userid": "<<userid>>",
    "collectionid": "<<courseid>>",
    "contextid": "<<programid>>",
    "contentid": "<<contentid>>"
}

{
  "userId": "<<userid>>",
  "collectionId": "<<courseid>>",
  "contentId": "<<contentid>>"
}

3

Strict Mode Consumption

  • The content will be consumed as new one every time

{
    "userid": "<<userid>>",
    "collectionid": "<<courseid>>",
    "contextid": "<<programid>>",
    "contentid": "<<contenid>>"
}

Viewer-Service - Content Consumption Scenarios:

The user can consume a content by searching it in our platform (organically) or via a collection when the user enrolled to a course.

With Viewer-Service, we will support tracking individual content consumption also. Below details explain how the data will be stored for a content consumption in different scenarios.

The below table has various scenarios considering the current and future use cases. Here we defined the database read/write logic to support these use case and fetch the save or fetch the required data from user_content_consumption table.

Table - user_content_counsumption

PRIMARY KEY (userid, collectionid, contextid, contentid) [userid, courseid, batchid, contentid]

Key words used in below table:

  • Carry forward content consumption - Considering the content consumed in any context to compute the progress or completion percentage (any collection, batch or individual content consumption).

Scenario

Write Request

Read Request(Condition)

Read Query

1

User consuming individual content. [New]

{
    "userid": "<<userid>>",
    "collectionid": "<<contentid>>",
    "contextid": "<<contentid>>",
    "contentid": "<<contentid>>"
}
{
    "userid": "<<userid>>",
    "contentid": "<<contentid>>"
}

WHERE userid='<<userid>>' and collectionid ='<<contentid>>' and contentid = <contentid> and contextid = <contentid>

2

User consuming a content with in a collection. [Existing]

  • Don’t carry forward content consumption (The content consumed in other context not considered for marking as complete).

{
    "userid": "<<userid>>",
    "collectionid": "<<courseid>>",
    "contextid": "<<batchid>>",
    "contentid": "<<contentid>>"
}

WHERE userid='<<userid>>' and collectionid ='<<courseid>>' and contextid ='<<batchid>>' and contentid='<<contentid>>'

3

User consuming a content with in a collection. [New]

  • Carry forward content consumption

{
    "userid": "<<userid>>",
    "collectionid": "<<courseid>>",
    "contextid": "<<courseid>>",
    "contentid": "<<contentid>>"
}

Note: When it is consumed with in this context.

WHERE userid='<<userid>>' and collectionid ='<<courseid>>'
OR
WHERE userid='<<userid>>' and collectionid ='<<courseid>>' and contextid ='<<courseid>>' and contentid='<<contentid>>'

4

User consuming a content within a course part of a program

  • Don't carry forward content consumption

{
    "userid": "<<userid>>",
    "collectionid": "<<courseid>>",
    "contextid": "<<batchid>>",
    "contentid": "<<contentid>>"
}

WHERE userid='<<userid>>' and collectionid ='<<courseid>>' and contextid in (select batchId from program where collectionid='<<courseid>>')

5

User consuming a content within a course part of a program

  • Carry forward content consumption

{
    "userid": "<<userid>>",
    "collectionid": "<<courseid>>",
    "contextid": "<<courseid>>",
    "contentid": "<<contentid>>"
}

WHERE userid='<<userid>>' and collectionid ='<<courseid>>' and contextid = ='<<courseid>>'

Content View Lifecycle:

When the user view the content in context of a collection and batch, for the first time its start, progress update and end triggers are processed. Revisit (2nd - nth view) of the content will be ignored to process and update the DB.

Shall we enable force ‘view end’ to handle the collection progress update sync issues?

  • View Start API should insert the row only if the row not exists.

  • View Update and End API should update the row only if the row exists.

Handling collection and batch dependencies:

For view start, end and update, courseId and batchId are non-mandatory. This would enable to track the progress for any content which is not part of a course.
This is handled in two ways:

  • If, collectionId and batchId are part of the request, then, individual content progress and overall collection progress is captured and computed.

  • In case of only userId and contentId, the progress is captured only for that content

Handling Collection Data types in DB:

  • With normal collection types, the map values gets distributed to multiple sstables with append, which might lead to read latency issues

  • To the handle the scenario, will consider the frozen collection types, which will helpful in avoiding tombstone and multiple sstable reads

Current vs New (Viewer-Service) APIs:

We need to continue supporting the current APIs before deprecate and delete. So, it requires to work with both the APIs with backward compatibility.

Enhance Current APIs to read summary from aggregate table.

Enhance the below APIs to read the progress and score metrics from user_activity_agg table.

  1. Enrolment List API.

  2. Content State Read API.

One time Data migration:

The content status and score metrics data should be updated to user_activity_agg table from user_enrolemnts and assessment_agg table for all the existing enrolment records.

API Spec

Content View Start

 POST - /v1/view/start

Request:

{
    "request": {
        "userId": "{{userId}}",
        "collectionId" : "{{collectionId}}",
        "contextId": "{{batchId}}",
        "contentId": "{{contentId}}"
    }
}

Response:

{
    "id": "api.view.start",
    "ver": "v1",
    "ts": "2021-06-23 05:37:40:575+0000",
    "params": {
        "resmsgid": null,
        "msgid": "5e763bc2-b072-440d-916e-da787881b1b9",
        "err": null,
        "status": "success",
        "errmsg": null
    },
    "responseCode": "OK",
    "result": {
        "{{contentId}}": "Progress started"
    }
}

Content View Update

 POST - /v1/view/update

Request:

{
    "request": {
        "userId": "{{userId}}",
        "collectionId" : "{{collectionId}}",
        "contextId": "{{batchId}}",
        "contentId": "{{contentId}}",
        "progressDetails": {}, // Progress details specific for each mimetype
        "timespent": 10 // Timespent in seconds
    }
}

Response:

200 OK:
{
    "id": "api.view.update",
    "ver": "v1",
    "ts": "2021-06-23 05:37:40:575+0000",
    "params": {
        "resmsgid": null,
        "msgid": "5e763bc2-b072-440d-916e-da787881b1b9",
        "err": null,
        "status": "success",
        "errmsg": null
    },
    "responseCode": "OK",
    "result": {
        "{{contentId}}": "SUCCESS"
    }
}

4XX or 5XX Error:
{
    "id": "api.view.update",
    "ver": "v1",
    "ts": "2021-06-23 05:37:40:575+0000",
    "params": {
        "resmsgid": null,
        "msgid": "5e763bc2-b072-440d-916e-da787881b1b9",
        "err": ERR_Error_Code,
        "status": "failed",
        "errmsg": ERR_error_msg
    },
    "responseCode": "BAD_REQUEST"/"SERVER_ERROR",
    "result": {
    }
}

Content Submit Assess

 POST - /v1/assessment/submit

Request:

{
    "request": {
        "userId": "{{userId}}",
        "collectionId" : "{{collectionId}}",
        "contextId": "{{batchId}}",
        "contentId": "{{contentId}}",
        "assessments": [{
            {{assess_event}} //Mandatory for self-assess contents
        }]
    }
}

Response:

{
    "id": "api.view.assess",
    "ver": "v1",
    "ts": "2021-06-23 05:37:40:575+0000",
    "params": {
        "resmsgid": null,
        "msgid": "5e763bc2-b072-440d-916e-da787881b1b9",
        "err": null,
        "status": "success",
        "errmsg": null
    },
    "responseCode": "OK",
    "result": {
        "{{contentId}}": "SUCCESS"
    }
}

Content View End

 POST - /v1/view/end

Request:

{
    "request": {
        "userId": "{{userId}}",
        "collectionId" : "{{collectionId}}",
        "contextId": "{{batchId}}",
        "contentId": "{{contentId}}"
    }
}

Response:

{
    "id": "api.view.end",
    "ver": "v1",
    "ts": "2021-06-23 05:37:40:575+0000",
    "params": {
        "resmsgid": null,
        "msgid": "5e763bc2-b072-440d-916e-da787881b1b9",
        "err": null,
        "status": "success",
        "errmsg": null
    },
    "responseCode": "OK",
    "result": {
        "{{contentId}}": "Progress ended"
    }
}

Content View Read

 POST - /v1/view/read

Request:

{
    "request": {
        "userId": "{{userId}}",
        "contentId": ["do_123", "do_1234"],
        "collectionId" : "{{collectionId}}", //optional
        "contextId": "{{batchId}}"   // optional
  
    }
}

Response:

{
    "id": "api.view.read",
    "ver": "v1",
    "ts": "2021-06-23 05:37:40:575+0000",
    "params": {
        "resmsgid": null,
        "msgid": "5e763bc2-b072-440d-916e-da787881b1b9",
        "err": null,
        "status": "success",
        "errmsg": null
    },
    "responseCode": "OK",
    "result": {
    	"userId": "{{userId}}",
    	"collectionId": "{{collectionId}}",
    	"contextId": "{{batchId}}",
        "contents": [{
          "identifier": "{contentId}",
          "progress": 45,
    	  "score": {{best_score}},
    	  "max_score": {{max_score}}
        }]
    }
}

Content Assesment Read

 POST - /v1/assessment/read

Request:

{
    "request": {
        "userId": "{{userId}}",
        "contentId": ["do_123", "do_1234"],
        "collectionId" : "{{collectionId}}", //optional
        "contextId": "{{batchId}}"   // optional
   }
}

Response:

{
    "id": "api.assessment.read",
    "ver": "v1",
    "ts": "2021-06-23 05:37:40:575+0000",
    "params": {
        "resmsgid": null,
        "msgid": "5e763bc2-b072-440d-916e-da787881b1b9",
        "err": null,
        "status": "success",
        "errmsg": null
    },
    "responseCode": "OK",
    "result": {
    	"userId": "{{userId}}",
    	"collectionId": "{{collectionId}}",
    	"contextId": "{{batchId}}",
        "contents": [{
    	  "score": {{best_score}},
    	  "max_score": {{max_score}}
        }]
    }
}

Viewer Summary - All enrolments

 GET - /v1/summary/list/:userId

Response:

{
  "id": "api.summary.list",
  "ver": "v1",
  "ts": "2021-06-23 05:59:54:984+0000",
  "params": {
    "resmsgid": null,
    "msgid": "95e4942d-cbe8-477d-aebd-ad8e6de4bfc8",
    "err": null,
    "status": "success",
    "errmsg": null
  },
  "responseCode": "OK",
  "result": {
    "summary": [
      {
        "userId": "{{userId}}",
        "collectionId": "{{collectionId}}",
        "batchId": "{{batchId}}",
        "enrolledDate": 1624275377301,
        "active": true,
        "contentStatus": {
          "{{contentId}}": {{status}}
        },
        "assessmentStatus": {
          "assessmentId": {
            "score": {{best_score}},
            "max_score": {{max_score}}
          }
        },
        "collection": {
          "identifier": "{{collectionId}}",
          "name": "{{collectionName}}",
          "logo": "{{logo Url}}",
          "leafNodesCount": {{leafNodeCount}},
          "description": "{{description}}"
        },
        "issuedCertificates": [{
          "name": "{{certName}}",
          "id": "certificateId",
          "token": "{{certToken}}",
          "lastIssuedOn": "{{lastIssuedOn}}"
        }],
        "completedOn": {{completion_date}},
        "progress": {{progress}},
        "status": {{status}}
      }
    ]
  }
}

Viewer Summary - Specific enrolment

 POST - /v1/summary/read

Request:

{
    "request": {
        "userId": "{{userId}}",
        "collectionId" : "{{collectionId}}",
        "contextId": "{{batchId}}"
    }
}

Response:

{
  "id": "api.summary.read",
  "ver": "v1",
  "ts": "2021-06-23 05:59:54:984+0000",
  "params": {
    "resmsgid": null,
    "msgid": "95e4942d-cbe8-477d-aebd-ad8e6de4bfc8",
    "err": null,
    "status": "success",
    "errmsg": null
  },
  "responseCode": "OK",
  "result": {
        "userId": "{{userId}}",
        "collectionId": "{{collectionId}}",
        "contextId": "{{batchId}}",
        "enrolledDate": 1624275377301,
        "active": true,
        "contentStatus": {
          "{{contentId}}": {{status}}
        },
        "assessmentStatus": {
          "assessmentId": {
            "score": {{best_score}},
            "max_score": {{max_score}}
          }
        },
        "collection": {
          "identifier": "{{collectionId}}",
          "name": "{{collectionName}}",
          "logo": "{{logo Url}}",
          "leafNodesCount": {{leafNodeCount}},
          "description": "{{description}}"
        },
        "issuedCertificates": [{
          "name": "{{certName}}",
          "id": "certificateId",
          "token": "{{certToken}}",
          "lastIssuedOn": "{{lastIssuedOn}}"
        }],
        "completedOn": {{completion_date}},
        "progress": {{progress}},
        "status": {{status}}
  }
}

Viewer Summary Delete

 DELETE - /v1/summary/delete/:userId?all - To Delete all enrolments

Response:

Response: 
{
    "id": "api.summary.delete",
    "ver": "v1",
    "ts": "2021-06-23 05:37:40:575+0000",
    "params": {
        "resmsgid": null,
        "msgid": "5e763bc2-b072-440d-916e-da787881b1b9",
        "err": null,
        "status": "success",
        "errmsg": null
    },
    "responseCode": "OK",
    "result": {}
}
 DELETE - /v1/summary/delete/:userId - To Delete specific enrolments

Request:

{
    "request": {
        "userId": "{{userId}}",
        "collectionId" : "{{collectionId}}",
        "batchId": "{{batchId}}"
    }
}

Response:

Response: 
{
    "id": "api.summary.delete",
    "ver": "v1",
    "ts": "2021-06-23 05:37:40:575+0000",
    "params": {
        "resmsgid": null,
        "msgid": "5e763bc2-b072-440d-916e-da787881b1b9",
        "err": null,
        "status": "success",
        "errmsg": null
    },
    "responseCode": "OK",
    "result": {}
}

Viewer Summary Download

 GET - /v1/summary/download/:userId?format=csv

Response:

{
    "id": "api.summary.download",
    "ver": "v1",
    "ts": "2021-06-23 05:37:40:575+0000",
    "params": {
        "resmsgid": null,
        "msgid": "5e763bc2-b072-440d-916e-da787881b1b9",
        "err": null,
        "status": "success",
        "errmsg": null
    },
    "responseCode": "OK",
    "result": {
      "url": "{{userId}}_viewer_summary.csv"
    }
}
 GET - /v1/summary/download/:userId

Response:

{
    "id": "api.summary.download",
    "ver": "v1",
    "ts": "2021-06-23 05:37:40:575+0000",
    "params": {
        "resmsgid": null,
        "msgid": "5e763bc2-b072-440d-916e-da787881b1b9",
        "err": null,
        "status": "success",
        "errmsg": null
    },
    "responseCode": "OK",
    "result": {
      "url": "{{userId}}_viewer_summary.json"
    }
}

Conclusion:

<TODO>

  • No labels