Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

  1. Cassandra

    1. Pros

      1. Can support faster key based queries. For ex: a query on course_id and user_id

      2. Need to scale only one DB to scale the entire courses infra

    2. Cons

      1. Limited query capability. Performance is guaranteed only when queried via the partition key.

      2. Filtering either by user properties or course properties needs to be done in memory of the API after fetching the data from db.

      3. Data joins to be done in memory.

  2. Druid

    1. Pros

      1. Faster and easier to scale.

      2. Supports joins from 0.18 version onwards

      3. Can query on any dimension

    2. Cons

      1. Append only DB. The Samza/Flink job needs to take care of idempotency

      2. Can query only by date field (as segments are created by date). Need to do custom data source design to be able to support the courses reporting needs which can become extremely complex

Edge Caching

...

...

Group Activity Aggregates

APIs

Get group aggregates

Expand
titlePOST - /data/group/agg

Request

Code Block
languagejson
{
  "request": {
    "groupId": String, // Required.  
    "activities": ?[{ // Optional. List of activity ids
      "id": String, // Required. The activity id
      "type": String // Required. The activity type
    }],
    "fields": ?Array[String] // Optional. The list of fields to send in the response
  }
}

Response

Code Block
languagejson
{
  "result": {
    "groupId": "",
    "activity": [{
      "id": "do_12312312",
      "type": "Course",
      ... // Other activity metadata like - course name, end date, status etc
      "agg": [{
      	"metric": String, // Metric ID
      	"value": Number, // Metrics Value
      	"lastUpdatedOn": Timestamp // When did the metric last update?
      }]
    }]
  }
}

Get activity aggregates

Expand
titlePOST - /data/group/activity/agg

Request

Code Block
languagejson
{
  "request": {
    "groupId": String, // Required. The group id
    "activityId": String, // Required. The activity id within the group
    "activityType": String, // Required. The activity type within the group
    "fields": ?Array[String] // Optional. The list of fields to send in the response
  }
}

Response

Code Block
languagejson
{
  "result": {
    "groupId": "",
    "activity": {
      "id": "do_12312312",
      "type": "Course",
      "metadata": {},
      "agg": [{
      	"metric": String, // Metric ID
      	"value": Number, // Metrics Value
      	"lastUpdatedOn": Timestamp // When did the metric last update?
      }]
      "members": [{
      	"id": String, // User ID
      	.... // User profile attributes required for display
      	"agg": [{
      	  "metric": String, // Metric ID. For ex: progress, completed, timespent etc
      	  "value": Number,
      	  "lastUpdatedOn": Timestamp
      	}]
      }]
    }
  }
}

Schema

Assumptions:

  • Group, activity and user tables exist and a mapping table exists for the group-activity-user relation

activity_user_agg

Column

Type

Description

activity_type

String

Type of the activity - Course, CourseUnit, Quiz etc

activity_id

String

Id of the activity - course_id, content_id etc

user_id

String

User Id

context_id

String

Context in which the activity happened. Combination of type:value. For ex: CourseBatch → cb:do_123121

agg

Map<String, Number>

Aggregate metrics for the user and activity combination

agg_last_updated

Map<String, Timestamp>

When did the agg metrics last updated?

Partition Key - (activity_type, activity_id, group_id, user_id)

activity_agg

Column

Type

Description

activity_type

String

Type of the activity - Course, CourseUnit, Quiz etc

activity_id

String

Id of the activity - course_id, content_id etc

context_id

String

Context in which the activity happened. Combination of type:value. For ex: CourseBatch → cb:do_123121

agg

Map<String, Number>

Aggregate metrics for the activity

agg_last_updated

Map<String, Timestamp>

When did the aggregate metrics last updated?

Partition Key - (activity_type, activity_id, context_id)

Course Metrics

  1. completedCount

  2. completionPercentage

  3. enrolledCount

  4. <TBD>

Course User Metrics

Open Questions

  1. Do we need activity_agg table which is a next level aggregation on activity_user_agg table?

  2. Will there be an activity happening only in group context? For ex: course can be taken up outside group but aggregated within group. But there can be a quiz conducted within group context across multiple groups.