...
Cassandra
Pros
Can support faster key based queries. For ex: a query on course_id and user_id
Need to scale only one DB to scale the entire courses infra
Cons
Limited query capability. Performance is guaranteed only when queried via the partition key.
Filtering either by user properties or course properties needs to be done in memory of the API after fetching the data from db.
Data joins to be done in memory.
Druid
Pros
Faster and easier to scale.
Supports joins from 0.18 version onwards
Can query on any dimension
Cons
Append only DB. The Samza/Flink job needs to take care of idempotency
Can query only by date field (as segments are created by date). Need to do custom data source design to be able to support the courses reporting needs which can become extremely complex
...
Edge Caching
...
Group Activity Aggregates
APIs
Get group aggregates
Expand | ||||||||||
---|---|---|---|---|---|---|---|---|---|---|
| ||||||||||
Request
Response
|
Get activity aggregates
Expand | ||||||||||
---|---|---|---|---|---|---|---|---|---|---|
| ||||||||||
Request
Response
|
Schema
Assumptions:
Group, activity and user tables exist and a mapping table exists for the group-activity-user relation
activity_user_agg
Column | Type | Description |
---|---|---|
activity_type | String | Type of the activity - Course, CourseUnit, Quiz etc |
activity_id | String | Id of the activity - course_id, content_id etc |
user_id | String | User Id |
context_id | String | Context in which the activity happened. Combination of type:value. For ex: CourseBatch → cb:do_123121 |
agg | Map<String, Number> | Aggregate metrics for the user and activity combination |
agg_last_updated | Map<String, Timestamp> | When did the agg metrics last updated? |
Partition Key - (activity_type, activity_id, group_id, user_id)
activity_agg
Column | Type | Description |
---|---|---|
activity_type | String | Type of the activity - Course, CourseUnit, Quiz etc |
activity_id | String | Id of the activity - course_id, content_id etc |
context_id | String | Context in which the activity happened. Combination of type:value. For ex: CourseBatch → cb:do_123121 |
agg | Map<String, Number> | Aggregate metrics for the activity |
agg_last_updated | Map<String, Timestamp> | When did the aggregate metrics last updated? |
Partition Key - (activity_type, activity_id, context_id)
Course Metrics
completedCount
completionPercentage
enrolledCount
<TBD>
Course User Metrics
Open Questions
Do we need activity_agg table which is a next level aggregation on activity_user_agg table?
Will there be an activity happening only in group context? For ex: course can be taken up outside group but aggregated within group. But there can be a quiz conducted within group context across multiple groups.