Introduction:
This documentation describes about the generation of the batch wise assessment report within the course. The report should generate with below fields. For more detail refer this PRD.
Solution -1: Assessment Samza job and Data product
Diagram:
As per the above diagram, Router job should route all ASSESS
events into Assessment Samza Job
which compute and update the RDBMS table.The assessment data product which read and joins the data from multiple tables and will generates the assessment report per batch and uploads to azure cloud storage.
Disadvantages:
- When pipeline is having huge lag,Then report will be wrong
Solution - 2: API and Data product
As per the above diagram, End user will sync the assess events through api, which will update the database with computed values. The assessment data product which read data from the database and will generate the reports per batch and uploads to azure cloud storage.
API:
METHOD: POST URI: /data/v3/course/assessment/telemetry // End point BODY: { "id": "api.course.batch.telemetry" ", "ver": "1.0", "ets": 1566884714550, //Time stamp in ms "data": [{ "batch_id":"", // ? already exists at event level in `cdata` "course_id":"" // ? already exists at event level in `cdata` "user_id":"" // ? already exists at event level in `cdata` "attempt_id":"" // Question attempt identifier "worksheet_id": "" // ? already exists at event level in `cdata` "events": [{}] //ONLY ASSESS Events }, ...], }
{"id":"api.course.batch.telemetry","ver":"1.0","ets":1566974765224,"params":{err:err},"responseCode":"SUCCESS"}
Disadvantages:
- Need to
validate
andde-dup
the the events.
Table Schema:
Question object :
CREATE TABLE assessment_profile ( batch_id text, user_id text, course_id text, worksheet_id text, worksheet_name text, attempt_id text, updated_on timestamp, created_on timestamp, question list<frozen<question>>, PRIMARY KEY (worksheet_id, attempt_id, user_id, course_id, batch_id ) ) CREATE TYPE question( id text, max_score int, score int, type text, title text, pass text, description text, duration:text );
Challenges:
- How to capture the
attempts? i.e. Number of times the particular user is attempted particular question.
- How to capture the
batch-id
andcourse-id
Conclusion:
Analytics team:
1. Analytics team will store the attempts as a blob in the database and all the event data related to the questions will be stored in the blob.
2. Analytics team will implement a API to ingest assessment related data. The API will take course_id and batch_id and a batch of ASSESS events.
3. The API will route the events to a separate Kafka topic. A new Samza job will process these events and load the summarized data into the database. Each record will correspond to a attempt_id for a worksheet. The record will also contain overall score for the attempt_id.
4. Postgres might not scale for the number of ASSESS events every day. Cassandra will be used to store the summarized data.
Portal/Mobile (Estimations have not been accounted for the following tasks):
1. Currently, the question id is auto generated every time the worksheet is played. Content player needs to fix it and use the do_id for the questions.
2. Mobile and portal will have to send the attempt id in cdata both in the case of practice questions and exams. Currently, a new attempt id can be generated every time the worksheet is played. However, in future the exams will need to have the same attempt id passed until the assessment is submitted.
3. Mobile/Portal should figure out a way to call the assement score computation API only for assessments worksheets.