Problem Statement
Related Jira Task -
Jira Legacy | ||||||
---|---|---|---|---|---|---|
|
- Bulk Content Upload is to be supported in Sunbird, with 3 operation modes
- API to be made available to check the real-time status of the bulk content upload process
- API to be made available to list the statuses of processes initiated by an user
operation-mode | workflow |
---|---|
upload | create-upload content |
publish | create-upload-publish content |
link | create-upload-publish content and link it to textbook |
Design
1. Validations
File related validations to be done are,
- Validate the format of the file
- Validate whether the file is readable
- Validate whether the file has data
Data related validations to be done are,
- Check whether the file is conforming to the bulk content upload template(The template should be configurable)
- Number of rows in file should be less than Max rows allowed(configuration)
- Duplicity check within the file. Key is Taxonomy(BGMS)+ContentName
2. Synchronous Processing
- Upload the CSV file to blob storage
2. Make an entry in bulk_upload_process table
column | data to insert | remarks |
---|---|---|
id | auto-generated unique id | processId |
createdby | uploader id | |
createdon | current timestamp | |
data | blobstore url of CSV file | |
failureresult | failedCount to be updated here | |
lastupdatedon | last updated timestamp to be updated here on each update | |
objecttype | content | |
organisationid | tenant id | |
processendtime | endTime - current timestamp to be inserted here while moving this process to completed state | |
processstarttime | startTime - current timestamp to be inserted here while moving this process to processing state | |
retrycount | 0 | Not used |
status | queued | status - possible values - queued, processing, completed |
storagedetails | report - blobstore url of result file | |
successresult | successCount to be updated here | |
taskcount | number of records in file | totalCount |
uploadedby | uploader id | |
uploadeddate | current timestamp |
3. Make entries into bulk_upload_process_task table (One record per content)
column | data to insert | remarks |
---|---|---|
processid | processid | id from master table |
sequenceid | auto-generated sequence id | |
createdon | current timestamp | |
data | data in JSON format | |
failureresult | JSON data + failed message | |
iterationid | 0 | Not used |
lastupdatedon | last updated timestamp to be updated here on each update | |
status | possible values - queued, success, failed | |
successresult | JSON data + success message |
4. For LINK operation-mode, get draft hierarchy of Textbooks mentioned in CSV and cache the dialCode-TextBookUnitDoId mapping in Redis
5. Push events to Kafka with Textbook Id as partition key for LINK operation-mode. Use hashed-value generated during duplicity check as partition key for other operation modes.
3. Asynchronous Processing - Samza
- Validation of mandatory fields
- Validate DIAL code (first against redis-cache, if not present, get draft hierarchy from cassandra and validate)
- Validate the file size and file format in Google Drive
- Validate the Taxonomy by creating the content - Hit the REST API
- Download the AppIcon from Google Drive
- Create an asset with downloaded image
- Update content with AppIcon image URL
- Download content file from Google Drive
- Upload content - Hit the REST API
- Publish the content - Hit the Java API
- Get draft hierarchy of the TextBook from Cassandra
- Get the metadata of the published content
- Update the draft hierarchy of the TextBook in Cassandra
- Update the status back to LMS Cassandra- bulk_upload_process_task table
- Retire the Content in case of any exception in the flow
4. Scheduler
- Scheduler to run in periodic intervals to consolidate the result from bulk_upload_process_task table and update the master table(bulk_upload_process) with success_count, failed_count, process_end_time, result_file_url and status
- While a process is being marked as completed, the result file has to be generated, uploaded to blobstore and URL updated back to bulk_upload_process table
5. Status Check API
- Data from bulk_upload_process table to be served based on the processId
6. Status List API
- The userId of the user should be deduced from keycloak access token passed in the header.
- Statuses of all uploads done by the user has to be served from the bulk_upload_process tables
...
API Specifications
Bulk Content Upload API - POST - /v1/textbook/content/bulk/upload
Request Headers
Content-Type | multipart/form-data |
Authorization | Bearer {{api-key}} |
x-authenticated-user-token | {{keycloak-token}} |
x-channel-id | {{channel-identifier}} |
x-framework-id | {{framework-identifier}} |
x-hashtag-id | {{tenant-id}} |
operation-mode | upload/publish/link |
Request Body
Code Block |
---|
content: [contentUploadFile.csv] |
Response : Success Response - OK (200)
Code Block |
---|
{
"id": "api.textbook.content.bulk.upload",
"ver": "v1",
"ts": "2019-07-26 11:28:42:315+0000",
"params": {
"resmsgid": null,
"msgid": "cf5b2e8e-70cf-401c-af29-980bc3151c67",
"err": null,
"status": "success",
"errmsg": null
},
"responseCode": "OK",
"result": {
"processId": "012813442982903808142"
}
} |
Response : Failure Response - BAD REQUEST (400) - Corrupt File
Code Block |
---|
{
"id": "api.textbook.content.bulk.upload",
"ver": "v1",
"ts": "2019-07-26 11:28:42:315+0000",
"params": {
"resmsgid": null,
"msgid": "cf5b2e8e-70cf-401c-af29-980bc3151c67",
"err": "CORRUPT_FILE",
"status": "CORRUPT_FILE",
"errmsg": "Bulk content upload failed due to corrupt file"
},
"responseCode": "CLIENT_ERROR",
"result": { }
} |
Response : Failure Response - BAD REQUEST (400) - Invalid File Format(Only CSV files are supported)
Code Block |
---|
{
"id": "api.textbook.content.bulk.upload",
"ver": "v1",
"ts": "2019-07-26 11:28:42:315+0000",
"params": {
"resmsgid": null,
"msgid": "cf5b2e8e-70cf-401c-af29-980bc3151c67",
"err": "INVALID_FILE_FORMAT",
"status": "INVALID_FILE_FORMAT",
"errmsg": "Bulk content upload failed due to invalid file format"
},
"responseCode": "CLIENT_ERROR",
"result": { }
} |
Response : Failure Response - BAD REQUEST (400) - Invalid File Template (Columns Missing)
Code Block |
---|
{
"id": "api.textbook.content.bulk.upload",
"ver": "v1",
"ts": "2019-07-26 11:28:42:315+0000",
"params": {
"resmsgid": null,
"msgid": "cf5b2e8e-70cf-401c-af29-980bc3151c67",
"err": "INVALID_FILE_TEMPLATE",
"status": "INVALID_FILE_TEMPLATE",
"errmsg": "Bulk content upload failed due to invalid file template"
},
"responseCode": "CLIENT_ERROR",
"result": { }
} |
Response : Failure Response - BAD REQUEST (400) - Too many rows
Code Block |
---|
{
"id": "api.textbook.content.bulk.upload",
"ver": "v1",
"ts": "2019-07-26 11:28:42:315+0000",
"params": {
"resmsgid": null,
"msgid": "cf5b2e8e-70cf-401c-af29-980bc3151c67",
"err": "MAX_ROW_COUNT_EXCEEDED",
"status": "MAX_ROW_COUNT_EXCEEDED",
"errmsg": "Max row count allowed is <config>"
},
"responseCode": "CLIENT_ERROR",
"result": { }
} |
...
Bulk Content Upload Status Check API - GET - /v1/textbook/content/bulk/upload/status/:processId
Request Headers
Accept | application/json |
Authorization | Bearer {{api-key}} |
x-authenticated-user-token | {{keycloak-token}} |
Response : Success Response - OK (200) - In Queue
Code Block |
---|
{
"id": "api.textbook.content.bulk.upload.status",
"ver": "v1",
"ts": "2019-07-26 11:28:42:315+0000",
"params": {
"resmsgid": null,
"msgid": "cf5b2e8e-70cf-401c-af29-980bc3151c67",
"err": null,
"status": "success",
"errmsg": null
},
"responseCode": "OK",
"result": {
"processId": "012813442982903808142",
"status": "Queued",
"totalCount": 500
}
} |
Response : Success Response - OK (200) - In Progress
Code Block |
---|
{
"id": "api.textbook.content.bulk.upload.status",
"ver": "v1",
"ts": "2019-07-26 11:28:42:315+0000",
"params": {
"resmsgid": null,
"msgid": "cf5b2e8e-70cf-401c-af29-980bc3151c67",
"err": null,
"status": "success",
"errmsg": null
},
"responseCode": "OK",
"result": {
"processId": "012813442982903808142",
"status": "Processing",
"totalCount": 500,
"successCount": 100,
"failedCount": 10,
"startTime": "2019-07-26 11:28:42:315+0000"
}
} |
Response : Success Response - OK (200) - Completed
Code Block |
---|
{
"id": "api.textbook.content.bulk.upload.status",
"ver": "v1",
"ts": "2019-07-26 11:28:42:315+0000",
"params": {
"resmsgid": null,
"msgid": "cf5b2e8e-70cf-401c-af29-980bc3151c67",
"err": null,
"status": "success",
"errmsg": null
},
"responseCode": "OK",
"result": {
"processId": "012813442982903808142",
"status": "Completed",
"totalCount": 500,
"successCount": 450,
"failedCount": 50,
"startTime": "2019-07-26 11:28:42:315+0000"
"endTime": "2019-07-26 12:28:42:315+0000",
"report": "signedDownloadUrl"
}
} |
Response : Failure Response - RESOURSE NOT FOUND (404) - ProcessId not found
Code Block |
---|
{
"id": "api.textbook.content.bulk.upload.status",
"ver": "v1",
"ts": "2019-07-26 11:28:42:315+0000",
"params": {
"resmsgid": null,
"msgid": "cf5b2e8e-70cf-401c-af29-980bc3151c67",
"err": "PROCESS_NOT_FOUND",
"status": "PROCESS_NOT_FOUND",
"errmsg": "Process Id xxx is not found in the system"
},
"responseCode": "RESOURCE_NOT_FOUND",
"result": { }
} |
...
Bulk Content Upload Status List API - GET - /v1/textbook/content/bulk/upload/status/list
Request Headers
Accept | application/json |
Authorization | Bearer {{api-key}} |
x-authenticated-user-token | {{keycloak-token}} |
Response : Success Response - OK (200)
Code Block |
---|
{
"id": "api.textbook.content.bulk.upload.status.list",
"ver": "v1",
"ts": "2019-07-26 11:28:42:315+0000",
"params": {
"resmsgid": null,
"msgid": "cf5b2e8e-70cf-401c-af29-980bc3151c67",
"err": null,
"status": "success",
"errmsg": null
},
"responseCode": "OK",
"result": {
"userId": "f61826aa-8e5d-4356-b8ad-edda92460750",
"uploads": [
{
"processId": "012813442982903808142",
"uploadedDate": "2018-12-12 14:25:27:466+0530",
"status": "Completed"
},
{
"processId": "012813442982903808143",
"uploadedDate": "2018-12-14 12:01:36:807+0530",
"status": "Processing"
}
]
}
} |