Bulk Upload - Content
Problem Statement
Related Jira Task - - SB-12164Getting issue details... STATUS
- Bulk Content Upload is to be supported in Sunbird, with 3 operation modes
- API to be made available to check the real-time status of the bulk content upload process
- API to be made available to list the statuses of processes initiated by an user
operation-mode | workflow |
---|---|
upload | create-upload content |
publish | create-upload-publish content |
link | create-upload-publish content and link it to textbook |
Design
1. Validations
File related validations to be done are,
- Validate the format of the file
- Validate whether the file is readable
- Validate whether the file has data
Data related validations to be done are,
- Check whether the file is conforming to the bulk content upload template(The template should be configurable)
- Number of rows in file should be less than Max rows allowed(configuration)
- Duplicity check within the file. Key is Taxonomy(BGMS)+ContentName
2. Synchronous Processing
- Upload the CSV file to blob storage
2. Make an entry in bulk_upload_process table
column | data to insert | remarks |
---|---|---|
id | auto-generated unique id | processId |
createdby | uploader id | |
createdon | current timestamp | |
data | blobstore url of CSV file | |
failureresult | failedCount to be updated here | |
lastupdatedon | last updated timestamp to be updated here on each update | |
objecttype | content | |
organisationid | tenant id | |
processendtime | endTime - current timestamp to be inserted here while moving this process to completed state | |
processstarttime | startTime - current timestamp to be inserted here while moving this process to processing state | |
retrycount | 0 | Not used |
status | queued | status - possible values - queued, processing, completed |
storagedetails | report - blobstore url of result file | |
successresult | successCount to be updated here | |
taskcount | number of records in file | totalCount |
uploadedby | uploader id | |
uploadeddate | current timestamp |
3. Make entries into bulk_upload_process_task table (One record per content)
column | data to insert | remarks |
---|---|---|
processid | processid | id from master table |
sequenceid | auto-generated sequence id | |
createdon | current timestamp | |
data | data in JSON format | |
failureresult | JSON data + failed message | |
iterationid | 0 | Not used |
lastupdatedon | last updated timestamp to be updated here on each update | |
status | possible values - queued, success, failed | |
successresult | JSON data + success message |
4. For LINK operation-mode, get draft hierarchy of Textbooks mentioned in CSV and cache the dialCode-TextBookUnitDoId mapping in Redis
5. Push events to Kafka with Textbook Id as partition key for LINK operation-mode. Use hashed-value generated during duplicity check as partition key for other operation modes.
3. Asynchronous Processing - Samza
- Validation of mandatory fields
- Validate DIAL code (first against redis-cache, if not present, get draft hierarchy from cassandra and validate)
- Validate the file size and file format in Google Drive
- Validate the Taxonomy by creating the content - Hit the REST API
- Download the AppIcon from Google Drive
- Create an asset with downloaded image
- Update content with AppIcon image URL
- Download content file from Google Drive
- Upload content - Hit the REST API
- Publish the content - Hit the Java API
- Get draft hierarchy of the TextBook from Cassandra
- Get the metadata of the published content
- Update the draft hierarchy of the TextBook in Cassandra
- Update the status back to LMS Cassandra- bulk_upload_process_task table
- Retire the Content in case of any exception in the flow
4. Scheduler
- Scheduler to run in periodic intervals to consolidate the result from bulk_upload_process_task table and update the master table(bulk_upload_process) with success_count, failed_count, process_end_time, result_file_url and status
- While a process is being marked as completed, the result file has to be generated, uploaded to blobstore and URL updated back to bulk_upload_process table
5. Status Check API
- Data from bulk_upload_process table to be served based on the processId
6. Status List API
- The userId of the user should be deduced from keycloak access token passed in the header.
- Statuses of all uploads done by the user has to be served from the bulk_upload_process tables
API Specifications
Bulk Content Upload API - POST - /v1/textbook/content/bulk/upload
Request Headers
Content-Type | multipart/form-data |
Authorization | Bearer {{api-key}} |
x-authenticated-user-token | {{keycloak-token}} |
x-channel-id | {{channel-identifier}} |
x-framework-id | {{framework-identifier}} |
x-hashtag-id | {{tenant-id}} |
operation-mode | upload/publish/link |
Request Body
content: [contentUploadFile.csv]
Response : Success Response - OK (200)
{ "id": "api.textbook.content.bulk.upload", "ver": "v1", "ts": "2019-07-26 11:28:42:315+0000", "params": { "resmsgid": null, "msgid": "cf5b2e8e-70cf-401c-af29-980bc3151c67", "err": null, "status": "success", "errmsg": null }, "responseCode": "OK", "result": { "processId": "012813442982903808142" } }
Response : Failure Response - BAD REQUEST (400) - Corrupt File
{ "id": "api.textbook.content.bulk.upload", "ver": "v1", "ts": "2019-07-26 11:28:42:315+0000", "params": { "resmsgid": null, "msgid": "cf5b2e8e-70cf-401c-af29-980bc3151c67", "err": "CORRUPT_FILE", "status": "CORRUPT_FILE", "errmsg": "Bulk content upload failed due to corrupt file" }, "responseCode": "CLIENT_ERROR", "result": { } }
Response : Failure Response - BAD REQUEST (400) - Invalid File Format(Only CSV files are supported)
{ "id": "api.textbook.content.bulk.upload", "ver": "v1", "ts": "2019-07-26 11:28:42:315+0000", "params": { "resmsgid": null, "msgid": "cf5b2e8e-70cf-401c-af29-980bc3151c67", "err": "INVALID_FILE_FORMAT", "status": "INVALID_FILE_FORMAT", "errmsg": "Bulk content upload failed due to invalid file format" }, "responseCode": "CLIENT_ERROR", "result": { } }
Response : Failure Response - BAD REQUEST (400) - Invalid File Template (Columns Missing)
{ "id": "api.textbook.content.bulk.upload", "ver": "v1", "ts": "2019-07-26 11:28:42:315+0000", "params": { "resmsgid": null, "msgid": "cf5b2e8e-70cf-401c-af29-980bc3151c67", "err": "INVALID_FILE_TEMPLATE", "status": "INVALID_FILE_TEMPLATE", "errmsg": "Bulk content upload failed due to invalid file template" }, "responseCode": "CLIENT_ERROR", "result": { } }
Response : Failure Response - BAD REQUEST (400) - Too many rows
{ "id": "api.textbook.content.bulk.upload", "ver": "v1", "ts": "2019-07-26 11:28:42:315+0000", "params": { "resmsgid": null, "msgid": "cf5b2e8e-70cf-401c-af29-980bc3151c67", "err": "MAX_ROW_COUNT_EXCEEDED", "status": "MAX_ROW_COUNT_EXCEEDED", "errmsg": "Max row count allowed is <config>" }, "responseCode": "CLIENT_ERROR", "result": { } }
Bulk Content Upload Status Check API - GET - /v1/textbook/content/bulk/upload/status/:processId
Request Headers
Accept | application/json |
Authorization | Bearer {{api-key}} |
x-authenticated-user-token | {{keycloak-token}} |
Response : Success Response - OK (200) - In Queue
{ "id": "api.textbook.content.bulk.upload.status", "ver": "v1", "ts": "2019-07-26 11:28:42:315+0000", "params": { "resmsgid": null, "msgid": "cf5b2e8e-70cf-401c-af29-980bc3151c67", "err": null, "status": "success", "errmsg": null }, "responseCode": "OK", "result": { "processId": "012813442982903808142", "status": "Queued", "totalCount": 500 } }
Response : Success Response - OK (200) - In Progress
{ "id": "api.textbook.content.bulk.upload.status", "ver": "v1", "ts": "2019-07-26 11:28:42:315+0000", "params": { "resmsgid": null, "msgid": "cf5b2e8e-70cf-401c-af29-980bc3151c67", "err": null, "status": "success", "errmsg": null }, "responseCode": "OK", "result": { "processId": "012813442982903808142", "status": "Processing", "totalCount": 500, "successCount": 100, "failedCount": 10, "startTime": "2019-07-26 11:28:42:315+0000" } }
Response : Success Response - OK (200) - Completed
{ "id": "api.textbook.content.bulk.upload.status", "ver": "v1", "ts": "2019-07-26 11:28:42:315+0000", "params": { "resmsgid": null, "msgid": "cf5b2e8e-70cf-401c-af29-980bc3151c67", "err": null, "status": "success", "errmsg": null }, "responseCode": "OK", "result": { "processId": "012813442982903808142", "status": "Completed", "totalCount": 500, "successCount": 450, "failedCount": 50, "startTime": "2019-07-26 11:28:42:315+0000" "endTime": "2019-07-26 12:28:42:315+0000", "report": "signedDownloadUrl" } }
Response : Failure Response - RESOURSE NOT FOUND (404) - ProcessId not found
{ "id": "api.textbook.content.bulk.upload.status", "ver": "v1", "ts": "2019-07-26 11:28:42:315+0000", "params": { "resmsgid": null, "msgid": "cf5b2e8e-70cf-401c-af29-980bc3151c67", "err": "PROCESS_NOT_FOUND", "status": "PROCESS_NOT_FOUND", "errmsg": "Process Id xxx is not found in the system" }, "responseCode": "RESOURCE_NOT_FOUND", "result": { } }
Bulk Content Upload Status List API - GET - /v1/textbook/content/bulk/upload/status/list
Request Headers
Accept | application/json |
Authorization | Bearer {{api-key}} |
x-authenticated-user-token | {{keycloak-token}} |
Response : Success Response - OK (200)
{ "id": "api.textbook.content.bulk.upload.status.list", "ver": "v1", "ts": "2019-07-26 11:28:42:315+0000", "params": { "resmsgid": null, "msgid": "cf5b2e8e-70cf-401c-af29-980bc3151c67", "err": null, "status": "success", "errmsg": null }, "responseCode": "OK", "result": { "userId": "f61826aa-8e5d-4356-b8ad-edda92460750", "uploads": [ { "processId": "012813442982903808142", "uploadedDate": "2018-12-12 14:25:27:466+0530", "status": "Completed" }, { "processId": "012813442982903808143", "uploadedDate": "2018-12-14 12:01:36:807+0530", "status": "Processing" } ] } }