Bulk Upload - Content
Problem Statement
Related Jira Task - SB-12164: Bulk Upload of Content through CSVIn Progress
Bulk Content Upload is to be supported in Sunbird, with 3 operation modes
API to be made available to check the real-time status of the bulk content upload process
API to be made available to list the statuses of processes initiated by an user
operation-mode | workflow |
|---|---|
upload | create-upload content |
publish | create-upload-publish content |
link | create-upload-publish content and link it to textbook |
Design
1. Validations
File related validations to be done are,
Validate the format of the file
Validate whether the file is readable
Validate whether the file has data
Data related validations to be done are,
Check whether the file is conforming to the bulk content upload template(The template should be configurable)
Number of rows in file should be less than Max rows allowed(configuration)
Duplicity check within the file. Key is Taxonomy(BGMS)+ContentName
2. Synchronous Processing
Upload the CSV file to blob storage
2. Make an entry in bulk_upload_process table
column | data to insert | remarks |
|---|---|---|
id | auto-generated unique id | processId |
createdby | uploader id | |
createdon | current timestamp | |
data | blobstore url of CSV file | |
failureresult | failedCount to be updated here | |
lastupdatedon | last updated timestamp to be updated here on each update | |
objecttype | content | |
organisationid | tenant id | |
processendtime | endTime - current timestamp to be inserted here while moving this process to completed state | |
processstarttime | startTime - current timestamp to be inserted here while moving this process to processing state | |
retrycount | 0 | Not used |
status | queued | status - possible values - queued, processing, completed |
storagedetails | report - blobstore url of result file | |
successresult | successCount to be updated here | |
taskcount | number of records in file | totalCount |
uploadedby | uploader id | |
uploadeddate | current timestamp |
3. Make entries into bulk_upload_process_task table (One record per content)
column | data to insert | remarks |
|---|---|---|
processid | processid | id from master table |
sequenceid | auto-generated sequence id | |
createdon | current timestamp | |
data | data in JSON format | |
failureresult | JSON data + failed message | |
iterationid | 0 | Not used |
lastupdatedon | last updated timestamp to be updated here on each update | |
status | possible values - queued, success, failed | |
successresult | JSON data + success message |
4. For LINK operation-mode, get draft hierarchy of Textbooks mentioned in CSV and cache the dialCode-TextBookUnitDoId mapping in Redis
5. Push events to Kafka with Textbook Id as partition key for LINK operation-mode. Use hashed-value generated during duplicity check as partition key for other operation modes.
3. Asynchronous Processing - Samza
Validation of mandatory fields
Validate DIAL code (first against redis-cache, if not present, get draft hierarchy from cassandra and validate)
Validate the file size and file format in Google Drive
Validate the Taxonomy by creating the content - Hit the REST API
Download the AppIcon from Google Drive
Create an asset with downloaded image
Update content with AppIcon image URL
Download content file from Google Drive
Upload content - Hit the REST API
Publish the content - Hit the Java API
Get draft hierarchy of the TextBook from Cassandra
Get the metadata of the published content
Update the draft hierarchy of the TextBook in Cassandra
Update the status back to LMS Cassandra- bulk_upload_process_task table
Retire the Content in case of any exception in the flow
4. Scheduler
Scheduler to run in periodic intervals to consolidate the result from bulk_upload_process_task table and update the master table(bulk_upload_process) with success_count, failed_count, process_end_time, result_file_url and status
While a process is being marked as completed, the result file has to be generated, uploaded to blobstore and URL updated back to bulk_upload_process table
5. Status Check API
Data from bulk_upload_process table to be served based on the processId
6. Status List API
The userId of the user should be deduced from keycloak access token passed in the header.
Statuses of all uploads done by the user has to be served from the bulk_upload_process tables
API Specifications
Bulk Content Upload API - POST - /v1/textbook/content/bulk/upload
Request Headers
Content-Type | multipart/form-data |
Authorization | Bearer {{api-key}} |
x-authenticated-user-token | {{keycloak-token}} |
x-channel-id | {{channel-identifier}} |
x-framework-id | {{framework-identifier}} |
x-hashtag-id | {{tenant-id}} |
operation-mode | upload/publish/link |
Request Body
content: [contentUploadFile.csv]Response : Success Response - OK (200)
{
"id": "api.textbook.content.bulk.upload",
"ver": "v1",
"ts": "2019-07-26 11:28:42:315+0000",
"params": {
"resmsgid": null,
"msgid": "cf5b2e8e-70cf-401c-af29-980bc3151c67",
"err": null,
"status": "success",
"errmsg": null
},
"responseCode": "OK",
"result": {
"processId": "012813442982903808142"
}
}Response : Failure Response - BAD REQUEST (400) - Corrupt File
{
"id": "api.textbook.content.bulk.upload",
"ver": "v1",
"ts": "2019-07-26 11:28:42:315+0000",
"params": {
"resmsgid": null,
"msgid": "cf5b2e8e-70cf-401c-af29-980bc3151c67",
"err": "CORRUPT_FILE",
"status": "CORRUPT_FILE",
"errmsg": "Bulk content upload failed due to corrupt file"
},
"responseCode": "CLIENT_ERROR",
"result": { }
}Response : Failure Response - BAD REQUEST (400) - Invalid File Format(Only CSV files are supported)
{
"id": "api.textbook.content.bulk.upload",
"ver": "v1",
"ts": "2019-07-26 11:28:42:315+0000",
"params": {
"resmsgid": null,
"msgid": "cf5b2e8e-70cf-401c-af29-980bc3151c67",
"err": "INVALID_FILE_FORMAT",
"status": "INVALID_FILE_FORMAT",
"errmsg": "Bulk content upload failed due to invalid file format"
},
"responseCode": "CLIENT_ERROR",
"result": { }
}Response : Failure Response - BAD REQUEST (400) - Invalid File Template (Columns Missing)
{
"id": "api.textbook.content.bulk.upload",
"ver": "v1",
"ts": "2019-07-26 11:28:42:315+0000",
"params": {
"resmsgid": null,
"msgid": "cf5b2e8e-70cf-401c-af29-980bc3151c67",
"err": "INVALID_FILE_TEMPLATE",
"status": "INVALID_FILE_TEMPLATE",
"errmsg": "Bulk content upload failed due to invalid file template"
},
"responseCode": "CLIENT_ERROR",
"result": { }
}Response : Failure Response - BAD REQUEST (400) - Too many rows
{
"id": "api.textbook.content.bulk.upload",
"ver": "v1",
"ts": "2019-07-26 11:28:42:315+0000",
"params": {
"resmsgid": null,
"msgid": "cf5b2e8e-70cf-401c-af29-980bc3151c67",
"err": "MAX_ROW_COUNT_EXCEEDED",
"status": "MAX_ROW_COUNT_EXCEEDED",
"errmsg": "Max row count allowed is <config>"
},
"responseCode": "CLIENT_ERROR",
"result": { }
}Bulk Content Upload Status Check API - GET - /v1/textbook/content/bulk/upload/status/:processId
Request Headers
Accept | application/json |
Authorization | Bearer {{api-key}} |
x-authenticated-user-token | {{keycloak-token}} |
Response : Success Response - OK (200) - In Queue
{
"id": "api.textbook.content.bulk.upload.status",
"ver": "v1",
"ts": "2019-07-26 11:28:42:315+0000",
"params": {
"resmsgid": null,
"msgid": "cf5b2e8e-70cf-401c-af29-980bc3151c67",
"err": null,
"status": "success",
"errmsg": null
},
"responseCode": "OK",
"result": {
"processId": "012813442982903808142",
"status": "Queued",
"totalCount": 500
}
}Response : Success Response - OK (200) - In Progress
{
"id": "api.textbook.content.bulk.upload.status",
"ver": "v1",
"ts": "2019-07-26 11:28:42:315+0000",
"params": {
"resmsgid": null,
"msgid": "cf5b2e8e-70cf-401c-af29-980bc3151c67",
"err": null,
"status": "success",
"errmsg": null
},
"responseCode": "OK",
"result": {
"processId": "012813442982903808142",
"status": "Processing",
"totalCount": 500,
"successCount": 100,
"failedCount": 10,
"startTime": "2019-07-26 11:28:42:315+0000"
}
}Response : Success Response - OK (200) - Completed
{
"id": "api.textbook.content.bulk.upload.status",
"ver": "v1",
"ts": "2019-07-26 11:28:42:315+0000",
"params": {
"resmsgid": null,
"msgid": "cf5b2e8e-70cf-401c-af29-980bc3151c67",
"err": null,
"status": "success",
"errmsg": null
},
"responseCode": "OK",
"result": {
"processId": "012813442982903808142",
"status": "Completed",
"totalCount": 500,
"successCount": 450,
"failedCount": 50,
"startTime": "2019-07-26 11:28:42:315+0000"
"endTime": "2019-07-26 12:28:42:315+0000",
"report": "signedDownloadUrl"
}
}Response : Failure Response - RESOURSE NOT FOUND (404) - ProcessId not found
{
"id": "api.textbook.content.bulk.upload.status",
"ver": "v1",
"ts": "2019-07-26 11:28:42:315+0000",
"params": {
"resmsgid": null,
"msgid": "cf5b2e8e-70cf-401c-af29-980bc3151c67",
"err": "PROCESS_NOT_FOUND",
"status": "PROCESS_NOT_FOUND",
"errmsg": "Process Id xxx is not found in the system"
},
"responseCode": "RESOURCE_NOT_FOUND",
"result": { }
}Bulk Content Upload Status List API - GET - /v1/textbook/content/bulk/upload/status/list
Request Headers
Accept | application/json |
Authorization | Bearer {{api-key}} |
x-authenticated-user-token | {{keycloak-token}} |
Response : Success Response - OK (200)
{
"id": "api.textbook.content.bulk.upload.status.list",
"ver": "v1",
"ts": "2019-07-26 11:28:42:315+0000",
"params": {
"resmsgid": null,
"msgid": "cf5b2e8e-70cf-401c-af29-980bc3151c67",
"err": null,
"status": "success",
"errmsg": null
},
"responseCode": "OK",
"result": {
"userId": "f61826aa-8e5d-4356-b8ad-edda92460750",
"uploads": [
{
"processId": "012813442982903808142",
"uploadedDate": "2018-12-12 14:25:27:466+0530",
"status": "Completed"
},
{
"processId": "012813442982903808143",
"uploadedDate": "2018-12-14 12:01:36:807+0530",
"status": "Processing"
}
]
}
}