/
Bulk Upload - Content

Bulk Upload - Content

Problem Statement

Related Jira Task -  SB-12164 - Getting issue details... STATUS

  1. Bulk Content Upload is to be supported in Sunbird, with 3 operation modes
  2. API to be made available to check the real-time status of the bulk content upload process
  3. API to be made available to list the statuses of processes initiated by an user
operation-modeworkflow
uploadcreate-upload content
publishcreate-upload-publish content
linkcreate-upload-publish content and link it to textbook


Design

1. Validations

File related validations to be done are,

  1. Validate the format of the file
  2. Validate whether the file is readable
  3. Validate whether the file has data

Data related validations to be done are,

  1. Check whether the file is conforming to the bulk content upload template(The template should be configurable)
  2. Number of rows in file should be less than Max rows allowed(configuration)
  3. Duplicity check within the file. Key is Taxonomy(BGMS)+ContentName


2. Synchronous Processing

  1. Upload the CSV file to blob storage

      2. Make an entry in bulk_upload_process table

columndata to insertremarks
id

auto-generated unique id

processId
createdby

uploader id


createdoncurrent timestamp
data

blobstore url of CSV file


failureresult
failedCount to be updated here
lastupdatedon
last updated timestamp to be updated here on each update
objecttypecontent
organisationidtenant id

processendtime


endTime - current timestamp to be inserted here while moving this process to completed state

processstarttime


startTime - current timestamp to be inserted here while moving this process to processing state

retrycount

0Not used

status

queuedstatus - possible values - queued, processing, completed

storagedetails


report - blobstore url of result file

successresult


successCount to be updated here

taskcount

number of records in filetotalCount

uploadedby

uploader id


uploadeddate

current timestamp


3. Make entries into bulk_upload_process_task table (One record per content)

columndata to insertremarks

processid

processidid from master table

sequenceid

auto-generated sequence id

createdon

current timestamp

data

data in JSON format

failureresult


JSON data + failed message

iterationid

0Not used

lastupdatedon


last updated timestamp to be updated here on each update

status


possible values - queuedsuccess, failed

successresult


JSON data + success message


4. For LINK operation-mode, get draft hierarchy of Textbooks mentioned in CSV and cache the dialCode-TextBookUnitDoId mapping in Redis

5. Push events to Kafka with Textbook Id as partition key for LINK operation-mode. Use hashed-value generated during duplicity check as partition key for other operation modes.


3. Asynchronous Processing - Samza

  1. Validation of mandatory fields
  2. Validate DIAL code (first against redis-cache, if not present, get draft hierarchy from cassandra and validate)
  3. Validate the file size and file format in Google Drive
  4. Validate the Taxonomy by creating the content - Hit the REST API
  5. Download the AppIcon from Google Drive
  6. Create an asset with downloaded image
  7. Update content with AppIcon image URL
  8. Download content file from Google Drive
  9. Upload content - Hit the REST API
  10. Publish the content - Hit the Java API
  11. Get draft hierarchy of the TextBook from Cassandra
  12. Get the metadata of the published content
  13. Update the draft hierarchy of the TextBook in Cassandra
  14. Update the status back to LMS Cassandra- bulk_upload_process_task table
  15. Retire the Content in case of any exception in the flow


4. Scheduler

  1. Scheduler to run in periodic intervals to consolidate the result from bulk_upload_process_task table and update the master table(bulk_upload_process) with success_count, failed_count, process_end_time, result_file_url and status
  2. While a process is being marked as completed, the result file has to be generated, uploaded to blobstore and URL updated back to bulk_upload_process table


5. Status Check API

  1. Data from bulk_upload_process table to be served based on the processId


6. Status List API

  1. The userId of the user should be deduced from keycloak access token passed in the header.
  2. Statuses of all uploads done by the user has to be served from the bulk_upload_process tables



API Specifications

Bulk Content Upload API - POST - /v1/textbook/content/bulk/upload

Request Headers

Content-Typemultipart/form-data
AuthorizationBearer {{api-key}}
x-authenticated-user-token{{keycloak-token}}
x-channel-id{{channel-identifier}}
x-framework-id{{framework-identifier}}
x-hashtag-id{{tenant-id}}
operation-modeupload/publish/link


Request Body

content: [contentUploadFile.csv]


Response : Success Response - OK (200)

{
    "id": "api.textbook.content.bulk.upload",
    "ver": "v1",
    "ts": "2019-07-26 11:28:42:315+0000",
    "params": {
        "resmsgid": null,
        "msgid": "cf5b2e8e-70cf-401c-af29-980bc3151c67",
        "err": null,
        "status": "success",
        "errmsg": null
    },
    "responseCode": "OK",
    "result": {
        "processId": "012813442982903808142"
    }
}


Response : Failure Response - BAD REQUEST (400) - Corrupt File

{
    "id": "api.textbook.content.bulk.upload",
    "ver": "v1",
    "ts": "2019-07-26 11:28:42:315+0000",
    "params": {
        "resmsgid": null,
        "msgid": "cf5b2e8e-70cf-401c-af29-980bc3151c67",
        "err": "CORRUPT_FILE",
        "status": "CORRUPT_FILE",
        "errmsg": "Bulk content upload failed due to corrupt file"
    },
    "responseCode": "CLIENT_ERROR",
    "result": { }
}


Response : Failure Response - BAD REQUEST (400) - Invalid File Format(Only CSV files are supported)

{
    "id": "api.textbook.content.bulk.upload",
    "ver": "v1",
    "ts": "2019-07-26 11:28:42:315+0000",
    "params": {
        "resmsgid": null,
        "msgid": "cf5b2e8e-70cf-401c-af29-980bc3151c67",
        "err": "INVALID_FILE_FORMAT",
        "status": "INVALID_FILE_FORMAT",
        "errmsg": "Bulk content upload failed due to invalid file format"
    },
    "responseCode": "CLIENT_ERROR",
    "result": { }
}


Response : Failure Response - BAD REQUEST (400) - Invalid File Template (Columns Missing)

{
    "id": "api.textbook.content.bulk.upload",
    "ver": "v1",
    "ts": "2019-07-26 11:28:42:315+0000",
    "params": {
        "resmsgid": null,
        "msgid": "cf5b2e8e-70cf-401c-af29-980bc3151c67",
        "err": "INVALID_FILE_TEMPLATE",
        "status": "INVALID_FILE_TEMPLATE",
        "errmsg": "Bulk content upload failed due to invalid file template"
    },
    "responseCode": "CLIENT_ERROR",
    "result": { }
}


Response : Failure Response - BAD REQUEST (400) - Too many rows

{
    "id": "api.textbook.content.bulk.upload",
    "ver": "v1",
    "ts": "2019-07-26 11:28:42:315+0000",
    "params": {
        "resmsgid": null,
        "msgid": "cf5b2e8e-70cf-401c-af29-980bc3151c67",
        "err": "MAX_ROW_COUNT_EXCEEDED",
        "status": "MAX_ROW_COUNT_EXCEEDED",
        "errmsg": "Max row count allowed is <config>"
    },
    "responseCode": "CLIENT_ERROR",
    "result": { }
}




Bulk Content Upload Status Check API - GET - /v1/textbook/content/bulk/upload/status/:processId

Request Headers

Acceptapplication/json
AuthorizationBearer {{api-key}}
x-authenticated-user-token{{keycloak-token}}

Response : Success Response - OK (200) - In Queue

{
    "id": "api.textbook.content.bulk.upload.status",
    "ver": "v1",
    "ts": "2019-07-26 11:28:42:315+0000",
    "params": {
        "resmsgid": null,
        "msgid": "cf5b2e8e-70cf-401c-af29-980bc3151c67",
        "err": null,
        "status": "success",
        "errmsg": null
    },
    "responseCode": "OK",
    "result": {
        "processId": "012813442982903808142",
		"status": "Queued",
		"totalCount": 500
    }
}

Response : Success Response - OK (200) - In Progress

{
    "id": "api.textbook.content.bulk.upload.status",
    "ver": "v1",
    "ts": "2019-07-26 11:28:42:315+0000",
    "params": {
        "resmsgid": null,
        "msgid": "cf5b2e8e-70cf-401c-af29-980bc3151c67",
        "err": null,
        "status": "success",
        "errmsg": null
    },
    "responseCode": "OK",
    "result": {
        "processId": "012813442982903808142",
		"status": "Processing",
		"totalCount": 500,
		"successCount": 100,
		"failedCount": 10,
		"startTime": "2019-07-26 11:28:42:315+0000"
    }
}


Response : Success Response - OK (200) - Completed

{
    "id": "api.textbook.content.bulk.upload.status",
    "ver": "v1",
    "ts": "2019-07-26 11:28:42:315+0000",
    "params": {
        "resmsgid": null,
        "msgid": "cf5b2e8e-70cf-401c-af29-980bc3151c67",
        "err": null,
        "status": "success",
        "errmsg": null
    },
    "responseCode": "OK",
    "result": {
        "processId": "012813442982903808142",
		"status": "Completed",
		"totalCount": 500,
		"successCount": 450,
		"failedCount": 50,
		"startTime": "2019-07-26 11:28:42:315+0000"
		"endTime": "2019-07-26 12:28:42:315+0000",
		"report": "signedDownloadUrl"
    }
}


Response : Failure Response - RESOURSE NOT FOUND (404) - ProcessId not found

{
    "id": "api.textbook.content.bulk.upload.status",
    "ver": "v1",
    "ts": "2019-07-26 11:28:42:315+0000",
    "params": {
        "resmsgid": null,
        "msgid": "cf5b2e8e-70cf-401c-af29-980bc3151c67",
        "err": "PROCESS_NOT_FOUND",
        "status": "PROCESS_NOT_FOUND",
        "errmsg": "Process Id xxx is not found in the system"
    },
    "responseCode": "RESOURCE_NOT_FOUND",
    "result": { }
}


Bulk Content Upload Status List API - GET - /v1/textbook/content/bulk/upload/status/list

Request Headers

Acceptapplication/json
AuthorizationBearer {{api-key}}
x-authenticated-user-token{{keycloak-token}}

Response : Success Response - OK (200)

{
    "id": "api.textbook.content.bulk.upload.status.list",
    "ver": "v1",
    "ts": "2019-07-26 11:28:42:315+0000",
    "params": {
        "resmsgid": null,
        "msgid": "cf5b2e8e-70cf-401c-af29-980bc3151c67",
        "err": null,
        "status": "success",
        "errmsg": null
    },
    "responseCode": "OK",
    "result": {
		"userId": "f61826aa-8e5d-4356-b8ad-edda92460750",
		"uploads": [
			{
				"processId": "012813442982903808142",
				"uploadedDate": "2018-12-12 14:25:27:466+0530",
				"status": "Completed"
			},
			{
				"processId": "012813442982903808143",
				"uploadedDate": "2018-12-14 12:01:36:807+0530",
				"status": "Processing"
			}
		]
    }
}