Upload ToC from CSV

Overview :

                   Provision for Textbook Creator to upload a csv in pre-defined format and system should be able to generate ToC based on that.


Problem Statement:

                   Process of creation of Textbook ToC (Table of Content) through portal is time consuming, requires training and continuous internet connectivity. ToCs are in tree like structures hence one chapter can have multiple sections and those sections can have multiple subsections. To simplify the process it requires a way where user can upload a csv file with input data and ToC can be created in one go after validation.

Sample CSV data. Also, there should be a way to update the attributes of sections, to help user do that - we should be able to download the content as csv format.


Proposed Solution:

To facilitate the process of ToC creation through CSV an API needs to be exposed which will take csv file as a request .

               

 Once the file has been uploaded the system will do below validations


validationdescriptiontypeconfigurable
Textbook Creatorbased on the authorization, the API should validate if the user calling the API has "Textbook Creator" role or notUserN/A
File exists and typeif the API has been invoked with a file of valid extension or notFileN/A
File SizeIf the uploaded file size is more than set limitFileYes
File HeadersIf the file contains the required headers or notColumn

Yes 

mandatory and allowed headers would be stored in db in system setting

File contains dataIf the file contains data other than headerDataN/A
Duplicate datavalidation to verify if there is duplicate row (approach described in details below)DataN/A
Textbook NameTextbook Name from the file should be validated against data fetched with get hirearchy call with content IdDataN/A
First Level Unit limitThere could be maximum 30 first level unit (i.e. Chapter)DataYes


Upload ToC API

POST v1/toc/:contentId    (where contentId is textbook id)

 Header 

{


Content-Type : "multipart/form-data",
ts:"",
X-msgid:"",
Authorization:"",
x-authenticated-user-token:"

}

Request

data : [file]

Response : 200 OK

					{

"id": "api.toc.create",

"ver": "v1",

"ts": "2018-11-15 13:52:50:990+0530",

"params": {

"resmsgid": null,

"msgid": null,

"err": null,

"status": "success",

"errmsg": null

},

"responseCode": "OK",

"result": {

"response": "SUCCESS"

}

}

                        ERROR 400 Bad Request

                     {
                              "id": "api.toc.create",
                              "ver": "v1",
                              "ts": "2018-11-15 13:52:50:990+0530"
                              "params": {
                                     "resmsgid": null,
                                     "msgid": null,
                                     "err": "UPLOAD_TOC_FAILED",
                                     "status": "UPLOAD_TOC_FAILED",
                                     "errmsg": "Table of Content could not be uploaded due to validation errors"
                                              },
                             "responseCode": "CLIENT_ERROR",
                            "result": {}
                       }

                      ERROR 404 Not Found

                     {
                              "id": "api.toc.create",
                              "ver": "v1",
                              "ts": "2018-11-15 13:52:50:990+0530"
                              "params": {
                                     "resmsgid": null,
                                     "msgid": null,
                                     "err": "RESOURCE_NOT_FOUND",
                                     "status": "RESOURCE_NOT_FOUND",
                                     "errmsg": "Resource not found"
                                              },
                             "responseCode": "CLIENT_ERROR",
                            "result": {}
                       }


Update ToC Attributes

PATCH     v1/toc/:contentId    (where contentId is textbook id)

 Header 

{

Content-Type : "multipart/form-data",
ts:"",
X-msgid:"",
Authorization:"",
x-authenticated-user-token:"

}

Request

data : [file]

Response

					{
"id": "api.toc",
"ver": "v1",
"ts": "2018-11-15 13:52:50:990+0530",
"params": {
"resmsgid": null,
"msgid": null,
"err": null,
"status": "success",
"errmsg": null
},
"responseCode": "OK",
"result": {
"response": "SUCCESS"
}
}

                       ERROR 400 Bad Request

                     {
                              "id": "api.toc.update",
                              "ver": "v1",
                              "ts": "2018-11-15 13:52:50:990+0530"
                              "params": {
                                     "resmsgid": null,
                                     "msgid": null,
                                     "err": "UPLOAD_TOC_FAILED",
                                     "status": "UPLOAD_TOC_FAILED",
                                     "errmsg": "Table of Content could not be uploaded due to validation errors"
                                              },
                             "responseCode": "CLIENT_ERROR",
                            "result": {}
                       }

                      ERROR 404 Not Found

                     {
                              "id": "api.toc.update",
                              "ver": "v1",
                              "ts": "2018-11-15 13:52:50:990+0530"
                              "params": {
                                     "resmsgid": null,
                                     "msgid": null,
                                     "err": "RESOURCE_NOT_FOUND",
                                     "status": "RESOURCE_NOT_FOUND",
                                     "errmsg": "Resource not found"
                                              },
                             "responseCode": "CLIENT_ERROR",
                            "result": {}
                       }


Download ToC as CSV

GET     v1/toc/:contentId    (where contentId is textbook id)

 Header 

{

Content-Type : "multipart/form-data",
ts:"",
X-msgid:"",
Authorization:"",
x-authenticated-user-token:"

}


Response

			  {contentId}.csv -> downloaded as file, will contain the upload csv + attributes. 

                            ERROR 404 Not Found

                           {
                              "id": "api.toc.update",
                              "ver": "v1",
                              "ts": "2018-11-15 13:52:50:990+0530"
                              "params": {
                                     "resmsgid": null,
                                     "msgid": null,
                                     "err": "RESOURCE_NOT_FOUND",
                                     "status": "RESOURCE_NOT_FOUND",
                                     "errmsg": "Resource not found"
                                              },
                             "responseCode": "CLIENT_ERROR",
                            "result": {}
                         }


Approach for duplicate row validation

1st Approach :


In this approach first, we will get each row, concatenate each cell data together after trimming and create a hash of the string (ex. md5). Then we will be putting it in a hashset after verifying if it does not exists. 

In case hash of any row is already existing we will throw duplicate row error.

 2nd Approach:

This approach is similar in concept to previous one however here we will validate the duplicate while creating a tree for the ToC. We can store the ToC info of Textbook as Map<String,Map<String,Set<String>>>. While creating this structure if at any point we see a collision, we can throw duplicate row error.

3rd Approach:

We can use guava bloom filter library for finding out the duplicate.

Comparison of above approaches 
Approachproscons
1stsimple to implementconcatenation
2ndno extra checkwould need treemap and treeset to maintain order
3rdmemory friendly for large datasetprobabilistic with chance of false positive


After the validations are successful we need to process data in the required format to make a post call to update hirearchy for the toc to be created.