Upload ToC from CSV

Upload ToC from CSV

Overview :

                   Provision for Textbook Creator to upload a csv in pre-defined format and system should be able to generate ToC based on that.



Problem Statement:

                   Process of creation of Textbook ToC (Table of Content) through portal is time consuming, requires training and continuous internet connectivity. ToCs are in tree like structures hence one chapter can have multiple sections and those sections can have multiple subsections. To simplify the process it requires a way where user can upload a csv file with input data and ToC can be created in one go after validation.

Sample CSV data. Also, there should be a way to update the attributes of sections, to help user do that - we should be able to download the content as csv format.



Proposed Solution:

To facilitate the process of ToC creation through CSV an API needs to be exposed which will take csv file as a request .

               

 Once the file has been uploaded the system will do below validations



validation

description

type

configurable

validation

description

type

configurable

Textbook Creator

based on the authorization, the API should validate if the user calling the API has "Textbook Creator" role or not

User

N/A

File exists and type

if the API has been invoked with a file of valid extension or not

File

N/A

File Size

If the uploaded file size is more than set limit

File

Yes

File Headers

If the file contains the required headers or not

Column

Yes 

mandatory and allowed headers would be stored in db in system setting

File contains data

If the file contains data other than header

Data

N/A

Duplicate data

validation to verify if there is duplicate row (approach described in details below)

Data

N/A

Textbook Name

Textbook Name from the file should be validated against data fetched with get hirearchy call with content Id

Data

N/A

First Level Unit limit

There could be maximum 30 first level unit (i.e. Chapter)

Data

Yes



Upload ToC API

POST v1/toc/:contentId    (where contentId is textbook id)

 Header 

{


Content-Type : "multipart/form-data",
ts:"",
X-msgid:"",
Authorization:"",
x-authenticated-user-token:"

}

Request

data : [file]

Response : 200 OK

{

"id": "api.toc.create",

"ver": "v1",

"ts": "2018-11-15 13:52:50:990+0530",

"params": {

"resmsgid": null,

"msgid": null,

"err": null,

"status": "success",

"errmsg": null

},

"responseCode": "OK",

"result": {

"response": "SUCCESS"

}

}

                        ERROR 400 Bad Request

                     {
                              "id": "api.toc.create",
                              "ver": "v1",
                              "ts": "2018-11-15 13:52:50:990+0530"
                              "params": {
                                     "resmsgid": null,
                                     "msgid": null,
                                     "err": "UPLOAD_TOC_FAILED",
                                     "status": "UPLOAD_TOC_FAILED",
                                     "errmsg": "Table of Content could not be uploaded due to validation errors"
                                              },
                             "responseCode": "CLIENT_ERROR",
                            "result": {}
                       }

                      ERROR 404 Not Found

                     {
                              "id": "api.toc.create",
                              "ver": "v1",
                              "ts": "2018-11-15 13:52:50:990+0530"
                              "params": {
                                     "resmsgid": null,
                                     "msgid": null,
                                     "err": "RESOURCE_NOT_FOUND",
                                     "status": "RESOURCE_NOT_FOUND",
                                     "errmsg": "Resource not found"
                                              },
                             "responseCode": "CLIENT_ERROR",
                            "result": {}
                       }


Update ToC Attributes

PATCH     v1/toc/:contentId    (where contentId is textbook id)

 Header 

{

Content-Type : "multipart/form-data",
ts:"",
X-msgid:"",
Authorization:"",
x-authenticated-user-token:"

}

Request

data : [file]

Response

{
"id": "api.toc",
"ver": "v1",
"ts": "2018-11-15 13:52:50:990+0530",
"params": {
"resmsgid": null,
"msgid": null,
"err": null,
"status": "success",
"errmsg": null
},
"responseCode": "OK",
"result": {
"response": "SUCCESS"
}
}

                       ERROR 400 Bad Request

                     {
                              "id": "api.toc.update",
                              "ver": "v1",
                              "ts": "2018-11-15 13:52:50:990+0530"
                              "params": {
                                     "resmsgid": null,
                                     "msgid": null,
                                     "err": "UPLOAD_TOC_FAILED",
                                     "status": "UPLOAD_TOC_FAILED",
                                     "errmsg": "Table of Content could not be uploaded due to validation errors"
                                              },
                             "responseCode": "CLIENT_ERROR",
                            "result": {}
                       }

                      ERROR 404 Not Found

                     {
                              "id": "api.toc.update",
                              "ver": "v1",
                              "ts": "2018-11-15 13:52:50:990+0530"
                              "params": {
                                     "resmsgid": null,
                                     "msgid": null,
                                     "err": "RESOURCE_NOT_FOUND",
                                     "status": "RESOURCE_NOT_FOUND",
                                     "errmsg": "Resource not found"
                                              },
                             "responseCode": "CLIENT_ERROR",
                            "result": {}
                       }


Download ToC as CSV

GET     v1/toc/:contentId    (where contentId is textbook id)

 Header 

{

Content-Type : "multipart/form-data",
ts:"",
X-msgid:"",
Authorization:"",
x-authenticated-user-token:"

}


Response

{contentId}.csv -> downloaded as file, will contain the upload csv + attributes.

                            ERROR 404 Not Found

                           {
                              "id": "api.toc.update",
                              "ver": "v1",
                              "ts": "2018-11-15 13:52:50:990+0530"
                              "params": {
                                     "resmsgid": null,
                                     "msgid": null,
                                     "err": "RESOURCE_NOT_FOUND",
                                     "status": "RESOURCE_NOT_FOUND",
                                     "errmsg": "Resource not found"
                                              },
                             "responseCode": "CLIENT_ERROR",
                            "result": {}
                         }


Approach for duplicate row validation

1st Approach :



In this approach first, we will get each row, concatenate each cell data together after trimming and create a hash of the string (ex. md5). Then we will be putting it in a hashset after verifying if it does not exists. 

In case hash of any row is already existing we will throw duplicate row error.

 2nd Approach:

This approach is similar in concept to previous one however here we will validate the duplicate while creating a tree for the ToC. We can store the ToC info of Textbook as Map<String,Map<String,Set<String>>>. While creating this structure if at any point we see a collision, we can throw duplicate row error.

3rd Approach:

We can use guava bloom filter library for finding out the duplicate.

Comparison of above approaches 

Approach

pros

cons



Approach

pros

cons



1st

simple to implement

concatenation



2nd

no extra check

would need treemap and treeset to maintain order



3rd

memory friendly for large dataset

probabilistic with chance of false positive





After the validations are successful we need to process data in the required format to make a post call to update hirearchy for the toc to be created.