Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

User can use bulk upload service for various artifacts like User, Organization, Location etc. Currently, user can access the results of upload through upload/status api, where in user gets all the records uploaded in json format. This can be difficult to consume for the end-user. Hence it is ideal to upload this results into cloud-service, and user can download the same from BulkUpload status service.

Solution

Outcome

As part of the story SB-8067, we are going to show all the bulkupload process to user, and allow him to filter and get all the bulk upload request that he has done through sunbird portal.

As part of this result, we will add one more column, which will provide the download link to bulkupload results.User can download the results as csv from here.

It will also hold additional data like - organisationId for Org Bulk Upload.

One column will be added to the curent table: bulk_upload_process, which will store the download link.

The link should only be visible if the process is marked to completed state.

Approach 1: Upload immediately

When bulk upload process finishes in background we will trigger the call to upload the results to cloud service.

Pros:

1. Results will be readily available to client, as soon as the upload process is completed.

Cons:

1. Is user need not download, we will end up consuming cloud storage unnecessarily.

Approach 2: Upload on user action

When bulk upload process finishes, we will just mark the status to completed. But, results will not be uploaded.

Now, when user clicks for the first time, we will check in our database, whether url is already generated. If not generated, we will now upload and generate the URL.

Next time, when user clicks, there will be 2 scenarios:

  1. Link might have expired: In this case we will regenerate and return the new URL.
  2. Link is still valid: we will just return the URL, so that no extra call is required.

Pros:

  • Cloud storage consumption is avoided

Cons:

  • This will be little time consuming, as upload is invoked only once user has initiated the action

Recommended link expiration: 1 dayApplicability

The solution is applicable for all types of bulk upload results. i.e. Org, User, Location etc.

User Outcome

User will be able to download the results of bulk upload process, from manage upload page. (where user can filter all the uploads done by certain filters).

As part of the tabular results which tracks status of all upload processes, there will be one additional column - Download Result -which will allow user to download the results of upload in CSV format.

This link will only be enabled/visible if status of the bulk upload is marked to completed.

Based on this URL, user can download the results.

Naming convention for the file downloaded would be → "<upload_type>_<process_Id>.csv". E.g. org_01201202.csv.

CSV file will have user-friendly labels (as per the system configuration).


Approach 1: Upload & Store Generated URL (Link does not expire)

As soon as the bulk upload is finished, we will upload the file to cloud storage, and generated URL will be stored within the table bulk_upload_process.

Once upload is done, we will set the status of upload to Completed

Pros:

  1. Minimal information stored within our database, which can easily access the uploaded data.
  2. Simple to implement.

Cons:

  1. As this is public URL without any expiry, anyone with access to link can download this sensitive data.
  2. We will have to parse the URL in reverse to generate the new signed URL.
  3. We cannot switch between different providers, as we are storing hard-coded data.
  4. In case of container URL changes, this will not be able to handle

Approach 2: Upload & Store Data required to generate URL on the fly

As soon as the bulk upload is finished, we will upload the file to cloud storage, but will not used the signed URL which is generated.

We will store enough data, to generate the signed URL on the fly.


Pros:

  1. We can secure the URL and minimize the security risks of leaking the sensitive data.
  2. We can use/switch between multiple providers, as we are storing minimum data to

Cons:

  1. We will have to store the data used for generating the links as well as the provider type.
  2. Link will expire within limited time-frame (as per configuration)


Technical Design


DB Changes

We will add one column cloud_storage_data in cassandra table → bulk_upload_process,which will store the data on upload. It will be a json format data, in following format.

{ "storageType" : "azure", "container": "User", "objectId": "101001100"}

Above data will be stored in encrypted format within database, so one cannot decipher the data at all.

Algorithm for approach 1

  • Before marking the status of bulkupload to completed, we will check the type - if it is Org or User or Location, it will continue the following process.

       Note: Currently only User/Org/Location will use this functionality, hence we will be enabling it only for this 3 functionalities.

  • Create a temporary CSV file, with name <type>_<processid>.csv.
  • Read the configuration for user-friendly names from system settings.
  • Read the first row from results (success/failure) and map the columns to user friendly names to write the first row.
  • Add one more column Result to mark the results, as Success or Failure
  • Now start adding the result rows, from success results first. mark the last column as Success in each case.
  • Now start adding the result rows from failure results, mark the last column as Failure in each case
  • Invoke the utility, which uploads the file into cloud storage based on : storageType, container, objectId, with pre-configured timeout.
  • Expected environment variable to connect to the cloud-storage. Currently only azure will be supported, which requires two environment variables AZURE_ACCOUNT_NAME, AZURE_ACCOUNT_KEY
  • In our case it will be currently 
    • storageType: "azure",
    • container: "org" or "user",
    • objectId: "user_101010" or "org_101110"
  • Once successful, store the generated URL into our db column.
  • Delete the temporary file.
  • Mark the status of upload to Completed.

Algorithm for approach 2

  • Only step after upload changes, rest are same as approach 1:
    • Once successful, encrypt the data used for uploading in above step as described in DB Changes section and store the data into cloud_storage_data column.

Public API exposed

GET upload/statusDownloadLink/processId

Output:

{

 "id":"api.upload.status",

"ver":"v1",

"ts":"2018-10-29 17:03:51:282+0530",

"params":{

"resmsgid":null,

"msgid":"1e44fd1f-acad-4f37-a3c5-69928eec98ca",

"err":null,

"status":"success",

"errmsg":null

},

"responseCode":"OK",

"result":{

"response":"signedUrl"

}

}