Manage Learn - Add New CSP Storage - Implementation Changes & Testing
Context
This document details how to enable any new CSP provider for Manage Learn capabilities which is part of Sunbird Ed
Manage Learn capabilities currently support Azure, AWS, OCI, and GCP
ML Core Service interacts with cloud storage for upload/download operation, and all other services(Survey, Projects, and Reports) as well as mobile apps and portals use these APIs for their needs:
Get a Signed URL (To upload assets to the cloud)
Get Downloadable URL
In Mongo DB where all the transactions of projects and observations are stored, only the relative path of the evidences of assets uploaded by users is captured. Bucket and CSP details are provided via config to the ML Core service.
In order to add support for any other cloud storage (e.g: Digital Ocean), below steps need to be followed:
ml-core-service:
Git Repos: https://github.com/project-sunbird/ml-core-service/tree/release-5.1.0
Latest branch: release-5.1.0
Step 1:- Introduce the necessary environment configuration basis the env sample as below (Refer - https://github.com/project-sunbird/ml-core-service/blob/release-5.1.0/.env.sample )
Step 2 - Define environment key for bucket name for preSignedUrls
and getDownloadableUrl
functions in here - https://github.com/project-sunbird/ml-core-service/blob/release-5.1.0/module/cloud-services/files/helper.js
Step 3 - Modify functions of https://github.com/project-sunbird/ml-core-service/blob/release-5.1.0/module/files/helper.js to enable new cloud provider
Step 4 - Add support for new cloud provider in the module here - https://github.com/project-sunbird/ml-core-service/tree/release-5.1.0/module/cloud-services
Note - Include new library via package.json and keep same function signatures in new module files added
Step 5 - Override value for below variables under private devops repo (file path: ansible/inventory/<env_name>/Core/common.yml) for new storage account:
ml_cloud_config
Step 6 - After Configuration Change, Deploy the service.
Step 7 - Test the 2 APIs as mentioned under Context section.
Adding New Cloud Libraries for data-pipeline
This document details about integration points for any new CSP provider with ml-analytics platform.
ml-analytics release-5.1.0, latest as on Dec, 2022
Few points to note:
ml-analytics-service
has integrated code with Azure, AWS, Oracle and GCP.ml-analytics-service
uses cloud for pre-processed data to be pushed before ingesting into Druid datasources.
In order to add support for any other cloud storage under ml-analytics components, below steps need to be followed:
Git repository:
Latest Branch: release-5.1.0
Changes to the cloud_storage
folder:
Add a new blank file with the name of the cloud service provider with the name of the cloud provider (Eg:
ms_azure.py
,oracle.py
). Make sure the file is created inside thecloud_storage
folder in the same hierachy as other files such asms_azure.py
,gcp.py
Import relavant and necessary libraries that can interact with the new cloud provider.
Define a Python Class Object .
Under the intilization method or
__init__
, construct and initlialize variables that identify the -Cloud account ID
Cloud account Key
Cloud account storage container/blob
Any other necessary variable that need to be initiated to interact with the cloud storage (Eg: token, type)
Create a Class-method named
upload_files
that has three (3) arguments -bucketPath
- Accepts the cloud account storage container/bloblocalPath
- Accept the local path of where the file is generatedfileName
- Accepts the name of the file
Then call the upload function from the library.
An example of the code is given below:
### -----Step 2 ----- ###
import os
import boto3
### ---- Step 3 ---- ###
class Oracle:
'''
Class to inititate and upload data in Oracle
'''
### ---- Step 4 ---- ###
def __init__(self, regionName, accessKey, secretAccessKey, endpoint_url, bucketName):
self.oracle = boto3.client(
service_name = 's3',
region_name = regionName,
aws_access_key_id = accessKey,
aws_secret_access_key = secretAccessKey,
endpoint_url = endpoint_url
)
self.bucket = bucketName
### ---- Step 5 ---- ###
def upload_files(self, bucketPath, localPath, fileName):
### ---- Step 6 ---- ###
with open(f"{localPath}/{fileName}", "rb") as file:
self.oracle.upload_fileobj(file, self.bucket, f"{bucketPath}/{fileName}")
Changes to the cloud.py
file
Import the Library you created in the
cloud.py
fileInside the
MultiCloud
Class - look forupload_to_cloud
methodAdd an
elif
statement that refer to the recently created cloud library-Initialize the cloud library by passing in the necessary parameters
Call the
upload_files
method and pass in these below values:<<name_of_service>>_service.upload_files(bucketPath = blob_Path, localPath = local_Path, fileName = file_Name)
Changes to the config.sample
file
For the config.sample
file, append a section in the config file:
Post adding these changes - this repository needs to be updated with the relevant values of the the added configuration variables.