Context
This document details how to enable any new CSP provider for Manage Learn capabilities which is part of Sunbird Ed
...
Step 1:- Introduce the necessary environment configuration basis the env sample as below (Refer - https://github.com/project-sunbird/ml-core-service/blob/release-5.1.0/.env.sample )
Expand | ||
---|---|---|
| ||
|
...
Step 7 - Test the 2 APIs as mentioned under Context section.
...
Adding New Cloud Libraries for data-pipeline
This document details about integration points for any new CSP provider with ml-analytics platform.
Info |
---|
ml-analytics release-5.1.0, latest as on Dec, 2022 |
Few points to note:
ml-analytics-service
has integrated code with Azure, AWS, Oracle and GCP.ml-analytics-service
uses cloud for pre-processed data to be pushed before ingesting into Druid datasources.
In order to add support for any other cloud storage under ml-analytics components, below steps need to be followed:
Git repository:
...
Latest Branch: release-5.1.0
Changes to the cloud_storage
folder:
Add a new blank file with the name of the cloud service provider with the name of the cloud provider (Eg:
ms_azure.py
,oracle.py
). Make sure the file is created inside thecloud_storage
folder in the same hierachy as other files such asms_azure.py
,gcp.py
Import relavant and necessary libraries that can interact with the new cloud provider.
Define a Python Class Object .
Under the intilization method or
__init__
, construct and initlialize variables that identify the -Cloud account ID
Cloud account Key
Cloud account storage container/blob
Any other necessary variable that need to be initiated to interact with the cloud storage (Eg: token, type)
Create a Class-method named
upload_files
that has three (3) arguments -bucketPath
- Accepts the cloud account storage container/bloblocalPath
- Accept the local path of where the file is generatedfileName
- Accepts the name of the file
Then call the upload function from the library.
Info |
---|
An example of the code is given below: |
Code Block | ||||
---|---|---|---|---|
| ||||
### -----Step 2 ----- ###
import os
import boto3
### ---- Step 3 ---- ###
class Oracle:
'''
Class to inititate and upload data in Oracle
'''
### ---- Step 4 ---- ###
def __init__(self, regionName, accessKey, secretAccessKey, endpoint_url, bucketName):
self.oracle = boto3.client(
service_name = 's3',
region_name = regionName,
aws_access_key_id = accessKey,
aws_secret_access_key = secretAccessKey,
endpoint_url = endpoint_url
)
self.bucket = bucketName
### ---- Step 5 ---- ###
def upload_files(self, bucketPath, localPath, fileName):
### ---- Step 6 ---- ###
with open(f"{localPath}/{fileName}", "rb") as file:
self.oracle.upload_fileobj(file, self.bucket, f"{bucketPath}/{fileName}") |
Changes to the cloud.py
file
Import the Library you created in the
cloud.py
fileInside the
MultiCloud
Class - look forupload_to_cloud
methodAdd an
elif
statement that refer to the recently created cloud library-Initialize the cloud library by passing in the necessary parameters
Call the
upload_files
method and pass in these below values:Code Block language py <<name_of_service>>_service.upload_files(bucketPath = blob_Path, localPath = local_Path, fileName = file_Name)
Info |
---|
An example of the code is shown below: |
Code Block | ||||
---|---|---|---|---|
| ||||
### ---- Step 1 ---- ###
from from oracle import Oracle
...
### ---- Step 2 & 3 ---- ###
elif elements == "ORACLE":
oracle_service = Oracle(
regionName = config.get("ORACLE", "region_name"),
accessKey = config.get("ORACLE", "access_key"),
secretAccessKey = config.get("ORACLE", "secret_access_key"),
endpoint_url = config.get("ORACLE", "endpoint_url"),
bucketName = config.get("ORACLE", "bucket_name")
)
oracle_service.upload_files(bucketPath = blob_Path, localPath = local_Path, fileName = file_Name) |
Changes to the config.sample
file
For the config.sample
file, append a section in the config file:
Info |
---|
An example is provided below: |
Code Block | ||||
---|---|---|---|---|
| ||||
[ORACLE]
endpoint_url = {{ ml_ORACLE_endpoint_url }}
access_key = {{ ml_ORACLE_access_key }}
secret_access_key = {{ ml_ORACLE_secret_access_key }}
region_name = {{ ml_ORACLE_region_name }}
bucket_name = {{ ml_ORACLE_bucket_name }} |
Post adding these changes - this repository needs to be updated with the relevant values of the the added configuration variables.