Context
This document details on how to enable any new CSP provider for Manage Learn capabilities which is part of Sunbird Ed
...
Manage Learn capabilities currently support Azure, AWS, OCI, and GCP
ML Core Service interacts with cloud storage for upload/download operation, and all other services(Survey, Projects, and Reports) as well as mobile app apps and portal portals use these APIs for their needs:
Get a Signed URL (To upload assets to the cloud)
Get Downloadable URL
In Mongo DB where all the transactions of projects , and observations are stored, only the relative path of the evidences of assets uploaded by users are is captured. Bucket and CSP details are provided via config to the ML Core service.
In order to add support for any other cloud storage (e.g: Digital Ocean), below steps need to be followed:
ml-core-service:
Git Repos:
...
https://github.com/project-sunbird/
...
ml-core-service/tree/release-5.1.0
Latest branch: release-5.21.0
...
The Service need only configuration change to maintain relative path in database while write operation and return the absolute path for cloud related metadata while read operation.
e.g:
...
Step 1:- Introduce the necessary environment configuration basis the env sample as below (Refer - https://
...
In above example, base url of storage account & bucket name got replaced with a string value "CONTENT_STORAGE_BASE_PATH" configured in cloudstorage_relative_path_prefix_content variable.
...
Override value for below variables under private devops repo (file path: ansible/inventory/<env_name>/Core/common.yml) for new storage account:
cloud_storage_content_bucketname
cloudstorage_replace_absolute_path
cloudstorage_relative_path_prefix
cloudstorage_base_path
valid_cloudstorage_base_urls
Configuration File Reference:
https://github.com/project-sunbird/sunbird-devops/blob/b61a35fad0362ea7eb0bb688ff0bc12ffc811571/ansible/roles/stack-sunbird/templates/content-service_application.conf#L484
After Configuration Change, Deploy the service.
Test Content Create & Read API with some metadata having cloud path (e.g: appIcon)
...
Knowledge-Platform repo:
StorageService.scala → Integration points of collecting to org.sunbird.cloud.storage
library. This SunbirdCloudStage SDK has to support for new CSP providers.
Sunbird cloud-stage-sdk version upgrade location
https://github.com/project-sunbird/knowledge-platform/blob/release-5.2.0/platform-modules/mimetype-manager/pom.xml
Code Block |
---|
<dependency>
<groupId>org.sunbird</groupId>
<artifactId>cloud-store-sdk</artifactId>
<version>1.4.3</version>
</dependency> |
flink jobs:
Git Repo:
github.com/project-sunbird/ml-core-service/blob/release-5.1.0/.env.sample )
Expand | ||
---|---|---|
| ||
|
Step 2 - Define environment key for bucket name for preSignedUrls
and getDownloadableUrl
functions in here - https://github.com/project-sunbird/
...
...
Both jobs uses cloud-storage-sdk for cloud storage operations. So first the sdk need a code change to have support for new cloud storage provider (e.g: OCI).
...
core-service/blob/release-5.1.0/module/cloud-services/files/helper.js
Step 3 - Modify functions of https://github.com/project-sunbird/
...
...
core-service/blob/release-5.1.0/module/files/helper.js to enable new cloud provider
Step 4 - Add support for new cloud provider in the module here - https://github.com/project-sunbird/
...
...
...
service/tree/release-5.1.0/module/cloud-services
Note - Include new library via package.json and keep same function signatures in new module files added
Step 5 - Override value for below variables under private devops repo (file path: ansible/inventory/<env_name>/
...
Core/common.yml) for new storage account:
ml_cloud_
...
Configuration File Reference:
https://github.com/project-sunbird/sunbird-learning-platform/blob/59a59270b5419153b902b3d68165a8b5539f872e/kubernetes/helm_charts/datapipeline_jobs/values.j2#L731
cloudstorage_replace_absolute_path
cloudstorage_relative_path_prefix
cloudstorage_base_path
valid_cloudstorage_base_urls
Configuration File Reference:
https://github.com/project-sunbird/sunbird-learning-platform/blob/59a59270b5419153b902b3d68165a8b5539f872e/kubernetes/helm_charts/datapipeline_jobs/values.j2#L736Build & Deploy both Job
Test Content/Collection Publish Workflow.
Content/Collection should be published successfully.
metadata having cloud storage file reference should be accessible.
e.g: downloadUrl pointing to a file, should be downloadable.
Media service integration (video-streaming)
...
config
Step 6 - After Configuration Change, Deploy the service.
Step 7 - Test the 2 APIs as mentioned under Context section.
...
Adding New Cloud Libraries for data-pipeline
This document details about integration points for any new CSP provider with ml-analytics platform.
Info |
---|
ml-analytics release-5.1.0, latest as on Dec, 2022 |
Few points to note:
ml-analytics-service
has integrated code with Azure, AWS, Oracle and GCP.ml-analytics-service
uses cloud for pre-processed data to be pushed before ingesting into Druid datasources.
In order to add support for any other cloud storage under ml-analytics components, below steps need to be followed:
Git repository:
...
Latest Branch: release-5.1.0
Changes to the cloud_storage
folder:
Add a new blank file with the name of the cloud service provider with the name of the cloud provider (Eg:
ms_azure.py
,oracle.py
). Make sure the file is created inside thecloud_storage
folder in the same hierachy as other files such asms_azure.py
,gcp.py
Import relavant and necessary libraries that can interact with the new cloud provider.
Define a Python Class Object .
Under the intilization method or
__init__
, construct and initlialize variables that identify the -Cloud account ID
Cloud account Key
Cloud account storage container/blob
Any other necessary variable that need to be initiated to interact with the cloud storage (Eg: token, type)
Create a Class-method named
upload_files
that has three (3) arguments -bucketPath
- Accepts the cloud account storage container/bloblocalPath
- Accept the local path of where the file is generatedfileName
- Accepts the name of the file
Then call the upload function from the library.
Info |
---|
An example of the code is given below: |
Code Block | ||||
---|---|---|---|---|
| ||||
### -----Step 2 ----- ###
import os
import boto3
### ---- Step 3 ---- ###
class Oracle:
'''
Class to inititate and upload data in Oracle
'''
### ---- Step 4 ---- ###
def __init__(self, regionName, accessKey, secretAccessKey, endpoint_url, bucketName):
self.oracle = boto3.client(
service_name = 's3',
region_name = regionName,
aws_access_key_id = accessKey,
aws_secret_access_key = secretAccessKey,
endpoint_url = endpoint_url
)
self.bucket = bucketName
### ---- Step 5 ---- ###
def upload_files(self, bucketPath, localPath, fileName):
### ---- Step 6 ---- ###
with open(f"{localPath}/{fileName}", "rb") as file:
self.oracle.upload_fileobj(file, self.bucket, f"{bucketPath}/{fileName}") |
Changes to the cloud.py
file
Import the Library you created in the
cloud.py
fileInside the
MultiCloud
Class - look forupload_to_cloud
methodAdd an
elif
statement that refer to the recently created cloud library-Initialize the cloud library by passing in the necessary parameters
Call the
upload_files
method and pass in these below values:Code Block language py <<name_of_service>>_service.upload_files(bucketPath = blob_Path, localPath = local_Path, fileName = file_Name)
Info |
---|
An example of the code is shown below: |
Code Block | ||||
---|---|---|---|---|
| ||||
### ---- Step 1 ---- ###
from from oracle import Oracle
...
### ---- Step 2 & 3 ---- ###
elif elements == "ORACLE":
oracle_service = Oracle(
regionName = config.get("ORACLE", "region_name"),
accessKey = config.get("ORACLE", "access_key"),
secretAccessKey = config.get("ORACLE", "secret_access_key"),
endpoint_url = config.get("ORACLE", "endpoint_url"),
bucketName = config.get("ORACLE", "bucket_name")
)
oracle_service.upload_files(bucketPath = blob_Path, localPath = local_Path, fileName = file_Name) |
Changes to the config.sample
file
For the config.sample
file, append a section in the config file:
Info |
---|
An example is provided below: |
Code Block | ||||
---|---|---|---|---|
| ||||
[ORACLE]
endpoint_url = {{ ml_ORACLE_endpoint_url }}
access_key = {{ ml_ORACLE_access_key }}
secret_access_key = {{ ml_ORACLE_secret_access_key }}
region_name = {{ ml_ORACLE_region_name }}
bucket_name = {{ ml_ORACLE_bucket_name }} |
Post adding these changes - this repository needs to be updated with the relevant values of the the added configuration variables.