Context

This document details on how to enable any new CSP provider for Manage Learn capabilities which is part of Sunbird Ed

...

Manage Learn capabilities currently support Azure, AWS, OCI, and GCP
ML Core Service interacts with cloud storage for upload/download operation, and all other services(Survey, Projects, and Reports) as well as mobile app apps and portal portals use these APIs for their needs:
- Get a Signed URL (To upload assets to the cloud)
- Get Downloadable URL
In Mongo DB where all the transactions of projects , and observations are stored, only the relative path of the evidences of assets uploaded by users are is captured. Bucket and CSP details are provided via config to the ML Core service.
In order to add support for any other cloud storage (e.g: Digital Ocean), below steps need to be followed:

ml-core-service:

Git Repos:

...

https://github.com/project-sunbird/

...

ml-core-service/tree/release-5.1.0

Latest branch: release-5.21.0

...

The Service need only configuration change to maintain relative path in database while write operation and return the absolute path for cloud related metadata while read operation.

e.g:

...

Step 1:- Introduce the necessary environment configuration basis the env sample as below (Refer - https://

...

In above example, base url of storage account & bucket name got replaced with a string value "CONTENT_STORAGE_BASE_PATH" configured in cloudstorage_relative_path_prefix_content variable.

...

Override value for below variables under private devops repo (file path: ansible/inventory/<env_name>/Core/common.yml) for new storage account:

cloud_storage_content_bucketname
cloudstorage_replace_absolute_path
cloudstorage_relative_path_prefix
cloudstorage_base_path
valid_cloudstorage_base_urls

Configuration File Reference:
https://github.com/project-sunbird/sunbird-devops/blob/b61a35fad0362ea7eb0bb688ff0bc12ffc811571/ansible/roles/stack-sunbird/templates/content-service_application.conf#L484

After Configuration Change, Deploy the service.
Test Content Create & Read API with some metadata having cloud path (e.g: appIcon)

...

Knowledge-Platform repo:

StorageService.scala → Integration points of collecting to org.sunbird.cloud.storage library. This SunbirdCloudStage SDK has to support for new CSP providers.

Sunbird cloud-stage-sdk version upgrade location
https://github.com/project-sunbird/knowledge-platform/blob/release-5.2.0/platform-modules/mimetype-manager/pom.xml

Code Block
<dependency> <groupId>org.sunbird</groupId> <artifactId>cloud-store-sdk</artifactId> <version>1.4.3</version> </dependency>

flink jobs:

Git Repo:

github.com/project-sunbird/ml-core-service/blob/release-5.1.0/.env.sample )

Expand

title	Sample ENV Configuration for AWS/GCP/OCI/AZURE

Code Block

#Cloud Storage Configuration
CLOUD_STORAGE = "AWS/GC/AZURE/OCI"                                                  // Cloud storage provider.

# Google Cloud Configuration
GCP_PATH = "./generics/helpers/credentials/storage.json"                        // Path to the the Google cloud authentication key
GCP_BUCKET_NAME = "gcp bucket name"                                             // Google cloud bucket name 

# Azure Cloud Configuration
AZURE_ACCOUNT_NAME = "AZURE_KEY"                                                // Azure account name
AZURE_ACCOUNT_KEY = "Ih..............NBN"                                       // Azure account key
AZURE_STORAGE_CONTAINER = "Azure_bucket"                                        // Azure container/bucket name

# AWS Cloud Configuration
AWS_ACCESS_KEY_ID = "AK...........WA"                                           // Aws cloud storage access key id
AWS_SECRET_ACCESS_KEY = "QB......................9sB"                           // Aws cloud storage access key
AWS_BUCKET_NAME = "aws bucket name"                                             // Aws cloud storage bucket name
AWS_BUCKET_REGION = "ap-south-1"                                                // Aws cloud storgae region
AWS_BUCKET_ENDPOINT = "s3.ap-south-1.amazonaws.com"                             // Aws cloud storage api's endpoint

# Oracle Cloud Configuration                                                    
OCI_ACCESS_KEY_ID = '23b90..............d01d'                                   // Oracle cloud storage access key Id
OCI_SECRET_ACCESS_KEY = '22levMw5Ci............SmNE='                           // Oracle cloud storage secret access key 
OCI_BUCKET_NAME = 'oracle cloud bucket name'                                    // Oracle cloud bucket name
OCI_BUCKET_REGION = 'ap-hyderabad-1'                                            // Oracle cloud bucket region
OCI_BUCKET_ENDPOINT = 'https://pmt5.compat.storage.ap-h1.oraclecloud.com'       // Oracle cloud bucket endPoint

Step 2 - Define environment key for bucket name for preSignedUrls and getDownloadableUrl functions in here - https://github.com/project-sunbird/

...

ml-

...

Both jobs uses cloud-storage-sdk for cloud storage operations. So first the sdk need a code change to have support for new cloud storage provider (e.g: OCI).

...

core-service/blob/release-5.1.0/module/cloud-services/files/helper.js

Step 3 - Modify functions of https://github.com/project-sunbird/

...

ml-

...

core-service/blob/release-5.1.0/module/files/helper.js to enable new cloud provider

Step 4 - Add support for new cloud provider in the module here - https://github.com/project-sunbird/

...

ml-

...

core-

...

service/tree/release-5.1.0/module/cloud-services

Note - Include new library via package.json and keep same function signatures in new module files added

Step 5 - Override value for below variables under private devops repo (file path: ansible/inventory/<env_name>/

...

Core/common.yml) for new storage account:

ml_cloud_

...

Configuration File Reference:
https://github.com/project-sunbird/sunbird-learning-platform/blob/59a59270b5419153b902b3d68165a8b5539f872e/kubernetes/helm_charts/datapipeline_jobs/values.j2#L731

cloudstorage_replace_absolute_path
cloudstorage_relative_path_prefix
cloudstorage_base_path
valid_cloudstorage_base_urls

Configuration File Reference:
https://github.com/project-sunbird/sunbird-learning-platform/blob/59a59270b5419153b902b3d68165a8b5539f872e/kubernetes/helm_charts/datapipeline_jobs/values.j2#L736
Build & Deploy both Job
Test Content/Collection Publish Workflow.
- Content/Collection should be published successfully.
- metadata having cloud storage file reference should be accessible.
  - e.g: downloadUrl pointing to a file, should be downloadable.

Media service integration (video-streaming)

...

config

Step 6 - After Configuration Change, Deploy the service.

Step 7 - Test the 2 APIs as mentioned under Context section.

...

Adding New Cloud Libraries for data-pipeline

This document details about integration points for any new CSP provider with ml-analytics platform.

Info
ml-analytics release-5.1.0, latest as on Dec, 2022

Few points to note:

ml-analytics-service has integrated code with Azure, AWS, Oracle and GCP.
ml-analytics-service uses cloud for pre-processed data to be pushed before ingesting into Druid datasources.

In order to add support for any other cloud storage under ml-analytics components, below steps need to be followed:

Git repository:

...

Latest Branch: release-5.1.0

Changes to the `cloud_storage` folder:

Add a new blank file with the name of the cloud service provider with the name of the cloud provider (Eg: ms_azure.py, oracle.py). Make sure the file is created inside the cloud_storage folder in the same hierachy as other files such as ms_azure.py, gcp.py
Import relavant and necessary libraries that can interact with the new cloud provider.
Define a Python Class Object .
Under the intilization method or __init__, construct and initlialize variables that identify the -
1. Cloud account ID
2. Cloud account Key
3. Cloud account storage container/blob
4. Any other necessary variable that need to be initiated to interact with the cloud storage (Eg: token, type)
Create a Class-method named upload_files that has three (3) arguments -
1. bucketPath - Accepts the cloud account storage container/blob
2. localPath- Accept the local path of where the file is generated
3. fileName- Accepts the name of the file
Then call the upload function from the library.

Info
An example of the code is given below:

Code Block

breakoutMode	wide
language	py

### -----Step 2 ----- ###
import os
import boto3

### ---- Step 3 ---- ###
class Oracle:
    '''
    Class to inititate and upload data in Oracle
    '''

### ---- Step 4 ---- ###
    def __init__(self, regionName, accessKey, secretAccessKey, endpoint_url, bucketName):
        self.oracle = boto3.client(
            service_name = 's3',
            region_name = regionName,
            aws_access_key_id = accessKey,
            aws_secret_access_key = secretAccessKey,
            endpoint_url = endpoint_url
        )
        self.bucket = bucketName

### ---- Step 5 ---- ###        
    def upload_files(self, bucketPath, localPath, fileName):
    
### ---- Step 6 ---- ###        
        with open(f"{localPath}/{fileName}", "rb") as file:
            self.oracle.upload_fileobj(file, self.bucket, f"{bucketPath}/{fileName}")

Changes to the `cloud.py` file

Import the Library you created in the cloud.py file
Inside the MultiCloud Class - look for upload_to_cloud method
Add an elif statement that refer to the recently created cloud library-
- Initialize the cloud library by passing in the necessary parameters
- Call the upload_files method and pass in these below values:
  Code Block
  language py
  <<name_of_service>>_service.upload_files(bucketPath = blob_Path, localPath = local_Path, fileName = file_Name)

Info
An example of the code is shown below:

Code Block

breakoutMode	wide
language	py

### ---- Step 1 ---- ###
from from oracle import Oracle
...

### ---- Step 2 & 3 ---- ### 
elif elements == "ORACLE":
                oracle_service = Oracle(
                    regionName = config.get("ORACLE", "region_name"),
                    accessKey = config.get("ORACLE", "access_key"),
                    secretAccessKey = config.get("ORACLE", "secret_access_key"),
                    endpoint_url = config.get("ORACLE", "endpoint_url"),
                    bucketName = config.get("ORACLE", "bucket_name")
                )
                oracle_service.upload_files(bucketPath = blob_Path, localPath = local_Path, fileName = file_Name)

Changes to the `config.sample` file

For the config.sample file, append a section in the config file:

Info
An example is provided below:

Code Block

breakoutMode	wide
language	text

[ORACLE]

endpoint_url = {{ ml_ORACLE_endpoint_url }}

access_key = {{ ml_ORACLE_access_key }}

secret_access_key = {{ ml_ORACLE_secret_access_key }}

region_name = {{ ml_ORACLE_region_name }}

bucket_name = {{ ml_ORACLE_bucket_name }}

Post adding these changes - this repository needs to be updated with the relevant values of the the added configuration variables.

Versions Compared

Old Version 2

New Version Current

Key

Context

ml-core-service:

Git Repos:

https://github.com/project-sunbird/

ml-core-service/tree/release-5.1.0

flink jobs:

Git Repo:

Media service integration (video-streaming)

Adding New Cloud Libraries for data-pipeline

Git repository:

Latest Branch: release-5.1.0

Changes to the `cloud_storage` folder:

Changes to the `cloud.py` file

Changes to the `config.sample` file

Page Comparison

Versions Compared

Old Version 2

New Version Current

Key

ml-core-service:

Git Repos:

https://github.com/project-sunbird/

ml-core-service/tree/release-5.1.0

flink jobs:

Git Repo:

Media service integration (video-streaming)

Adding New Cloud Libraries for data-pipeline

Git repository:

Latest Branch: release-5.1.0

Changes to the cloud_storage folder:

Changes to the cloud.py file

Changes to the config.sample file

Changes to the `cloud_storage` folder:

Changes to the `cloud.py` file

Changes to the `config.sample` file