Knowlg - Add New CSP Storage - Implementation Changes & Testing

Background

This document details the integration points for any new CSP provider with the Sunbird-Knowledge platform. (Latest as of December 2022. Sunbird-Knowlg release-5.2.0)

  • In the above diagram, Knowlg components marked in yellow colour shows its connections with cloud storage and marked in orange colour shows it’s connections with cloud media service (used to generate video streaming).

  • Knowlg has verified with Azure, AWS and GCP(GCP MediaService integration is pending).

  • Knowlg has below flink jobs which interact with cloud storage for upload/download operation:

    • asset-enrichment

    • content-publish

    • qrcode-image-generator

    • live-node-publisher

  • knowlg-api-service transforms cloud related metadata (e.g: downloadUrl, appIcon, etc) but doesn't interact with cloud storage.

    • e.g: the service convert cloud specific path (absolute path) to cloud neutral path (relative path) and vice versa.

  • In order to add support for any other cloud storage (e.g: OCI) under Knowlg components, below steps need to be followed:

knowlg-api-service:

Git Repos:

https://github.com/project-sunbird/knowledge-platform

https://github.com/project-sunbird/sunbird-learning-platform

Latest branch: release-5.2.0

 

  • The Service need only configuration change to maintain relative path in database while write operation and return the absolute path for cloud related metadata while read operation.

Config Changes:

  • Override value for below variables under private devops repo (file path: ansible/inventory/<env_name>/Core/common.yml) for new storage account:

cloud_storage_content_bucketname
cloudstorage_replace_absolute_path
cloudstorage_relative_path_prefix
cloudstorage_base_path
valid_cloudstorage_base_urls

Configuration File Reference:
https://github.com/project-sunbird/sunbird-devops/blob/b61a35fad0362ea7eb0bb688ff0bc12ffc811571/ansible/roles/stack-sunbird/templates/content-service_application.conf#L484

Deployment:

  • After Configuration Change, Deploy the content and taxonomy service.

Testing:

  • Test Content Create & Read API with some metadata having cloud path (e.g: appIcon)


The key integration points present at the below files

Knowledge-Platform repo:

StorageService.scala → Integration points of collecting to org.sunbird.cloud.storage library. This SunbirdCloudStage SDK has to support for new CSP providers.

Sunbird cloud-storage-sdk version upgrade location. Knowlg is using the 1.4.3 version.

<dependency> <groupId>org.sunbird</groupId> <artifactId>cloud-store-sdk</artifactId> <version>1.4.3</version> </dependency>

Git Repo:

Code Changes:

  • Flink jobs uses cloud-storage-sdk for cloud storage operations. So first the sdk need a code change to have support for new cloud storage provider (e.g: OCI).

  • Sunbird cloud-storage-sdk is a utility which provides methods for cloud upload/download operations.

  • Link to cloud-storage-sdk git repo:

  • Once the cloud-storage-sdk new version is published in maven central, we need to update the version in jobs-core module

  • Knowlg flink jobs are using the cloud-storage-sdk version 1.4.3

Configuration Changes:

  • Override value for below variables under private devops repo (file path: ansible/inventory/<env_name>/Knowledge-Platform/common.yml) for new storage account:

Variable Name

Description

Example Value

Variable Name

Description

Example Value

cloud_public_storage_accountname

unique key to identify the storage account

sunbirddevbbpublic

cloud_public_storage_endpoint

endpoint of a cloud storage account.

““

cloud_public_storage_secret

secret key to access the storage account.

NA

cloud_service_provider

storage provider name

azure

cloud_storage_content_bucketname

bucket/container name where data will be stored or read from.

sunbird-content-dev

cloud_storage_dial_bucketname

bucket/container name where dial data will be stored or read from.

dial

cloudstorage_base_path

this variable will have either CNAME or base url of cloud storage account

"https://sunbirddevbbpublic.blob.core.windows.net"

cloudstorage_relative_path_prefix_content

this variable will have string value which will replace cloud storage base path & bucket name

"CONTENT_STORAGE_BASE_PATH"

cloudstorage_relative_path_prefix_dial

this variable will have string value which will replace cloud storage base path & bucket name

"DIAL_STORAGE_BASE_PATH"

cloudstorage_replace_absolute_path

This flag is used to enable or disable the cloud-agnostic data feature.

false

cloudstorage.metadata.list

this variable will have list of metadata which has cloud storage url

'["appIcon", "artifactUrl", "posterImage", "previewUrl", "thumbnail", "assetsMap", "certTemplate", "itemSetPreviewUrl", "grayScaleAppIcon", "sourceURL", "variants", "downloadUrl", "streamingUrl", "toc_url", "data", "question", "solutions", "editorState", "media", "pdfUrl", "transcripts"]'

valid_cloudstorage_base_urls

This variable will have a list of urls that should be replaced by the CONTENT_STORAGE_BASE_PATH string if the cloud-agnostic feature is enabled.

["https://sunbirddevbbpublic.blob.core.windows.net"]

  • Configuration File Reference:

Deployment:

  • Build & Deploy both Job

Testing:

  • Test Content/Collection Publish Workflow.

    • Content/Collection should be published successfully.

    • metadata having cloud storage file reference should be accessible.

      • e.g: downloadUrl pointing to a file, should be downloadable.

Media service integration (video-streaming)

Git Repo:

  • To generate video streaming we have video-streaming-generator flink job.

  • Flink job uses the interface IMediaService and corresponding cloud media service implementation.

  • Currently this job is having the implementation for azure media service and AWS elemental media convert. Similarly other service provide can be also implemented.

  • To implement new service provider, implement the IMediaService for the respective service provider then override the submitJob, getJob and getStreamingPaths

  • Add the respective config in here.