Introduction

As a Sourcing Org Contributor when I try to Add from Library, I will get the most relevant content. Most relevant content is defined based on the match between My textbook and Textbooks in Library.

As a Sourcing Org Contributor, I will get two options against each chapter (unit) in the textbook.
1. (New) Add from Library
2. Add New
(New) As a Sourcing Org Contributor, I will be able to Add from Library by Exploring and by viewing Suggestions.
Add from Library will have two ways: “Explore” (or “Library”) and “Suggestions” (or “Recommended”).
(New) As a Sourcing Org Contributor, I will be able to Explore pre-filtered set of textbooks, chapter (topic) wise.
(New) As a Sourcing Org Contributor, I will be able to Preview selected content quickly.
(New) As a Sourcing Org Contributor, I will Add Selected Content to any chapter (unit) in My Textbook.
(New) As a Sourcing Org Contributor, I will be able to view Suggested content, Preview, and Add to the Selected chapter (unit) in My Textbook.

Problem Statement

Explore: When a textbook- TOC is uploaded, for each chapter/ topic, show similar chapter/ topic in other textbook and the content linked to those topics
1. Based on the filter
2. Search for Chapter/topics in other textbooks based on textbook nodes as query
Suggest: When a textbook- TOC is uploaded, for each chapter/ topic, show 5 MVC Content that can be linked to each chapter/topic
1. Search for enriched MVC Content based on textbook nodes as query
2. Search for enriched MVC Content embeddings based on textbook nodes query as embedding

MVC Content

Create a standard excel format for MVC Content
1. State
2. Type
3. Board [Important]
4. Grade [Important]
5. Subject [Important]
6. Medium [Important]
7. Textbook Name [Important]
8. Chapter No.
9. Chapter Name (Level 1) [Important]
10. Chapter Concept Name (Level 1) [Important]
11. Topic Name (Level 2) [Important]
12. Topic Concept Name (Level 2) [Important]
13. Sub Topic Name (Level 3) [Important]
14. Sub Topic Concept Name (Level 3) [Important]
15. Source [Important]
16. Content URL [Important]
Existing excel data needs to be preprocessed so that standard structure can be achieved.
Please note that one Content URL can have multiple entries.
If the Content URL is multiple then we need to merge that in the Excel. There should be only one row per content URL.
There are many cases where the Content URL is not valid so we will have to ignore those entries.
Columns Field Values mentioned in the excel will be considered as final and rest of the data will be read from Diksha Content Read API

Design

Using Elasticsearch 7.4 to make use of vector indexing for semantic search.
Using strict mapping for better performance.

Implementation Flow

Develop a script which will read the data from google sheets and merge the data into one CSV.
mvc-content-create API should handle
1. Excel (xls)
2. JSON
Develop a new mvc-content-create API which will accept the field and values defined in the excel.
1. Check whether the Content URL is valid or not.
2. Basis the Content URL, extract Content ID and other properties of the content using Content Read API of Diksha.
3. Board, Grade, Subject and Medium data will be picked from the excel and not from the Diksha Content and pass the values to Auto Create Event.
4. Trigger Auto Create Job with some extra parameters in event JSON
  1. textbookname
  2. level1name
  3. level1concept
  4. level2name
  5. level2concept
  6. level3name
  7. level3concept
  8. label (MVC)
  9. source [Diksha, iDream, ToonMasti etc...]
  10. sourceurl
5. Auto Create JOB internally calls the Content Create API of Vidyadaan and passes the appropriate request.
6. Auto Create JOB internally trigger Publish Pipeline of Vidyadaan.
Content Create API of Vidyadaan Changes
1. Change content definition of Neo4J to handle below mentioned additional columns
  1. textbookname
  2. level1name
  3. level1concept
  4. level2name
  5. level2concept
  6. level3name
  7. level3concept
  8. label (MVC)
  9. source [Diksha, iDream, ToonMasti etc...]
  10. sourceurl
2. Content Create API will insert this data in Neo4J
Publish Pipeline Changes
1. Publish Pipeline will insert data in vidyadaan content ES with above additional columns.
2. if the label parameter exists and its value is MVC, it triggers mvc-processor pipeline.
3. We need to insert below mentioned values in MVC ES and Cassandra
  1. textbookname
  2. level1name
  3. level1concept
  4. level2name
  5. level2concept
  6. level3name
  7. level3concept
  8. label (MVC)
  9. source [Diksha, iDream, ToonMasti etc...]
  10. sourceurl

Elastic Search Index Structure:

Name: mvc-content

Property	Data Type	Tokenization	Group	Description

Property	Data Type	Tokenization	Group	Description
name	Text	Yes	core - metadata
description	Text	Yes
mimeType	Text	No
contentType	Text	No
resourceType	Text	No
artifactUrl	Text	No
streamingUrl	Text	No
previewUrl	Text	No
downloadUrl	Text	No
framework	Text	No
board	Text	Yes
medium	Text	Yes
subject	Text	Yes
gradeLevel	Text	Yes
keywords	Text	Yes
source	Text	Yes	source - metadata	URI of the content. This is the public URI to access the source of the MVC.
ml_level1Concepts	Text	Yes	ml - metadata
ml_level2Concepts	Text	Yes
ml_level3Concepts	Text	Yes
ml_contentText	Text	Yes		Text extracted form the pdf, video or ecml Content
ml_keywords	Text	Yes		Keywords identified from ml_contentText
ml_content_text_vector	Dense vector	No		Vector representation of ml_contentText and description using pertained ml model
label	Text	Yes		Tags that represent the Content. ex: MVC

Content-service

We can use the existing code of Content Search API for MVC Search by following two approaches.

Create a new route of MVC reuse in the existing API.
1. Pros
  1. All the existing utilities and dependencies can be reused.
  2. Manageability becomes easy and both the Search API is part of one project.
2. Cons
  1. It could impact the performance, though it is very less.
  2. Deployment of Diksha will impact Vidyadaan application as well.
  3. ES version has to be same for both Diksha and Vidyadaan.
Create a new API for MVC reuse
1. Pros
  1. No impact on existing Diksha Search Service
  2. No dependency on Deployment now, as both are separate services.
  3. Latest ES version can be used for Vidyadaan
2. Cons
  1. Maintainability would be an issue both at Code and DB level.

API Spec:

Request:
1. HTTP Verb: POST
2. URL: "'https://dock.sunbirded.org/api/mvc/v3/search'"
3. Header Parameters
  1. Content-Type: “application/json“
  2. Authorization: “Bearer <auth-token>“
4. Request Parameters:
  1. mode: soft/hard
  2. filters
  3. softConstraints
  4. vector - search

{
  "request": {
    "mode": "explore",
    "filters": {
      "medium": [
        "Telegu"
      ],
      "gradeLevel": [
        "Class 4",
        "Class 5",
        "Class 6"
      ],
      "status": [
        "Live"
      ],
      "textbookName": [
        "Science"
      ],
      "level1Name": [
        "Sorting Materials Into Groups"
      ],
      "level1Concept": [
        "Materials"
      ],
      "level2Name": [
        "Objects Around Us"
      ],
      "level2Concept": [
        "Various Objects"
      ]
    }
  }
}

Response

{
  "id": "ekstep.mvc-composite-search.search",
  "ver": "1.0",
  "ts": "2020-05-21T22:23:43ZZ",
  "params": {
    "resmsgid": "c1658c85-e0a1-41ed-bd9a-72df223f505d",
    "msgid": null,
    "err": null,
    "status": "successful",
    "errmsg": null
  },
  "responseCode": "OK",
  "result": {
    "count": 3,
    "content": [
      {
        "organisation": [
          "Vidya2"
        ],
        "channel": "sunbird",
        "framework": "NCF",
        "board": "State(Tamil Nadu)",
        "subject": "English",
        "medium": [
          "Telegu"
        ],
        "gradeLevel": [
          "Class 4",
          "Class 5",
          "Class 6"
        ],
        "name": "15_April_ETB",
        "description": "Enter description for TextBook",
        "language": [
          "English"
        ],
        "appId": "dev.dock.portal",
        "contentEncoding": "gzip",
        "identifier": "do_113025640118272000173",
        "node_id": 5244,
        "nodeType": "DATA_NODE",
        "mimeType": "application/vnd.ekstep.content-collection",
        "resourceType": "Book",
        "contentType": [
          "TextBook"
        ],
        "objectType": "Content",
        "textbookName": [
          "Science"
        ],
        "level1Name": [
          "Sorting Materials Into Groups"
        ],
        "level1Concept": [
          "Materials"
        ],
        "level2Name": [
          "Objects Around Us"
        ],
        "level2Concept": [
          "Various Objects"
        ]
      }
    ]
  }
}

ML Workbench api:

Request:

POST /daggit/submit
{
	"request":{
		"input":{
			"APP_HOME": "/daggit_home/content_reuse",
			"content":[{
				"subject": "Science",
				"downloadUrl": "https://ntpproductionall.blob.core.windows.net/ntp-content-production/ecar_files/do_312533255910883328118977/muulyvaan-yogdaankrtaa-10th-vijnyaan_1532071348607_do_312533255910883328118977_2.0.ecar",
				"language": ["English"],
				"mimeType": "application/vnd.ekstep.ecml-archive",
				"objectType": "Content",
				"gradeLevel": ["Class 10"],
				"artifactUrl": "https://ntpproductionall.blob.core.windows.net/ntp-content-production/content/do_312533255910883328118977/artifact/1529938853455_do_312533255910883328118977.zip",
				"contentType": "Resource",
				"identifier": "do_312533255910883328118977",
				"graph_id": "domain",
				"nodeType": "DATA_NODE",
				"node_id": 575061}, 
				 {...},...]
		},
		"job":"diksha_content_keyword_tagging"
	}
}

Response:

Text Vectorisation API:

Request:

Response:

action = getContentVec

MVC Content Create API

Request:

To create MVC content using JSON

To create MVC content using Excel

Response:

MVC Processor Samza Job - Event JSON

Stage 1: This job will get triggered from the Publish pipeline, if the Label is “MVC“

Stage 2: ML Keyword Extraction API will be triggering this event

Stage 3: ML Vectorization API will be triggering this event

MVC Cassandra Table Modification

Sunbird Design

[Design Brainstorm] - MVC - Content Reuse

Introduction

Problem Statement

MVC Content

Design

Implementation Flow

Elastic Search Index Structure:

Content-service

API Spec:

ML Workbench api:

Text Vectorisation API:

MVC Content Create API

MVC Processor Samza Job - Event JSON