[Design Brainstorm] - MVC - Content Reuse

Introduction

As a Sourcing Org Contributor when I try to Add from Library, I will get the most relevant content. Most relevant content is defined based on the match between My textbook and Textbooks in Library.

  1. As a Sourcing Org Contributor, I will get two options against each chapter (unit) in the textbook.

    1. (New) Add from Library

    2. Add New

  2. (New) As a Sourcing Org Contributor, I will be able to Add from Library by Exploring and by viewing Suggestions.

  3. Add from Library will have two ways: “Explore” (or “Library”) and “Suggestions” (or “Recommended”).

  4. (New) As a Sourcing Org Contributor, I will be able to Explore pre-filtered set of textbooks, chapter (topic) wise.

  5. (New) As a Sourcing Org Contributor, I will be able to Preview selected content quickly.

  6. (New) As a Sourcing Org Contributor, I will Add Selected Content to any chapter (unit) in My Textbook.

  7. (New) As a Sourcing Org Contributor, I will be able to view Suggested content, Preview, and Add to the Selected chapter (unit) in My Textbook.

Problem Statement

  1. Explore: When a textbook- TOC is uploaded, for each chapter/ topic, show similar chapter/ topic in other textbook and the content linked to those topics

    1. Based on the filter

    2. Search for Chapter/topics in other textbooks based on textbook nodes as query

  2. Suggest: When a textbook- TOC is uploaded, for each chapter/ topic, show 5 MVC Content that can be linked to each chapter/topic

    1. Search for enriched MVC Content based on textbook nodes as query

    2. Search for enriched MVC Content embeddings based on textbook nodes query as embedding

MVC Content 

  1. Create a standard excel format for MVC Content

    1. State

    2. Type   

    3. Board [Important]

    4. Grade [Important] 

    5. Subject [Important] 

    6. Medium [Important] 

    7. Textbook Name [Important]

    8. Chapter No. 

    9. Chapter Name (Level 1) [Important]

    10. Chapter Concept Name (Level 1) [Important]

    11. Topic Name (Level 2) [Important]

    12. Topic Concept Name (Level 2) [Important]

    13. Sub Topic Name (Level 3) [Important]

    14. Sub Topic Concept Name (Level 3) [Important]

    15. Source [Important]

    16. Content URL [Important]

  2. Existing excel data needs to be preprocessed so that standard structure can be achieved.

  3. Please note that one Content URL can have multiple entries.

  4. If the Content URL is multiple then we need to merge that in the Excel. There should be only one row per content URL. 

  5. There are many cases where the Content URL is not valid so we will have to ignore those entries.

  6. Columns Field Values mentioned in the excel will be considered as final and rest of the data will be read from Diksha Content Read API

Design

 

  • Using Elasticsearch 7.4 to make use of vector indexing for semantic search.

  • Using strict mapping for better performance.

Implementation Flow

  1. Develop a script which will read the data from google sheets and merge the data into one CSV.

  2. mvc-content-create API should handle

    1. Excel (xls)

    2. JSON

  3. Develop a new mvc-content-create API which will accept the field and values defined in the excel.

    1. Check whether the Content URL is valid or not.

    2. Basis the Content URL, extract Content ID and other properties of the content using Content Read API of Diksha.

    3. Board, Grade, Subject and Medium data will be picked from the excel and not from the Diksha Content and pass the values to Auto Create Event.

    4. Trigger Auto Create Job with some extra parameters in event JSON

      1. textbookname

      2. level1name

      3. level1concept

      4. level2name

      5. level2concept

      6. level3name

      7. level3concept

      8. label (MVC)

      9. source [Diksha, iDream, ToonMasti etc...]

      10. sourceurl

    5. Auto Create JOB internally calls the Content Create API of Vidyadaan and passes the appropriate request.

    6. Auto Create JOB internally trigger Publish Pipeline of Vidyadaan.

  4. Content Create API of Vidyadaan Changes

    1. Change content definition of Neo4J to handle below mentioned additional columns

      1. textbookname

      2. level1name

      3. level1concept

      4. level2name

      5. level2concept

      6. level3name

      7. level3concept

      8. label (MVC)

      9. source [Diksha, iDream, ToonMasti etc...]

      10. sourceurl

    2. Content Create API will insert this data in Neo4J

  5. Publish Pipeline Changes

    1. Publish Pipeline will insert data in vidyadaan content ES with above additional columns.

    2. if the label parameter exists and its value is MVC, it triggers mvc-processor pipeline.

    3. We need to insert below mentioned values in MVC ES and Cassandra

      1. textbookname

      2. level1name

      3. level1concept

      4. level2name

      5. level2concept

      6. level3name

      7. level3concept

      8. label (MVC)

      9. source [Diksha, iDream, ToonMasti etc...]

      10. sourceurl

Elastic Search Index Structure:

Name: mvc-content

Property

Data Type

Tokenization

Group

Description

Property

Data Type

Tokenization

Group

Description

name

Text

Yes

core - metadata

 

description

Text

Yes

 

mimeType

Text

No

 

contentType

Text

No

 

resourceType

Text

No

 

artifactUrl

Text

No

 

streamingUrl

Text

No

 

previewUrl

Text

No

 

downloadUrl

Text

No

 

framework

Text

No

 

board

Text

Yes

 

medium

Text

Yes

 

subject

Text

Yes

 

gradeLevel

Text

Yes

 

keywords

Text

Yes

 

source

Text

Yes

source - metadata

URI of the content. This is the public URI to access the source of the MVC.

ml_level1Concepts

Text

Yes

ml - metadata

 

ml_level2Concepts

Text

Yes

 

ml_level3Concepts

Text

Yes

 

ml_contentText

Text

Yes

Text extracted form the pdf, video or ecml Content

ml_keywords

Text

Yes

 

Keywords identified from ml_contentText

ml_content_text_vector

Dense vector

No

 

Vector representation of ml_contentText and description using pertained ml model

label

Text

Yes

 

Tags that represent the Content. ex: MVC

 

Content-service

We can use the existing code of Content Search API for MVC Search by following two approaches.

  1. Create a new route of MVC reuse in the existing API.

    1. Pros

      1. All the existing utilities and dependencies can be reused.

      2. Manageability becomes easy and both the Search API is part of one project.

    2. Cons

      1. It could impact the performance, though it is very less.

      2. Deployment of Diksha will impact Vidyadaan application as well.

      3. ES version has to be same for both Diksha and Vidyadaan.

  2. Create a new API for MVC reuse

    1. Pros

      1. No impact on existing Diksha Search Service

      2. No dependency on Deployment now, as both are separate services.

      3. Latest ES version can be used for Vidyadaan

    2. Cons

      1. Maintainability would be an issue both at Code and DB level.

 

API Spec:

  1. Request:

    1. HTTP Verb: POST

    2. URL: "'https://dock.sunbirded.org/api/mvc/v3/search'"

    3. Header Parameters

      1. Content-Type: “application/json“

      2. Authorization: “Bearer <auth-token>“

    4. Request Parameters:

      1. mode: soft/hard

      2. filters

      3. softConstraints

      4. vector - search

{ "request": { "mode": "explore", "filters": { "medium": [ "Telegu" ], "gradeLevel": [ "Class 4", "Class 5", "Class 6" ], "status": [ "Live" ], "textbookName": [ "Science" ], "level1Name": [ "Sorting Materials Into Groups" ], "level1Concept": [ "Materials" ], "level2Name": [ "Objects Around Us" ], "level2Concept": [ "Various Objects" ] } } }
  1. Response

{ "id": "ekstep.mvc-composite-search.search", "ver": "1.0", "ts": "2020-05-21T22:23:43ZZ", "params": { "resmsgid": "c1658c85-e0a1-41ed-bd9a-72df223f505d", "msgid": null, "err": null, "status": "successful", "errmsg": null }, "responseCode": "OK", "result": { "count": 3, "content": [ { "organisation": [ "Vidya2" ], "channel": "sunbird", "framework": "NCF", "board": "State(Tamil Nadu)", "subject": "English", "medium": [ "Telegu" ], "gradeLevel": [ "Class 4", "Class 5", "Class 6" ], "name": "15_April_ETB", "description": "Enter description for TextBook", "language": [ "English" ], "appId": "dev.dock.portal", "contentEncoding": "gzip", "identifier": "do_113025640118272000173", "node_id": 5244, "nodeType": "DATA_NODE", "mimeType": "application/vnd.ekstep.content-collection", "resourceType": "Book", "contentType": [ "TextBook" ], "objectType": "Content", "textbookName": [ "Science" ], "level1Name": [ "Sorting Materials Into Groups" ], "level1Concept": [ "Materials" ], "level2Name": [ "Objects Around Us" ], "level2Concept": [ "Various Objects" ] } ] } }

 


ML Workbench api:

Request:

POST /daggit/submit { "request":{ "input":{ "APP_HOME": "/daggit_home/content_reuse", "content":[{ "subject": "Science", "downloadUrl": "https://ntpproductionall.blob.core.windows.net/ntp-content-production/ecar_files/do_312533255910883328118977/muulyvaan-yogdaankrtaa-10th-vijnyaan_1532071348607_do_312533255910883328118977_2.0.ecar", "language": ["English"], "mimeType": "application/vnd.ekstep.ecml-archive", "objectType": "Content", "gradeLevel": ["Class 10"], "artifactUrl": "https://ntpproductionall.blob.core.windows.net/ntp-content-production/content/do_312533255910883328118977/artifact/1529938853455_do_312533255910883328118977.zip", "contentType": "Resource", "identifier": "do_312533255910883328118977", "graph_id": "domain", "nodeType": "DATA_NODE", "node_id": 575061}, {...},...] }, "job":"diksha_content_keyword_tagging" } }

Response:

 


Text Vectorisation API:

Request:

Response:

action = getContentVec

 

 

MVC Content Create API

Request:

  • To create MVC content using JSON

  • To create MVC content using Excel

 

Response:

MVC Processor Samza Job - Event JSON

Stage 1: This job will get triggered from the Publish pipeline, if the Label is “MVC“

Stage 2: ML Keyword Extraction API will be triggering this event

Stage 3: ML Vectorization API will be triggering this event

MVC Cassandra Table Modification