Background
Problem: Questions are fundamental to learning -- both in terms of assessing learner's progress as well as in terms of generating curiosity. Boards like CBSE have repository of questions, which are so far accessible to just the affiliated schools and a select few, and in a closed system. How can we open up these assets to all, and provide value added services on the top.
Energised Question Bank (EQB) envisions to have a repository of questions and related data, make them available to different applications, offer value added services to Teachers, Parents, and Learners, among other concerned stakeholder in the educational ecosystem.
It amounts to treating Question as a first class citizen in the platform, and develop services around it.
Sample UseCase: In a Textbook, for each chapter, create a packet containing one mark and five mark questions that have appeared in previous exams, and link them to the chapter. This will allow learners to prepare for exams in a targeted fashion - a digital twin of a guide book available in the market.
Few Definitions and Notions:
- Question: A question is assessment item given to a learner to assess a learning objective. In simple terms, a question is what is given to a student for the purpose of assessing a student's proficiency in a concept. Example: Explain the benefits of ground water recharging?
- Question Set: A question set a collection of questions having a certain common characteristic. That common characteristic is defined in terms of the question meta data. Example: set of all questions having one mark.
- Question Paper: It is an array (collection) of Question Sets, where additional conditions/restrictions are the placed at the individual member Question Sets. For example, give ten, 1 mark questions, followed by 5 five mark questions, which all must have appeared in exams.
- Answer: A workout, in sufficient details, that satisfies what is asked. In the typical nomenclature, it is what a student provides in response to a question. It could just a correct option in a MCQ, a single word or a phrase in the FTB, an 150 word essay in a creative writing question.
- Marking Scheme: A Marking Scheme is a set of hints or key points, along with grading scheme, that an evaluators looks for when grading assessments. Primarily, Marking Scheme is meant for a teacher (evaluator) and not a student.
Ingestion Flow :
- Extraction Phase - Convert Unstructured data to Structured data: Questions, Answers, Marking Schemes, Meta Data could be available in different forms such as a PDFs, Word Documents, Scanned PDFs, possibly even in databases. Ingest all such (unstructured) data, and extract relevant data and write that data against a specific suitable QML compliant schema. Extracting the data as per the schema, may be human generated or algorithm derived. Source data is now ingested into the platform, and is ready for auto-curation (validation).
- Auto Curation (Pre-Validation) - Generate Quality Metadata of the Extracted data. Some or part of the the extracted data may require validation by an SME. However, validating every and all extracted data might be too cumbersome for a human in the loop. In this phase, additional machine derived quality meta data will be used to prioritize the effort required by the human in the loop.
- Curation (Validation) - Validate Extracted data. An expert or a designated curator can validate the extracted data, and change the status of the extracted data into one of the three states: accepted (ready for publishing), modified (and ready for publishing) and rejected (not suitable for publishing). At the end of this phase, extracted data is curated, and the publishable data now is available for downstream consumption.
- Ingestion - Ingest the curated data. Once data is in a published form, write it to a DB like Cassandra
Question Set Creation Flow:
- Specify the Intent of the question set: The purpose of the question set creation will be concretized by filling the query fields. In effect, Intent is nothing but a query against the question bank with agreed upon meta data. Several predefined purpose templates will be given. A user can select an existing one.
- Edit the Intent: A concretized query can be edited (UI component)
- Submit the intent: Fire the query
- Edit the responses: Accept or Reject the items in the result set at item level or set level
Question Paper Creation Flow:
- Same as question set creation flow, except that additional intent fields are required. For example, give 10% weight to Chapter one, and they should go to Section A of the Exam Set.
Assigning Question Sets to Textbook Flow
Pre-requisites: Assume that Textbook Framework and Question Meta data are aligned at the Taxonomy level. Different entry points can be enabled to create Textbooks that are linked with Question Sets.
Two data points are required to create an Energised Question Bank: 1) Textbook Spine 2) Question Set Logic. Once these two pieces are available, a Textbook can be created with links to Question Sets.
- Automated Textbook Creation:
- Provide Textbook Spine as a CSV or create these spines for every Framework available in the platform. DIKSHA Implementation can create Textbook Spines via APIs or via Portal UI
- Provide a list of available Pre-baked Question Set Intent Blueprints.
- For every entry in the Spine, and the preselected list of Blueprints, fire the query, get the question sets, create the resources, and link them
- Once an energised question bank is created, a user can edit it.
- EQB Creation via eVolve:
- Current eVolve UI for creating Textbooks and linking Teaching Content can be leveraged (Rayulu you can add more specifics here)
- Semi Automated: DIKSHA Implementation Team can create Textbooks via APIs or portal UI. From backend, EQB can be auto created, made available for editing. eVolve UI can be enhanced to edit the Intent configuration and select, deselect the items in the result set.
Schema and Architecture Details:
Overall System Level View
CureIt: Ingestion Pipeline
CureIt : Behavior
- Schema is defined for the source type (dimensions are specified)
- Information is extracted against the "meaning" of the dimension. Example: "difficulty" is a dimension of the question. It needs to be extracted if not available in an consumable form
- By default, all dimensions are draft state.
- During pre-validation phase, a set of rules (specified a human or learnt by the system) are applied, which are helpful in validating the extraction step. For example:
- If a dimension is filled by machine, confidence score is a derived score which can be used to prioritise for a human-in-the-loop
- We suspect that the image presented in the question is not legible or crossed the boundaries, but there is no easy way to fix it. Present the image to human in the loop and ask whether it is presentable or not
- We (system) may think that one Marking Scheme is suitable for presentation as an Answer. Let the human in the loop validate it.
- Human-in-the-loop, prioritizes the validation tasks based on "interest". A Maths teacher can only select Maths questions and only provide Answers.
- Different kinds of reviewers can focus on specific tasks (of the validation). Not everybody needs to do everything
- Eventually, every record moves from a draft state to either draft or published or reject state
- Every individual dimension also goes through the same states except that unlike at record level, individual dimension can only be either published or rejected state
CureIt: EQB Specifics
- Question Metadata
- from CBSE hard disk: subject, medium, grade, difficulty, blooms level, question type, marks, marking scheme, answer
- Sunbird Taxonomy map: topic, subtopic
- additional:
- Question Data
- urls: question image, marking scheme image, bundle pdf, png
- Validation Meta Data (high is better, low requires attention)
- needs cropping
- needs answer
- marking scheme can be graduated to answer
- ...
SetIt: Ingestion Pipeline
SetIt: Behavior
- Taxonomy terms, such as a Medium, Grade, Subject, Topic, Subtopic are chosen
- The purpose of the question is set is specified in terms of the attributes of the Question such as a Marks, Blooms level, Difficulty etc..
- A Query is formed with the above, and fired against the DB. Result are given back to the user
SetIt: Interface
- Taxonomy Terms can be specified via multiple entry points
- DIKSHA implementation team can create Textbook Spine
- via CSV
- Question Set Purpose is exposed via a JSON configuration. They can be provided in multiple ways
- Several Blueprints are provided with pre-filled values.
- The JSON data can be seen via a custom UI in new Portal as is
- A User can edit the JSON data UI in case he/she wants to customise it
- The Results of the query can be edited by the User. This allows full customisation
- Bulk Accept
- Reject individual items
- Add individual items (so that query can be modified, new results are fetched and interesting items are added to the collection
TBD
- telemetry event structures (during curation)
- to better the auto curation process
- telemetry event structures (during set acceptance/modification)
- to better query fulfilment
- intent specification schema
- similar to Plug-n-Play analytics JSON-ified filtering criteria
05th June Scope
- Preparatory Question Sets and Exam Question sets will be made available to
- Grades 9 and 10
- Seven Subjects (Maths, Science, Social Science, Hindi, English and Sanskrit)
- Coverage and quality will vary depend on the curation effort required, and quality of the data
- Auto Curation (Validation) metrics will be developed to reduce effort by the Human-in-the-Loop. This is seen as a general ML infrastructure capability (in particular a Reinforcement Learning Environment)
- Few rules will be developed to identify quality tags (Manual effort)
- Few algorithms will be developed to identify quality tags
- Alignment between extracted Taxonomy terms from CBSE Question Bank and NCERT will be aligned (Manual effort)
- Sample Question Set Intents will be created (Manual effort)
- Auto Textbook Creation: Given Textbook Spine and list of Blueprints, Textbook creation will be automated
- Ability to modify the Intent, and select the result set will be developed
- Support for passive consumption of Question-Answer pairs. There will not be any evaluation of answers, or interactions on the questions. They will be treated like normal resources types (not assessment resources)
Beyond June Scope
- 16 core subjects, k1-12 grades
- QML implementation, support additional interaction types
- Support for Exemplary Answers (actual Answers written by students, and taken as OCR images)
- Support for ingesting previous exam paper questions
Design Direction(?)
Treat Textbook as a Map.
- Every learning services is then actually a location-based Service.
- In a Map, we can ask for
- Find restaurants near-by
- Find the shortest route from A to B
- Find a shopping mall near by
- If we treat Textbook as a Map
- Find teaching material for
- Find practicing material for
- Find previous exam questions for
- Tell me how others are doing here
- Tell me what others are facing difficulty with here
- How can from here (concept) to another (next chapter)
- In a Map, we can ask for
- It unifies the design across all verticals in DIKSHA. It is like building an Uber, an Ola, a Swiggy, on the top of Google Maps.
- Lessens the burden of providing exact location details (Taxonomy terms) by non-expert Users
- In order ask for services, a user does not need provide exact address (like house number, street name, nearest landmark, pin code)
- He simply drops a pin
- A user simply select a kindle like Textbook from the library (by supplying just four fields – board, medium, grade, subject)
- User need bother about choosing Chapter, Topic or anything of that sort.
- A User implicitly provides them as he/she is browsing the Textbook. The current location on the kindle-like book, gives away the exact details
- At this time, Users are using Taxonomy terms as pointers. But in the platform, we also interpret them as pointers as well as what they mean. With Textbook as a Map, metaphor, we dont've even have to ask for the pointers. All the complexity can be handled by the backend of figuring out the meaning of the location
- Providing resources in one medium to another medium, is like transforming cartesian coordinates to polar coordinates. User always remains in their own zone.