cQube | Design Document (Nov 2022)

 

 

 



cQube Vision

 

Executive Summary

  • cQube is a domain-agnostic, packaged solution for enabling scheme monitoring for public programs

  • cQube offers packaging, deployment ease, and infra optimization, compared to off-the-shelf tech

  • Core proposition is its ability to provide out-of-the-box usability leveraging domain-specific configurations

  • The configurations layer will equip cQube with pre-defined indicators, insights, and suggested actions (nudges)

  • cQube for education (Ed) will accelerate adoption of VSK as envisioned under MoE’s program

  • The Ed configurations, along with programmatic SOPs will constitute a VSK Playbook as an additional public good

 

cQube Vision and Value Prop

cQube is envisioned as a ready-to-use/pre-packaged, configurable, and extendable DPG solution to enable observability and action towards effective policy implementation in education and other sectors, involving various stakeholders across govt, society, and private sector.

 

There are several challenges that government program owners face while leveraging technology for monitoring public programs

  • Program administrators face challenges in defining actionable insights and relevant indicators for tracking

  • Data ingestion, processing & visualization available as disaggregated products, not packaged into a solution

  • Governments are unable to hire expert technical staff (architects, engineers) to smartly leverage a combination of available products together for end-to-end monitoring

  • Governments have budget and digital infrastructure (server) limitations

 

cQube offers two core differential value propositions for monitoring programs

  • A domain-agnostic product packaging of ingestion, processing and visualization layer in a manner that is:

    • Easy to deploy with minimal engineering staff

    • Lite deployment (minimal infrastructure requirements optimized to program needs)

    • Built for scale (to be scale tested with ~500k users)

    • Leverages reference schemas and spec-compliant API based implementation

    • Brownfield ready with modular components interoperable with existing systems

  • Domain-specific configuration layer allowing out-of-the-box contextualization of indicators, insights, role based nudges and actions

  • We aim to initiate cQube Ed and other sector based configurations can also be made available going forward

 

cQube architecture

The current architecture of cQube has certain challenges which affect the adopter experience

  • Installation & CSV-based ingestion processes affecting adoption by state deployers

  • Fixed set of chart types disallowing state admins to configure the charts that they wish to see

  • Minimal interoperability due to non-exposure of data that gets stored as part of cQube

 

cQube architecture has been revised to enable flexibility in configuration and adoption by states

  • Spec-based architecture has been created for cQube Ed v5.0 with API-first design consideration

  • There are 5 blocks: Ingestion > Processing > Storage > Visualisation > Insight & Action Adapter

  • All the blocks can be used individually and as an end-to-end solution by the state adopters

 

cQube Ed (for India)

VSK was initiated by MoE as a program to improve monitoring of schemes / programs in education

  • Objective to effectively collect, monitor, correlate and analyze data to take timely decisions

  • Envisioned as state-level systems to monitor schemes / programs and build accountability in field-staff

  • Overall goal to drive a big leap in learning outcomes through meaningful data based actions

 

Today, system actors face challenges in monitoring delivery of schemes / programs in education in India

Actor

Problem

Actor

Problem

National Body (e.g., NCERT)

- Is not able to collate data easily from states to monitor programs nationally

NCERT wants to analyze NISHTHA compliance across states, but data is incomplete

Department Head

- Does not know how all programs are impacting the key department goals

DG cannot gauge whether mentoring compliance is improving NIPUN competencies

Program Owner

(e.g., MDM incharge)

- Unable to monitor effective delivery

State MDM incharge doesn’t know which districts are behind in ration procurement

- Unable to measure intended outcomes

State MDM incharge cannot gauge impact of MDM program on student attendance

Admin Officer

(e.g., Block Edu Officer)

- Is not aware of key indicators (metrics) to be monitored and improved

Is called to a district review, but does not know on which KPIs he will get questioned

- Does not know who are the accountable actors that can improve key indicators

Only 40% teachers in the block completed DIKSHA training, but how to improve?

School Head / Principal

- Unable to determine the course of action if any indicator for the school goes off

Principal isn’t aware which subjects are lowering the Class 10th Board pass rates

 

cQube layered with education-specific configurations (cQube Ed) can accelerate state VSK adoption

  • A VSK Playbook will define common indicators, roles, insights and suggested actions relevant across states

  • The Playbook will serve as an independent public good, enabling VSK compliance independent of tech.

  • cQube Ed will provide an end-to-end solution to enable VSK and come with:

VSK Specs

Playbook Reference Schema APIs

  • Schemas to capture data, insights, actions

  • NDEAR compliant APIs for enabling interoperability

cQube Ed Software

 

Data Ingestion and Processing

  • Ingestion of input data from existing (DIKSHA, SARAL, ODK.) or new systems

  • Ingestion of data from new data sources through configurable schemas

  • Processing ingested data into aggregated indicators

Data Output (out of box plugins for Visualization, Insights, Nudges)

  • Mobile app with personalized role based dashboards and nudges

  • Out-of-box visualizations with dashboarding tools

  • Precomputed insights downloadable weekly as reports for key role actors

  • Ability to add new programs to generate insights and visualizations

Admin Capabilities

  • Role mapping to indicators for visibility and accountability

  • Weekly report templates and dashboard configurations

  • Configure and automate in-app nudges (as a plugin)

 

cQube Ed Configuration Layer

The cQube Ed configurations layer will allow cQube to be VSK ready for states to begin ‘using’

  • The configurations layer will allow:

    • List of pre-configured programs and indicators to monitor national and state level programs

    • Mapping of users and roles (e.g., school principal, block officer) to indicators for visibility

    • Generation of insights (e.g., benchmarks) on top of indicators

    • Mapping suggested actions (e.g., nudging rules) for users against generated insights

 

The cQube Ed configurations layer will leverage VSK Playbook for out-of-box configurations

  • VSK Playbook will serve as a public good for enabling monitoring of ed focused schemes / programs

  • VSK playbook to be created by Samagra (Q3) in discussion with states, housed within MoE

  • The Playbook can be evolved and contributed to by state entities, other ed ecosystem partners and MoE

 

  • cQube Ed has multiple deployments, further enhancement needed to be VSK Playbook ready

  • Today, cQube is being leveraged by MoE (NVSK) and Jharkhand, with more states expected

  • Overall, goal is to enable all states to have VSK playbook implemented before the next academic year

  • Certain enhancements are needed to make it VSK playbook ready

 

  • There are some functionalities & use cases that cQube Ed (out-of-box) will not support

  • cQube Ed will continue storing aggregated data , user-level indicators & nudges not possible

E.g., viewing admin officer wise school monitoring visit compliance in a given month

  • For predefined indicators, cQube Ed will not ingest data in any schema other than the one specified

  • cQube Ed will allow limited operations to be performed on the datasets exposed (SQL layer)

 

  • States will be able to adopt cQube Ed based on the evolution of their own brownfield systems

  • States that have existing data input, processing and visualization solutions

    • Can configure their existing systems based on VSK playbook

    • Or can connect their data input to cQube and get processing, visualization, nudges out of the box

    • Or can connect their data input and visualization to cQube and get processing, nudges out of the box

  • States that don’t have existing data input, processing and visualization solutions

    • Can setup cQube through a SI and get data processing, visualization and nudges out of the box

    • Can leverage NDEAR reference data input tools - SARAL, Shiksha

 

  • The high level architecture for cQube Ed has been finalized

  • Spec-based architecture has been created for cQube Ed v5.0 with API-first design consideration

  • There are 5 blocks: Ingestion > Processing > Storage > Visualisation > Insight & Action Adapter

  • All the blocks can be used individually and as an end-to-end solution by the state adopters

 

Problem Statement

Governments usually collect a large amount of administrative data (especially in the education context), mostly for routine reporting and compliance purposes. Many countries have recently begun finding more advanced and useful ways of leveraging these data sets to allocate resources better, measure success, and improve government-administered programs' efficiency. However, a lot of state governments in the country still operate on anecdotal rather than data-backed decision making due to some key challenges:

  1. Data Quality: Data sets are useful when they are accurate and reliable. When data points are inaccurate, missing, or poorly defined, the information is less useful. In most of the state governments, the data collected is isolated and unstructured.

  2. Data Accessibility: Outdated technology can make it difficult for governments to extract their data in a usable format. And when contractors control computer systems, states may not even own their data or lack the technical expertise to use it.

  3. Data Usage: Data, if available, should be used and leveraged to improve performance. Even if some states have structured and accurate data, there are either no dashboards to analyze the data or the dashboards are very difficult to interpret. If some information can be interpreted from the available data, there is always a gap between information and action by the user.

  4. Data Sharing: Data sets are most valuable when shared across programs or agencies and combined with other data sets to get new insights. However, many agencies are not competent to maintain interoperable datasets. They also face uncertainty about how to comply with privacy laws or share data while keeping it secure, limiting the insights gleaned from the data.

  5. Expert Staffing. To use data effectively, governments need staff members to understand the technical skills to manage and analyze data. These members are in short supply leading to issues related to data usage.

These challenges become a hindrance for the state governments in implementing effective review and monitoring processes, which in result also affect the government-administered programs' efficiency.

 

cQube Ed

cQube is envisioned as a ready-to-use/pre-packaged, configurable, and extendable solution to enable observability and action towards effective policy implementation in education and other sectors, involving various stakeholders across govt, society, and private sectors

 

cQube Ed is a pre-packaged solution for education with 120+ pre-defined actionable indicators & insights like attendance, assessment etc. It acts as an accelerator for states in fast tracking their VSK journeys.

 

cQube can be extended to other domains as well, similar to education through a domain specific configuration layer.

 

Design Principles:

cQube is based on the following design principles:

  1. Solution: cQube is neither a tool nor a platform. It is a ready-to-use/pre-packaged, configurable, and extendable solution to enable observability and action towards effective policy implementation in education and other sectors, involving various stakeholders across govt, society, and private sectors

  2. Education-specific: A pre-packaged solution of cQube for education, cQube Ed, comes with a set of predefined actionable indicators and insights which are specific to education. For eg: the metrics could be related to attendance, enrolment, assessments etc. The schema for data ingestion will also be defined for edu-specific indicators.

  3. Based in Indian Context: cQube is based in the Indian context, implying that the jurisdictions and hierarchies will be defined accordingly. For eg: The hierarchy for cQube Ed will be as follows - State > District > Block > Cluster > School > Class..

 

Use Cases

The major use cases envisaged to be unlocked through cQube Ed are as follows:

 

Decentralized Observability:

Situation: The State Project Director and other field officers (District / Block / Cluster Officers, HM / Principal / Teacher) wish to review the indicators for their jurisdiction (blocks / clusters, schools, classes) in order to improve the overall performance.

Present (with no implementation of cQube Ed): The State Project Director and other field officers go to different dashboards to review different indicators. One dashboard tells the performance on teacher mentoring, another tells the performance on student attendance and so on. The officers conducting the review open each dashboard (if they are able to find all) and review one indicator per dashboard.

Even when they go to a particular dashboard, the officers always need to select the district / block / cluster whose performance needs to be seen for that particular indicator.

Future (with implementation of cQube Ed): At a monthly state review meeting in Uttar Pradesh’s education department, the State Project Director opens a single dashboard on his screen. While reviewing all indicators on the same dashboard, he spots that Varanasi is a bottom performer in terms of student assessment performance in recently-conducted SA1 exams. He tells the district officer during the meeting to review the blocks under him immediately with respect to this indicator.

The District officer of Varanasi quickly opens the dashboard for Varanasi on her phone during the meeting itself and looks at the indicator showing student performance in SA1. She quickly identifies the blocks that need to take corrective action from the chart shown, shares this information on the WhatsApp group with all the block officers and calls for an in-person review meeting the next day to discuss the next steps for improvement on this indicator. As suggested on the VSK dashboard, she also sends a nudge through the VSK dashboard to all of these block officers which need improvement.

As the respective block officers receive the nudge on their devices from the VSK dashboard, they identify the clusters within their block on the VSK dashboard where this indicator has been low and follow the same process as the district officer with all the cluster officers. The cluster officers then review the schools under them and speak to the HMs / Principals regarding the specific issue. The HMs / Principals then discuss the issue regarding low student performance in SA1 with specific teachers for their classes.

As a result, key learning outcomes that need to be addressed in each classroom by the teachers are identified and become a focus in the teaching-learning process for the next few months in order to achieve improvement in student assessment performance in the next set of exams.

Solution: cQube Ed will allow these field education officers to view & share personalized dashboards and insights to conduct review and monitoring effectively and take data-backed decisions. cQube Ed will also enable these officers to send & receive nudges if there is a dip in performance with specific actionables.

 

Configurability:

Situation: A new program on tablet-based learning (e-Adhigam) has been launched in the state of Haryana. The state admin (head / nodal officer of VSK in the state education department) has been asked by the SPD to ensure that the following metric is being reviewed across the state at all levels (district / block / cluster etc.) on VSK: ‘Number of students using tablets daily’.

Present (with no implementation of cQube Ed): If no data is being collected, the state admin asks for data in a specific format from all districts (usually sent as CSVs) which is then collated and interpreted.

Assuming that the required data for this indicator is being collected in a structured manner, the State Admin would give the requirements to the State Technical Team (NIC, if no state tech team exists) to add this indicator in the following manner:

  • Indicator: Total Number of unique students who are using tablets daily

  • Type of Chart to be created: Table

  • Dashboard where chart should be included: New Dashboard to be created ‘e-Adhigam’

  • Dashboard should contain filters like Year & Month

Post multiple calls to understand the requirement, the state tech team would be able to create this chart in around 2-3 weeks (if all goes well), post which the reviews will be initiated on this indicator.

Future (with implementation of cQube Ed): As soon as there is a requirement for a new indicator, the state admin speaks to the state tech team to add this data point to the VSK. The tech team will then modify the adapter within a few hours to add this new data point as an event to the ingestion block of cQube Ed.

Once the dataset has been connected to VSK, the state admin can define the chart type, colors and positioning of the chart on the dashboard through the admin console.

Solution: cQube Ed will allow the state admins to add an indicator or an insight within a day through an intuitive and easy-to-use admin console.

1. Terminologies

 

  1. Adopter - Leadership / Decision Makers who plan to and leverage cQube Ed

  2. Domain - Area of interest and expertise that we are working on. Eg: Health, Education, etc.

  3. Input Sources - Any input data sources that adopters use. Eg: Database (MIS, SQL, NoSQL) and Google Sheets / Excel / CSV, adopter applications

  4. Management Information System (MIS) - A data store that houses all the raw data of adopter organizations to generate events

  5. Ingestion - A system where all the data from input sources reach as the first step

  6. Adapter - Generates events from MIS / any other input source and pushes them to cQube Ed through an API

  7. Dataset - High-level data which is computed by aggregating events. It is a data representation of the indicator. Datasets are persistent within cQube Ed. A dataset is created for at least one indicator. This has been explained in detail with examples in this section

  8. Indicator - A visual representation of a dataset(s). Eg: District-wise average attendance %.

  9. Event - A data structure that records an occurrence at a particular time for an entity (eg: school, etc). It is a combination of simple data types (eg: integer, varchar, etc.). An event should always contain a column/set of columns that helps you calculate the Indicator. A table with a timestamp doesn’t necessarily mean that it is an event; it should contribute to either aggregation or filtering of the dataset. This has been explained in detail with examples in this section. Additional details are in this section.

  10. Allowed Data Types - SQL compliant data types found and supported across most RDBMS implementations.

    1. Numeric data types such as int, tinyint, bigint, float, real, etc.

    2. Date and Time data types such as Date, Time, Datetime, etc.

    3. Character and String data types such as char, varchar, text, etc.

    4. Unicode character string data types, for example nchar, nvarchar, ntext, etc.

  11. PII - Personal Identifiable Information

  12. Data Processing - Processing involves:

    1. Transformation of Events to data that updates datasets - Transformation happens through a transformer: f(eventDetails, eventSchema, datasetSchema, dimesionConfig) = [array of columns]

    2. Updating datasets

Processing in cQube Ed is done on the event(s) being emitted.

  1. Dimension - Dimensions describe events. [image] This has been explained in detail with examples in this section

  2. Transformer - Operation / Function being performed on an event & dimension to process them into a dataset. This has been explained in detail with examples in this section

  3. Visualization - A functional block that focussed on rendering data

  4. Charts - A single sheet of information in the form of a table, graph, or diagram

  5. Dashboard - A collection of charts

  6. Dashboard Organizer - A WYSIWYG editor that allows the placement of charts

  7. Insight - An actionable comprehension of certain data and visualizations

  8. Action Adapters - A type of written communication to disseminate information in order to encourage or persuade someone to do something in a certain way

  9. Plugins - An external or internal component that adds functionality to vanilla cQube Ed. Since cQube Ed is an API first design and the APIs are exposed, to add additional functionality, a plugin needs to be created.

2. Use Cases [1]

#

Persona

#

Use case (epics)

#

Persona

#

Use case (epics)

1

Deployer

1.1

As a deployer, I can install cQube Ed seamlessly (at a single click), select domain & setup data ingestion, processing and visualization pipeline

1.2

As a deployer, I can define and ingest spec-compliant state data into cQube Ed to generate actionable insights

1.3

As a deployer, I can process the events to generate datasets

1.4

As a deployer, I can visualize any dataset to generate charts for the program dashboard

1.5

As a deployer, I can analyze if the cQube Ed instance is running well or not

2

Admin

2.1

As an admin, I can choose the insight to be shown on the dashboards to selected users

2.2

As an admin, I can request for an additional insight to be shown on the dashboards to selected users

2.3

As an admin, I can setup program dashboards using the insights generated from datasets and provide access to multiple users

2.4

As an admin, I can configure nudges to decide what nudge has to be sent to whom

3

User

3.1

As a user, I can view the program dashboard to identify potential actions

3.2

As a user, I can view and receive nudges based on data insights

3.3

As a user, I can share the insights from the dashboards with stakeholders on other channels

Detailed user stories for each use case have been linked here.

3. Design Considerations

  1. Low Tech/Minimal Coding: Leverage existing open source tools and generalization architecture which allow the adopter to meet their needs through configurations with no or minimum coding (need to balance it with ease of deployment and manage needs).

  2. API First: APIs should be built for all functionality and blocks for the independent evolution of the solution. Exposing and documenting internal structures would allow for a pluggable and contributable system all around. Evolution would be critical.

  3. Progressive Enhancement: The design should always allow for changes of modules in a way that there are tiers to improvement in experience.

  4. Modularity: Each individual block in cQube Ed should be modular and allow the adopter to pick and choose relevant components. Blocks should be small and solve a problem in an end-to-end manner to ensure usefulness.

  5. Integrations: Adopters with higher capacity and intent should have the option to enable higher-order complex capabilities as relevant to them. The higher-order capabilities need not come as part of the solution but can be enabled as an integration. Integrations can be enabled using clear boundaries between blocks and enabling communication between them through APIs over commonly shared specifications of data transfer and storage.

  6. Ease of Deployment/Management: A lot of educational bodies have a weak capacity to design, implement, deploy and operate solutions. To achieve scale -

    1. Single VM Deployment - It is important to enable a pre-packaged out-of-the-box reference solution that is easy to deploy and manage on commodity hardware with min specified requirement.

    2. Cloud-agnostic deployment: On-premise deployment on VMs, and deployment on any hyper-scale using cloud-native alternatives should be possible.

  7. Scalable

    1. Ingestion: cQube Ed will ingest aggregates that will be pushed as events, the frequency of which is selected by the state admin. This could lead to a lot of very small events. The ingestion should be expandable to allow for faster ingestion and to keep everything real-time.

    2. SLA Driven: An SLA of 30 mins between ingestion and insight generation. Delays in business operations and decision-making cause the government to miss opportunities and expose them to risk.

    3. Performance: In Spite of the hardware limitations, cQube Ed should never compromise on this aspect and should clearly notify the user of what can and cannot be done. Exhaustive benchmarking suites to ensure stable performance at a specified hardware level

  8. Data Privacy: cQube Ed will not store data at a user level so no PII is part of the system. Only aggregated data and insights will be stored as part of cQube Ed.

  9. Accessibility: The solution should be accessible on mobile devices along with desktops.

  10. Data Security: To be added

4. Technical Architecture Diagram

5. Specifications (WIP)

 

Specs

Links

Event

Link

Dimension

Link

Dataset

Link

Transformer

Link

Indicator

Link

Charts

Link (Dashlet Spec++)

Dashboards

Link

 

The links for Charts and Dashboards will be added at a later stage.

6. Feature Comparison

As a

I will be able to

Feature List

v5.0

(Jan '23)

v6.0

(Mar '23)

v7.0

(May '23)

As a

I will be able to

Feature List

v5.0

(Jan '23)

v6.0

(Mar '23)

v7.0

(May '23)

Field Officer (State, District, Block, Cluster, School, Class)

Select my role as well as geography (eg: District Officer, Agra) and view my pre-defined indicators for VSK on cQube dashboard (desktop-based) in order to conduct review & monitoring and take actions.

Ingestion of state data into cQube as per fixed schemas for pre-defined indicators through adapters

 

 

 

Processing of ingested data to create aggregated datasets

 

 

 

Auto-creation of visualizations of pre-defined indicators for each role from aggregated datasets on desktop-based dashboards

For existing users of cQube 4.0, these visualizations will be created on current cQube visualization layer

For new users of cQube 5.0, these visualizations will be created on Metabase / Superset

 

 

 

Enabling Metabase / Superset as a plugin for visualizations of pre-defined indicators

 

 

 

Code-based addition of new indicators & chart types by the state

 

 

 

Select my role & geography (eg: District Officer, Agra) and view & download my pre-defined indicators & suggestive actions for VSK on cQube app (mobile-based) in order to conduct review & monitoring and take actions. I will also be able to send & receive nudges with suggestive actions.

Auto-creation of visualizations & insights of pre-defined indicators from aggregated datasets on an app-based dashboard

All users will need to adopt Metabase / Superset in order to enable app-based dashboards

 

 

 

Enabling pre-defined suggestive actions on the cQube app and desktop-based dashboards by mapping these against indicators & roles

 

 

 

Download charts, dashboards & action report as PDF

 

 

 

Enabling UCI as a plugin for the users to send & receive pre-defined in-app nudges, showing suggestive actions

 

 

 

State Tech Team

Add new program indicators, visualizations & suggestive actions to be shown as part of VSK. I will also be able to manage users, reports and configure nudges in the state.

Admin console to add new program indicators on the dashboard by connecting datasets, creating visualizations, adding suggestive actions and mapping them to a role

 

 

 

Management of users & roles through the admin console for logins and authorisation (self-claimed or central)

 

 

 

Configuration of report templates and in-app nudges (with an option to automate them) through the admin console

 

 

 

One-click installation of cQube

 

 

 

 

 

Note: Embedding charts from the external app in the ref viz app is not allowed - cQube Ed will enable creation of charts using an external WYSIWYG editor based on dashlet spec to enable creating visualizations.

 

Following table differentiates between the core of cQube Ed and the plugins that will be contributed by the adopter community.

 

cQube Ed Core

- All building blocks

- Dashlet spec impl.

cQube Ed Plugins

- Cloud Agnostic

- Horizontal Scalability

- Near Real Time

1-step data ingestion with fact validation and initial processing done on an event. Connection to any data source through an adaptor (externally)

- Adapter (Input source) plugins published to open source to ease the ingestion of events

- Out of the box indicators configured
- Admin plugin to manage basic things

 

 

- Ability to define an end to end flow using APIs

 

- Ability to use Superset as a Chart and Dashboard Configurator, additionally as a Renderer.

Only Domain Specification Based Indicator and Visualizations

- Nudges configuration through admin

- Authentication handled externally)
- Authorization with attribute/role-based access control for the insight, processing, and dataset, viz

- OPA based plugin to manage roles through a UI.

Single script installation on a single machine using the same Ansible scripts

- Horizontally Scalable Deployment on any CSP using Kubernetes without a DevOps needed.

- SLA of 4 hours from event ingestion to it being shown on dashboard

- Benchmarks - Results will be published for an Input source with a predefined domain spec. Scripts will be available publicly

- A UI testing suite for Dashlets

- Usage Telemetry, monitoring

- Event lifecycle monitoring and debugging plugin
- Plugin for external Archival of messages
- Event(s) replaying through plugins

 

7. How to enable an indicator as part of v5.0?

 

cQube Ed v5.0:

cQube Ed v5.0 is a fundamental shift from how the data is currently (as part of v4.0) ingested into cQube.

 

cQube v4.0 had the following ways of data ingestion:

  1. Either aggregated data as CSV uploaded (zipped), or

  2. Emission APIOpentelemetry

v5.0 will ingest event(s) in a format that will update the dataset(s) in some shape or form. A dataset can be derived from one or more specified events, independently (i.e. no dependency between events to update the dataset). It may additionally contain dimensions and derived values from other datasets during the mapping process.

 

An example is shown below.

There is a top-down approach that will be followed from an adopter perspective:

The state admin first defines the Indicators that need to be visualized on the main dashboard. For example, District-wise Average Student Attendance % with dropdown filters of Year & Month. The following is a template of how this Indicator will be visualized on a dashboard:

Year Filter 🔽 Month Filter 🔽

District

Average Attendance %

<District Name>

<Average attendance % with color coding>

 

From the Indicators, datasets are defined. There will be at least one dataset for an Indicator being visualized. For example, for the Indicator mentioned above, the dataset will have the following columns:

 

dataset_attendance

Date

School Id

Count

Sum

Average Attendance %

<dd-mm-yyyy>

<unique id of a school>

<count parameter>

<sum parameter>

<average calculated through sum / count>

 

In order to create this dataset, data will need to be ingested from the state databases. The state will send the data to cQube Ed as per the defined event spec. For example, the event in this case will be attendance. Following will be the schema for attendance, the format in which the state will share their data (adapters can be leveraged by the states to convert their MIS / database data into this format):

 

School Id

Total Present

Total Students

Date

101

20

50

13-11-2022

201

10

46

12-10-2022

301

42

42

11-02-2021

 

There will be requirements of some dimensions as well, to be able to produce the dataset mentioned above. For example, in this case, school will be the dimension for the ‘attendance’ event.

 

School Id

Name

District

Block

Cluster

101

D.P.S Public School

Kangra

Baijnath

CGPS Yangpa-1

201

St. Mary Convent School

Mandi

Karsog

GSSS Karsog

301

Holy Child School

Shimla

Shimla

Shimla

 

Next step is transformation (processing) performed to convert these events and dimensions into a dataset. For the above example,

 

INSERT INTO

dataset_attendance (date, sum, count, average, schoolId)

VALUES

(‘13 - 11 - 2022’, 20, 50, 40, 101) ON CONFLICT ON CONSTRAINT dataset_attendance_unique_date_schoolId DO

UPDATE

SET

count = count + 50,

sum = sum + 20,

average = (sum + 20) /(count + 50)

WHERE

schoolId = 101

 

This processing will result in the following dataset:

dataset_attendance

Date

School Id

Count

Sum

Average Attendance %

13-11-2022

101

50

20

40

13-11-2022

201

60

30

50

11-02-2021

301

63

42

66.7

 

The dataset will then result in the following visualization on the dashboard:

 

Year Filter (2022) 🔽 Month Filter (Nov) 🔽

District

Average Attendance %

Kangra

45%

 

Let’s take another example.

Suppose there are multiple Indicators that need to be visualized on a dashboard, for example -

  • State-wide Average Attendance %

  • District-wise Average Attendance %

  • Block-wise Average Attendance %

  • School-wise Average Attendance %

 

There will be a separate dataset available for each of these Indicators. For example, the respective datasets for the Indicators would be:

  • State_attendance

  • District_attendance

  • Block_attendance

  • School_attendance

 

The following attendance event will affect each of the datasets, that will then flow into the Indicator visualization on the dashboard.

 

School ID

Total Present

Total Students

Date

101

20

50

12-11-2022

 

In this way, cQube Ed v5.0 will:

  • Allow for a federated data source (for example - MIS) using Adapters (push-based).

  • Provide the ability to use existing viz or create Viz using a UI-based dashboard organizer OR to connect dataset (using SQL) to existing BI tools.

  • Provide configurable Role Based Access - allow enforced and set as a preference

8. Blocks in the future versions of cQube Ed:

Ingestion

Current Implementation -

  • All the datasets are ingested into ‘Emission Data Storage’

  • Data is ingested as aggregated data in a CSV

  • Student-level data for assessment and attendance is stored

 

Proposed Implementation -

  • Data is emitted/pulled through input sources.

  • The adapter then converts emitted data to an event(s), in order to make it compliant to the event spec. An adapter can also ingest datasets directly and push them to cQube Ed (using the dataset spec). An adapter is a custom logic that sits inside the state data center and monitors it for new events.

 

  • Events can also be sourced using an SDK that becomes part of the state applications.

  • To ensure backpressure handling, no loss of input events, and throughput, the API will ensure that events are but with a smaller event bus (and not a resource intensive solution like Kafka) with offsets and message retention as first-class citizens.

  • Since the events are not processed/validated in sync with the API, the admin for input sources should be notified in case of any issues.

  • Events are first-class citizens in cQube Ed and hence can be directly ingested without further changes. Since the API is expected to handle a large number of events, they can either be aggregated at the adapter level or at the processing level.

  • The event bus ensures that:

    • The events are only being stored until they are processed. The SLA for processing has been kept at 4 hours. (Back pressure management)

    • Since cQube Ed will become domain agnostic, the events for each domain will be created upfront as part of the domain solution in a similar manner as shown in the example above. Events are created using an event spec and pushed upstream to the adapter to enforce them at runtime.

    • Helps aggregate similar events and thus helps optimizations.

  • Any amount (< 1k events/second) of data can be ingested real-time in the form of events for aggregate entities - e.g. Classroom attendance at that moment. This can be uploaded in batches as well.

  • The state can fix a frequency for data updates in the ingestion spec which will run by truncating and inserting all data

 

As explained before, here is an example of how an event will look like to visualize a district / block / cluster or school-level Indicator for average attendance %.

 

School Id

Total Present

Total Students

Date

101

20

50

13-11-2022

201

10

46

12-10-2022

301

42

42

11-02-2021

 

Anonymised and Aggregated Event Store (AAES):

This is a persistent storage of certain events. Following example explains the need of this event store:

Suppose a state wants to calculate the number of unique students who have been absent for more than 5 days in a particular district, in order to send a nudge to the district officer.

Since the events being collected as part of cQube Ed do not have student id and are stored only transiently, it will not be possible to visualize this particular Indicator.

Hence, an anonymised and aggregated event store will be required which stores historical and encrypted / anonymised information for certain Indicators (student-level attendance in this case). Having an AAES will enable the visualization of unique students who have been absent for more than 5 days in a particular district.

Other examples of Indicators / Visualisations where AAES will be required are as follows:

  • Count of NIPUN students according to the latest spot assessment taken

  • Count of students who have been consistently scoring <x% on assessments

  • Count of mentors who have met their target visits, and who haven’t

 

Where should AAES be maintained? Who should maintain it?

 

Option A (Outside of cQube Ed, state to maintain) - Recommendation

  • This would put the onus on the state to store the data in a certain format to visualize certain Indicators

  • State will only have to manage the server space to maintain this database

  • State would need to monitor their servers regularly to ensure that the events are being sent so that the respective visualizations don’t fail

 

Option B (As part of cQube Ed, product team to maintain)

  • This would put the onus on the product to manage the required server space, custom to each state’s requirements

  • Hardware costs would increase for the state in order to adopt cQube Ed, as cost for AAES will also get included

  • There will be less dependency on the state to manage this event store

Processing

Processing is a two-step process

  1. Transformation of Events to data that updates datasets - Transformation happens through a transformer.
    f(eventDetails, eventSchema, datasetSchema, dimesionConfig) = [array of columns]

  2. Updating datasets -> Datasets = And UPSERT to the data store.

Processing is done on an event(s) being emitted by the event bus.

 

As explained earlier in the sections, here is an example of a transformer.

 

INSERT INTO

dataset_attendance (date, sum, count, average, schoolId)

VALUES

(‘13 - 11 - 2022’, 20, 50, 40, 101) ON CONFLICT ON CONSTRAINT dataset_attendance_unique_date_schoolId DO

UPDATE

SET

count = count + 50,

sum = sum + 20,

average = (sum + 20) /(count + 50)

WHERE

schoolId = 101

Datasets

Current Implementation -

  • All the processed data is stored in ‘Output Data Storage’ which is query-able

  • Datasets are created on S3

 

Proposed Implementation -

  • Datasets will be maintained on SQL compatible data store.

  • The datasets will be JSON-compliant, hence enabling visualization using existing chart types on cQube

  • The datasets will also be SQL-compliant, hence enabling

    • External system to build features faster over cQube Ed (Ecosystem play)

    • Existing SQL-based charting tools, data processing tools, etc.

  • Datasets will be of two types:

    • Auto Datasets - generated automatically based on the domain spec.

    • Custom Datasets - allows for an extension on the existing ones if the spec doesn’t solve for it; can be generated at runtime by using well-defined SQL constructs [1], [2]; custom dataset creation is not part of the pipeline and managed separately for ease of use. Custom datasets can also be created by the states from the auto datasets. (as part of v6.0)

 

As explained earlier in the sections, here is an example of a dataset, which is formed post processing ingested events:

 

dataset_attendance

Date

School Id

Count

Sum

Average Attendance %

13-11-2022

101

50

20

40

13-11-2022

201

60

30

50

11-02-2021

301

63

42

66.7

Visualizations/Insights

Current Implementation -

  • Generates fixed types of visualizations based on pre-defined metrics and data sources

  • Logged-in users can view the visualization on the browser over the internet and metrics can also be downloaded for taking action or for further analysis

 

Proposed Implementation -

  • Visualization will be generated using current cQube charts using the json output of the datasets, to ensure backward compatibility for current cQube users

  • Some external visualization tools will also be provided to create charts using the SQL output of the datasets for new cQube users

  • Over time, a visualization toolkit will also be built which will be divided into three parts: Charts, Components, and Dashboard Organizer

  • Charts - This will get subsumed in the dashlets spec. All charts should conform to a Chart Spec that includes three things - Chart Layout (UI Details & Expected Data format), Query to fetch the data, and data transformation from queried data to a chart-specific format. (actionable??).

  • Components - Non-chart components that become part of the dashboard like images, markdowns, decision-making, etc. These should be embeddable like charts.

  • Dashboard Organizer - Allows you to position charts and components. Dashboards will always be live with the timestamp for the last updated data and will be updated every 4 hours.

 

Approach for implementation of the visualization layer in cQube will be as follows:

  • Dashlet spec will be leveraged to define the charts

  • In the short term the states can either use current cQube charts or connect any visualization tool as a plugin and leverage it for creation of visualizations

  • In the long term, a chart configurator tool will be integrated within cQube for creation of charts & dashboard in a WYSIWYG manner. The output produced using this tool will be a dashlet.

  • A chart renderer tool will be integrated within cQube Ed to render dashlets

  • Recommendation to build this chart configurator + renderer tool is to embed an open-source tool like Superset, which will allow us to focus more on what is missing (lesser effort) than building it from scratch

  • Now, the charts created through this tool can be visualized in the following ways:

    • On the chart configurator + renderer tool itself

    • On the current cQube Ed visualization layer

    • On any other tools like chart.js, d3js etc.

Dashlet <> Chart Configurator <> Render

Actions

Actions are not seen as a separate entity in cQube. They are part of the Visualization layer and have a 1:1 mapping with the Indicator. Action adapters are used to notify the owner of the Indicator of any diversion from the norm. Please note that even though actions are mapped to an Indicator, they don’t necessarily need to be triggered when using the Dashboard. Actions will be shown as part of the dashboard but can notify the admin async.

 

Actions are embedded in charts using the Dashlet spec and defined in the domain spec as part of the Indicator.

Use Cases to be enabled -

  • Rule-based Action generation: Some actions will be generated based on some rules applied to the visualizations. For example, if assessment performance is less than x %, then show y pedagogical recommendations to the teacher.

  • Anomaly detection: Anomalies will be detected in the data, based on which visualization has been created. For example, assessment scores submitted by a teacher.

  • Correlation and Causation on factors: Correlation and Causation can be established on a metric and factors affecting it to generate relations. For example, does attendance affect the assessment performance of students in a classroom?

  • Auto-generated / Templated narrative generation: Text-based narratives can be created based on the insights generated to ease decision-making for the users. For example, the narrative can tell the district officer to conduct a review of officers of x blocks on delivery of textbooks as there is a delay in the same in these blocks.

  • Rule-based Nudges: Nudges can be sent through UCI based on some set rules on the visualizations created on cQube Ed. For example, an admin can nudge a district officer to take reviews on student assessments as the overall district performance has been going down. These will be enabled by plugins that a state can create to be able to implement this functionality.

  • Simulator: A simulator can be created for interaction with the data to make decisions. For example, a district officer can simulate improvement in assessment performance if textbooks are delivered to schools by x date. (Here is an example of how this will work.)

Admin Dashboard

  • State will have a config-driven admin dashboard for ingestion of events, dimensions, and datasets

  • This dashboard will be further extended for visualization configs such as: 2.3.4, 2,3.5, 2.3.6, 2.3.7, 2.3.8, 2.3.9.

  • The implementation of the admin console can be done using form schema and react-admin.

9. Installation

  1. The state deployer will be able to install cQube Ed by running a single command with minimal specs. That will also include a one-time setup for ingestion, processing, and visualization blocks.

  2. The state deployer then selects the domain during cQube Ed installation.

  3. The state deployer is then also able to install helper modules like adapter & Anonymized and Aggregated event store for certain use cases if required.

 

cQube Ed will be based on a single-tenant system. Each adopter (tenant) will have their own instance (we'll always keep separate hardwares for the cQube Ed instance). The detailed architecture of how this will be done is still being laid out and will be added to this document, once finalized.

 

10. Software Requirements

There will be minor changes in the current software requirements than the ones being followed - https://cqube.sunbird.org/use/software-requirements. The idea is to continue building with the existing constraints on hardware but with the ability to scale if needed horizontally.

 

Change in the current requirements:

For ingestion, Typescript & Nestjs (Framework of express.js) will be used.

11. Other important elements:

Authentication

Users should be able to identify themselves through any Oauth2 system (SSOs) they have been using as part of their organization without the need to duplicate them.

Access Control

Since everything (Spec, Ingestion, Storage, Datasets, Visualizations) is exposed through APIs internally between services or externally, there should be a common access control layer at layer 7 that ensures everything is safe. The features to include in this would be

  1. Hierarchy based (Parent <> Child, both in organization hierarchy or Geography)

  2. Role-based

  3. Attribute-based?

  4. DEPA?

Some good layer 7 proxies that can manage these complexities depending on where we are deploying are - Kong (all), and Envoy (k8s). To not make this complicated and difficult to manage, there should be a UI to allow adopters to manage their own roles or even import roles. It is assumed that cQUBE would not require "encryption at store" and "encryption during transit" (in addition to any standard encryption available during transport, like, https) for any Indicator.

It will not be a prerequisite for the state to have a registry / authenticated database of users of VSK (officers, teachers) in order to implement VSK.

Domain Specification

An entire domain for cQube Ed can be modeled as a class which will include instances of all of the items mentioned above - Event, Dimension, Dataset, Transformer, Indicator, Charts, Dashboard, Actions. The domain can be modified after instantiation using the inbuilt APIs. An example would look like this. The domain config is used by the starter script to initialize a cQube Ed instance..

Deployment

  1. Networking - Existing network architectures can be reused without changes.

  2. Automated Deployment

    1. Using Ansible scripts on Jenkins on physical machines

    2. Using Ansible scripts on Jenkins on k8s

Monitoring Tools

  1. Health Checks - for all services, databases, and pipelines

  2. Status Pages

  3. OpenTelemetry Based Distributed Debugging

Scalability

  1. Event Bus

    1. Rabbitmq

    2. Kafka

  2. Storage - Horizontally

Security Implications:

  • The doc builds over and above what is already shared here.

  • There are security concerns over access of data as a SQL source - this will be handled through row/column level granular permissions.

  • PII Management - No PII is stored as part of cQube Ed in the original form.

New Use Case Creation

  1. Workflow

    1. Create a domain spec

    2. Redeploy

  2. Development Workflow

    1. `docker-compose -f docker-compose.dev.yaml -d`. This will start all the services and continue to monitor the changes done to the code