Introduction
This wiki details the architecture of enabling reporting framework to operate at scale. It discusses the high level design problems to be solved and introduces the proposed architecture for the same.
Key Design Problems
TBA
Reporting Architecture
Druid Architecture
Druid Data Model
Raw Telemetry
Dimension in Druid | Field in Telemetry | Description | Data Type | |
---|---|---|---|---|
1 | eid | eid | Event Id | String |
2 | syncts | syncts | Sync Timestamp | Long |
3 | actor_id | actor.id | Actor Id of the event | String |
4 | actor_type | actor.type | Type of the actor | String |
5 | channel_id | context.channel | Channel Id | String |
6 | producer_id | context.pdata.id | Producer Id | String |
7 | producer_pid | context.pdata.pid | Producer Process Id | String |
8 | context_env | context.env | Context Environment | String |
9 | sid | context.sid | Session Id | String |
10 | did | context.did | Device Id | String |
11 | context_cdata_type | context.cdata.type | Correlation Data Type | String |
12 | context_cdata_id | context.cdata.id | Correlation Data Id | Array[String] |
13 | object_id | object.id | Content Id | String |
14 | object_type | object.type | Content Type | String |
15 | object_version | object.ver | Content Version | String |
16 | tags | tags | Tags | Array[String] |
17 | edata_type | edata.type | Event type | String |
18 | edata_subtype | edata.subtype | Event subtype | String |
19 | edata_mode | edata.mode | START event Mode of start | String |
20 | edata_pageid | edata.pageid | Unique pageid | String |
21 | edata_uri | edata.uri | IMPRESSION event Relative URI of the content | String |
22 | edata_id | edata.id | Event data Id | String |
23 | edata_duration | edata.duration | Duration of the event | String |
24 | edata_index | edata.index | ASSESS event Index of the question within a content | String |
25 | edata_pass | edata.pass | ASSESS event Field to identify pass or fail for assessments | String |
26 | edata_score | edata.score | ASSESS event Assessment score | Double |
27 | edata_resvalues | edata.resvalues | ASSESS event Assessment results | Array[Object] |
28 | edata_item_id | edata.item.id | ASSESS event Assessment item id | String |
29 | edata_item_title | edata.item.title | ASSESS event Assessment item title | String |
30 | edata_item_maxscore | edata.item.maxscore | ASSESS event Assessment item max score | Double |
31 | edata_target_id | edata.target.id | ASSESS event Assessment item target id | String |
32 | edata_target_type | edata.target.type | ASSESS event Assessment item target type | String |
33 | edata_rating | edata.rating | FEEDBACK event Ratings | String |
34 | edata_comments | edata.comments | FEEDBACK event Comments | String |
35 | edata_dir | edata.dir | SHARE event direction | String |
36 | edata_items_id | edata.items.id | SHARE event shared item ids | String |
37 | edata_items_type | edata.items.type | SHARE item types | String |
38 | edata_items_origin_id | edata.items.origin.id | SHARE event source id | String |
39 | edata_items_origin_type | edata.items.origin.type | SHARE event source type | String |
40 | edata_items_to_id | edata.items.to.id | SHARE event destination id | String |
41 | edata_items_to_type | edata.items.to.type | SHARE event destination type | String |
42 | edata_state | edata.state | AUDIT event current state | String |
43 | edata_prevstate | edata.prevstate | AUDIT event previous state | String |
44 | edata_size | edata.size | SEARCH event result size | Integer |
45 | edata_filters_dialcodes | edata.filters.dialcodes | SEARCH event List of dialcodes | Array[String] |
46 | dloc_state | ldata.state | State location information for the device | String |
47 | dloc_state_code | ldata.state_code | State ISO code information for the device | String |
48 | dloc_city | ldata.city | City location information for the device | String |
49 | dloc_country_code | ldata.country_code | Country ISO code information for the device | String |
50 | dloc_country | ldata.country | Country location information for the device | String |
51 |
Summary Events
Dimension in Druid | Field in Summary event | Description | Data Type | |
---|---|---|---|---|
1 | eid | eid | Event Id | String |
2 | ver | ver | Version | String |
3 | syncts | syncts | Sync timestamp | Long |
4 | uid | uid | User Id | String |
5 | context_date_range_from | context.date_range.from | Start Date for the summary | String |
6 | context_date_range_to | context.date_range.to | End Date for the summary | String |
7 | context_rollup_l1 | context.rollup.l1 | Context level1 rollup | String |
8 | context_rollup_l2 | context.rollup.l2 | Context level2 rollup | String |
9 | context_rollup_l3 | context.rollup.l3 | Context level3 rollup | String |
10 | context_rollup_l4 | context.rollup.l4 | Context level4 rollup | String |
11 | channel_id | dimensions.channel | Channel Id as dimension from raw telemetry | String |
12 | device_id | dimensions.did | Device Id as dimension from raw telemetry | String |
13 | producer_id | dimensions.pdata.id | Producer Id as dimension from raw telemetry | String |
14 | producer_pid | dimensions.pdata.pid | Producer Process Id as dimension from raw telemetry | String |
15 | session_id | dimensions.sid | Session Id as dimension | String |
16 | session_type | dimension.type | Type of summary | String |
17 | session_mode | dimension.mode | Mode of action in the session | String |
18 | object_id | object.id | Content Id | String |
19 | object_type | object.type | Content Type | String |
20 | object_type | object.type | Content Type | String |
21 | object_version | object.ver | Content version | String |
22 | object_rollup_l1 | object.rollup.l1 | Object level1 rollup | String |
23 | object_rollup_l2 | object.rollup.l2 | Object level2 rollup | String |
24 | object_rollup_l3 | object.rollup.l3 | Object level3 rollup | String |
25 | object_rollup_l4 | object.rollup.l4 | Object level4 rollup | String |
26 | tags | tags | Tags attached to a summary event | Array[String] |
27 | time_spent | edata.eks.time_spent | Time spent in the session excluding idle time | String |
28 | time_difference | edata.eks.time_diff | Total time in a session including idle time | String |
29 | interaction_count | edata.eks.interact_events_count | Total count of interact events in a session | Long |
30 | summary_env | edata.eks.env_summary.env | High level env within the app (content, domain, resources, community) | String |
31 | summary_env_count | edata.eks.env_summary.count | Count of times the environment has been visited | Integer |
32 | summary_env_time_spent | edata.eks.env_summary.time_spent | Time spent per env | Double |
33 | summary_page_id | edata.eks.page_summary.id | Page id | String |
34 | summary_page_type | edata.eks.page_summary.type | Type of page e.g. view/edit | String |
35 | summary_page_visit_count | edata.eks.page_summary.visit_count | Number of times each page was visited | String |
36 | summary_page_time_spent | edata.eks.page_summary.time_spent | Time taken per page | Double |
37 | item_responses_item_id | edata.eks.item_responses.itemId | Question Id passed in the ASSESS event | String |
38 | item_responses_time_spent | edata.eks.item_responses.timeSpent | Time spent in seconds from ASSESS event | String |
39 | item_responses_pass | edata.eks.item_responses.pass | Pass response for a question from ASSESS event | String |
40 | item_responses_score | edata.eks.item_responses.score | Score from ASSESS event | Array[Integer] |
41 | item_responses_max_score | edata.eks.item_responses.maxScore | Max Score from ASSESS event | Array[Integer] |
42 | item_responses_timestamp | edata.eks.item_responses.time_stamp | Timestamp for each response from ASSESS event | String |
43 | dloc_state | ldata.state | State location information for the device | String |
44 | dloc_state_code | ldata.state_code | State ISO code information for the device | String |
45 | dloc_city | ldata.city | City location information for the device | String |
46 | dloc_country_code | ldata.country_code | Country ISO code information for the device | String |
47 | dloc_country | ldata.country | Country location information for the device | String |
48 |
Aggregates
Granularity → DAY
Druid field name | Druid source field | Aggregate Type |
---|---|---|
total_interactions | interaction_count | SUM |
total_time_spent | time_spent | SUM |
total_sessions | mid | COUNT |
Denormalized fields for Content, Device and User model
Content Model
Report JSON Spec
JSON Schema
[{ id: String, // Required. Report ID. label: String, // Required. Report Label (will be shown up as menu) title: String, // Optional. Report title. Defaults to report label description: String, // Optional. Report description. HTML text can be included as description dataSource: String, // Required. Location of the data source to show the report. Can be an expression. For ex: /<report_id>/{{channel}}/report.json charts: [{ // Optional datasets: [{ data: Array[Number], // Required if `dataExpr` is not provided. Array of Number. Data points to show in the chart dataExpr: String, // Required if `data` is not provided. Expression pointing to the data in dataSource. For ex: {{data.noOfDownloads}} label: String // Required. Label to display on the chart }], labels: Array[String], // Required if `labelsExpr` is not provided. Labels to show on the x-axis labelsExpr: String, // Required if `labels` is not provided. Expression pointing to the data in dataSource. For ex: {{data.Date}} chartType: String, // Optional. Defaults to line. Available types - line, bar, radar, pie, polarArea & doughnut colors: [""], // Optional. Color to show for each dataset. Defaults to ["#024F9D"]. options: { // Optional. options for display. Full set of options look at https://valor-software.com/ng2-charts/ responsive: Boolean, // Defaults to true ... }, legend: Boolean // Optional. Whether to show the legend below/above the chart. Defaults to true and position to top. }], table: { // Optional "columns": Array[String], // Required if `columnsExpr` is not provided. Columns to show. "values": Array[Array[String]], // Required if `valuesExpr` is not provided. Column data. "columnsExpr": String, // Required if `columns` is not provided. Expression pointing to the data in dataSource. For ex: {{keys}} "valuesExpr": String // Required if `values` is not provided. Expression pointing to the data in dataSource. For ex: {{tableData}} }, downloadUrl: String // Location to download the data as CSV }]
Following is a example schema to show the general usage report
{ id: "usage", label: "Diksha Usage Report", title: "Diksha Usage Report", description: "The report provides a quick summary of the data analysed by the analytics team to track progess of Diksha across states. This report will be used to consolidate insights using various metrics on which Diksha is currently being mapped and will be shared on a weekly basis. The first section of the report will provide a snapshot of the overall health of the Diksha App. This will be followed by individual state sections that provide state-wise status of Diksha", dataSource: "/usage/$state/report.json", charts: [ { datasets: [{ dataExpr: "{{data.Number_of_downloads}}", label: "# of downloads" }], labelsExpr: "{{data.Date}}", chartType: "line" }, { datasets: [{ dataExpr: "{{data.Number_of_succesful_scans}}", label: "# of successful scans" }], labelsExpr: "{{data.Date}}", chartType: "bar" } ], table: { "columnsExpr": "{{key}}", "valuesExpr": "{{tableData}}" }, downloadUrl: "<report_id>/$state/$timeFilter.csv" }