Page Comparison

...

This document describes the detailed design to generate the data for the portal dashboard reports dashboards from Druid and export the data to cloud storage. It This mainly consists of 2 parts:

...

following modules

Configure Report API - which is used to request for a new report.
Disable Report API - which is used to disable report generation.
Report Data Generator - is a spark job runs as per schedule time, generates data out of druid and exports data to cloud storage for each requests submittedenabled.

...

Configure Report API:

...

Input:

Channel Id
Query Time Interval

...

Report Config Json
Query Json

Request Object

Code Block

language
theme	RDark
borderStyle	solid
linenumbers	true
collapse	false

{
    "id": "sunbird.analytics.report.submit",
    "ver": "1.0",
    "ts": "2019-03-07T12:40:40+05:30",
    "params": {
        "msgid": "4406df37-cd54-4d8a-ab8d-3939e0223580",
        "client_key": "analytics-team"
    },
    "request": {
        "channel_id": "",
        "interval": "Last 7 days",
        "report_config": {
            "id": "usage",
            "label": "Diksha Usage Report",
            "title": "Diksha Usage Report",
            "description": "The report provides a quick summary of the data analysed by the analytics team to track progess of Diksha across states. This report will be used to consolidate insights using various metrics on which Diksha is currently being mapped and will be shared on a weekly basis. The first section of the report will provide a snapshot of the overall health of the Diksha App. This will be followed by individual state sections that provide state-wise status of Diksha",
            "dataSource": "/usage/report.json",
            "charts": [
                {
                    "datasets": [
                        {
                            "dataExpr": "data.Number_of_downloads",
                            "label": "# of downloads"
                        }
                    ],
                    "labelsExpr": "data.Date",
                    "chartType": "line"
                },
                {
                    "datasets": [
                        {
                            "dataExpr": "data.Number_of_succesful_scans",
                            "label": "# of successful scans"
                        }
                    ],
                    "labelsExpr": "data.Date",
                    "chartType": "bar"
                }
            ],
            "table": {
                "columnsExpr": "key",
                "valuesExpr": "tableData"
            },
            "downloadUrl": "report_id1/report.csv"
        },
        "query": {

        }
    }
}

Output:
1. Saves report-id and job_name as "druid-reports" in cassandra platform_db.job_request table as
"SUBMITTED"Build
1. "ENABLED"
2. Save "report-id.json" to azure
3. Takes druid query from request if any or build druid query from
inputs and save to azure as "report-id.txt"
Save "report-id.json" to azure

Report Data Generator:

Search

other inputs

Save druid query: We have following 2 approaches:
Approach 1: Save indivisual report queries seperatly in azure

Code Block

language
theme	RDark
borderStyle	solid
linenumbers	true
collapse	false

druid/queries/report-id-1.txt
druid/queries/report-id-2.txt

Approach 2: Save all queries in one xml file like this

Code Block

language
theme	RDark
borderStyle	solid
linenumbers	true
collapse	false

<?xml version = "1.0" encoding = "utf-8"?>
  <reports>
     <report name="report1" id="report_1" enabled="true">
       <query>
         <![CDATA[
           {
             "queryType": "groupBy",
             "dataSource": "telemetry-events",
             "granularity": "day",
             "dimensions": ["eid"],
             "aggregations": [
               { "type": "count", "name": "context_did", "fieldName": "context_did" }
             ],
             "intervals": [ "$report_interval" ]
             "filter": {
               "type": "and",
               "fields": [
                 { "type": "selector", "dimension": "eid", "value": "IMPRESSION" },
                 { "type": "selector", "dimension": "edata_type", "value": "detail" },
                 { "type": "selector", "dimension": "edata_pageid", "value": "collection-detail" },
                 { "type": "selector", "dimension": "context_pdata_id", "value": "prod.diksha.app" }
               ]
             },
             "postAggregations" : [{
               "type"   : "arithmetic",
               "name"   : "avg__edata_value",
               "fn"     : "/",
               "fields" : [
                 { "type" : "fieldAccess", "name" : "total_edata_value", "fieldName" : "total_edata_value" },
                 { "type" : "fieldAccess", "name" : "rows", "fieldName" : "rows" }
               ]
             }]
           }
         ]]>
       </query>
       <query_params>
         <param name="datasource" value="telemetry-events" optional="false"/>
         <param name="granularity" value="day" optional="false"/>
         <param name="report_interval" value="LAST_7_DAYS" 
         <param name="start_date" value="2019-03-01" optional="true"/>
         <param name="start_date" value="2019-03-07" optional="true"/>
       </query_params>
     </report>
   </reports>

Disable Report API:

Input:
report-id
Output:
saves report-id and job_name as "druid-reports" in cassandra platform_db.job_request table as "DISABLED"

Report Data Generator Data Product:

Input:
1. Get reports list from config, if null/empty search through cassandra platform_db.job_request table for job_name = "druid-reports" and status = "
SUBMITTED"
Execute druid query for each reports from the list
Save query response json to azure as "report-data/report-id.json"
Updates URL path in "report-id.json"
1. ENABLED"
2. Reading from config enables to re-generate reports only for certain requests.
  Example: If only report-1 & report-2 needs to be re-run -
  Step1: Update data-product config with above report ids.
  Step2: Run the spark job.

Output:
Uploads report output data to azure

Algorithm:
1. Read druid query from azure and execute for each reports in the list
2. Updates report data URL complete path in report config json in azure.

Versions Compared

Old Version 3

New Version 4

Key

Configure Report API:

Report Data Generator:

Disable Report API:

Report Data Generator Data Product: