Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

This document describes the detailed design to generate the data for the portal dashboard reports dashboards from Druid and export the data to cloud storage. It This mainly consists of 2 parts:

...

following modules

  1. Configure Report API - which is used to request for a new report.
  2. Disable Report API - which is used to disable report generation.
  3. Report Data Generator - is a spark job runs as per schedule time, generates data out of druid and exports data to cloud storage for each requests submittedenabled.

...

Configure Report API:

...

  • Input:
  1. Channel Id
  2. Query Time Interval

...

  1. Report Config Json
  2. Query Json

Request Object

Code Block
language
themeRDark
borderStylesolid
linenumberstrue
collapsefalse
{
    "id": "sunbird.analytics.report.submit",
    "ver": "1.0",
    "ts": "2019-03-07T12:40:40+05:30",
    "params": {
        "msgid": "4406df37-cd54-4d8a-ab8d-3939e0223580",
        "client_key": "analytics-team"
    },
    "request": {
        "channel_id": "",
        "interval": "Last 7 days",
        "report_config": {
            "id": "usage",
            "label": "Diksha Usage Report",
            "title": "Diksha Usage Report",
            "description": "The report provides a quick summary of the data analysed by the analytics team to track progess of Diksha across states. This report will be used to consolidate insights using various metrics on which Diksha is currently being mapped and will be shared on a weekly basis. The first section of the report will provide a snapshot of the overall health of the Diksha App. This will be followed by individual state sections that provide state-wise status of Diksha",
            "dataSource": "/usage/report.json",
            "charts": [
                {
                    "datasets": [
                        {
                            "dataExpr": "data.Number_of_downloads",
                            "label": "# of downloads"
                        }
                    ],
                    "labelsExpr": "data.Date",
                    "chartType": "line"
                },
                {
                    "datasets": [
                        {
                            "dataExpr": "data.Number_of_succesful_scans",
                            "label": "# of successful scans"
                        }
                    ],
                    "labelsExpr": "data.Date",
                    "chartType": "bar"
                }
            ],
            "table": {
                "columnsExpr": "key",
                "valuesExpr": "tableData"
            },
            "downloadUrl": "report_id1/report.csv"
        },
        "query": {

        }
    }
}
  • Output:
    1. Saves report-id and job_name as "druid-reports" in cassandra platform_db.job_request table as
    "SUBMITTED"Build
    1. "ENABLED"
    2. Save "report-id.json" to azure
    3. Takes druid query from request if any or build druid query from
    inputs and save to azure as "report-id.txt"
  • Save "report-id.json" to azure

Report Data Generator:

  • Search
    1. other inputs
    2. Save druid query: We have following 2 approaches:
      Approach 1: Save indivisual report queries seperatly in azure

      Code Block
      language
      themeRDark
      borderStylesolid
      linenumberstrue
      collapsefalse
      druid/queries/report-id-1.txt
      druid/queries/report-id-2.txt
      

      Approach 2: Save all queries in one xml file like this

      Code Block
      language
      themeRDark
      borderStylesolid
      linenumberstrue
      collapsefalse
      <?xml version = "1.0" encoding = "utf-8"?>
        <reports>
           <report name="report1" id="report_1" enabled="true">
             <query>
               <![CDATA[
                 {
                   "queryType": "groupBy",
                   "dataSource": "telemetry-events",
                   "granularity": "day",
                   "dimensions": ["eid"],
                   "aggregations": [
                     { "type": "count", "name": "context_did", "fieldName": "context_did" }
                   ],
                   "intervals": [ "$report_interval" ]
                   "filter": {
                     "type": "and",
                     "fields": [
                       { "type": "selector", "dimension": "eid", "value": "IMPRESSION" },
                       { "type": "selector", "dimension": "edata_type", "value": "detail" },
                       { "type": "selector", "dimension": "edata_pageid", "value": "collection-detail" },
                       { "type": "selector", "dimension": "context_pdata_id", "value": "prod.diksha.app" }
                     ]
                   },
                   "postAggregations" : [{
                     "type"   : "arithmetic",
                     "name"   : "avg__edata_value",
                     "fn"     : "/",
                     "fields" : [
                       { "type" : "fieldAccess", "name" : "total_edata_value", "fieldName" : "total_edata_value" },
                       { "type" : "fieldAccess", "name" : "rows", "fieldName" : "rows" }
                     ]
                   }]
                 }
               ]]>
             </query>
             <query_params>
               <param name="datasource" value="telemetry-events" optional="false"/>
               <param name="granularity" value="day" optional="false"/>
               <param name="report_interval" value="LAST_7_DAYS" 
               <param name="start_date" value="2019-03-01" optional="true"/>
               <param name="start_date" value="2019-03-07" optional="true"/>
             </query_params>
           </report>
         </reports>
      


Disable Report API:

  • Input:
    report-id
  • Output:
    saves report-id and job_name as "druid-reports" in cassandra platform_db.job_request table as "DISABLED"

Report Data Generator Data Product:

  • Input:
    1. Get reports list from config, if null/empty search through cassandra platform_db.job_request table for job_name = "druid-reports" and status = "
    SUBMITTED"
  • Execute druid query for each reports from the list
  • Save query response json to azure as "report-data/report-id.json"
  • Updates URL path in "report-id.json"
    1. ENABLED"
    2. Reading from config enables to re-generate reports only for certain requests.
      Example: If only report-1 & report-2 needs to be re-run -
      Step1: Update data-product config with above report ids.
      Step2: Run the spark job.
  • Output:
    Uploads report output data to azure
  • Algorithm:
    1. Read druid query from azure and execute for each reports in the list
    2. Updates report data URL complete path in report config json in azure.