Skip to end of metadata
Go to start of metadata

You are viewing an old version of this page. View the current version.

Compare with Current View Page History

« Previous Version 4 Next »

Introduction:

This document describes the design to generate the data for the dashboards from Druid and export the data to cloud storage. This mainly consists of following modules

  1. Configure Report API - which is used to request for a new report.
  2. Disable Report API - which is used to disable report generation.
  3. Report Data Generator - is a spark job runs as per schedule time, generates data out of druid and exports data to cloud storage for each requests enabled.

Configure Report API:

  • Input:
  1. Channel Id
  2. Query Time Interval
  3. Report Config Json
  4. Query Json

Request Object

{
    "id": "sunbird.analytics.report.submit",
    "ver": "1.0",
    "ts": "2019-03-07T12:40:40+05:30",
    "params": {
        "msgid": "4406df37-cd54-4d8a-ab8d-3939e0223580",
        "client_key": "analytics-team"
    },
    "request": {
        "channel_id": "",
        "interval": "Last 7 days",
        "report_config": {
            "id": "usage",
            "label": "Diksha Usage Report",
            "title": "Diksha Usage Report",
            "description": "The report provides a quick summary of the data analysed by the analytics team to track progess of Diksha across states. This report will be used to consolidate insights using various metrics on which Diksha is currently being mapped and will be shared on a weekly basis. The first section of the report will provide a snapshot of the overall health of the Diksha App. This will be followed by individual state sections that provide state-wise status of Diksha",
            "dataSource": "/usage/report.json",
            "charts": [
                {
                    "datasets": [
                        {
                            "dataExpr": "data.Number_of_downloads",
                            "label": "# of downloads"
                        }
                    ],
                    "labelsExpr": "data.Date",
                    "chartType": "line"
                },
                {
                    "datasets": [
                        {
                            "dataExpr": "data.Number_of_succesful_scans",
                            "label": "# of successful scans"
                        }
                    ],
                    "labelsExpr": "data.Date",
                    "chartType": "bar"
                }
            ],
            "table": {
                "columnsExpr": "key",
                "valuesExpr": "tableData"
            },
            "downloadUrl": "report_id1/report.csv"
        },
        "query": {

        }
    }
}
  • Output:
    1. Saves report-id and job_name as "druid-reports" in cassandra platform_db.job_request table as "ENABLED"
    2. Save "report-id.json" to azure
    3. Takes druid query from request if any or build druid query from other inputs
    4. Save druid query: We have following 2 approaches:
      Approach 1: Save indivisual report queries seperatly in azure

      druid/queries/report-id-1.txt
      druid/queries/report-id-2.txt
      

      Approach 2: Save all queries in one xml file like this

      <?xml version = "1.0" encoding = "utf-8"?>
        <reports>
           <report name="report1" id="report_1" enabled="true">
             <query>
               <![CDATA[
                 {
                   "queryType": "groupBy",
                   "dataSource": "telemetry-events",
                   "granularity": "day",
                   "dimensions": ["eid"],
                   "aggregations": [
                     { "type": "count", "name": "context_did", "fieldName": "context_did" }
                   ],
                   "intervals": [ "$report_interval" ]
                   "filter": {
                     "type": "and",
                     "fields": [
                       { "type": "selector", "dimension": "eid", "value": "IMPRESSION" },
                       { "type": "selector", "dimension": "edata_type", "value": "detail" },
                       { "type": "selector", "dimension": "edata_pageid", "value": "collection-detail" },
                       { "type": "selector", "dimension": "context_pdata_id", "value": "prod.diksha.app" }
                     ]
                   },
                   "postAggregations" : [{
                     "type"   : "arithmetic",
                     "name"   : "avg__edata_value",
                     "fn"     : "/",
                     "fields" : [
                       { "type" : "fieldAccess", "name" : "total_edata_value", "fieldName" : "total_edata_value" },
                       { "type" : "fieldAccess", "name" : "rows", "fieldName" : "rows" }
                     ]
                   }]
                 }
               ]]>
             </query>
             <query_params>
               <param name="datasource" value="telemetry-events" optional="false"/>
               <param name="granularity" value="day" optional="false"/>
               <param name="report_interval" value="LAST_7_DAYS" 
               <param name="start_date" value="2019-03-01" optional="true"/>
               <param name="start_date" value="2019-03-07" optional="true"/>
             </query_params>
           </report>
         </reports>
      

Disable Report API:

  • Input:
    report-id
  • Output:
    saves report-id and job_name as "druid-reports" in cassandra platform_db.job_request table as "DISABLED"

Report Data Generator Data Product:

  • Input:
    1. Get reports list from config, if null/empty search through cassandra platform_db.job_request table for job_name = "druid-reports" and status = "ENABLED"
    2. Reading from config enables to re-generate reports only for certain requests.
      Example: If only report-1 & report-2 needs to be re-run -
      Step1: Update data-product config with above report ids.
      Step2: Run the spark job.
  • Output:
    Uploads report output data to azure
  • Algorithm:
    1. Read druid query from azure and execute for each reports in the list
    2. Updates report data URL complete path in report config json in azure.
  • No labels