Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

Parameter

Mandatory

Description

Comments

report_name

Yes

Name of the report


query_engine

Yes

Data Source

DRUID, CASSANDRA, ELASTICSEARCH

execution_frequency

Yes

Report generation frequency

DAILY, WEEKLY, MONTHLY

channel_id

No

ChannelId for filtering

report_interval

Yes

Date range for queries

  1. YESTERDAY
  2. LAST_7_DAYS,
  3. LAST_WEEK,
  4. LAST_30_DAYS,
  5. LAST_MONTH,
  6. LAST_QUARTER,
  7. LAST_3_MONTHS,
  8. LAST_6_MONTHS
  9. LAST_YEAR

query

Yes

Query to be executed


output_format

Yes

Output format of the report

json, csv

output_file_pattern

No

Report output filename pattern

report_id and end_date from the interval is are used by default

{report_id}-{end_date}.{output_format}

Other Supported Placeholders are:

  1. report_name
  2. timestamp
output_field_namesYesOutput field names used in report output
group_lastby_generatedfieldsNoLatest report generation dateFields by which reports are grouped bychannel_id, device_id


  • Request Object

    Code Block
    language
    themeRDark
    borderStylesolid
    linenumberstrue
    collapsefalse
     {
      "id":"sunbird.analytics.report.submit",
      "ver":"1.0",
      "ts":"2019-03-07T12:40:40+05:30",
      "params":{
         "msgid":"4406df37-cd54-4d8a-ab8d-3939e0223580",
         "client_key":"analytics-team"
      },
      "request":{
         "channel_id":"in.ekstep",
         "report_name":"avg_collection_downloads",
         "query_engine": "druid",
         "execution_frequency": "DAILY",
         "report_interval":"LAST_7_DAYS",
         "output_format": "json",
         "output_field_names": ["Average Collection Downloads"],
         "query_json":{
            "queryType":"groupBy",
            "dataSource":"telemetry-events",
            "granularity":"day",
            "dimensions":[
               "eid"
            ],
            "aggregations":[
               { "type":"count", "name":"context_did", fieldName":"context_did" }
            ],
            "filter":{
               "type":"and",
               "fields":[
                  { "type":"selector", "name":"eid", fieldName":"IMPRESSION" },
                  { "type":"selector", "name":"edata_type", fieldName":"detail" },
                  { "type":"selector", "name":"edata_pageid", fieldName":"collection-detail" },
                  { "type":"selector", "name":"context_pdata_id", fieldName":"prod.diksha.app" }
               ]
            },
            "postAggregations":[
               {
                  "type":"arithmetic",
                  "name":"avg__edata_value",
                  "fn":"/",
                  "fields":[
                     { "type":"fieldAccess", "name":"total_edata_value", "fieldName":"total_edata_value" },
                     { "type":"fieldAccess", "name":"rows", "fieldName":"rows" }
                  ]
               }
            ],
            "intervals":[
               "2019-02-20T00:00:00.000/2019-01-27T23:59:59.000"
            ]
         }
      }
     }
     


  • Output:

...

Code Block
themeFadeToGrey
borderStylesolid
   # Schema of table
   TABLE platform_db.druid_reports_configuration (
     report_id text, // hash   of report_name text,
     and report_interval
text,      queryreport_engineconfig text, // Entire JSON from request
cron_expression text,      status text,
     report_query_location text,
     report_output_format text,
     report_output_location text,
     report_output_filename text,
     report_output_file_pattern list<text>,
     report_last_generated timestamp,
     PRIMARY KEY (report_id) );
   )

Location and file format of query in azure:

...

themeRDark
borderStylesolid

...


...


Job Scheduler Engine:


Image RemovedImage Added



  • Input:
         - A list of reports in druid_reports_configuration Cassandra table with the cron_expression which falls within the current day of execution and with status as ENABLED.
  • Algorithm:              

...

- The report will be DISABLED in the platform_db.druid_reports_configuration Cassandra table

...

-  Report data file will be saved in Azure with specified format
platform_db.job_request table will be updated with job status and output file details will be updated in platform_db.druid_reports_configuration

  • Output location and file format in Azure:

Once a request has been submitted and processing complete, the report data file with the name of the file being the report name id suffixed with genaration report_interval end-date saved under :

Code Block
themeRDark
borderStylesolid
   /druid-reports/report_id-id/reportyyyy-namemm-mmddyyyydd.csv
   /druid-reports/report_id-id/reportyyyy-namemm-mmddyyyydd.json

Regenerate Report API:

...