Page Comparison

...

Configure Report API:

Input parameters

Parameter	Mandatory	Description	Comments
report_name	Yes	Name of the report
query_engine	Yes	Data Source	DRUID, CASSANDRA, ELASTICSEARCH
execution_frequency	Yes	Report generation frequency	DAILY, WEEKLY, MONTHLY

channel_id

No

ChannelId for filtering

report_interval	Yes	Date range for queries	YESTERDAY LAST_7_DAYS, LAST_WEEK, LAST_30_DAYS, LAST_MONTH, LAST_QUARTER, LAST_3_MONTHS, LAST_6_MONTHS LAST_YEAR
query	Yes	Query to be executed
output_format	Yes	Output format of the report	json, csv
output_file_pattern	No	Report output filename pattern	report_id and end_date from the interval are used by default {report_id}-{end_date}.{output_format} Other Supported Placeholders are: report_name

date

timestamp

timestamp
output_field_names	Yes	Output field names used in report output
group_by_fields	No	Fields by which reports are grouped by	channel_id, device_id

Request Object

Code Block

language
theme	RDark
borderStyle	solid
linenumbers	true
collapse	false

 {
  "id":"sunbird.analytics.report.submit",
  "ver":"1.0",
  "ts":"2019-03-07T12:40:40+05:30",
  "params":{
     "msgid":"4406df37-cd54-4d8a-ab8d-3939e0223580",
     "client_key":"analytics-team"
  },
  "request":{
     "channel_id":"in.ekstep",
     "report_name":"avg_collection_downloads",
     "query_engine": "druid",
     "execution_frequency": "DAILY",
     "report_interval":"LAST_7_DAYS",
     "output_format": "json",
     "output_field_names": ["Average Collection Downloads"],
     "query_json":{
        "queryType":"groupBy",
        "dataSource":"telemetry-events",
        "granularity":"day",
        "dimensions":[
           "eid"
        ],
        "aggregations":[
           { "type":"count", "name":"context_did", fieldName":"context_did" }
        ],
        "filter":{
           "type":"and",
           "fields":[
              { "type":"selector", "name":"eid", fieldName":"IMPRESSION" },
              { "type":"selector", "name":"edata_type", fieldName":"detail" },
              { "type":"selector", "name":"edata_pageid", fieldName":"collection-detail" },
              { "type":"selector", "name":"context_pdata_id", fieldName":"prod.diksha.app" }
           ]
        },
        "postAggregations":[
           {
              "type":"arithmetic",
              "name":"avg__edata_value",
              "fn":"/",
              "fields":[
                 { "type":"fieldAccess", "name":"total_edata_value", "fieldName":"total_edata_value" },
                 { "type":"fieldAccess", "name":"rows", "fieldName":"rows" }
              ]
           }
        ],
        "intervals":[
           "2019-02-20T00:00:00.000/2019-01-27T23:59:59.000"
        ]
     }
  }
 }

Output:

...

Code Block

theme	FadeToGrey
borderStyle	solid

   # Schema of table
   TABLE platform_db.druid_reports_configuration (
     report_id text, // hash  of  report_name text,
     and report_interval
text,      queryreport_engineconfig text, // Entire JSON from  cron_expression text,request
      status text,
     report_query_location text,
     report_output_format text,
     report_output_location text,
     report_outputlast_filename text,
     report_output_file_pattern list<text>generated timestamp,
     PRIMARY KEY (report_id) );
   )

Location and file format of query in azure:

Code Block

theme	RDark
borderStyle	solid

/druid-reports/query/druid/report-id.json
/druid-reports/query/cassandra/report-id.cql

Job Scheduler Engine:

Image RemovedImage Added

Input:
-

...

A list of reports in

...

reports_configuration

...

Cassandra table with the cron_expression which falls within the current day of execution and with status as ENABLED.
Algorithm:

...

- Data availability check has following 2 criteria:

1. Kafka indexing lag: check for 0 lag in druid ingestion.

2. Druid segments count: Segments should have been created for previous day.

...

- Reports based on telemetry-events will be submitted for execution upon satisfying both the criteria.

...

- Reports based on summary-events will be submitted for execution upon satisfying only 2nd criteria or check for files in azure.

- If report_last_generated is not equals to previous day for a report, submits the same report for all the pending dates.

Output:

- The list of reports are submitted for execution into the platform_db.job_request Cassandra table with the status=SUBMITTED and job_name=druid-reports-<report-id>.

...

Output:

- The report will be disabled DISABLED in the platform_db.druid_reports_configuration Cassandra table

...

- Report data file will be saved in Azure with specified format
- platform_db.job_request table will be updated with job status and output file details will be updated in platform_db.druid_reports_configuration

Output location and file format in Azure:

Once a request has been submitted and processing complete, the report data file with the name of the file being the report name id suffixed with genaration report_interval end-date saved under :

Code Block

theme	RDark
borderStyle	solid

   /druid-reports/report_id-id/reportyyyy-namemm-mmddyyyydd.csv
   /druid-reports/report_id-id/reportyyyy-namemm-mmddyyyydd.json

Regenerate Report API:

...

- replay-interval

Algorithm:

- If report interval falls between Loops through replay interval, adds a record find report_output_filename with each date and resubmits to job_request table for execution.

- Backup and delete older report data files from azure.

- report-list can be empty in case of regenerating all enabled reports.

Output:

- The list of reports are submitted for execution into the platform_db.job_request Cassandra table with the status=SUBMITTED and job_name=druid-reports-<report-id>.

...

Versions Compared

Old Version 21

New Version Current

Key

Configure Report API:

Input parameters

Job Scheduler Engine:

Regenerate Report API: