...
Configure Report API:
Input parameters
Parameter | Mandatory | Description | Comments |
---|---|---|---|
report_name | Yes | Name of the report | |
query_engine | Yes | Data Source | DRUID, CASSANDRA, ELASTICSEARCH |
execution_frequency | Yes | Report generation frequency | DAILY, WEEKLY, MONTHLY |
channel_id
No
ChannelId for filtering
report_interval | Yes | Date range for queries |
|
query | Yes | Query to be executed | |
output_format | Yes | Output format of the report | json, csv |
output_file_pattern | No | Report output filename pattern | report_id and end_date from the interval are used by default {report_id}-{end_date}.{output_format} Other Supported Placeholders are:
|
| |||
output_field_names | Yes | Output field names used in report output | |
group_by_fields | No | Fields by which reports are grouped by | channel_id, device_id |
Request Object
Code Block language theme RDark borderStyle solid linenumbers true collapse false { "id":"sunbird.analytics.report.submit", "ver":"1.0", "ts":"2019-03-07T12:40:40+05:30", "params":{ "msgid":"4406df37-cd54-4d8a-ab8d-3939e0223580", "client_key":"analytics-team" }, "request":{ "channel_id":"in.ekstep", "report_name":"avg_collection_downloads", "query_engine": "druid", "execution_frequency": "DAILY", "report_interval":"LAST_7_DAYS", "output_format": "json", "output_field_names": ["Average Collection Downloads"], "query_json":{ "queryType":"groupBy", "dataSource":"telemetry-events", "granularity":"day", "dimensions":[ "eid" ], "aggregations":[ { "type":"count", "name":"context_did", fieldName":"context_did" } ], "filter":{ "type":"and", "fields":[ { "type":"selector", "name":"eid", fieldName":"IMPRESSION" }, { "type":"selector", "name":"edata_type", fieldName":"detail" }, { "type":"selector", "name":"edata_pageid", fieldName":"collection-detail" }, { "type":"selector", "name":"context_pdata_id", fieldName":"prod.diksha.app" } ] }, "postAggregations":[ { "type":"arithmetic", "name":"avg__edata_value", "fn":"/", "fields":[ { "type":"fieldAccess", "name":"total_edata_value", "fieldName":"total_edata_value" }, { "type":"fieldAccess", "name":"rows", "fieldName":"rows" } ] } ], "intervals":[ "2019-02-20T00:00:00.000/2019-01-27T23:59:59.000" ] } } }
- Output:
...
Code Block | ||||
---|---|---|---|---|
| ||||
# Schema of table TABLE platform_db.druid_reports_configuration ( report_id text, // hash of report_name text, and report_interval text, queryreport_engineconfig text, // Entire JSON from cron_expression text,request status text, report_query_location text, report_output_format text, report_output_location text, report_outputlast_filename text, report_output_file_pattern list<text>generated timestamp, PRIMARY KEY (report_id) ); ) |
Location and file format of query in azure:
Code Block | ||||
---|---|---|---|---|
| ||||
/druid-reports/query/druid/report-id.json
/druid-reports/query/cassandra/report-id.cql
|
Job Scheduler Engine:
- Input:
-
...
- A list of reports in
...
- reports_configuration
...
- Cassandra table with the cron_expression which falls within the current day of execution and with status as ENABLED.
- Algorithm:
...
-
Data availability check has following 2 criteria:
1. Kafka indexing lag: check for 0 lag in druid ingestion.
2. Druid segments count: Segments should have been created for previous day.
...
-
Reports based on telemetry-events will be submitted for execution upon satisfying both the criteria.
...
-
Reports based on summary-events will be submitted for execution upon satisfying only 2nd criteria or check for files in azure.
-
If report_last_generated is not equals to previous day for a report, submits the same report for all the pending dates.
- Output:
-
The list of reports are submitted for execution into the platform_db.job_request
Cassandra table with the status=SUBMITTED and job_name=druid-reports-<report-id>.
...
- Output:
-
The report will be disabled DISABLED in the platform_db.druid_reports_configuration
Cassandra table
...
-
Report data file will be saved in Azure with specified format- platform_db.job_request
table will be updated with job status and output file details will be updated in platform_db.druid_reports_configuration
Output location and file format in Azure:
Once a request has been submitted and processing complete, the report data file with the name of the file being the report name id suffixed with genaration report_interval end-date saved under :
Code Block | ||||
---|---|---|---|---|
| ||||
/druid-reports/report_id-id/reportyyyy-namemm-mmddyyyydd.csv /druid-reports/report_id-id/reportyyyy-namemm-mmddyyyydd.json |
Regenerate Report API:
...
-
replay-interval
- Algorithm:
-
If report interval falls between Loops through replay interval, adds a record find report_output_filename with each date and resubmits to job_request table for execution.
-
Backup and delete older report data files from azure.
-
report-list can be empty in case of regenerating all enabled reports.
- Output:
-
The list of reports are submitted for execution into the platform_db.job_request
Cassandra table with the status=SUBMITTED and job_name=druid-reports-<report-id>.
...