Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

Configure Report API:

Input parameters

Parameter

Mandatory

Description

Comments

report_name

Yes

Name of the report


query_engine

Yes

Data Source

DRUID, CASSANDRA, ELASTICSEARCH

execution_frequency

Yes

Report generation frequency

DAILY, WEEKLY, MONTHLY

channel_id

No

ChannelId for filtering

report_interval

Yes

Date range for queries

  1. YESTERDAY
  2. LAST_7_DAYS,
  3. LAST_WEEK,
  4. LAST_30_DAYS,
  5. LAST_MONTH,
  6. LAST_QUARTER,
  7. LAST_3_MONTHS,
  8. LAST_6_MONTHS
  9. LAST_YEAR

query

Yes

Query to be executed


output_format

Yes

Output format of the report

json, csv

output_file_pattern

No

Report output filename pattern

report_id and end_date from the interval are used by default

{report_id}-{end_date}.{output_format}

Other Supported Placeholders are:

  1. report_name
  • date
  • timestamp
    1. timestamp
    output_field_namesYesOutput field names used in report output
    group_by_fieldsNoFields by which reports are grouped bychannel_id, device_id


    • Request Object

      Code Block
      language
      themeRDark
      borderStylesolid
      linenumberstrue
      collapsefalse
       {
        "id":"sunbird.analytics.report.submit",
        "ver":"1.0",
        "ts":"2019-03-07T12:40:40+05:30",
        "params":{
           "msgid":"4406df37-cd54-4d8a-ab8d-3939e0223580",
           "client_key":"analytics-team"
        },
        "request":{
           "channel_id":"in.ekstep",
           "report_name":"avg_collection_downloads",
           "query_engine": "druid",
           "execution_frequency": "DAILY",
           "report_interval":"LAST_7_DAYS",
           "output_format": "json",
           "output_field_names": ["Average Collection Downloads"],
           "query_json":{
              "queryType":"groupBy",
              "dataSource":"telemetry-events",
              "granularity":"day",
              "dimensions":[
                 "eid"
              ],
              "aggregations":[
                 { "type":"count", "name":"context_did", fieldName":"context_did" }
              ],
              "filter":{
                 "type":"and",
                 "fields":[
                    { "type":"selector", "name":"eid", fieldName":"IMPRESSION" },
                    { "type":"selector", "name":"edata_type", fieldName":"detail" },
                    { "type":"selector", "name":"edata_pageid", fieldName":"collection-detail" },
                    { "type":"selector", "name":"context_pdata_id", fieldName":"prod.diksha.app" }
                 ]
              },
              "postAggregations":[
                 {
                    "type":"arithmetic",
                    "name":"avg__edata_value",
                    "fn":"/",
                    "fields":[
                       { "type":"fieldAccess", "name":"total_edata_value", "fieldName":"total_edata_value" },
                       { "type":"fieldAccess", "name":"rows", "fieldName":"rows" }
                    ]
                 }
              ],
              "intervals":[
                 "2019-02-20T00:00:00.000/2019-01-27T23:59:59.000"
              ]
           }
        }
       }
       


    • Output:

    ...

    Code Block
    themeFadeToGrey
    borderStylesolid
       # Schema of table
       TABLE platform_db.druid_reports_configuration (
         report_id text, // hash  of  report_name text,
         and report_interval
    text,      queryreport_engineconfig text, // Entire JSON from  cron_expression text,request
          status text,
         report_query_location text,
         report_output_format text,
         report_output_location text,
         report_outputlast_filename text,
         report_output_file_pattern list<text>generated timestamp,
         PRIMARY KEY (report_id) );
       )
    

    Location and file format of query in azure:

    Code Block
    themeRDark
    borderStylesolid
    /druid-reports/query/druid/report-id.json
    /druid-reports/query/cassandra/report-id.cql
    


    Job Scheduler Engine:


    Image RemovedImage Added



    • Input:
           -

    ...

    •  A list of reports in 

    ...

    • reports_configuration

    ...

    • Cassandra table with the cron_expression which falls within the current day of execution and with status as ENABLED.
    • Algorithm:              

    ...

                    - Data availability check has following 2 criteria:

                            1. Kafka indexing lag: check for 0 lag in druid ingestion.

                            2. Druid segments count: Segments should have been created for previous day.

    ...

                    - Reports based on telemetry-events will be submitted for execution upon satisfying both the criteria.

    ...

                    - Reports based on summary-events will be submitted for execution upon satisfying only 2nd criteria or check for files in azure. 

                    - If report_last_generated is not equals to previous day for a report, submits the same report for all the pending dates.

    • Output:

    - The list of reports are submitted for execution into the platform_db.job_request Cassandra table with the status=SUBMITTED and job_name=druid-reports-<report-id>.

    ...

    • Output:

    - The report will be disabled DISABLED in the platform_db.druid_reports_configuration Cassandra table

    ...

    -  Report data file will be saved in Azure with specified format
    platform_db.job_request table will be updated with job status and output file details will be updated in platform_db.druid_reports_configuration

    • Output location and file format in Azure:

    Once a request has been submitted and processing complete, the report data file with the name of the file being the report name id suffixed with genaration report_interval end-date saved under :

    Code Block
    themeRDark
    borderStylesolid
       /druid-reports/report_id-id/reportyyyy-namemm-mmddyyyydd.csv
       /druid-reports/report_id-id/reportyyyy-namemm-mmddyyyydd.json

    Regenerate Report API:

    ...

    - replay-interval

    • Algorithm:

    - If report interval falls between  Loops through replay interval, adds a record find report_output_filename with each date and resubmits to job_request table for execution.

    - Backup and delete older report data files from azure.

    - report-list can be empty in case of regenerating all enabled reports.

    • Output:

    - The list of reports are submitted for execution into the platform_db.job_request Cassandra table with the status=SUBMITTED and job_name=druid-reports-<report-id>.

    ...