This page details the monitoring capabilities enabled for the sourcing flows. This is to enable the system predict possible issues so that they can be proactively addressed.

User Actions to monitor

Any object type (collection, content, question set etc.) is opened for editing
Any object type (collection, content, question set etc.) is saved as draft
Any object type (collection, content, question set etc.) is sent for review
Any object type (collection, content, question set etc.) is published

Dashboards

Error Dashboard

At any given point of time: Number of objects in “FAILED” state
1. Total Count, Count based on each primary content category
For any given time period: Number of error events from front-end (whenever user is shown an error for any of the above user actions):
1. Total count
2. Count based on each event type: Open for Edit, Save, Send for Review, Publish
3. Count based on each primary content category

Search Index Sync Dashboard

For any given time period
1. Search Index lag
  1. Number of updates happened
  2. % of updates that have synced with search index
  3. Average time taken to complete the sync
  4. Number of updates the have errors in sync

Publish pipeline Lag Dashboard

For any given time period
1. Publish lag
  1. Number of objects sent for publish in that period
  2. % of objects sent for publish in that period have completed processing
  3. Average time taken to complete the processing

Alerts to be triggered

Failed object alert

Trigger condition: There is at least one object is in “Failed” state at given point of time
Frequency: Every 2 hours
Details in the alert: List of object ids of the objects in failed state
Action to be taken:
1. Investigate the failed objects and rectify them to unblock users
2. Identify root cause and possible actions to prevent it

Error events alert

Trigger condition: At least 10% of the user events triggered during the given time duration have errors
Frequency: Every 2 hours - data seen for last two hours
Details in the alert: For each error event
1. Object id, User action that triggered the error, Error detail
Action to be taken:
1. Identify root cause and possible actions to prevent it

Publish lag alert

Trigger condition: At least 5% of the objects sent for review or for publish have not completed the processing within 4 hours
Frequency: Every 4 hours - data seen for last four hours
Details in the alert: Following details for both Review and Publish lag
1. Total number of objects sent for processing
2. Number of objects completed processing
3. Number of objects not completed processing
4. Average time taken for processing
5. Object id of each object that has not been completed the processing within 4 hours along with its current state
Action to be taken:
1. Investigate the unprocessed objects and process them to unblock users
2. Identify root cause and possible actions to prevent it

Single Sourcing Solution

Functional Monitoring of the system

User Actions to monitor

Dashboards

Error Dashboard

Search Index Sync Dashboard

Publish pipeline Lag Dashboard

Alerts to be triggered

Failed object alert

Error events alert

Publish lag alert

Related content