/
Sunbird Monitoring

Sunbird Monitoring

Monitoring System overview

High level architecture diagram

What resources are monitored? What metrics are scraped?

  • CPU usage of Pods and Nodes

  • Memory usage of Pods and Nodes

  • Disk usage of Pods and Nodes

  • Network usage of Pods and Nodes

  • Traffic and API metrics such as latency, request per second, request / response size

  • Kafka consumer lag metrics

  • Cassandra metrics such as heap, compactions, read and write etc

  • Process metrics on the nodes

  • Service endpoints and their health

For an exhaustive list of what all is being monitored, refer to the grafana dashboards.

What alert rules are configured

  • Many alert rules are configured such as

  • High cpu usage on nodes

  • High memory usage on nodes

  • High disk usage on nodes

  • Increasing API latencies etc.

For an exhaustive list of alert rules, take a look at this helm chart - https://github.com/project-sunbird/sunbird-devops/tree/master/kubernetes/helm_charts/monitoring/alertrules

What notifications are configured

  • The above section (alert rules) are configured as notifications

  • The notifications are sent to email and slack channel

Code base structure and explain what is what

Monitoring chart is present here - https://github.com/project-sunbird/sunbird-devops/tree/master/kubernetes/helm_charts/monitoring

additional-scrape-configs

  • This helm chart contains the prometheus scrape configuration, labels, interval and timeout

alertrules

  • This helm chart contains the alert rules

azure-ambari-prometheus-exporter

  • This helm chart is used to install ambari exporter which monitors the hadoop cluster like HDInsights

blackbox-exporter

  • This helm chart is used to monitor service or http(s) endpoints and check if they are healthy or not

cassandra-jmx-exporter

  • This helm chart is used to monitor cassandra clusters

dashboards

  • This helm chart contains the grafana dashboards

elasticsearch-exporter

  • This helm chart is used to monitor elasticsearch cluster

ingestion-kafka-exporter

  • This helm chart is used to monitor ingestion kaka cluster

json-path-exporter

  • This helm chart is used to scrape remote jsons and convert them to prometheus metrics

kafka-exporter

  • This helm chart is used to monitor kafka clusters

kafka-lag-exporter

  • This helm chart is used to monitor kafka topic / group lag

kafka-topic-exporter

  • This helm chart is used to monitor kafka topics

oauth2-proxy

  • This helm chart installs oauth2 proxy in the monitoring namespace

processing-kafka-exporter

  • This helm chart is used to monitor processing kaka cluster

prometheus-operator

  • This helm chart installs the prometheus operator along with grafana, node exporter, kube state metrics and alertmanager

prometheus-redis-exporter

  • This helm chart is used to morning redis nodes

statsd-exporter

  • This helm chart is used to monitor kong api metrics

ansible role

Overriding the specs

Defining additional specs

Deploying the monitoring stack

  • Use the jenkins job named Monitoring under the Deploy/Kubernetes directory folder.

  • The variables defined in the private repo template under mandatory and optional should be filled which are required for the monitoring stack. Example - slack channel name, smtp configurations etc., if you plan to use the alerting capabilities

Service Monitoring

Backup and Restore

How to's?

Monitoring new resources

  • If a new node needs to be monitored then

    • Install node exporter on the VM using the Opsadminstration/Bootstrap Jenkins job

    • Add the IP of the node under the node-exporter ansible group in the inventory

    • Deploy the Kubernetes/Monitoring jenkins job with tag as all

  • If a new service within Kubernetes needs to be monitored (which has the capability to directly emit prometheus metrics), then

    • Add a service monitor file in the helm chart (Already covered in previous section)

    • Redeploy the service

Scrapping new metrics

  • Prometheus automatically scrapes new metrics when the target is added in the configuration

  • A target can be

    • An exporter endpoint (example: node exporter endpoint)

    • A service monitor endpoint (example: see service monitor file)

Adding new dashboards

Modify existing dashboard

How to add alert rules?

How to add new service monitor?

How to add notification endpoints ? Mail, slack etc

 

Related content

Keycloak on Sunbird
Keycloak on Sunbird
Read with this
Infrastructure Monitoring Process
Infrastructure Monitoring Process
More like this
Ease of Installation of Sunbird Ed Building Block
Ease of Installation of Sunbird Ed Building Block
Read with this
How to create a dashboard in grafana
How to create a dashboard in grafana
More like this
Graylog on Sunbird
Graylog on Sunbird
More like this
Kubernetes on Sunbird
Kubernetes on Sunbird
More like this