Enhancing logging in Sunbird Desktop App



Background:

Currently, Deskstop App has a logging system in place. That is configurable and allows storing of backups on log rotation. Desktop App stores all type('all' | 'trace' | 'fatal' | 'error' | 'off' | 'info' | 'warn' | 'debug') of logs in app.log file that has a rotation policy of 10 MB and stores compressed backup up to 3 files. And also has a dedicated error.log file that stores only error logs(10 MB | 3 backups). This is built on top of log4js. It is difficult to query based on date and there is no distinction between logs. 

Solution:

Logs Categorize:

Sunbird desktop app log can be categorized into different sections based on the task or meaning that it provides. Grouping logs based on categorizes help debug, sync and analyze the different aspects of the Desktop app.



Log CategoryEnabledSyncs to platformLog storageLog format
1

App Install / Update

Enabled by default.NoLast 3 Logs will be storedString
2

Application

Enabled by default with App log level set to 'INFO'.NoLast 7 days logs.String
3

Debug

Disabled, can be enabled when needed for a short amount of timeNoLast 3 Logs will be storedString
4

Crash

Enabled by default.YesLast 10 Logs will be storedString
5

Error

Enabled by default.YesLast 500 Logs will be storedString
6

Performance

Enabled by default.YesLast 100 Logs will be storedJSON

App Install / Update: 

These logs will be generated when the app is being installed on the system(Window or Ubuntu). These logs will be generated by hooking into Electron-builder installation hooks. We can keep 2-3 installation logs.

Application: 

All logs generated by the app which has a higher log level than the app log level will be collected here. By default app, log level will be set to 'INFO'.

Debug: 

All logs generated by the app irrespective of the app log level will be collected here. This will be enabled based on the user request for a short amount of time. When enable App log level will be set to 'ALL'. This is used when we cant debug from telemetry or from application logs. These logs can be sent when Rasing support tickets in the future.

Crash

These logs get generated when the app(Electron) crashes. Logs generated will be in minidump format and these logs need to be further processed to do any analysis.

Syncing Crash logs to the platform:

Solution 1: Processing minidump in server

We require an API to sync and batch job process minidumps. 

  1. API to store minidumps: This API will accept minidumps from apps and stores the same in the blob.
  2. Batch job: This job will retrieve minidumps from the blob and using electron symbols and minidump library will create meaning full data. This data can be stored in Druid or Elastic search for analysis.

                              

Note: This requires an implementation design review.

Solution 2: Processing minidump in the desktop app

We require electron symbols and minidump parsing library to process crash logs. Once processed we can sync the processed data to error sync API.

Error:

All unhandled exceptions and unhandled rejections will be collected in these logs. The app keeps last 500 crash logs and sync to the platform for further processing using the network queue. 

Syncing Error logs to the platform:

Sunbird platform has error log aggregation API, same can be used here.

Performance: 

This log gets generated for each task/API call etc, capturing time it took to complete, DID and other metrics(CPU, memory usage, etc). Perf logs will be generated for below tasks 

  1. App startup
  2. Content Import
  3. Content Export
  4. Content Download
  5. Content Delete
  6. API calls


Perf Log format
{
    id: '<uuid>',
    timeTaken: '<time taken to complete the task>',
    createdOn: '<created date>',
    type: '<network|system>',
    subType: '<import,download,export etc>',
    did:'<system device id>',
    size: '<size of the object>',
    extras: {<'task specific data'>}
  }

Syncing Perf log to the platform:

Note: This requires an implementation design review.

Solution 1: Sync to new API.

We require new API sync this logs to the platform and to analyze/visualize. 

Solution 2: Log Telemetry metrics event.

We can convert perf log to telemetry metrics event. This will be synced along with all the telemetry events to the platform. 

Solution 3: Local analytics.

We can create local analytic with all the perf logs and refer to this when needed. 


Logs levels

The desktop app will have all traditional log levels and some custom levels.


LevelDescription
1INFOFor logging information messages.
2DEBUG
For logging messages for debugging purposes.
3ERRORFor logging errors.
4FATALFor logging errors that are fatal.
5WARNFor logging warning messages.
6TRACE
For logging messages to help trace errors.
7PERF
For logging all performance information. Like how much time import took, time took for search content, etc.
8DB_ERRORFor logging errors from Database.
9NETWORK_ERRORFor logging network-related errors.