...
We are using asynchronous calls to update and create various attributes associated with user, org etc. This create and update operations are done in background using the actor operations. The case of failure "failed to process the background calls" will leads to data inconsistency, And this is a very major issue as we have only process half of the task and assured the requester with a success message. We can use queuing which will be not in memory so if at any point service fails to work or stops we will able to process the background task as soon as service restarts. This will give 99.9% assurance tat that data will be processed and that it is what we need .
Approaches for implementing Kafka queues for background data task.
...
This will the scenario where message construction logic is wrong , For this we can set an alarm so that we will be able to look in it as soon as possible, and reprocesses them.
Writing to a file will not work fine if we run service in a docker container, as for each deployment it will remove the old data in the file.
So we can use S3 / Cassandra / azure for keeping event messages and Logstash will read it .
Compatibility of AWS S3 / Cassandra /Azure with Logstash:
- AWS S3 : It is compatible with Logstash as organization involved in development and maintenance of logstash it self provide plugin to read events from a S3 file.
more details can be found on this link https://www.elastic.co/guide/en/logstash/5.3/plugins-inputs-s3.html .
Provided Configuration details are descriptive. - CASSANDRA : It is compatible with Logstash by using JDBC plugin provided by the organization who owns Logstash .
more details can be found on this link https://www.elastic.co/guide/en/logstash/current/plugins-inputs-jdbc.html .
Provided configuration details are very descriptive but too much complex and need more resources than AWS S3 approach. Further more we have to explicitly add the JDBC driver and define each and every configuration details.
We also have to mention QUERIES for how message event should be taken out and what to do with already processed messages. It is a slow process as it will need frequent database calls. also we have to update the database for processed messages.
If we use this approach we have to write the message events into database which is more expensive than writing it to file, Also we have to take care of database clean up as data will always grow and will be useless after sometime. - AZURE : It is compatible with Logstash by using this plugin reads and parses data from Azure Storage Blobs : https://github.com/Azure/azure-diagnostics-tools/tree/master/Logstash/logstash-input-azureblob , which is developed and maintained by Microsoft .
Easy to install and use . Same as AWS S3 . Performance analysis in respect to AWS S3 not known, Not much used as AWS S3.
Approaches to implement event processing :
...
We have to create a utility which helps us in generating event message, which will be logically more efficient . By using a util we can generate a simple event message with all the required data to be processed or reused. We can logically group the background operations, So the intelligent consumer will read it and process all the background operations as per define logic. This will remove any chance of partial data update or create. Further on consumer side we can have a Storage where we can put all the event messages which were not processed successfully or had some errors, so we can study them to improve our implementation.
We will be having two type of event messages :
- Transnational Transaction : single and grouped events
- Informative : single events.
Logically grouping of operations :
...