Scaling Sunbird Infra

We are scaling horizontally wherever possible. This is the main area of focus.
In components where horizontal scaling is not possible, we would scale it vertically

Docker swarm
Elasticsearch cluster
Consumption and creations API's
- Development team will make a change to send / receive such requests to / from elasticsearch or Redis
- This would reduce the stress on the database for such API's and also the response would be faster
YARN cluster
Cassandra cluster
Learning and Search servers

Present Arch:

Single docker swarm cluster with multiple managers and nodes to run up to 256 containers in a overlay network. We have approx. 150 containers running to support 600 tps load

Cons:

Future Arch:

Multi-swarm cluster attaching to load-balancer
Ability to ad/remove cluster without down-time
Blue-Green deployments are possible with Multi-swarm cluster
Multiple backend pools to azure load balancer
Each swarm cluster will be of same capacity
Auto-scaling of services (increasing replicas) can be done by polling prometheus data (cadvisor metrics)

Kibana will be moved to an individual VM
Service label will be changed to identify the log
Logs data from Multiple swarm clusters will be pushed to Log ElasticSearch Cluster
Log aggregation for KP and DP services

Enabling CDN for faster loading of static contents
Dev team will be implementing a fallback to default loading when CDN not available
After analyzing multiple CDN providers, we have selected to use Verizon Standard CDN as it is cost effective, performance and feature set looks satisfying for our needs
Changing a CDN provider can be done easily if we require more features after a go-live evalutation
- Azure POP locations - https://docs.microsoft.com/en-us/azure/cdn/cdn-pop-locations
- Azure CDN compare - https://docs.microsoft.com/en-us/azure/cdn/cdn-features

A production replica infrastructure would be created (without production data) with multi-swarm and other scaled up components.
Various type of load testing would be performed on this infrastructure to test the capacity and stability
Once the data is validated and meets our standards, the same would be planned for production.