Environment Stability - RCA
- ETL job not being executed for 1.14 pre-prod deployment - steps were missed though provided in the deployment tracker
- Swarm update due to change in a particular node by Azure
- Refactoring of the variable values (500 variables to 60 variables), Communication to the larger group was not done.
- Peer-review on pre-prod was not for validating this change
- Possible risk in going to production
Suggested resolution
- Can there be a peer review post deployment ?
- Big refactoring activities DevOps changes could be done in off peak hours, engineering must be involved to help validate. Communication must be done to all the stakeholders before hand.
- Can a dry run on a new instance happen ?
- Can there be a dev ops process kit for every env ?
- Can there be a common deployment request tracker ?
- Can there be only weekly two times deployment requests for devops
- Can Jira be leveraged for deployment ?
- How can there be stories that are moving sprint to sprint be minimised ?