Sr. Site Reliability Engineer
Job Description :
Sr. SRE will be dedicated full-time to creating software that improves the reliability of systems in production, fixing issues, responding to incidents and usually taking on call responsibilities.
Implementing an SRE team will greatly benefit both IT operations and software development teams. Not only can SRE drive deeper reliability to systems in production but it will likely help IT, support and development teams spend less time working on support escalations, and give them more time to build new features and services.
Sr. SRE will be mainly responsible for below activities :
Sr. SRE is responsible to write Build and Deploy automation for various application and infrastructure using Maven, ant, Groovy, Terraform , ansible etc.
He must be creating Automated CICD and CT Pipelines for smooth and frequent releases of Software. SR. SRE will play hand on DevOps Automation Engineer role.
Sr. SRE will be in charge of proactively building and implementing services to make IT and support better at their jobs.
This can be anything from adjustments to monitoring and alerting to build, deploy and various environments uptime support.
A site reliability engineer can be tasked with building a homegrown tool from scratch to help with weaknesses in software delivery or incident / problem management.
Sr. SRE must be hand on in Terraform, Ansible, Python, Jenkins, CICD concepts to help building reliable DevOps tools.
Similarly to the point above, a site reliability engineer can expect to spend time fixing support escalation cases. But, as your SRE operations mature, your systems will become more reliable and you’ll see fewer critical incidents in production leading to fewer support escalations.
Because an SRE Team touches so many different parts of the engineering and IT organization, they can be a great source of knowledge and can be helpful for routing issues to the right people and teams.
Sr. SRE will be responsible to fulfill this need.
More times than not, site reliability engineers will need to take on-call responsibilities. Sr. SRE is responsible to improve system reliability through the optimization of on-call processes.
SRE teams will help add automation and context to alerts leading to better real-time collaborative response from on-call responders.
Additionally, Sr. SRE would update runbook, tools and documentation to help prepare on-call teams for future incidents.
SRE teams gain exposure to systems in both staging and production, as well as all technical teams. They take part in work with software development, support, IT operations and on-call duties meaning they build up a great amount of historical knowledge over time.
Instead of siloing this knowledge into the mind of one team or one person, site reliability engineers can be tasked with documenting much of what they know.
Constant upkeep of documentation and runbooks can ensure that teams get the information they need right when they need it.
Sr. SRE member will be responsible to handle this and help through automation to ease these kind of tasks.
Without thorough post incident reviews you have no way to identify what’s working and what’s not. Sr. SRE need to keep teams honest and ensure that everyone software developers and IT professionals are conducting post-incident reviews, documenting their findings and taking action on their learning.
Then, site reliability engineers are often tasked with action items for building or optimizing some part of the SDLC or incident lifecycle to bolster the reliability of their service.
Must have :