Cloud Services Business Unit delivers the full VMware portfolio of enterprise capabilities as an integrated set of cloud services, to enable consistent infrastructure and operations across every major public cloud, or service provider environment.
Our team enables Cloud Providers across the globe to consume VMware products. By offering a wide range of VMware-based cloud services on a geographical basis, Providers can offer cloud services that quickly and seamlessly extend their customer’s data center into the cloud using the same VMware products and tools they already use on premise.
Role : As a Senior Member of Technical Staff, Site Reliability, you will collaborate closely with product development teams on management and deployment of multiple SaaS offerings.
You have a background running large scale applications in Public Cloud(AWS, GCP, Azure) deployed over Kubernetes. You are excited about helping teams be successful in building reliable, self-healing services. Responsibilities :
Participate in architectural reviews with reliability and resiliency in mind.
Recommend preventive and corrective actions for incidents.
Collaborate with teams on improving deployment automation, improving resiliency and security of our cloud products. You’re intimately familiar with CI / CD tools and methodologies and know how to get the most out of them
Comfortable working with development teams on addressing reliability and scale concerns across the stack. You’re just as much dev as ops and flourish working in an Agile model
Help teams improve the observability of their services through application and infrastructure instrumentation. Monitoring, alerting, metrics, and deep introspection of applications is a must and an area you’re passionate about
Troubleshoot complex operational issues within a microservices based architecture
Develop tooling to enhance development and troubleshooting efficiency
Participate in the on-call rotation in keeping the Availability as per SLA.
Required Skills :
5-8 years of SRE / DevOps experience working on highly scalable distributed systems
A solid understanding of cloud-based architectures and concepts, with hands-on experience using Public Clouds and Kubernetes
Scripting Experience with any programming language
Experience with metric and log aggregation tools (Prometheus, ELK, etc.)
Experience with Monitoring tools like Grafana / Wavefront
Strong interpersonal communication skills (including listening, speaking, and writing) and ability to work well in a diverse, team-focused environment with other SREs, Engineers, Product Managers, etc.
Preferred Skills :
Experience working on Terraform / Ansible / Helm
Knowledge of relational and non-relational databases, networking, Linux internals, filesystems, web architecture, CI / CD principles
Experience using Git
Category : Engineering and Technology
Subcategory : Site Reliability
Experience : Manager and Professional
Full Time / Part Time : Full Time
Posted Date : 2021-06-08