Senior Site Reliability Engineer - Opportunity for Working Remotely
VMware
Heredia, Costa Rica
hace 1 día

Job Description

Cloud Services Business Unit delivers the full VMware portfolio of enterprise capabilities as an integrated set of cloud services, to enable consistent infrastructure and operations across every major public cloud, or service provider environment.

Our team enables Cloud Providers across the globe to consume VMware products. By offering a wide range of VMware-based cloud services on a geographical basis, Providers can offer cloud services that quickly and seamlessly extend their customer’s data center into the cloud using the same VMware products and tools they already use on premise.

Role : As a Senior Member of Technical Staff, Site Reliability, you will collaborate closely with product development teams on management and deployment of multiple SaaS offerings.

You have a background running large scale applications in Public Cloud(AWS, GCP, Azure) deployed over Kubernetes. You are excited about helping teams be successful in building reliable, self-healing services. Responsibilities :

  • Participate in architectural reviews with reliability and resiliency in mind.
  • Recommend preventive and corrective actions for incidents.
  • Collaborate with teams on improving deployment automation, improving resiliency and security of our cloud products. You’re intimately familiar with CI / CD tools and methodologies and know how to get the most out of them
  • Comfortable working with development teams on addressing reliability and scale concerns across the stack. You’re just as much dev as ops and flourish working in an Agile model
  • Help teams improve the observability of their services through application and infrastructure instrumentation. Monitoring, alerting, metrics, and deep introspection of applications is a must and an area you’re passionate about
  • Troubleshoot complex operational issues within a microservices based architecture
  • Develop tooling to enhance development and troubleshooting efficiency
  • Participate in the on-call rotation in keeping the Availability as per SLA.
  • Required Skills :

  • 5-8 years of SRE / DevOps experience working on highly scalable distributed systems
  • A solid understanding of cloud-based architectures and concepts, with hands-on experience using Public Clouds and Kubernetes
  • Scripting Experience with any programming language
  • Experience with metric and log aggregation tools (Prometheus, ELK, etc.)
  • Experience with Monitoring tools like Grafana / Wavefront
  • Strong interpersonal communication skills (including listening, speaking, and writing) and ability to work well in a diverse, team-focused environment with other SREs, Engineers, Product Managers, etc.
  • team-player attitude
  • Preferred Skills :

  • Experience working on Terraform / Ansible / Helm
  • Knowledge of relational and non-relational databases, networking, Linux internals, filesystems, web architecture, CI / CD principles
  • Experience using Git
  • JoinCSBU referral campaign

    Category : Engineering and Technology

    Subcategory : Site Reliability

    Experience : Manager and Professional

    Full Time / Part Time : Full Time

    Posted Date : 2021-07-15

    Reportar esta oferta
    checkmark

    Thank you for reporting this job!

    Your feedback will help us improve the quality of our services.

    Inscribirse
    Mi Correo Electrónico
    Al hacer clic en la opción "Continuar", doy mi consentimiento para que neuvoo procese mis datos de conformidad con lo establecido en su Política de privacidad . Puedo darme de baja o retirar mi autorización en cualquier momento.
    Continuar
    Formulario de postulación