Senior Site Reliability Engineer
San Jose, San José, Costa Rica
hace 1 día

This position is available anywhere in Mexico

Intelligent Conversation and Communications Cloud (IC3) Carrier Operations Team

Intelligent Conversations and Communications Cloud (IC3) powers billions of real-time customer conversations across Microsoft’s first-party (Teams, Skype) and second-party (Dynamics) solutions.

IC3 enables reliable and high-quality audio / video calling, meeting, and messaging services that work every time, from anywhere seamlessly across all customer touchpoints.

IC3 makes conversations on our platform more intelligent in real-time empowering the best-in-class productivity tools for the modern workplace where every call, meeting or chat makes the next one better.

As part of the IC3 Carrier Operations SRE team, our mission is to ensure we operate the IC3 PSTN services with end to end high availability, performance and reliability to ensure customer objectives are consistently met or exceeded.

To achieve this, we work closely with our product and engineering teams and use a variety of home-grown toolsets aimed at aggressive automation for reliability.

We are also a service engineering-focused team running at scale while supporting deployments to support new carriers across the globe.


  • Learning and enhancing existing tools, developing new tools to meet new scale and features aimed at reducing manual intervention, enhancing prevention, detection, and mitigation of service impacts.
  • Work with team of engineers focused on improving the reliability, scalability, latency, and efficiency of Teams Calling Services powering cloud communications (Azure).
  • Managing problem resolution with service providers.
  • Manage incident response and perform root cause analysis investigations.
  • Reviewing existing processes and driving improvements in order to support scale and excellence of Teams Calling Services.
  • Automate and analyze data consumption and provide operational insights into service reliability, customer experience to Design and Product teams.
  • Partnering with Data Scientists / ML engineers in developing proactive anomaly detection measures
  • Participate in Inner / Open-Source projects to help Microsoft and Partners beat next gen challenges
  • Participating in recruiting, mentoring, and developing a team of experienced SRE engineers.

  • Participate in on-call rotation of the local follow-the-sun team.
  • Required

  • 3+ years of experience as a software engineer or site reliability engineer directly supporting and developing quality products
  • 3+ years of coding experience in any backend language : C#, Java, JavaScript, Python, Shell, PowerShell
  • 3+ years of experience working in distributed systems, micros-services and highly available infrastructure on prem or cloud.
  • 2+ years of experience working and deploying in any cloud : Azure, IBM Cloud, Google Cloud, AWS, etc.
  • 3 years of experience working in Agile teams and self-directed teams.
  • Experience in building SQL Complex queries and exposing the info to user / non-technical consumption (Bars, timelines, trending, etc.)
  • Experience in networking : virtual networks & subnets.
  • Bonus (desirable skills) :

  • Experience with Microsoft Azure, Azure DevOps, ServiceNow, Microsoft Dynamics or FLOW
  • Knowledge / experience of Internet network architecture and working / functioning principles.
  • Experience analyzing network packet captures and signaling traces
  • Experience with Voice over IP (SBCs, Media Gateways, Circuit-switched Telephony, SS7, ISDN / ISUP.)
  • Reportar esta oferta

    Thank you for reporting this job!

    Your feedback will help us improve the quality of our services.

    Mi Correo Electrónico
    Al hacer clic en la opción "Continuar", doy mi consentimiento para que neuvoo procese mis datos de conformidad con lo establecido en su Política de privacidad . Puedo darme de baja o retirar mi autorización en cualquier momento.
    Formulario de postulación