logo inner

SRE Manager

DescopeLos Altos, California, United StatesOnsite

We are seeking an experienced and driven
SRE Manager to lead our Site Reliability Engineering team. This role is critical to ensuring the availability, scalability, and performance of our production systems. As the SRE Manager, you will be responsible for managing a team of engineers focused on building automation, enhancing monitoring and observability, improving system reliability, and fostering a culture of operational excellence. You will work closely with development, infrastructure, and security teams to support high-quality product delivery with minimal downtime.

Key Responsibilities:


  • Lead and grow a high-performing SRE team responsible for the reliability, performance, and scalability of production systems.
  • Own the incident management process, postmortems, and root cause analysis to improve system resilience.
  • Drive implementation of SLAs, SLOs, and error budgets across services to align operational goals with business objectives.
  • Champion the use of automation to reduce manual work and improve deployment and recovery times.
  • Collaborate with software engineering and DevOps teams to ensure systems are designed for reliability and operational efficiency.
  • Oversee system monitoring, alerting, and observability efforts using tools like Prometheus, Grafana, Datadog, or similar.
  • Manage on-call rotations, and ensure proper documentation, runbooks, and playbooks are maintained.
  • Identify and drive continuous improvement in system architecture, capacity planning, and deployment strategies.
  • Ensure compliance with security, privacy, and regulatory requirements within the infrastructure.
  • Provide mentorship, performance reviews, and career development opportunities for SRE team members.
  • Qualifications:
  • Bachelor’s degree in Computer Science, Engineering, or a related field (or equivalent experience).
  • 4+ years of experience in software engineering, DevOps, or SRE roles.
  • Strong experience with cloud platforms (AWS, GCP, or Azure) and infrastructure-as-code tools (Terraform, Pulumi, etc.).
  • Proficient in programming/scripting languages such as Python, Go, Javascript.
  • Deep understanding of Linux systems, networking, and container orchestration (Kubernetes, Docker).
  • Strong knowledge of CI/CD pipelines and release automation.
  • Excellent leadership, communication, and project management skills.
  • Proven track record of building reliable systems at scale and managing incident response in production environments.

Life at Descope

Hi, we're Descope! We help every developer build secure, frictionless authentication and user journeys for any application. Drag-and-drop your auth at https://www.descope.com/sign-up
Thrive Here & What We Value1. Mission-driven company with a focus on customer identity enhancement in apps2. Collaborative and supportive workplace environment3. Career growth and professional development opportunities4. Innovation and creativity emphasis5. Teamwork and open communication

Related Sub

This job belongs to these sub. Explore related roles here:
Product manager jobs
Your tracker settings

We use cookies and similar methods to recognize visitors and remember their preferences. We also use them to measure ad campaign effectiveness, target ads and analyze site traffic. To learn more about these methods, including how to disable them, view our Cookie Policy or Privacy Policy.

By tapping `Accept`, you consent to the use of these methods by us and third parties. You can always change your tracker preferences by visiting our Cookie Policy.

logo innerThatStartupJob
Discover the best startup and their job positions, all in one place.
Copyright © 2025