Job FunctionsEnsure reliability, scalability, performance, and security of systemsManage and optimize infrastructure for efficiency and minimal downtimeDevelop, maintain, and enhance observability and CI/CD toolsLead product initiatives, conduct resiliency reviews, coordinate cross-team efforts, and manage goals, risks, and resources for successful deliveryAdvise on reliability, mentor engineers on best practices, facilitate cross-team communication, and translate stakeholder needs into technical solutions
Job Requirements8+ years of experience in reliability, scalability, performance, security, and enterprise system architectureStrong coding skills in at least one language (Go, Python, Java Spring Boot, .NET, etc.)Deep knowledge of software applications, technical processes, and emerging disciplinesHands-on experience with monitoring and telemetry tools (Grafana, Prometheus, Datadog, Splunk, etc.), SLO alerting, and CI/CD tools (Jenkins, GitHub Actions, GitLab, Terraform)Expertise in containerization and orchestration (Kubernetes, Docker, ECS) and troubleshooting networking and distributed system issuesExperience creating infrastructure resources using Terraform or OpenTofu, with formal training or certification in software engineering and 5+ years of applied experience
SkillsStrong coding skills in at least one language (Go, Python, Java Spring Boot, .NET, etc.)Deep knowledge of software applications, technical processes, and emerging disciplinesHands-on experience with monitoring and telemetry tools (Grafana, Prometheus, Datadog, Splunk, etc.), SLO alerting, and CI/CD tools (Jenkins, GitHub Actions, GitLab, Terraform)Expertise in containerization and orchestration (Kubernetes, Docker, ECS) and troubleshooting networking and distributed system issuesExperience creating infrastructure resources using Terraform or OpenTofu, with formal training or certification in software engineering and 5+ years of applied experienceDrivers – satisfied by making things happen, not coming along for the rideHumility – humble team players check their egos and consider the team’s needs above their ownGrowth Mindset – approach challenges and failures as learning opportunities