logo inner

Site Reliability Engineer - Observability

Second Front SystemsWorldwideRemote
ABOUT THE ROLE
Second Front Systems' (2F) Product team is seeking a highly skilled and motivated Senior Site Reliability Engineer to join our Observability team. We are a small team working to accelerate the deployment of emerging technology into national security use-cases. We are seeking technical professionals who want to operate on the front lines of an exciting and disruptive mission.As a Senior SRE for Second Front Systems, you'll be responsible for deploying, maintaining, and scaling our observability infrastructure across multiple DoD networks.

You'll work with Kubernetes-based platforms, BigBang charts from DoD Platform One, and build automation to make our monitoring stack easier to deploy for new customers. You'll be empowered to collaborate with others to implement infrastructure that delivers unique capabilities for our commercial and government customers, including the Department of Defense.The Observability team is looking for a strong SRE with deep DevSecOps and Kubernetes experience. Someone who has deployed and maintained monitoring infrastructure at scale, with an eye for security in highly-regulated environments.

Experience with DoD software deployments, Platform One, and single-tenant architectures is highly valued.We are a fast-growing entrepreneurial team working at the convergence of technology and national security. If this type of effort interests you, come join us!Note: This position requires U.S. citizenship due to government contract requirements.

What You’ll Do


  • Deploy and maintain observability stack (Grafana, Mimir, Prometheus) across multiple customer clusters and DoD networks
  • Build Helm chart abstractions and automation to streamline monitoring deployments for new customers
  • Troubleshoot and debug complex Kubernetes issues, networking problems, and monitoring stack failures
  • Configure and maintain BigBang charts and DoD Platform One integrations
  • Design and implement infrastructure automation using tools like Pulumi, ArgoCD, and Flux
  • Work with Istio service mesh and Keycloak for authentication in secure environments
  • Monitor and optimize performance of monitoring infrastructure across multiple environments
  • Collaborate with security teams to ensure compliance with NIST requirements and DoD standards
  • Participate in on-call rotation and incident response for production environments

Skills You’ll Bring to Our Team


  • 5+ years of Site Reliability Engineering or DevOps experience
  • Deep experience with Kubernetes administration, troubleshooting, and scaling
  • Hands-on experience deploying and maintaining observability tools (Prometheus, Grafana, Mimir/Cortex)
  • Strong understanding of Helm charts, GitOps practices, and CNCF tooling
  • Experience with service mesh technologies (Istio preferred)
  • Proven ability to debug complex distributed systems and networking issues
  • Understanding of authentication systems and security in regulated environments
  • Ability to work independently and collaborate with team members in a remote environment

Preferred Qualifications


  • Active security clearance or ability to obtain a Secret-level security clearance
  • Previous experience with DoD software deployments and Platform One
  • Experience with BigBang charts and Iron Bank containers
  • Experience working in national security or highly regulated environments
  • Familiarity with compliance frameworks (NIST, FedRAMP, etc.)
  • Experience with infrastructure as code (Pulumi, Terraform)

Technologies we Use


  • Observability: Grafana stack, Prometheus, custom alerting tools
  • Kubernetes: Helm, ArgoCD, Flux, Tekton, BigBang charts
  • Security: Istio, Keycloak, Kyverno
  • Infrastructure: AWS/GCP/Azure, Pulumi, Git/GitLab
  • Languages: YAML, Bash, Go

$160,000 - $180,000 a yearPerks & BenefitsThis role is full time.  As a public benefit corporation, we’re a team of purpose-driven trailblazers transforming the future of U.S. national security. We hire the best to do their best and, as such, we are committed to providing the perks and benefits you need to be successful—both in- and outside the workplace.We offer you:Competitive Salary100% Healthcare, vision and dental coverage401(k) + 3% company contributionWellness perks (Fitness classes, mental health resources)Equity incentive planTech + office supplies stipendAnnual professional development stipendFlexible paid time off + federal holidays offParental leaveWork from anywhereReferral BonusVisit our careers page to learn more.

#LI-Remote


Apply for this job

Life at Second Front Systems

At Second Front Systems, we build software that accelerates delivery of emerging commercial technologies to U.S. warfighters. By harnessing insights and methodologies from the private sector and aligning them with government priorities and processes, we enable defense and national security professionals to effectively engage in long-term, continuous competition for access to emerging technologies. Our Atlas Fulcrum software platform equips operators for acquisition warfare by capturing, integrating, and presenting data about solution providers of interest for market research, tech scouting and evaluation. This software as a service tool allows program managers, acquisition professionals, and national security innovators to compete for the best technology and speed it's transition to the warfighter. Second Front is a public benefit corporation and veteran-owned defense company headquartered in Arlington, Virginia, with a bi-coastal and international presence.
Thrive Here & What We Value1. Competitive salary2. 401(k) + 3% company contribution3. Wellness perks (Fitness classes, mental health resources)4. Equity incentive plan5. Work from anywhere6. Parental leave7. Flexible paid time off + federal holidays off8. Referral Bonus9. Tech + office supplies stipend10. Annual professional development stipend

Related Sub

This job belongs to these sub. Explore related roles here:
Your tracker settings

We use cookies and similar methods to recognize visitors and remember their preferences. We also use them to measure ad campaign effectiveness, target ads and analyze site traffic. To learn more about these methods, including how to disable them, view our Cookie Policy or Privacy Policy.

By tapping `Accept`, you consent to the use of these methods by us and third parties. You can always change your tracker preferences by visiting our Cookie Policy.

logo innerThatStartupJob
Discover the best startup and their job positions, all in one place.
Copyright © 2025