About Beatdapp
Beatdapp is a venture-backed startup delivering the most advanced streaming integrity and recommendation technology in the world1. While our roots are in fighting the multi-billion dollar problem of streaming fraud, we have leveraged our "Trust & Safety Operating System" to power a new generation of discovery.We believe that true personalization starts with verified behavior. By filtering out noise and manipulated signals before they impact the model, we build recommendation engines on a foundation of clean, authentic data.
We are looking for builders who want to work with the world’s best streaming services and music labels to reshape how content is discovered.
The Role
We are seeking a ML/DevOps Engineer who is passionate about building the high-availability systems that allow our machine learning models to thrive in production. In this role, you will be operating in the intersection of cloud infrastructure and machine learning operations. You will be one of the architects of a system where trillions of data points turn into real-time recommendations.You will take full ownership of multi-cluster Kubernetes environments, ensuring our API workloads scale seamlessly and our deployment pipelines are automated to perfection.
You will bridge the gap between complex ML research and stable, production-grade delivery, ensuring that as our models grow in complexity, our infrastructure remains fast, secure, and resilient.
Responsibilities
- Cloud Infrastructure & Orchestration: Manage and optimize multi-cluster Kubernetes (K8s) environments. You will implement sophisticated autoscaling policies and node management strategies to support high-availability ML workloads.
- Production Deployment Excellence: Design and orchestrate live service deployments using strategies such as A/B testing and Canary releases. You will ensure the system supports seamless rollbacks and API versioning.
- Infrastructure as Code (IaC): Design and maintain our infrastructure using IaC principles to ensure environment consistency and rapid disaster recovery.
- End-to-End Observability: Take ownership of the logging, traces, and metrics components. You will define error budgets and maintain the health monitoring systems that keep our RecSys engine running 24/7.
- Security & Compliance: Partner with security teams to enforce patch management, secrets handling (IAM/Secret Manager), and data encryption protocols to protect sensitive streaming data.
- Systems Ownership: Automate routine operational tasks and environment provisioning. You will be a primary stakeholder for system uptime, managing outages with a critical-thinking mindset and clear communication.
Successful Candidates will have:
- 3+ years of professional experience in DevOps, SRE, or Infrastructure Engineering, preferably supporting data-intensive or ML applications.
- Kubernetes Experience: Deep familiarity with K8s, including experience with compute instances, network configuration (VPCs/Subnets), and scaling API workloads.
- Strong Engineering Chops: Proficiency in writing clean, scalable, and object-oriented code. You are comfortable writing production-grade code that can handle large-scale data in virtualized environments.
- CI/CD Expertise: Proven track record of building automated pipelines, managing image registries (Docker/Podman), and handling complex code versioning.
- Architectural Fluency: A strong understanding of datastores (Relational vs. Non-relational), caching strategies, and data transfer protocols (HTTPS/APIs).
- Security-First Mindset: Experience working with sensitive data, encryption, and secure cloud networking.
Bonus Points
- GCP Experience: Familiarity with Google Cloud Platform (GCP) services and Terraform.
- Service Mesh Experience: Hands-on work with Istio or Linkerd for Kubernetes.
- MLOps Interest: Familiarity with Python and experience deploying dedicated RecSys infrastructure or vector databases.
- Automation Maturity: Experience with GitHub Actions (GHA) and building highly automated, self-healing deployment workflows.
- Documentation Rigor: A strong feel for creating clear architecture diagrams, code commenting, and technical design documents.