Site Reliability Engineer

Unstructured TechnologiesSan Francisco, California, United StatesHybrid, Onsite

This job is no longer open

Ready to shape the future of AI infrastructure and build systems that power the most advanced unstructured data pipelines in the world?

At Unstructured, we’re building the backbone of generative AI—enabling companies to transform PDFs, HTML, Word docs, images, and more into high-performance data pipelines that scale. Our tools are already used by half of the Fortune 500, and our open-source package has been downloaded 26+ million times. Now we’re entering our next chapter—and we’re hiring a Site Reliability Engineer to help scale our systems and safeguard our infrastructure.If you’re energized by reliability, love solving infrastructure challenges at scale, and want to help define how modern AI systems run in production, this is your moment.

You’ll work closely with Engineering, Product, and Customer teams to build scalable systems, streamline CI/CD, and make reliability a first-class citizen across everything we deploy.🏢

This role is hybrid in San Francisco—join us in-office 3x a week for deep collaboration, whiteboard sessions, and hands-on impact.

🔧

What You’ll Own & Drive

🛠

Scale & Stability at the Core

Design and implement highly available, observable, and scalable infrastructure across cloud environmentsBuild resilient systems that meet the demands of enterprise-grade, production AI workloads⚙️

Automate Everything

Develop Infrastructure-as-Code using Terraform, Pulumi, and othersOwn CI/CD automation and build reusable pipelines with GitHub Actions and modern DevOps tooling🚀

Own Kubernetes & Orchestration

Manage and optimize our Kubernetes clusters and containerized environmentsTune Helm charts, service mesh configs, and orchestration systems for performance and security📊

Obsess Over Observability

Implement and maintain monitoring, logging, and alerting with tools like Prometheus, Grafana, Datadog, and ElasticEnsure we can see, understand, and respond to system behavior in real-time🧪

Drive Production Readiness

Partner with engineering to prepare features and systems for production rolloutsContribute to capacity planning, deployment strategies, and fault-tolerant system design🔥

Lead Incident Response

Support and lead incident response processes, postmortems, and root cause analysisChampion a culture of blameless retrospectives and continuous improvement💻

Accelerate Engineering Velocity

Improve developer experience through tooling, automation, and streamlined feedback loopsHelp teams move faster without sacrificing quality or uptime🧬

What You Bring

-4+ years in SRE, DevOps, or Infrastructure Engineering roles supporting high-scale production environments-Deep experience with cloud platforms like AWS, GCP, or Azure-Expertise in Kubernetes, Docker, and container orchestration at scale-Strong Linux systems and networking fundamentals-Scripting and automation skills (Python, Bash, or Go preferred)-Proficiency with Infrastructure-as-Code (Terraform, Pulumi, Ansible, or similar)-Solid understanding of monitoring and observability best practices-A calm, systems-thinking approach to incident response and reliability💎

Bonus Points

-Experience supporting ML infrastructure or real-time data pipelines-Exposure to serverless or event-driven architectures-Contributions to open-source DevOps projects or communities-Familiarity with security and compliance in cloud-native environments🌟

Why You’ll Love It Here

Impact That Matters

: Own the core infrastructure behind AI systems used by the Fortune 500

Big Technical Challenges

: Solve hard, meaningful problems at the cutting edge of cloud and data

Elite Team

: Join a sharp, humble group of engineers who value execution and impact

SF Office Vibes

: Collaborate live with real whiteboards and real humans (not just Slack threads)

Flexible Culture

: Hybrid structure with async-friendly, low-ego collaboration$190,000 - $250,000 a yearThis role's salary is benchmarked against San Francisco market rates to remain competitive with top-tier talent in high-cost-of-living regions. Final compensation may vary based on experience, skill set, and location. Apply for this job

This job is no longer open

Life at Unstructured Technologies

Transforming Natural Language Data From Raw to Machine Learning-Ready: Open source libraries and APIs to build custom preprocessing pipelines for labeling, training, or production machine learning pipelines.

Thrive Here & What We Value1. Dynamic Sales Team2. Flexible Working Hours and Remote Working Opportunities3. Professional Growth and Development in an Innovative Tech Company4. Emphasis on Innovation and Creativity5. Collaborative Work Environment6. Impactful Role in Shaping the Company's Direction and Driving Innovation in Unstructured Data Processing7. Remote First Company8. Competitive Salary and Performance-Based Incentives9. Comprehensive Benefits Package, Including Health, Dental, and Vision Insurance10. Vibrant and Inclusive Work Environment Where Your Ideas Matter

Related Sub

This job belongs to these sub. Explore related roles here:

Open source jobs

Site Reliability Engineer

Ready to shape the future of AI infrastructure and build systems that power the most advanced unstructured data pipelines in the world?

This role is hybrid in San Francisco—join us in-office 3x a week for deep collaboration, whiteboard sessions, and hands-on impact.

What You’ll Own & Drive

Scale & Stability at the Core

Automate Everything

Own Kubernetes & Orchestration

Obsess Over Observability

Drive Production Readiness

Lead Incident Response

Accelerate Engineering Velocity

What You Bring

Bonus Points

Why You’ll Love It Here

Impact That Matters

Big Technical Challenges

Elite Team

SF Office Vibes

Flexible Culture

Life at Unstructured Technologies

Related Sub

Your tracker settings