Site reliability engineer at Writer

📐 About this role

We are looking for a foundational member of the Cloud infrastructure team at WRITER. This role will involve contributing to the development and implementation of our Site reliability engineering (SRE) program. The ideal candidate will ensure the reliability, scalability, performance, and security of WRITER’s critical systems, taking a proactive approach to guarantee that our high-ROI products reach our customers seamlessly.

🦸🏻‍♀️ Your responsibilities:

Lead the design, implementation, and maintenance of WRITER, Inc.’s cloud infrastructure to ensure high availability and performance
Design and implement scalable cloud automation to support seamless deployment for our largest enterprise customers
Automate infrastructure provisioning and management using Terraform & Python
Collaborate with development teams to optimize cloud resources and enhance system reliability
Develop and maintain monitoring and alerting systems to proactively identify and resolve issues affecting the reliability of our writing solutions
Conduct post-mortem analyses of system failures to identify root causes and implement preventive measures
Optimize and scale our cloud infrastructure to support growing user demand and ensure cost efficiency
Ensure the security and compliance of our systems, adhering to industry standards and regulations
Provide mentorship and technical guidance to junior engineers, fostering a culture of reliability and continuous improvement
Stay current with emerging technologies and industry trends to continuously improve our site reliability practices

⭐ Is this you?

Proven expertise in Site Reliability Engineering with a minimum of 7 years of hands-on experience
Deep understanding of system architecture and infrastructure design to ensure high availability and performance
Bachelor’s degree in Computer Science, Engineering, or a related technical field
Strong proficiency in programming languages such as Python, Java, Go for automation and monitoring
Experience with cloud platforms like AWS, Azure, or GCP, and their respective services for scalable and resilient systems
Expertise in containerization technologies (e.g., Docker, Kubernetes) and orchestration tools
Knowledge of monitoring and logging tools (e.g., Prometheus, Grafana, ELK Stack) to maintain system health and performance
Ability to lead and mentor junior engineers in best practices for reliability and system optimization
Excellent communication skills to collaborate effectively with cross-functional teams and stakeholders
Proactive approach to identifying and mitigating potential system failures and performance bottlenecks

✨Preferred skills & experience:

Software engineering expertise
Terraform
Python
Kubernetes
Scala
AWS/GCP

🍩 Benefits & perks (UK full-time employees):

Generous PTO, plus company holidays
Comprehensive medical and dental insurance
Paid parental leave for all parents (12 weeks)
Fertility and family planning support
Early-detection cancer testing through Galleri
Competitive pension scheme and company contribution
Annual work-life stipends for:
Home office setup, cell phone, internet
Wellness stipend for gym, massage/chiropractor, personal training, etc.
Learning and development stipend
Company-wide off-sites and team off-sites
Competitive compensation and company stock options

Life at Writer

Writer is the AI writing assistant for smart content teams. We help content leaders scale their messaging, communication style, and must-have language, no matter who's writing. Top companies like Twitter, Intuit, and UiPath have standardized their content creation on Writer.

Thrive Here & What We Value

Generous PTO and company holidays

Medical, dental, and vision coverage for employees and their families

Paid parental leave for all parents (12 weeks)

Fertility and family planning support

Earlydetection cancer testing through Galleri

Annual worklife stipends for home office setup, cell phone, internet

Health savings account with company contribution

Wellness stipend for gym, massage/chiropractor, personal training, etc.

Learning and development stipend

Company-wide off-sites and team off-sites

Job FunctionsDesigning and implementing cloud infrastructureAutomating infrastructure provisioning and management using Terraform & PythonCollaborating with development teams to optimize cloud resources and enhance system reliabilityDeveloping and maintaining monitoring and alerting systems to proactively identify and resolve issues affecting the reliability of our writing solutionsConducting post-mortem analyses of system failures to identify root causes and implement preventive measuresOptimizing and scaling our cloud infrastructure to support growing user demand and ensure cost efficiency

Job RequirementsBachelor’s degree in Computer Science, Engineering, or a related technical fieldExpertise in Site Reliability Engineering with a minimum of 7 years of hands-on experienceDeep understanding of system architecture and infrastructure design to ensure high availability and performanceStrong proficiency in programming languages such as Python, Java, Go for automation and monitoringExperience with cloud platforms like AWS, Azure, or GCP, and their respective services for scalable and resilient systemsExpertise in containerization technologies (e.g., Docker, Kubernetes) and orchestration toolsKnowledge of monitoring and logging tools (e.g., Prometheus, Grafana, ELK Stack) to maintain system health and performance

SkillsProven expertise in Site Reliability Engineering with a minimum of 7 years of hands-on experienceDeep understanding of system architecture and infrastructure design to ensure high availability and performanceStrong proficiency in programming languages such as Python, Java, Go for automation and monitoringExperience with cloud platforms like AWS, Azure, or GCP, and their respective services for scalable and resilient systemsExpertise in containerization technologies (e.g., Docker, Kubernetes) and orchestration toolsKnowledge of monitoring and logging tools (e.g., Prometheus, Grafana, ELK Stack) to maintain system health and performanceExcellent communication skills to collaborate effectively with cross-functional teams and stakeholdersProactive approach to identifying and mitigating potential system failures and performance bottlenecksAbility to lead and mentor junior engineers in best practices for reliability and system optimization