Member of Technical Staff - Model Serving / API Backend Engineer

Black Forest Labs

is a cutting-edge startup pioneering generative image and video models. Our team, which invented Stable Diffusion, Stable Video Diffusion, and FLUX.1, is currently a strong candidate to join us in developing and improving our API / model serving backend and services.Role:

Develop and maintain robust APIs for serving machine learning models
Transform research models into production-ready demos and MVPs
Optimize model inference for improved performance and scalability
Implement and manage user preference data acquisition systems
Ensure high availability and reliability of model serving infrastructure
Collaborate with ML researchers to rapidly prototype and deploy new models

Ideal Experience:

Strong proficiency in Python and its ecosystem for machine learning, data analysis, and web development
Extensive experience with RESTful API development and deployment for ML tasks
Familiarity with containerization and orchestration technologies (e.g., Docker, Kubernetes)
Knowledge of cloud platforms (AWS, GCP, or Azure) for deploying and scaling ML services
Proven track record in rapid ML model prototyping using tools like Streamlit or Gradio
Experience with distributed task queues and scalable model serving architectures
Understanding of monitoring, logging, and observability best practices for ML systems

Nice to have:

Experience with frontend development frameworks (e.g., Vue.js, Angular, React)
Familiarity with MLOps practices and tools
Knowledge of database systems and data streaming technologies
Experience with A/B testing and feature flagging in production environments
Understanding of security best practices for API development and ML model serving
Experience with real-time inference systems and low-latency optimizations
Knowledge of CI/CD pipelines and automated testing for ML systems
Expertise in ML inference optimizations, including techniques such as:
Reducing initialization time and memory requirements
Implementing dynamic batching
Utilizing reduced precision and weight quantization
Applying TensorRT optimizations
Performing layer fusion and model compilation
Writing custom CUDA code for performance enhancements

Related Sub

This job belongs to these sub. Explore related roles here:

Member of Technical Staff - Model Serving / API Backend Engineer

Related Sub

Your tracker settings