logo inner

Member of Technical Staff - Model Serving / API Backend Engineer


Black Forest Labs
is a cutting-edge startup pioneering generative image and video models. Our team, which invented Stable Diffusion, Stable Video Diffusion, and FLUX.1, is currently a strong candidate to join us in developing and improving our API / model serving backend and services.Role:

  • Develop and maintain robust APIs for serving machine learning models
  • Transform research models into production-ready demos and MVPs
  • Optimize model inference for improved performance and scalability
  • Implement and manage user preference data acquisition systems
  • Ensure high availability and reliability of model serving infrastructure
  • Collaborate with ML researchers to rapidly prototype and deploy new models

Ideal Experience:

  • Strong proficiency in Python and its ecosystem for machine learning, data analysis, and web development
  • Extensive experience with RESTful API development and deployment for ML tasks
  • Familiarity with containerization and orchestration technologies (e.g., Docker, Kubernetes)
  • Knowledge of cloud platforms (AWS, GCP, or Azure) for deploying and scaling ML services
  • Proven track record in rapid ML model prototyping using tools like Streamlit or Gradio
  • Experience with distributed task queues and scalable model serving architectures
  • Understanding of monitoring, logging, and observability best practices for ML systems

Nice to have:

  • Experience with frontend development frameworks (e.g., Vue.js, Angular, React)
  • Familiarity with MLOps practices and tools
  • Knowledge of database systems and data streaming technologies
  • Experience with A/B testing and feature flagging in production environments
  • Understanding of security best practices for API development and ML model serving
  • Experience with real-time inference systems and low-latency optimizations
  • Knowledge of CI/CD pipelines and automated testing for ML systems
  • Expertise in ML inference optimizations, including techniques such as:
  • Reducing initialization time and memory requirements
  • Implementing dynamic batching
  • Utilizing reduced precision and weight quantization
  • Applying TensorRT optimizations
  • Performing layer fusion and model compilation
  • Writing custom CUDA code for performance enhancements

Related Sub

This job belongs to these sub. Explore related roles here:
Your tracker settings

We use cookies and similar methods to recognize visitors and remember their preferences. We also use them to measure ad campaign effectiveness, target ads and analyze site traffic. To learn more about these methods, including how to disable them, view our Cookie Policy or Privacy Policy.

By tapping `Accept`, you consent to the use of these methods by us and third parties. You can always change your tracker preferences by visiting our Cookie Policy.

logo innerThatStartupJob
Discover the best startup and their job positions, all in one place.
Copyright © 2025