logo inner

Research Scientist / Engineer – Multimodal Capabilities

Luma AIUnited StatesRemote

About the Role


The Multimodal Capabilities team at Luma focuses on unlocking advanced capabilities in our foundation models through strategic research into multimodal understanding and generation. This team tackles fundamental research questions around how different modalities can be combined to enable new behaviors and capabilities, working on the open-ended challenges of what makes multimodal AI systems truly powerful and versatile.

Responsibilities


  • Collaborate with the Foundation Models team to identify capability gaps and research solutions
  • Design datasets, experiments, and methodologies to systematically improve model capabilities across vision, audio, and language
  • Develop evaluation frameworks and benchmarking approaches for multimodal AI capabilities
  • Create prototypes and demonstrations that showcase new multimodal capabilities

Experience


  • Strong programming skills in Python and PyTorch
  • Experience with multimodal data processing pipelines and large-scale dataset curation
  • Understanding of computer vision, audio processing, and / or natural language processing techniques
  • (Preferred) Expertise working with interleaved multimodal data
  • (Preferred) Hands-on experience with Vision Language Models, Audio Language Models, or generative video models

Compensation Range: $200K - $300K

Your tracker settings

We use cookies and similar methods to recognize visitors and remember their preferences. We also use them to measure ad campaign effectiveness, target ads and analyze site traffic. To learn more about these methods, including how to disable them, view our Cookie Policy or Privacy Policy.

By tapping `Accept`, you consent to the use of these methods by us and third parties. You can always change your tracker preferences by visiting our Cookie Policy.

logo innerThatStartupJob
Discover the best startup and their job positions, all in one place.
Copyright © 2025