Product Manager, Model Behavior

Location* | San Francisco, California, United States

TypeOnsite

Sub

About Cartesia

Our mission is to build the next generation of AI: ubiquitous, interactive intelligence that runs wherever you are. Today, not even the best models can continuously process and reason over a year-long stream of audio, video and text—1B text tokens, 10B audio tokens and 1T video tokens—let alone do this on-device.We're pioneering the model architectures that will make this possible. Our founding team met as PhDs at the Stanford AI Lab, where we invented State Space Models or SSMs, a new primitive for training efficient, large-scale foundation models.

Our team combines deep expertise in model innovation and systems engineering paired with a design-minded product engineering team to build and ship cutting edge models and experiences.We're funded by leading investors at Index Ventures and Lightspeed Venture Partners, along with Factory, Conviction, A Star, General Catalyst, SV Angel, Databricks and others. We're fortunate to have the support of many amazing advisors, and 90+ angels across many industries, including the world's foremost experts in AI.

About the Role

We're seeking an exceptional Product Manager to drive model quality and behavior excellence for our text-to-speech and speech-to-text products at Cartesia. As our Model Behavior PM, you'll be the bridge between our customers' needs and our model development teams, defining what world-class TTS and STT models should sound like, perform like, and feel like. This role combines deep analytical rigor with customer empathy to continuously elevate our model quality and establish Cartesia as the gold standard in voice AI.

Your Impact

Define and evolve comprehensive evaluation frameworks for TTS and STT model behavior, establishing clear metrics for naturalness, accuracy, prosody, emotion, latency, and user satisfaction across diverse use cases
Conduct systematic competitive analysis by deeply using our products alongside competitors' offerings, identifying quality gaps, behavioral differences, and opportunities for differentiation
Partner closely with data teams to design data collection strategies, labeling guidelines, and dataset curation approaches that directly improve model behavior and performance
Collaborate with evaluation teams to build rigorous testing methodologies, automated evaluation pipelines, and human evaluation protocols that catch edge cases and quality regressions
Engage directly with customers across industries to understand their voice AI requirements, gather qualitative feedback on model behavior, and translate insights into actionable product improvements
Drive cross-functional alignment between research, engineering, data, and GTM teams to prioritize and execute on model behavior improvements that deliver maximum customer impact
Build a deep intuition for what makes TTS and STT models truly great—from subtle pronunciation nuances to handling of edge cases—and champion quality standards across the organization
Create frameworks, documentation, and best practices that help internal teams and customers understand model capabilities, limitations, and optimal usage patterns

What You Bring

6+ years of product management experience with technical products, preferably in AI/ML, audio, or speech technologies
Strong analytical mindset with experience designing evaluation frameworks, defining success metrics, and making data-driven quality decisions
Deep customer empathy with proven ability to conduct user research, synthesize qualitative feedback, and translate needs into product requirements
Technical fluency to work effectively with ML researchers, data scientists, and engineers—understanding model behavior at a detailed level
Exceptional attention to detail and quality standards, with the ability to notice subtle differences in model outputs and articulate what makes one better than another
Experience working cross-functionally with data teams, engineering teams, and evaluation/testing teams
Strong communication skills to advocate for quality and influence technical teams toward customer-centric decisions

Nice to Have

Direct experience with speech technologies (TTS, STT, voice cloning, or conversational AI)
Background in linguistics, audio engineering or speech sciences
Experience with ML model evaluation, A/B testing methodologies, or human evaluation design
Familiarity with audio quality metrics (MOS, WER, CER, prosody analysis)
Prior experience at a company known for exceptional product quality and attention to detail

What We Offer

🍽 Lunch, dinner and snacks at the office🏥 Fully covered medical, dental, and vision insurance for employees🏦 401(k)✈️ Relocation and immigration support🦖 Your own personal Yoshi

Our culture

🏢 We’re an in-person team based out of San Francisco. We love being in the office, hanging out together and learning from each other everyday.🚢 We ship fast. All of our work is novel and cutting edge, and execution speed is paramount. We have a high bar, and we don’t sacrifice quality and design along the way.🤝 We support each other. We have an open and inclusive culture that’s focused on giving everyone the resources they need to succeed.Compensation Range: $200K - $270K