Principal Bioinformatics Data Scientist

LocationUnited States

TypeRemote, Onsite

Baylor Genetics is seeking an experienced and visionary Principal Bioinformatics Data Scientist to join our Bioinformatics R&D and Data Science team. This individual will play a pivotal role in advancing our genomic analysis capabilities through the development of innovative computational methods, algorithms, and ML/AI-driven models to support our mission of delivering accurate, fast, and clinically actionable genomic insights.

The Principal Bioinformatics Data Scientist at Baylor Genetics will serve as a scientific and technical leader driving innovation at the intersection of bioinformatics, data science, machine learning, and clinical genomics. This individual will develop and implement advanced computational, statistical, machine learning, and AI solutions to enhance genomic analysis pipelines, accelerate diagnostic workflows, and enable data-driven insights from large-scale genomic and clinical datasets.This role requires a seasoned scientist-engineer who thrives on solving complex biological problems through advanced data science and who is comfortable bridging research innovation and production-grade deployment in a regulated clinical environment.The successful candidate will combine deep expertise in bioinformatics, genomics, computer science, and machine learning to lead high-impact initiatives spanning secondary and tertiary analysis, data integration, and model-based interpretation of clinical genomics data.

QUALIFICATIONS:

Education:

Master’s and higher degree (PhD preferred) in Bioinformatics, Computer Science, Data Science, Computational Biology, or a related field.

Experience:
6+ years of professional experience in genomic data science related to bioinformatics, computational genomics, or similar, including at least 3 years in a senior or lead role.
Proven track record of statistical, machine learning, and AI model development using genomic and clinical data.
Strong experience with secondary and tertiary genomic analysis (alignment, variant calling, annotation, and interpretation).
Experience in data Lakehouse (Databricks, Snowflakes), and precision health platforms (DNAnexus, Velsera).
Experience in big data, data ETL, data visualization, workflow orchestration/logging, and databases (including SQL, no-SQL, and graph-based).
Demonstrated experience working in a clinical or diagnostic genetics environment is highly desirable.
Proficient in Python, R, C/C++, Java, or similar programming languages.
Expertise in machine learning frameworks (e.g., TensorFlow, PyTorch, Scikit-learn, XGBoost).
Advanced understanding of statistical modeling, including Bayesian inference, GLMs, mixed models, and resampling methods.
Experience applying deep learning architectures (transformers, CNNs, GNNs) to genomic and biomedical data.
Deep understanding of statistical analysis, data modeling, and computational methods used in genomics.
Experience with NGS data formats and genome databases.
Familiarity with cloud computing environments (Azure, AWS, GCP) and distributed computing frameworks (e.g., Spark, Dask).
Deep knowledge of statistical modeling, dimensionality reduction, and data visualization.
Familiarity with CI/CD, containerization (Docker/Kubernetes), and version control (Git).
Core Competencies:
Exceptional analytical, problem-solving, and critical thinking skills.
Ability to translate complex data-driven analyses into actionable biological and clinical insights.
Excellent written and verbal communication skills, with the ability to communicate effectively across disciplines.
Deep understanding of both computational methods and biological context.
Demonstrated leadership in cross-functional team environments.
Passion for innovation in precision medicine and clinical genomics.

DUTIES AND RESPONSIBILITIES:

Algorithms and Model Development:

Design, develop, and optimize computational algorithms, statistical models, and machine learning/AI approaches for genomic data analysis.
Develop algorithms for variant prioritization, pathogenicity prediction, phenotype-genotype association, and diagnostic decision support.
Lead efforts to apply deep learning and predictive modeling to variant interpretation, phenotype correlation, and diagnostic decision support.
Develop and optimize algorithms for secondary and tertiary genomic analysis, variant calling, variant annotation, ACMG classification, and data-driven interpretation of genomic findings.
Apply deep learning architectures (CNNs, RNNs, GNNs, transformers) and probabilistic modeling approaches to improve variant calling, variant interpretation, and disease prediction.
Design and optimize Bayesian, regression, and ensemble models to quantify uncertainty, improve confidence scoring, and support clinical decision-making.

Develop feature engineering and dimensionality reduction strategies for multi-modal data integration (genomic, transcriptomic, phenotypic, and clinical).

Genomic Analysis:
Serve as a subject matter expert (SME) in data science applications for bioinformatics pipeline development, including secondary and tertiary analysis, variant interpretation, and clinical reporting automation.
Drive enhancements to existing bioinformatics pipelines for improved accuracy, performance, and interpretability.
Integrate and analyze multi-omics datasets (genomic, transcriptomic, phenotypic, and clinical) to extract meaningful biological and clinical insights.
System Innovation and Optimization:
Evaluate and enhance existing bioinformatics and data science pipelines for improved accuracy, speed, and scalability.
Collaborate with engineering teams to integrate new algorithms and frameworks into production-grade analysis pipelines.
Drive novel genomic data platform development to support a cohesive data ecosystem.
Research and Development:
Drive innovation in genomics data science through research and development of novel analytical methodologies.
Stay current with emerging tools, frameworks, and technologies in ML, AI, bioinformatics, and genomics, and guide the team in adopting best practices.
Cross Functional Collaboration:
Partner closely with laboratory scientists, clinical geneticists, software engineers, and data engineers to translate scientific insights into clinical applications.
Serve as a scientific and technical thought leader across cross-disciplinary projects.
Evaluate emerging technologies, frameworks, and methodologies to ensure Baylor Genetics remains at the forefront of computational genomics innovation.
Present findings and strategic recommendations to executive and scientific leadership.
Mentorship:
Provide technical mentorship and guidance to bioinformatics scientists, data scientists, and software engineers.
Contribute to strategic planning and technical direction for the Bioinformatics R&D and Data Science group.

WHY JOIN BAYLOR GENETICS

At Baylor Genetics, you’ll be part of a world-class organization at the forefront of clinical genomics, transforming patient care through innovation, quality, and collaboration. You’ll have the opportunity to work alongside some of the brightest minds in genomics and data science, contributing directly to cutting-edge diagnostic solutions that impact patients and families worldwide.

PHYSICAL DEMANDS AND WORK ENVIRONMENT:

Frequently required to sit
Frequently required to stand
Frequently required to utilize hand and finger dexterity
Frequently required to talk or hear
Frequently required to utilize visual acuity to operate equipment, read technical information, and/or use a keyboard
Occasionally exposed to bloodborne and airborne pathogens or infectious materials

EEO Statement:

Baylor Genetics is proud to be an equal opportunity employer dedicated to building an inclusive and diverse workforce. We do not discriminate based on race, religion, color, national origin, sex, sexual orientation, age, gender identity, veteran status, disability, genetic information, pregnancy, childbirth, or related medical conditions, or any other status protected under applicable federal, state, or local law.