logo inner

Senior Site Reliability Engineer (SRE) & Support Lead

LocationChennai, India | Tamil Nadu, India
TypeOnsite
Banyan Software provides the best permanent home for successful enterprise software companies, their employees, and customers. We are on a mission to acquire, build and grow great enterprise software businesses all over the world that have dominant positions in niche vertical markets. In recent years, Banyan was named the #1 fastest-growing private software company in the US on the Inc. 5000 and amongst the top 10 fastest-growing companies by the Deloitte Technology Fast 500. Founded in 2016 with a permanent capital base setup to preserve the legacy of founders, Banyan focuses on a buy and hold for life strategy for growing software companies that serve specialized vertical markets.

Senior Site Reliability Engineer (SRE) & Support Lead (Touchstream)


Location:

Chennai, India

Reports to:

Head of Integrations

Role Type:

Hands-on senior individual contributor with support leadership responsibilities

Company & Core Product Snapshot


Touchstream is the OTT Operations Hub: a cloud-native SaaS platform for independent, end-to-end monitoring of streaming video systems (CDNs, origin, delivery chain). We serve some of the world’s largest broadcasters, telco/OTT services, and streaming platforms—monitoring tens of thousands of live streams in real time.Touchstream now unifies its best selling CDN Monitoring and VirtualNOC into a single platform delivering:

  • Unified data & end-to-end visibility across the streaming workflow
  • Best-in-class incident intelligence and RCA tooling (including timestamped evidence packs)
  • Operating-model improvements via shared views, collaboration, AI MCP Servers and rich knowledge bases
  • Business value and ROI reporting for capacity optimization and performance insights

Role Summary


As Senior SRE Engineer & Support Lead, you will own production health for Touchstream’s customer-facing platform and data plane, while also leading the global technical support function as part of your SRE responsibilities. Your mission is twofold:

  • Reliability ownership: ensure high availability, performance, and change safety across the system (UI/API and ingest, process & query pipelines), with strong SLO discipline and continuous improvement.
  • Support leadership: run and evolve the support operation—triage, escalation, incident response coordination, tooling, and (over time) building a strong support team in Chennai to deliver world-class customer outcomes.
  • This is a highly impactful role at the intersection of SRE, incident management, observability engineering, and customer-facing support.

    Responsibilities:


    1) Reliability Ownership (Primary)


    • Define and maintain SLOs, error budgets, and service health reporting.
    • Own availability and performance of:
    • Customer-facing system: UI/API
    • Data plane: ingest,  process & query pipelines
    • Drive capacity planning for live-event spikes, load testing, and scaling strategies.
    • Prevent recurring issues through high-quality RCAs and rigorous follow-through.

    2) On-Call & Incident Management (Run the Room)


    • Build and evolve the on-call operating model: severity levels, paging rules, escalation paths, comms templates.
    • Lead high-severity incidents end-to-end: triage, mitigation, rollback, “stop the bleeding” decisions, stakeholder comms.
    • Track MTTA/MTTR and implement systemic improvements over time.

    3) Observability for the Observability Platform (“Meta-Observability”)


    • Own “who watches the watcher?”—monitoring and alerting for Touchstream’s monitoring pipeline itself.
    • Standardize telemetry conventions (logs/metrics/traces) across services.
    • Build and maintain dashboards for:
    • ingest health (per customer / per source)
    • pipeline lag
    • query performance
    • alerting health
    • Tune alerting to reduce noise: dedupe, routing, “symptom vs cause,” threshold hygiene.

    4) Release Engineering & Change Safety (Bulletproof Change Management)


    • Implement guardrails: feature flags, progressive delivery/canaries, automated rollback triggers.
    • Maintain release readiness practices: migration checks, backfills, customer impact assessment, capacity impacts.
    • Drive change metrics: deploy frequency, change failure rate, recovery time from deploys.

    5) Cost & Efficiency Ownership (Cloud Economics)


    • Monitor and optimize cost per GB ingested/stored/queried.
    • Enforce retention policies, tiering, sampling, and query limits without breaking customer value.
    • Make explicit capacity vs. cost tradeoffs—especially around large live events and heavy dashboards.

    6) Security & Resilience Basics (Small-Team Practicality)


    • Baseline controls: access reviews, secrets management, least privilege, dependency scanning.
    • Rate limiting / abuse guardrails, audit logging, security incident response readiness.
    • Backup/restore and lightweight-but-real disaster recovery drills.

    7) Support Leadership & Operations (Explicitly Part of the Role)


    • Serve as the senior escalation point for critical customer issues and high-impact outages.Senior Technical Support Manage…
    • Own the support operating model:
    • ticket triage, prioritization, SLAs, escalation paths, and shift handovers
    • runbooks, playbooks, FAQs, and knowledge base (including formats suitable for AI-assisted support / RAG)
    • Establish and monitor support KPIs (SLA compliance, backlog, customer satisfaction, MTTx) and implement process improvements.Senior Technical Support Manage…
    • Partner with Engineering/Product/Integrations to turn support learnings into reliability fixes and product improvements.
    • Over time: help build, mentor, and lead a team of support/NOC engineers in Chennai.

    8) Customer-Impact Focus (Tenant Health & Trust)


    • Maintain per-tenant “customer health views”: SLO compliance, noisy sources, top offenders, recurring incident patterns.
    • Collaborate with Product on operator workflows: service health panels, incident summaries, status updates.

    Required Qualifications & Skills


    Technical / SRE Foundation


    • 8+ years in SRE, production operations, technical support for SaaS, or NOC/ops roles with strong reliability ownership.
    • Strong Linux fundamentals; comfort with debugging distributed systems.
    • Strong understanding of cloud infrastructure (AWS and/or GCP) and service operations.
    • Experience with monitoring/alerting/logging stacks, incident management, and RCA practices.
    • Ability to automate operational work (Python and/or shell scripting); comfort with APIs and CLI tooling.

    Streaming / OTT Domain (Nice to Have)


    • Strong understanding of video streaming and delivery concepts: HLS, DASH, CMAF, ABR, CDNs, origin, HTTP, caching, DNS, SSL/TLS.  Familiarity with AWS Media Services is a big plus.

    Support Leadership & Customer Communication


    • Proven ability to run escalations and communicate clearly in high-pressure incidents.
    • Experience designing support workflows, SLAs, escalation paths, and operational KPIs.
    • Strong written and verbal English; confidence presenting incident status and RCAs to customers.

    Working Style


    • Comfortable with flexible hours to support global customers (overlap with Europe/US time zones as needed).
    • Bias for action, continuous improvement mindset, and strong ownership.

    Desired / Nice-to-Have


    • Prior experience supporting high-scale, always-on streaming events and live operations.
    • Experience with progressive delivery, canarying, feature-flag platforms, and release automation.
    • Familiarity with IT service management frameworks (e.g., ITIL).
    • Security operations exposure (secrets management, vulnerability management, audit logging).

    What You’ll Gain & Why Join


    • A senior, high-ownership role shaping reliability + support for a mission-critical observability platform in OTT streaming.
    • Direct impact on global broadcasters and streaming services—improving viewer experience at scale.
    • Opportunity to build the SRE/support operating model and grow the Chennai support function over time.
    • Collaboration with a globally distributed team across engineering, integrations, operations, and product.

    Diversity, Equity, Inclusion & Equal Employment Opportunity at Banyan: Banyan affirms that inequality is detrimental to our Global Teams, associates, our Operating Companies, and the communities we serve. As a collective, our goal is to impact lasting change through our actions. Together, we unite for equality and equity. Banyan is committed to equal employment opportunities regardless of any protected characteristic, including race, color, genetic information, creed, national origin, religion, sex, affectional or sexual orientation, gender identity or expression, lawful alien status, ancestry, age, marital status, or protected veteran status and will not discriminate against anyone on the basis of a disability.

    We support an inclusive workplace where associates excel based on personal merit, qualifications, experience, ability, and job performance.

    Beware of Recruitment Scams


    We have been made aware of individuals fraudulently posing as members of our Talent Acquisition team and extending fake job offers. These scams may involve requests for personal information or payment for equipment. 

    Protect yourself by following these steps:


    • Verify that all communications from our recruiting team come from an @banyansoftware.com email address.
    • Remember, employers will never request payment or banking information during the hiring process.
    • If you receive a suspicious message, do not respond — instead, forward it to careers@banyansoftware.com and/or report it to the platform where you received it.

    Your safety and security are important to us. Thank you for staying vigilant.

    Your tracker settings

    We use cookies and similar methods to recognize visitors and remember their preferences. We also use them to measure ad campaign effectiveness, target ads and analyze site traffic. To learn more about these methods, including how to disable them, view our Cookie Policy or Privacy Policy.

    By tapping `Accept`, you consent to the use of these methods by us and third parties. You can always change your tracker preferences by visiting our Cookie Policy.

    logo innerThatStartupJob
    Discover the best startup and their job positions, all in one place.
    Copyright © 2025