logo inner

Director, Site Reliability Engineering

LendingPointLas Colinas, Texas, United StatesRemote, Hybrid, Onsite

Job Title: Director, Site Reliability Engineering
Reports To: SVP, QAFLSA Status: ExemptDepartment: TechnologyJOB SUMMARY:Responsible for leading the strategy, architecture, and operations of the Site Reliability Engineering (SRE) function at LendingPoint. This includes overseeing infrastructure automation, DevSecOps, CI/CD pipelines, observability, release management, system stability, and incident response. The Director acts as a high-level technical decision-maker—establishing technical standards, guiding architectural decisions, and ensuring the reliability and scalability of systems to support business goals.ESSENTIAL JOB FUNCTIONS:· Provide day-to-day leadership to the SRE team, ensuring effective operations, growth, and innovation.· Manage cloud-native infrastructure, including servers, container clusters, databases, and networks across AWS/GCP/Azure.· Design and scale CI/CD pipelines and observability tools (Grafana, Prometheus, Dynatrace, Full Story, etc.) for production-grade environments.· Oversee release planning, coordination, risk mitigation, and change control across engineering and business stakeholders.· Implement proactive monitoring, alerting, and incident response systems to ensure performance and reliability.· Lead capacity planning and scaling efforts for high-growth environments and services.· Drive automation initiatives to optimize operations, reduce manual effort, and improve service quality.· Manage vendor relationships with cloud providers, data centers, and infrastructure partners to uphold SLAs and resolve issues efficiently.· Own disaster recovery and business continuity strategies to minimize downtime and ensure data resilience.· Develop and maintain infrastructure and operational documentation; provide internal training as needed.· Guide cross-functional release planning across Product, QA, Engineering, and IT Ops to align with business goals.· Lead retrospectives for major incidents and continuously improve recovery time and system reliability.· Promote a culture of continuous improvement, learning, and engineering excellence within the team.MINIMUM QUALIFICATIONS:· Bachelor's degree in computer science or related discipline, preferred.· 10+ years of experience in SRE or DevOps roles supporting high-scale systems.· 5+ years of experience leading SRE/DevOps or release teams.· Strong expertise in Kubernetes administration, Docker container orchestration, and infrastructure as code (IaC).· Experience managing production infrastructure on AWS, Azure, or Google Cloud Platform.· Deep knowledge of monitoring, logging, and alerting tools such as Prometheus, Dynatrace, Full Story, or Nagios.· Hands-on experience with CI/CD tools (e.g., GitLab CI, Jenkins), IaC (Terraform), and scripting languages (Python, Bash, Go).· Strong programming background in Java, with experience building and scaling microservices-based platforms.· Solid understanding of web/API technologies (REST, JSON), observability, and API gateways.· Experience managing environments across development, QA, staging, and production tiers.· Proven ability to lead disaster recovery planning, business continuity, and compliance enforcement.· Certification in relevant areas (e.g., AWS, Azure Administrator, GCP Network Engineer) is a plus.· Excellent analytical, troubleshooting, and decision-making skills for complex system problems.· Strong verbal and written communication skills can interact at all levels of the organization.COMPETENCIES:· Customer Service: Exceptional attitude and a passion for providing outstanding service to internal customers. · Analytical Skills: Proven capacity to extract and manipulate large datasets in an efficient manner. · Communications: Exhibits good listening and comprehension.

Expresses ideas and thoughts in verbal and written form. Strong presentation skills. · Compliance & Risk Awareness – Enforces standards and policies to ensure secure, compliant operations.· Infrastructure Management – Expert in managing cloud infrastructure, scalability, security, and platform efficiency.· Observability & Incident Response – Establishes comprehensive monitoring and drives high-quality incident handling.· Problem Solving – Tackles complex systems issues with data-driven strategies and root cause analysis.· Release & Change Management – Effectively governs the release lifecycle, balancing speed with stability.· Strategic Communication – Engages cross-functional teams and leadership with clarity, transparency, and influence. · Team Leadership – Inspires and manages high-performing engineering teams with a focus on trust, agility, and resilience.SUPERVISORY RESPONSIBILITYYesPHYSICAL DEMANDSWhile performing the duties of this job, the employee is regularly required to stand, walk, reach and sit for a minimum of 8 hours with or without reasonable accommodation.

The employee is required to use hands to finger, handle, or feel objects and/or tools. The employee is required to talk or hear with or without reasonable accommodation and must sometimes lift and move up to 10 pounds.WORK ENVIRONMENTWhile performing the logistics duties of this job, the employee is frequently exposed to moderate noises such as computers, printers, and other light traffic noise in an office setting.This role is in-office. Remote work may be performed from a pre-approved location, as arranged, and scheduled by team management and approved by department leadership.OTHER DUTIESPlease note this job description is not designed to cover or contain a comprehensive listing of activities, duties or responsibilities that are required of the employee for this job.

Duties, responsibilities, and activities may change or be supplemented at any time with or without notice.

Life at LendingPoint

Atlanta-based LendingPoint, a leading fintech balance sheet lender, is committed to redefining who is able to access money at fair rates, and empowering consumers to build financial momentum. LendingPoint’s award-winning leadership team holds intellectual patents for unique modeling of data and credit scoring. Committed to customer-centered excellence, the company is a Better Business Bureau accredited company. Get to know us here on LinkedIn and at lendingpoint.com.
Thrive Here & What We Value1. Promotes strong cross-functional communications2. Enhances efficiencies impacting process improvement efforts3. Drives actions with stakeholders for business goals4. Foster creativity and technical excellence within the team5. Provides technical insights to support decision-making6. Architectural leadership in resolving inter-program issues7. Availability to work weekends, if needed8. Minimum qualifications: Bachelor's degree preferred; CPA certification and 3+ years of experience required9. Experience with large volume bank reconciliation10. Advanced MS Excel skills
Your tracker settings

We use cookies and similar methods to recognize visitors and remember their preferences. We also use them to measure ad campaign effectiveness, target ads and analyze site traffic. To learn more about these methods, including how to disable them, view our Cookie Policy or Privacy Policy.

By tapping `Accept`, you consent to the use of these methods by us and third parties. You can always change your tracker preferences by visiting our Cookie Policy.

logo innerThatStartupJob
Discover the best startup and their job positions, all in one place.
Copyright © 2025