Description
Summary: The CloudOps Engineer is responsible for ensuring the smooth operation of cloud-based environments by overseeing incident response, monitoring, and optimization activities. This role involves managing various cloud technologies such as storage, network equipment, servers, and hardware components. The CloudOps Engineer will also ensure timely patch management, monitor system performance, and collaborate with other teams to deliver high-quality service to both internal teams and customers.
The position requires a proactive approach to identifying and resolving issues while maintaining operational metrics and documentation.Key Responsibilities:
- Manage the smooth operation of cloud environments across AWS, Azure, GCP, VMware, and other platforms.
- Monitor cloud infrastructure for incidents, performance issues, and security vulnerabilities.
- Lead and execute system patching across cloud infrastructure, ensuring patches are timely applied to mitigate security risks.
- Diagnose and resolve issues related to patch deployment and system performance post-patching.
- Ensure compliance with patch management policies by reviewing logs and generating reports on patch status.
- Collaborate with support and development teams to minimize disruptions and address any performance issues.
- Perform activities related to the installation, configuration, and management of traditional and cloud patching (Tanium, AWS, Azure, GCP, VMware).
- Analyze and review system performance, identifying bottlenecks and recommending improvements.
- Support the operation of complex cloud infrastructures by responding to alerts, troubleshooting issues, and ensuring maximum uptime.
- Participate in an on-call rotation and provide 24/7 availability to address critical operational issues.
- Maintain and update documentation regarding patching processes, schedules, compliance, and incident resolution.
- Communicate effectively with stakeholders and customers to manage expectations and build long-term relationships.
- Monitor and analyze system logs, responding to incidents and performing root cause analysis (RCA).
- Collaborate with team members to report operational metrics and key performance indicators (KPIs).
- Provide frontline support for Cloud Monitoring, including infrastructure and application management.
- Stay current on emerging technologies, security trends, and best practices to improve operational efficiency.
Requirements
Skills & Qualifications:
- Solid understanding of cloud platforms such as Amazon AWS, Microsoft Azure, and Google Cloud Platform (GCP).
- Experience with VMware environments and cloud infrastructure management.
- Proficiency in operating systems, both Windows and Linux/Unix.
- Familiarity with network protocols (DHCP, DNS, FTP, Active Directory, Group Policy, IIS).
- Basic networking and troubleshooting skills, with the ability to analyze and resolve system and network issues.
- Experience with common cloud operations tools such as ServiceNow, LogicMon, Elastic Search, and others.
- Basic scripting knowledge in PowerShell or Python to automate tasks and improve efficiency.
- Strong communication skills, both verbal and written, with the ability to communicate complex technical issues clearly.
- Ability to multitask, manage competing priorities, and work well under pressure.
- Must be a team player who adheres to defined processes and effectively handles customer escalations.
- Self-motivated with the ability to quickly learn new technologies with minimal support and guidance.
- A proactive mindset to identify trends and potential issues before they cause significant disruptions.
- Willingness to work extended hours, participate in an on-call rotation, and support off-hour production needs.
- Ability to thrive in a fast-paced, dynamic, and flexible work environment.
Desirable Experience:
- Previous experience working in Cloud Operations or DevOps roles.
- Knowledge of patch management tools (e.g., Tanium) and security remediation practices.
- Ability to provide creative solutions to operational challenges through out-of-the-box thinking.
- Experience with cloud and infrastructure monitoring tools.