BS/BA, MS, or PhD equivalent in Computer Science, Engineering, or a related field.
5+ years of experience in DevOps, Site Reliability Engineering, or a senior infrastructure role.
Excellent experience in designing, building, and managing large-scale, high-availability infrastructure across both cloud (AWS, GCP, Azure) and on-premises environments, with deep expertise in container orchestration platforms like RKE2 and OpenShift.
Strong knowledge of networking fundamentals (IP, TCP, UDP, etc.) and security concepts (Network Security, application security, cryptography).
Solid understanding of the technical aspects of software development, system architecture, and how the Linux operating system works.
Excellent troubleshooting skills, including solving complex system problems and navigating business ambiguities in hybrid settings.
Deep expertise and extensive experience in DevOps methodologies and frameworks (e.g., CI/CD, GitOps, Infrastructure as Code).
Expert knowledge and experience in defining infrastructure vision, strategy, and roadmaps for hybrid environments.
Expert understanding of the end-to-end product development lifecycle.
Strong understanding of business principles, KPIs, and financial modeling related to infrastructure costs (e.g., CapEx vs. OpEx).
Deep knowledge of the AI/ML industry and relevant market trends.
Extensive experience in leading and facilitating cross-functional teams.
Strong experience in mentoring and developing junior team members.