Role Introduction
We’re hiring a skilled Backend Engineer (Lead) to architect, build, and scale the backbone of our
voice AI SaaS platform and internal PaaS solutions. You will create robust, secure, and scalable
server-side systems that handle real-time voice data, WebRTC interactions, and SIP integrations
for enterprise clients. With a multi-tenant, high-traffic environment, our platform needs to be
performant, fault-tolerant, and flexible enough to adapt to customer-specific on-prem deployments.
Responsibilities
- PaaS & Infrastructure
- Design and maintain the core Platform-as-a-Service (PaaS) that supports CozmoS AI’s voice
applications and microservices.
- Oversee deployment strategies that empower internal and external teams to build on top of our
platform with minimal friction.
- Real-Time Communication (WebRTC & SIP)
- Integrate and optimize WebRTC and SIP (Session Initiation Protocol) services to enable
real-time audio streaming, calls, and other voice features.
- Implement signaling, session management, and NAT traversal solutions for voice
communication, ensuring low latency and high availability.
- Multi-Tenant Architecture
- Develop a highly scalable multi-tenant platform with strict data isolation and resource
partitioning to support enterprise customers.
- Work closely with DevOps to ensure efficient resource utilization, especially for GPU-heavy
voice/AI workloads.
- Microservices & API Development- Design and implement backend microservices that power CozmoX AI’s features, focusing on
resilience, observability, and performance.
- Create and maintain secure, high-throughput REST/GraphQL APIs or SDKs that serve both
internal teams and external clients.
- DevOps & CI/CD
- Own the end-to-end delivery process: automated testing, CI/CD pipelines, infrastructure as
code (Terraform/CloudFormation), and secret management.
- Containerize and orchestrate services with Docker/Kubernetes, handling cloud deployments
and on-prem setups for enterprise customers.
- Observability & Performance
- Set up comprehensive logging, monitoring, and alerting for real-time systems to ensure smooth
call flows and rapid troubleshooting.
- Continuously optimize server and network performance, focusing on minimizing latency for
real-time interactions.
- Security & Compliance
- Implement best-in-class security practices (encryption at rest/in transit, authentication,
authorization) in a multi-tenant environment.
- Ensure compliance with relevant data protection and privacy standards required by enterprise
customers.
- Collaboration & Leadership
- Collaborate with front-end, AI/ML teams, and product stakeholders to align on requirements,
APIs, and architecture decisions.
- Mentor junior engineers, guide best practices, and drive architectural discussions for future
product roadmap.
Qualifications
Professional Experience
- 5+ years of backend development experience, with a proven track record in building
high-availability, large-scale systems.
- Demonstrable exposure to real-time communication protocols and tools (WebRTC, SIP, RTP,
STUN/TURN, etc.).
- Experience with multi-tenant SaaS or PaaS offerings, ensuring data isolation and robust
partitioning.
Technical Expertise
- Proficiency in one or more backend languages (Python, Node.js, Go, Java) and frameworks.
- Hands-on experience with microservices, containerization (Docker), and orchestration
(Kubernetes) in production.
- Strong DevOps fundamentals: CI/CD, Infrastructure as Code (Terraform, CloudFormation),
config/secret management.
- Familiarity with call signaling, session management, and load balancing for RTC or VoIP
systems.
- Understanding of enterprise-level security practices, authentication mechanisms (OAuth, SSO),
and compliance requirements.
Cloud & On-Prem Deployment- Experience deploying services in public cloud environments (AWS, GCP, or Azure) and
on-premises.
- Knowledge of implementing robust logging and monitoring stacks (ELK, Prometheus, Grafana)
for production systems.
Preferred Skills
- Voice & AI: Knowledge of voice technology stacks, SIP trunking, telephony infrastructure, and
how they integrate with AI/ML pipelines.
- Scalability & Performance: Demonstrated ability to handle high traffic and low-latency
demands in RTC contexts.
- Database & Storage: Working knowledge of both SQL and NoSQL databases, caching layers
(Redis, Memcached), and data partitioning strategies.
- Leadership: Prior experience in leading or mentoring a small engineering team, driving
architectural decisions, and influencing product direction.
- Open Source: Contributions to open-source RTC or backend frameworks; interest in staying
on top of emerging backend/telephony trends.