What You’ll Do
As a Senior DevOps/MLOps Engineer, you’ll help architect, build and scale the infrastructure that powers our agentic environment in production. You’ll work alongside engineers, data scientists, product managers and delivery leads to enable continuous infrastructure deployment, robust monitoring, and fast experimentations.
InteractiveAI runs on two high-performance engines: Product Teams that craft and scale our Agentic IDE, and Implementation Squads ship high-impact and domain-specific AI solutions. Depending on your craft and ambition, you’ll join the team where you can create outsized value—and give you a transparent, performance-based path to growth and rewards.
- Design & scale multi-tenant, cloud-agnostic runtimes (Kubernetes/GPU clusters) supporting on-prem, VPC, and hybrid installs
- Automate end-to-end ML pipelines—data ingestion, fine-tuning (LoRA/QLoRA), evaluation, and secure rollout—through robust CI/CD
- Partner with product engineers and client performance squads to ship custom agents from sandbox (≤ 5 days) to production (4–6 weeks) on tight SLAs.
- Automate infrastructure using Terraform, Ansible, or similar tools
- Implement and manage containerized workloads (Docker, Kubernetes, etc.)
- Ensure security, compliance, and data governance standards are met
- Troubleshoot production incidents and proactively improve system reliability
What We’re Looking For
We’re looking for someone who can build and help us scale a robust infrastructure for our agentic environment and its ecosystem of solutions – with strong fundamentals, clean execution, and operational maturity.
Minimum Requirements:- 3+ years in DevOps, Site Reliability, or Infrastructure Engineering roles
- 3+ years deploying and managing AI/ML production workloads on one or more major public clouds (e.g., AWS, GCP or Azure)
- Experience in deploying robust, resilient and distributed cloud solutions at scale
- Proficiency in containerization and orchestration (Docker, Kubernetes)
- Experience building and managing CI/CD pipelines
- Familiarity with infrastructure-as-code tools (Terraform, CloudFormation, Pulumi)
- Strong scripting skills (Python, Bash, or similar)
- Experience with monitoring and logging stacks (e.g., Prometheus, Grafana, ELK)
- Strong communication and collaboration across cross-functional teams
Additional Requirements:- Experience deploying ML/AI workloads in production
- Exposure to model versioning, tracking, and reproducibility tooling (e.g., MLflow, Weights & Biases)
- Experience implementing security practices in DevOps pipelines
- Familiarity with GDPR, ISO 27001, or other regulatory/compliance frameworks
- Previous work in regulated or enterprise-grade environments is a plus
Interview Process
We keep our process focused and respectful of your time. Most candidates complete it in 2–3 weeks. Here’s what to expect:
- Intro Call – 30 minutes with our team to align on fit and expectations
- Take-Home Challenge – A practical task based on real-world problems
- Technical Interview – Deep dive into the challenge, technical experience, and DevOps engineering
- Cultural and Values Interview – Discussion on motivation, cultural and value alignment
- Offer – Final conversation and offer
We’re building a team of builders — people who care about impact, quality, and growth. If that’s you, let’s talk —
careers@interactive.ai
About us
InteractiveAI is a fast-growing startup on a mission to empower enterprises with fully managed AI agent lifecycles.
We are building the next generation of enterprise-AI solutions, delivering an end-to-end Agentic IDE alongside an extensible ecosystem of agentic resources and solutions.
Our platform allows companies to orchestrate, monitor, evaluate, deploy and improve AI agents—and soon fine-tune and own their own models.
We value autonomy, speed, and innovation, and we’re building a world-class team to match. Our squads are lean, focused, and execution-driven.
If you thrive in high-performance environments and want to be part of a company that rewards transformational outcomes, this is for you.