DevOps / MLOps Engineer, AI-Native Platform — InterOpera Pte.Ltd.

Deskripsi Pekerjaan

Informasi lengkap tentang posisi dan persyaratan

Ringkasan Yukerja

Lowongan DevOps / MLOps Engineer, AI-Native Platform di InterOpera Pte.Ltd. kami kurasi dari JobStreet (kategori Teknologi & IT). Posisi ini ditandai sebagai remote — pastikan timezone dan syarat lokasi kandidat di deskripsi resmi. Yukerja.com bukan pemberi kerja — lamaran diproses di situs sumber resmi.

About InterOpera

InterOpera is pioneering intelligence infrastructure that eliminates key enterprise pain points driving cost efficiency, profitability, and scalable growth. Through real-time decision making capabilities, we empower clients to identify challenges, generate actionable insights, implement solutions, and track outcomes. Our solutions span sales optimization, risk management, and energy efficiency. We build at the intersection of applied AI and enterprise SaaS and we are scaling fast across Asia.

We're looking for a DevOps / MLOps Engineer who is passionate about building and operating production-grade AI infrastructure, thrives in a fast-paced startup environment, and wants to help shape the future of enterprise decision-making through scalable cloud platforms, Kubernetes, CI/CD automation, and self hosted LLM deployments. In this role, you will own the infrastructure powering our AI-native enterprise platform, ensuring that our model serving, observability, and GPU-backed workloads are reliable, secure, cost-efficient, and compliant with the data residency requirements of regulated enterprise clients. If you enjoy working at the intersection of platform engineering and applied AI while enabling high-performance agentic systems and RAG pipelines in production, we'd love to hear from you.

Key Responsibilities

Platform & Infrastructure - the foundation

Cloud & Kubernetes: manage and continuously improve our cloud infrastructure and Kubernetes environments for scalability, reliability, and security.
CI/CD: build and maintain robust CI/CD pipelines for applications and services.
Infrastructure as Code: automate provisioning and operational tasks with Terraform (preferred) and modern IaC practices.
Monitoring & reliability: own observability - metrics, logs, traces, alerting, SLOs and drive system performance, reliability, and security.
Deployments & troubleshooting: support application deployments and lead production troubleshooting; read and debug backend code (Python, Go, Java, Node.js) to root-cause incidents fast.
Collaboration: partner closely with backend engineers to improve system scalability and reliability across APIs, microservices, and data stores.

MLOps / LLMOps — operating AI in production

Model serving: deploy and operate inference servers (vLLM, TGI, NVIDIA Triton, Ray Serve) behind autoscaling, load-balanced, observable endpoints.
GPU infrastructure: provision and orchestrate GPU nodes on Kubernetes (device plugins, MIG/time-slicing, node pools, CUDA/driver lifecycle); schedule and bin-pack GPU workloads efficiently.
CI/CD for models: build pipelines that take a model from fine-tune → evaluation → packaging → deployment, with a model registry, artifact versioning, and canary / blue-green rollout and rollback of model versions.
AI observability: track latency, throughput, token cost, GPU utilization, and quality/faithfulness regressions; wire evaluation harnesses into CI as release gates so model quality is enforced, not hoped for.
Reproducibility: manage prompt and model versioning and experiment tracking (MLflow, Weights & Biases, or equivalent).
Retrieval infrastructure: run and scale the data plane for RAG — vector stores (pgvector, Qdrant, Weaviate) and embedding pipelines.

Self-Hosted sLLM — the north star

Own the stack: stand up and operate our own small/specialized LLMs (Llama, Mistral, Qwen, Gemma) end-to-end in production.
Inference optimization: deploy quantized models and tune batching, KV-cache, and concurrency for the best latency and cost per token at the ops layer.
Scale & cost: autoscale GPU inference for spiky enterprise traffic, plan capacity, and relentlessly optimize GPU and cloud spend.
Secure & compliant self-hosting: run models in-VPC / in-tenant with strong secrets management and a secure model supply chain, meeting the data-residency and audit expectations of regulated clients (e.g., MAS-style oversight).

What We’re Looking For

Must-Have Experience

Production experience with Linux, Docker, and Kubernetes.
Hands-on experience with at least one major cloud — GCP, AWS, or Azure.
Experience with CI/CD tools and Infrastructure as Code (Terraform preferred).
Good understanding of networking, databases, and monitoring/observability tooling.
Ability to read and troubleshoot backend code (Node.js, Java, Python, Go, or PHP).
Solid understanding of APIs, microservices, and backend architectures.
4+ years in DevOps / SRE / platform engineering, owning services from design through 24/7 production operation.

Strongly Preferred — MLOps / sLLM

GPU infrastructure on Kubernetes — scheduling, autoscaling, and lifecycle of GPU workloads.
Model serving with vLLM, TGI, NVIDIA Triton, Ray Serve, or KServe.
LLMOps / MLOps — model registries, experiment tracking, AI observability, and evaluation-in-CI.
Inference optimization — quantization, batching, and self-hosting of open-source LLMs for cost, latency, and data-residency.

Mindset & Culture Fit

Results-oriented: you make infrastructure decisions with clear awareness of reliability, security, and cost trade-offs and their business impact.
Ownership: an initiative-driven self-starter, comfortable with on-call discipline and adaptable to a fast-paced startup with shifting priorities.
Communication: excellent English (written and verbal); able to work across a multicultural, distributed team. Bahasa Indonesia is a plus.

Nice to Have (Bonus)

Experience with Kafka, Redis, or Elasticsearch.
Security best practices and cloud cost optimization (FinOps), including GPU cost optimization.
Previous backend development experience.
MLOps orchestration with Ray, Kubeflow, SkyPilot, or the Hugging Face stack.
Knowledge graph / GraphRAG infrastructure — graph databases (Neo4j, Memgraph) and graph-aware retrieval. Given where our platform is heading, this is a meaningful bonus.
Experience operating in regulated domains (RegTech / finance, e.g. MAS) with data-residency and audit requirements.
Experience in one of our domains: enterprise sales automation, risk management, or energy / sustainability tech.

Work Environment

High-intensity startup: fast-paced, demanding workload we ship fast and iterate faster. Occasional weekend or off-hours work when launches or incidents require it.
Flexible hours & remote: flexibility expected for project deadlines and global team collaboration across time zones.

What We Offer

Competitive salary commensurate with experience, plus performance bonus.
Direct ownership of the infrastructure that runs our AI-Native platform including our self-hosted LLMs.
A small, senior, ambitious team where your work is visible and your impact is measurable.
Modern tech stack and the freedom to make the right architectural calls.

Ringkasan Yukerja

Tips Melamar DevOps / MLOps Engineer, AI-Native Platform