MLOps Engineer
We're seeking talented MLOps Engineers with deep, hands-on expertise in modern ML frameworks — specifically JAX, PyTorch, and kernel-level programming (Pallas/Triton). This role involves AI model training and evaluation work, including writing and assessing MLOps tasks and solutions to generate high-quality training data for frontier AI systems.
This is a 40-hour full-time engagement, with no conflicts/no other engagements.
Key Responsibilities
Guide research and engineering teams to close knowledge gaps and improve AI model performance in MLOps, training infrastructure, and ML framework-level topics.
Design challenging, domain-relevant tasks, and write accurate and well-structured solutions to MLOps and ML systems problems.
Evaluate MLOps tasks and solutions and provide clear, written technical feedback.
Develop guidelines and detailed rubrics/evaluation frameworks to assess training pipeline design, distributed systems reasoning, and kernel-level optimization across tasks.
Collaborate with other subject matter experts to ensure consistency and accuracy in training data.
Core Qualifications
2+ years of dedicated professional experience in ML infrastructure, MLOps, or ML systems engineering at a recognized, top-tier organization.
Hands-on production experience with JAX and/or PyTorch at scale.
Experience writing or optimizing custom GPU kernels using Pallas (JAX) or Triton.
Demonstrable career progression.
Ability to engage reliably for at least 40 hours/week during weekdays.
Strong written communication skills and the ability to explain complex technical decisions clearly.