You are viewing a preview of this job. Log in or register to view more details about this job.

Machine Learning Infrastructure Engineer

ML Infrastructure Engineer
Menlo Park, CA | On-Site | Full-Time/Direct Hire

Inference Optimization is a MUST

Looking for ML Infra experts (Bay Area preferred) with deep experience in CUDA, GPU optimization, VLLMs, and LLM inference—pure language focus, no vision/audio.

This is for our Dillusion-LLM StartUp Client in Menlo Park, CA

We’re looking for a ML Infrastructure Engineer to help build the infrastructure that powers large-scale model training and real-time inference. You’ll collaborate with world-class researchers and engineers to design high-performance, distributed systems that bring advanced LLMs into production.

Responsibilities

Design and manage distributed infrastructure for ML training at scale
Optimize model serving systems for low-latency inference
Build automated pipelines for data processing, model training, and deployment
Implement observability tools to monitor performance in production
Maximize resource utilization across GPU clusters and cloud environments
Translate research requirements into robust, scalable system designs

Must-Haves

Masters or PhD in Computer Science, Engineering, or a related field (or equivalent experience)
Strong foundation in software engineering, systems design, and distributed systems
Experience with cloud platforms (AWS, GCP, or Azure)
Proficient in Python and at least one systems-level language (C++/Rust/Go)
Hands-on experience with Docker, Kubernetes, and CI/CD workflows
Familiarity with ML frameworks like PyTorch or TensorFlow from a systems perspective