
Machine Learning Infrastructure Engineer
ML Infrastructure Engineer
Menlo Park, CA | On-Site | Full-Time/Direct Hire
Inference Optimization is a MUST
Looking for ML Infra experts (Bay Area preferred) with deep experience in CUDA, GPU optimization, VLLMs, and LLM inference—pure language focus, no vision/audio.
This is for our Dillusion-LLM StartUp Client in Menlo Park, CA
We’re looking for a ML Infrastructure Engineer to help build the infrastructure that powers large-scale model training and real-time inference. You’ll collaborate with world-class researchers and engineers to design high-performance, distributed systems that bring advanced LLMs into production.
Responsibilities
- Design and manage distributed infrastructure for ML training at scale
- Optimize model serving systems for low-latency inference
- Build automated pipelines for data processing, model training, and deployment
- Implement observability tools to monitor performance in production
- Maximize resource utilization across GPU clusters and cloud environments
- Translate research requirements into robust, scalable system designs
Must-Haves
- Masters or PhD in Computer Science, Engineering, or a related field (or equivalent experience)
- Strong foundation in software engineering, systems design, and distributed systems
- Experience with cloud platforms (AWS, GCP, or Azure)
- Proficient in Python and at least one systems-level language (C++/Rust/Go)
- Hands-on experience with Docker, Kubernetes, and CI/CD workflows
- Familiarity with ML frameworks like PyTorch or TensorFlow from a systems perspective