You are viewing a preview of this job. Log in or register to view more details about this job.

CUDA Kernel Optimizer - ML Engineer

Send an email to stephaniemaryna.trafford@halogion.com with the Subject “CUDA Kernel Optimizer - ML Engineer” showing your interest in the role to get priority consideration.

At Halogion, we are an Independent member of Mercor referral partner program. We refer candidates to our partner that collaborates with world’s leading AI research labs to build and train cutting-edge AI models

we are engaging advanced CUDA experts who specialize in GPU kernel optimization, performance profiling, and numerical efficiency. These professionals possess a deep mental model of how modern GPU architectures execute deep learning workloads. They are comfortable translating algorithmic concepts into finely tuned kernels that maximize throughput while maintaining correctness and reproducibility,

2) Key Responsibilities

Develop, tune, and benchmark CUDA kernels for tensor and operator workloads.

Optimize for occupancy, memory coalescing, instruction-level parallelism, and warp scheduling.

Profile and diagnose performance bottlenecks using Nsight Systems, Nsight Compute, and comparable tools.

Report performance metrics, analyze speedups, and propose architectural improvements.

Collaborate asynchronously with PyTorch Operator Specialists to integrate kernels into production frameworks.

Produce well-documented, reproducible benchmarks and performance write-ups.

3) Ideal Qualifications

Deep expertise in CUDA programming, GPU architecture, and memory optimization.

Proven ability to achieve quantifiable performance improvements across hardware generations.

Proficiency with mixed precision, Tensor Core usage, and low-level numerical stability considerations.

Familiarity with frameworks like PyTorch, TensorFlow, or Triton (not required but beneficial).

Strong communication skills and independent problem-solving ability.

Demonstrated open-source, research, or performance benchmarking contributions.

4) More About the Opportunity

Ideal for independent contractors who thrive in performance-critical, systems-level work.

Engagements focus on measurable, high-impact kernel optimizations and scalability studies.

Work is fully remote and asynchronous; deliverables are outcome-driven.

Access to shared benchmarking infrastructure and reproducibility tooling via Mercor support resources.