Design ML/LLM evaluation tasks and rubrics and grade model/agent outputs for frontier AI labs. Remote contract, 30+ hrs/week, $45-140/hr (US/Canada $100-140/hr). 5+ years MLE with SFT/RLHF/reward modeling required.
ML Engineer (Coding Agent Experience)
Job description
About the Role
Mercor is partnering with a leading AI research lab to support a Frontier Code Agents project.
This role focuses on evaluating and improving frontier AI coding models through realistic machine learning engineering workflows and structured technical assessments.
Contributors use professional ML engineering expertise to review, compare, and improve AI-generated implementations involving model training, deployment, MLOps systems, inference infrastructure, and LLM-powered applications.
Key Responsibilities
Evaluate AI Coding Agents
- Review AI-generated implementations involving model training pipelines, inference systems, MLOps workflows, LLM applications, and AI-powered products
Machine Learning Engineering Assessment
- Apply professional judgment to realistic ML engineering scenarios
- Review training workflows, model deployment strategies, inference architecture, production ML systems, and LLM integration approaches
Required Experience
- Minimum 2 years of professional machine learning engineering experience
- Experience building production ML systems, model deployment infrastructure, LLM applications, or AI-powered products
AI Coding Tools
Cursor, Claude Code, Codex, Windsurf, Gemini CLI, or similar tools.
Important Note
Mercor currently cannot support H1-B or STEM OPT candidates.
Compensation
- $400 per accepted task (~$85/hr effective rate)
About Mercor
Mercor partners with leading AI labs and enterprises to train and evaluate frontier AI systems using expert human knowledge.
You will be redirected to the company's website to complete your application.