Evaluate AI-generated code and design advanced programming tasks. High-paying remote role ($90–$120/hr).
Agentic Coding Annotator - Online / Offline Tasks
Job description
About Turing
Turing is one of the world’s fastest-growing AI companies, accelerating the advancement and deployment of powerful AI systems.
Turing partners with leading AI labs to improve frontier models across reasoning, coding, agentic behavior, multimodality, multilinguality, STEM, and advanced software intelligence.
---
Role Overview
Turing is seeking experienced software practitioners to work as Agentic Coding Annotators supporting frontier AI coding model evaluation projects.
This role focuses on evaluating realistic software engineering workflows within agentic coding environments. Candidates will review model-generated coding trajectories, validate outputs, design coding tasks, and provide structured annotations and evaluations.
This is an advanced engineering evaluation role requiring strong debugging, validation, and software reasoning capabilities.
---
Key Responsibilities
Online Evaluations
- Interact with blinded AI coding models on predefined software tasks
- Evaluate and rank generated coding trajectories
- Validate outputs through testing and debugging
Offline Evaluations
- Design realistic multi-step coding tasks
- Simulate user workflows and engineering objectives
- Create detailed evaluation rubrics and grading criteria
- Review and assess generated model outputs
General Responsibilities
- Execute coding tasks within agentic coding harnesses
- Run tests, commands, scripts, and debugging workflows
- Inspect logs and generated artifacts
- Perform manual and automated validation checks
- Write evidence-based evaluation rationales
- Maintain process consistency and schema compliance
- Escalate broken environments or unclear workflows
---
Required Skills & Qualifications
- 5+ years of experience in:
- Software Engineering
- QA Engineering
- Developer Tooling
- Data Engineering
- ML Engineering
- Similar code-heavy technical roles
Strong programming experience in at least 1–2 ecosystems:
- Python
- JavaScript / TypeScript
- Rust
- Java
- C / C++
- Bash / CLI environments
- Haskell
- Swift
- SQL
Candidates must be able to:
- Read unfamiliar codebases
- Run and interpret tests/scripts
- Debug complex issues
- Evaluate implementation correctness
- Analyze edge cases and partial fixes
---
Preferred Qualifications
- Strong Docker experience
- Experience working with large production repositories
- Strong software architecture judgment
- Experience designing realistic engineering tasks
- Ability to create non-trivial coding workflows beyond tutorial-level exercises
---
Work Details
- Fully remote contractor role
- 8 hours/day commitment
- Minimum 4-hour overlap with PST required
- Contract duration: 5 weeks
- Expected start date: next week
---
Why Join Turing
- Work on frontier AI coding systems
- Contribute to advanced LLM evaluation pipelines
- Collaborate with global engineering teams
- Flexible remote-first environment
- High-impact work improving agentic coding models
You will be redirected to the company's website to complete your application.