Remote contract opportunity for experienced Python software engineers to contribute production-grade Python expertise toward frontier AI model training projects. Strong knowledge of modern Python architecture, testing, performance optimization, and scalable systems required.
Software & Data Science Expert
Job description
Mercor is building a benchmark dataset to evaluate AI models on professional document understanding and instruction following within the Technology domain.
Contributors will create high-quality evaluation tasks based on real-world technical materials including:
- Technical specifications
- Architecture documentation
- API references
- Codebases
- Web research
- Code execution workflows
Each task includes:
- A clearly defined objective
- Ground-truth outputs
- Objective evaluation criteria
- Detailed scoring rubrics
The goal is to evaluate an AI system's ability to:
- Understand technical documentation
- Follow complex instructions
- Perform multi-step reasoning
- Produce accurate and structured outputs
Key Responsibilities
Author benchmark tasks based on real-world technical documents
Create multi-step challenges requiring:
- Document comprehension
- Technical reasoning
- Information synthesis
- Instruction following
Define expected outputs and ground-truth answers
Develop objective evaluation rubrics
Ensure tasks accurately measure AI performance across technology-focused workflows
Contribute to benchmark datasets used for frontier AI model evaluation
Ideal Candidate Background
Candidates should have at least 3 years of hands-on experience in one or more of the following domains:
Software Engineering
- Application development
- Software architecture
- APIs
- Technical documentation
- System design
Data Science & Analytics
- Data analysis
- Statistical reasoning
- Data workflows
- Business analytics
- Machine learning concepts
Work Expectations
- Minimum commitment of 15–20 hours per week
- Fully remote work environment
- Flexible schedule
- Independent contractor engagement
Contract & Payment Terms
- Weekly payments through Stripe or Wise
- Remote project work completed on your own schedule
- Projects may be extended, shortened, or concluded based on project requirements and performance
- No access to confidential employer or client information is required
Important Note
Mercor currently cannot support:
- H1-B candidates
- STEM OPT candidates
About Mercor
Mercor partners with leading AI labs and enterprises to train and evaluate frontier AI systems using human expertise.
Contributors work directly on projects that improve next-generation AI models through high-quality evaluation datasets, benchmark creation, and expert review.
You will be redirected to the company's website to complete your application.