Why This Role Exists
Mercor partners with leading AI teams to improve the quality, usefulness, and reliability of general-purpose conversational AI systems.
This role focuses on evaluating and improving chat behavior in large language models (LLMs). You will assess AI-generated responses across diverse topics and provide structured feedback to ensure accuracy, clarity, and alignment with human expectations.
What You’ll Do
- Evaluate LLM-generated responses for accuracy and effectiveness.
- Conduct fact-checking using trusted public sources and verification tools.
- Annotate strengths, weaknesses, and factual inaccuracies.
- Assess reasoning quality, clarity, tone, and completeness.
- Ensure model responses follow conversational and system guidelines.
- Apply consistent annotations using structured taxonomies and evaluation benchmarks.
Who You Are
- Bachelor’s degree holder.
- Native Arabic speaker or ILR 5 / C2 proficiency.
- Fluent in English.
- Experienced user of large language models (LLMs).
- Strong writing and structured analytical skills.
- Highly detail-oriented with strong quality judgment.
- Comfortable working across diverse domains and topics.
- Strong college-level mathematics skills.
Nice-to-Have
- Experience with RLHF, model evaluation, or annotation workflows.
- Background in research, policy, analytics, linguistics, or engineering.
- Experience comparing multiple outputs and making qualitative judgments.
- Familiarity with evaluation rubrics and benchmarking systems.
What Success Looks Like
- You consistently identify factual and reasoning errors.
- Your evaluation artifacts are clear, consistent, and reproducible.
- Your feedback measurably improves AI response quality.
- AI systems improve prior to public deployment due to your reviews.
Contract & Payment
- Independent contractor engagement.
- Fully remote with flexible schedule.
- Weekly payments via Stripe or Wise.
- Geography restricted to Egypt, Saudi Arabia, UAE, and USA.
- $22.64 per hour.
About Mercor
Mercor partners with leading AI labs and enterprises to train frontier models using human expertise. Contributors collaborate with researchers to improve advanced AI systems used globally.