Job Summary
Join our customer’s team as an AI Model Evaluator (LLM & Agent Systems) and help shape the future of generative AI and autonomous agent technologies.
In this role, you will benchmark, analyze, and assess cutting-edge AI systems in real-world scenarios. Your structured evaluations and qualitative insights will directly inform model improvements, product refinement, and AI safety standards.
This position is ideal for analytical professionals with experience in AI quality assessment and model evaluation.
Key Responsibilities
- Evaluate outputs from large language models (LLMs) and autonomous agent systems using defined rubrics and guidelines.
- Review multi-step agent actions, including screenshots and reasoning traces, to assess accuracy and quality.
- Apply evaluation standards consistently, identifying edge cases, recurring patterns, and failure modes.
- Provide detailed, structured feedback to support benchmarking and model improvement.
- Participate in calibration sessions to ensure consistent scoring across evaluators.
- Adapt to evolving guidelines and ambiguous evaluation scenarios.
- Document findings clearly and communicate insights effectively to stakeholders.
Required Skills and Qualifications
- Experience in LLM evaluation, AI output analysis, QA/testing, UX research, or similar analytical roles.
- Strong background in AI benchmarking and rubric-based scoring frameworks.
- Exceptional attention to detail and sound judgment in complex scenarios.
- English proficiency (B2+ or equivalent) with strong written and verbal communication skills.
- Ability to work independently in a remote environment.
- Commitment of at least 20 hours per week for the initial contract term.
- Analytical mindset focused on actionable qualitative feedback.
Preferred Qualifications
- Experience with RLHF, annotation workflows, or AI benchmarking frameworks.
- Familiarity with autonomous agent systems or workflow automation tools.
- Background in mobile apps or digital product evaluation processes.
Offer Details
- Job Type: Contract (Minimum 2 weeks, potential extension)
- Openings: 7
- Hourly Pay: $20 – $30 per hour
- Location: Remote
- Minimum Commitment: 20 hours per week