Remote LLM & Agent Evaluation Jobs: 108 Open Now

Freelance Agent Evaluation Engineer

Mindrift

LLM & Agent Evaluation Part-time ↘ +45 regions

English AI Testing +12

Ireland, Belgium, Denmark, Finland, Norway, Sweden
$50/hr
Jun 30, 2026

Please submit your CV in English and indicate your level of English proficiency. Mindrift connects specialists with project-based AI opportunities for leading tech companies, focused on testing, evaluating, and improving AI systems.

View Details

Prompt Writer

Alignerr

LLM & Agent Evaluation Contractor

AI Training Creativity +12

Remote (Global)
$40 – $120/hr
Jun 29, 2026

Prompt Writer (AI Training) About the Role What if your curiosity, creativity, and way with words could directly shape how the world's most advanced AI systems think, reason, and respond?

View Details Apply Now

AI Chatbot Tester

Alignerr

LLM & Agent Evaluation Contractor

AI Make Algorithms +12

Remote (Global)
$40 – $120/hr
Jun 29, 2026

AI Chatbot Tester About the Role What if your opinions could make AI smarter, safer, and more useful for millions of people? As an AI Chatbot Tester with Alignerr, that's exactly what you'll do.

View Details Apply Now

Enterprise AI Agent Users

Mercor

Agent System Evaluator Contractor Short-term

Technical Writing English Enterprise AI Usage +4

Remote (Global)
$30/hr
Jun 27, 2026

Seeking users with hands-on experience using enterprise AI agent systems to provide structured feedback on real-world workflows and tool behavior. This fully remote contractor role offers flexible scheduling and pays $30 per hour.

View Details Apply Now

Document Review Expert

Mercor

LLM & Agent Evaluation Freelancer

Critical Reading Attention to Detail Instruction Following +3

United States
$35 – $40/hr
Jun 25, 2026

Role Overview Mercor is partnering with a leading AI research laboratory to improve the quality of AI-generated document evaluation systems through structured human review. This role is designed for highly detail-oriented professionals who can critically assess AI-generated…

View Details Apply Now

Internet-Native Bilingual Evaluator Expert (Spanish – Latin America)

Mercor

Bilingual LLM Evaluator Contractor · Part-time

Spanish English LLM Evaluation +11

Remote (Global)
$25/hr
Jun 19, 2026

Mercor is seeking Internet-Native Bilingual Evaluator Experts (Spanish–English) to contribute to cutting-edge AI research focused on multilingual communication, internet culture, and real-world language use. In this role, you will evaluate, annotate, and improve AI model…

View Details Apply Now

Internet-Native Bilingual Evaluator Expert (French)

Mercor

Bilingual LLM Evaluator Contractor · Part-time

French English LLM Evaluation +11

Remote (Global)
$50/hr
Jun 19, 2026

Mercor is seeking Internet-Native Bilingual Evaluator Experts (French–English) to contribute to advanced AI research focused on multilingual communication, internet culture, and real-world language use. This role is ideal for individuals who actively participate in both French-…

View Details Apply Now

Internet-Native Bilingual Evaluator Expert (Japanese)

Mercor

Bilingual LLM Evaluator Contractor · Part-time

Japanese English LLM Evaluation +11

Remote (Global)
$30-$40/hr
Jun 19, 2026

Mercor is seeking Internet-Native Bilingual Evaluator Experts (Japanese–English) to contribute to advanced AI research focused on multilingual communication, internet culture, and contemporary digital language use. This role is ideal for individuals who actively participate in…

View Details Apply Now

AI Quality Analyst (Gemini) - Chinese

Turing

Bilingual LLM Evaluator Contractor · Part-time Short-term

prompt-engineering AI Evaluation Annotation +7

Remote (Global)
$15/hour
Jun 18, 2026

About Turing Based in San Francisco, California, Turing is a leading research accelerator supporting frontier AI laboratories and enterprises deploying advanced AI systems. Turing helps organizations improve AI performance through high-quality datasets, advanced training…

View Details Apply Now

Ubuntu/Linux User for AI Data Training

SME Careers

LLM & Agent Evaluation Contractor · Part-time Ongoing

Linux Ubuntu Data Annotation +9

Albania, Argentina, Armenia, Bangladesh, Bhutan, Bolivia, Bosnia and Herzegovina, Brazil, Bulgaria, Cambodia, Chile, Colombia, Croatia, Ecuador, Estonia, Georgia, Guyana, Hungary, India, Indonesia, Kosovo, Latvia, Lithuania, Malaysia, Moldova, Montenegro, Nepal, North Macedonia, Pakistan, Paraguay, Peru, Poland, Romania, Serbia, Singapore, Slovakia, Slovenia, Sri Lanka, Suriname, Thailand, Czech Republic, Philippines, United States, Timor-Leste, Ukraine, Uruguay, Vietnam
Up to $20/hr
Jun 17, 2026

As an AI Data Trainer with Linux/Ubuntu, you will work remotely on hourly-paid AI data training projects involving structured, detail-oriented tasks. Responsibilities include installing applications, following detailed instructions, and recording actions while completing…

View Details Apply Now

Kurdish Language Expert

micro1

Bilingual LLM Evaluator Contractor

kurdish Translation Editing +7

Remote (Global)
$40 – $95/hr
Jun 16, 2026

Job Summary As a Kurdish Language Expert, you will leverage your linguistic expertise to help train next-generation AI systems through translation, editing, linguistic analysis, data annotation, and language quality evaluation tasks. Your contributions will help AI systems…

View Details Apply Now

Wolof Language Expert

micro1

Bilingual LLM Evaluator Contractor

wolof Translation Localization +7

Remote (Global)
$40 – $95/hr
Jun 16, 2026

Job Summary As a Wolof Language Expert, you will use your linguistic expertise to help train next-generation AI systems through high-quality translation, localization, writing, editing, and language evaluation tasks. Your work will contribute directly to improving how AI…

View Details Apply Now

Personalized Life Assistant Expert

Mercor

LLM & Agent Evaluation Contractor Ongoing

LLM Evaluation Rubric Design Quality Assessment +9

United States
$50 – $200/hr
Jun 15, 2026

About the Role A leading AI research organization is seeking experienced AI power users to evaluate how effectively advanced language models perform real-world, highly personalized life-assistance tasks. Contributors will assess AI outputs across domains such as productivity,…

View Details Apply Now

Senior AI Agent Evaluation Engineer

Mindrift

LLM & Agent Evaluation Part-time

AI English Testing +12

United Kingdom
Up to $50/hr
Jun 15, 2026

Please submit your CV in English and indicate your level of English proficiency. Mindrift connects specialists with project-based AI opportunities for leading tech companies, focused on testing, evaluating, and improving AI systems.

View Details Apply Now

AI Quality Analyst (Personalization) - Korean

Turing

Bilingual LLM Evaluator Contractor

Korean AI Evaluation Data Annotation +10

Remote (Global)
$15 – $15/hr
Jun 14, 2026

About the Role Turing is seeking Korean-speaking AI Quality Analysts to evaluate a new personalization feature for Gemini. In this role, you will assess how effectively the AI uses information from a user's Gemini conversations, Gmail, Google Search history, and YouTube activity…

View Details Apply Now

AI Safety Experts — English & Marathi

Mercor

Red Team / Safety Evaluator Contractor Short-term

Data Annotation Technical Writing Cybersecurity +12

Remote (Global)
$20 - $22/hour
Jun 11, 2026

Mercor is seeking bilingual AI Safety Experts fluent in both English and Marathi to help evaluate and strengthen the safety of frontier AI systems. This role focuses on red teaming AI models by identifying vulnerabilities, testing misuse scenarios, and generating high-quality…

View Details Apply Now

Dutch Audio Generalist Evaluator Expert

Mercor

Bilingual LLM Evaluator Contractor Short-term

Proofreading Fact-Checking Research +12

United Kingdom, United States, Singapore
$50/hr
Jun 11, 2026

Mercor is seeking Dutch Audio Generalist Evaluator Experts to contribute to a high-impact AI research project focused on audio understanding, transcription, annotation, and model evaluation. Contributors will help train and benchmark advanced language models by converting audio…

View Details Apply Now

Argentinian Spanish Audio Generalist Evaluator Expert

Mercor

Bilingual LLM Evaluator Contractor · Part-time

Transcription Localization Editing +11

United Kingdom, United States
$50/hr
Jun 11, 2026

Mercor is seeking an Argentinian Spanish Audio Generalist Evaluator Expert to contribute to a high-impact AI research project focused on audio understanding, transcription, annotation, and model evaluation. This role supports the training and benchmarking of advanced language…

View Details Apply Now

Senior Software Engineer — Agentic Coding

Alignerr

Agent System Evaluator Contractor · Full-time

AI Training Engineering +12

Remote (Global) (US: WA)
$10 – $40/hr
Jun 11, 2026

Senior Software Engineer — Agentic Coding (AI Training) About the Role What if your software engineering expertise could define how the next generation of AI writes, debugs, and ships code on its own? We're looking for Senior Software Engineers in Seattle to work at the…

View Details Apply Now

Nepali Bilingual Expert

micro1

Bilingual LLM Evaluator Contractor · Part-time

Remote (Global)
Jun 10, 2026

<h2>About the Role</h2> <p>In this remote contractor role, you'll apply your Nepali language expertise to help train next-generation AI systems. Your work will shape how models learn, reason, and perform through high-quality, real-world input.</p> <h2>Key Responsibilities</h2>…

View Details Apply Now

AI Safety Experts — English & Odia

Mercor

Red Team / Safety Evaluator Contractor Short-term

Cybersecurity LLM Evaluation Technical Writing +11

Remote (Global)
$20 - $22/hour
Jun 6, 2026

Mercor is seeking bilingual AI Safety Experts fluent in both English and Odia to help evaluate and strengthen the safety of frontier AI systems. This role focuses on red teaming AI models by identifying vulnerabilities, testing misuse scenarios, and generating high-quality…

View Details Apply Now

AI Safety Experts — English & Gujarati

Mercor

Red Team / Safety Evaluator Contractor Short-term

LLM Evaluation Data Annotation Technical Writing +12

Remote (Global)
$20 - $22/hour
Jun 6, 2026

Mercor is seeking bilingual AI Safety Experts fluent in both English and Gujarati to help evaluate and strengthen the safety of frontier AI systems. This role focuses on red teaming AI models by identifying vulnerabilities, testing misuse scenarios, and generating high-quality…

View Details Apply Now

AI Safety Experts — English & Assamese

Mercor

Red Team / Safety Evaluator Contractor Short-term

LLM Evaluation Data Annotation Technical Writing +12

Remote (Global)
$20 - $22/hour
Jun 6, 2026

Mercor is seeking bilingual AI Safety Experts fluent in both English and Assamese to help evaluate and strengthen the safety of frontier AI systems. This role focuses on red teaming AI models by identifying vulnerabilities, testing misuse scenarios, and generating high-quality…

View Details Apply Now

AI Safety Experts — English & Punjabi

Mercor

Red Team / Safety Evaluator Contractor Short-term

Cybersecurity Data Annotation Technical Writing +12

Remote (Global)
$20 - $22/hour
Jun 6, 2026

Mercor is seeking bilingual AI Safety Experts fluent in both English and Punjabi to help evaluate and strengthen the safety of frontier AI systems. This role focuses on red teaming AI models by identifying vulnerabilities, testing misuse scenarios, and generating high-quality…

View Details Apply Now

AI Safety Experts — English & Malayalam

Mercor

Red Team / Safety Evaluator Contractor Short-term

LLM Evaluation Data Annotation Technical Writing +12

Remote (Global)
$20 - $22/hour
Jun 6, 2026

Mercor is seeking bilingual AI Safety Experts fluent in both English and Malayalam to help evaluate and strengthen the safety of frontier AI systems. This role focuses on red teaming AI models by identifying vulnerabilities, testing misuse scenarios, and generating high-quality…

View Details Apply Now

AI Safety Experts — English & Telugu

Mercor

Red Team / Safety Evaluator Contractor Short-term

Cybersecurity Data Annotation Technical Writing +12

Remote (Global)
$20 - $22/hour
Jun 6, 2026

Mercor is seeking bilingual AI Safety Experts fluent in both English and Telugu to help evaluate and strengthen the safety of frontier AI systems. This role focuses on red teaming AI models by identifying vulnerabilities, testing misuse scenarios, and generating high-quality…

View Details Apply Now

AI Safety Experts — English & Tamil

Mercor

Red Team / Safety Evaluator Contractor Short-term

LLM Evaluation Data Annotation Technical Writing +12

Remote (Global)
$20 - $22/hour
Jun 6, 2026

Mercor is seeking bilingual AI Safety Experts fluent in both English and Tamil to help evaluate and strengthen the safety of frontier AI systems. This role focuses on red teaming AI models by identifying vulnerabilities, testing misuse scenarios, and generating high-quality…

View Details Apply Now

AI Safety Experts — English & Kannada

Mercor

Red Team / Safety Evaluator Contractor Short-term

LLM Evaluation Data Annotation Technical Writing +12

Remote (Global)
$20 - $22/hour
Jun 6, 2026

Mercor is seeking bilingual AI Safety Experts fluent in both English and Kannada to help evaluate and strengthen the safety of frontier AI systems. This role focuses on red teaming AI models by identifying vulnerabilities, testing misuse scenarios, and generating high-quality…

View Details Apply Now

AI Safety Experts — English & Urdu

Mercor

Red Team / Safety Evaluator Contractor Short-term

LLM Evaluation Data Annotation Technical Writing +12

Remote (Global)
$20 - $22/hour
Jun 6, 2026

Mercor is seeking bilingual AI Safety Experts fluent in both English and Urdu to help evaluate and strengthen the safety of frontier AI systems. This role focuses on red teaming AI models by identifying vulnerabilities, testing misuse scenarios, and generating high-quality…

View Details Apply Now

AI Safety Experts — English & Bengali

Mercor

Red Team / Safety Evaluator Contractor Short-term

LLM Evaluation Data Annotation Technical Writing +12

Remote (Global)
$20 - $22/hr
Jun 6, 2026

Mercor is seeking bilingual AI Safety Experts fluent in both English and Bengali to help evaluate and strengthen the safety of frontier AI systems. This role focuses on red teaming AI models by identifying vulnerabilities, testing misuse scenarios, and generating high-quality…

View Details Apply Now

Secondary Education Teacher

DataAnnotation.tech

LLM Evaluator (English) Contractor · Part-time

LLM Evaluation Editing Proofreading +7

New Zealand
$25-$30+/hr
Jun 4, 2026

DataAnnotation is seeking Secondary Education Teachers to help train and evaluate AI models. In this role, contributors will assess AI-generated responses, evaluate reasoning quality, and complete writing and editing tasks that help improve AI performance and accuracy.…

View Details Apply Now

AI Evaluation Specialist

Micro1

Agent System Evaluator Contractor · Full-time

LLM Evaluation Quality Assurance Technical Writing +10

Remote (Global)
$20 - $35/hr
May 30, 2026

As an AI Evaluation Specialist, you will help train and evaluate next-generation AI systems by designing evaluation tasks, creating grading rubrics, and assessing AI performance across real-world computer-based workflows. This role focuses on structured thinking, precise…

View Details Apply Now

Balinese Bilingual Expert

Micro1

Bilingual LLM Evaluator Contractor · Part-time

Transcription Balinese English +12

Remote (Global)
$15 - $95/hr
May 27, 2026

In this role, you'll apply your expertise to help train next-generation AI systems. Your work will shape how models learn, reason, and perform through high-quality, real-world input. No prior experience in AI is required, your domain knowledge is what matters. Key…

View Details Apply Now

Generalist

Micro1

LLM Evaluator (English) Contractor

Data Entry Attention to Detail Problem Solving +11

Remote (Global)
$15 - $25/hr
May 19, 2026

As a Generalist, you will contribute to the training and improvement of next-generation AI systems by providing high-quality real-world input, feedback, and operational support. This role is designed for adaptable professionals who can work efficiently across multiple tasks and…

View Details Apply Now

Rubrics Evaluator (Professional Experience)

Mercor

LLM & Agent Evaluation Contractor Short-term

Analytical Reasoning Financial Analysis Rubric Evaluation +10

Remote (Global)
$60-$85/hr
May 17, 2026

Mercor is seeking experienced enterprise professionals to contribute to an AI evaluation and benchmarking project in partnership with a leading AI lab. Contributors will evaluate the quality of AI-generated reasoning, decision-making, and analytical outputs across complex…

View Details Apply Now

Safety Project | Emotional Distress Role-Play Actor

Vetto AI

Red Team / Safety Evaluator Freelancer

Acting Role-Play Improvisation +11

Remote (Global)
$40/deliverable
May 16, 2026

Vetto is hiring contributors for the Safety Project — Emotional Distress (Role-Play Track), an AI safety initiative focused on evaluating how AI systems respond during emotionally sensitive and high-risk conversations. The project centers on testing whether AI systems can…

View Details Apply Now

Research Intern, Multimodal LLM Benchmarking

Centific

LLM & Agent Evaluation Intern · Full-time

Multimodal AI LLM Evaluation Benchmark Design +10

United States
$40/hr
May 16, 2026

Centific is hiring a Research Intern focused on Multimodal LLM Benchmarking to contribute to advanced AI evaluation research involving multimodal foundation models. This internship focuses on designing, executing, and analyzing benchmark systems for AI models operating across:…

View Details Apply Now

Generative AI Associate

Innodata

LLM & Agent Evaluation Contractor · Part-time

Data Annotation LLM Evaluation LLM Prompt Engineering +10

United States
$15/hr
May 16, 2026

Innodata (Nasdaq: INOD) is a global data engineering company. We believe that data and Artificial Intelligence (AI) are inextricably linked. Our mission is to enable the responsible advancement of artificial intelligence by providing the data, evaluation frameworks, and human…

View Details Apply Now

Generalist — Real World Understanding

Mercor

LLM Evaluator (English) Contractor

Critical Thinking Analytical Reasoning Spatial Reasoning +11

Remote (Global)
$34 - $40/hr
May 14, 2026

Mercor is seeking analytically minded generalists to help train AI systems on real-world reasoning and visual understanding tasks. This role focuses on evaluating AI performance across: Multi-modal reasoning Real-world interpretation Spatial reasoning Common-sense problem…

View Details Apply Now

Generalist

Mercor

LLM Evaluator (English) Contractor

Communication Skills Analytical thinking Research +5

Remote (Global)
$35 – $45/hr
May 12, 2026

Mercor is hiring remote generalists to contribute to AI training and evaluation projects for leading AI labs and enterprises. This opportunity involves working on projects focused on improving frontier AI systems using human expertise across a variety of tasks and domains.…

View Details Apply Now

AI Writing Evaluator (Domain Expert)

Volga Partners

LLM Evaluator (English) Contractor · Full-time Short-term

LLM Evaluation AI Writing Evaluation LLM Prompt Engineering +12

United States, Canada
$20 – $23/hr
May 11, 2026

About Volga Partners Volga Partners is a U.S.-based company specializing in Artificial Intelligence, machine learning, business process outsourcing, and professional services for leading technology companies and multinational organizations. Role Overview Volga Partners…

View Details Apply Now

AI Evaluation & Annotation Specialist (Italian)

Volga Partners

Bilingual LLM Evaluator Contractor

Italian LLM Evaluation AI Annotation +12

Italy
$10 – $14/hr
May 11, 2026

Role Overview We are seeking detail-oriented AI Evaluation & Annotation Specialists to help train and improve Large Language Models (LLMs). In this role, you will review AI-generated responses, evaluate output quality, annotate content, and follow structured guidelines to…

View Details Apply Now

Customer Support Specialist

DataAnnotation.tech

LLM Evaluator (English) Contractor Ongoing

Customer Support LLM Evaluation Writing +11

United States (US: VT)
$25–$30+/hr
May 7, 2026

About the Role DataAnnotation is seeking Customer Support Specialists to help train and improve next-generation AI systems. In this role, you will evaluate AI chatbot outputs, assess response quality, and help improve language accuracy, reasoning, and customer-facing…

View Details Apply Now

Freelance Writer

DataAnnotation.tech

LLM Evaluator (English) Contractor Ongoing

Writing Editing Content Development +12

Canada
$25 – $30+/hr
May 7, 2026

About the Role DataAnnotation is hiring Freelance Writers to help train and improve advanced AI language models. In this role, you will create writing and editing tasks for AI systems, evaluate chatbot-generated responses, and help improve the overall quality, logic, and…

View Details Apply Now

AI Content Writing Specialist

DataAnnotation.tech

LLM Evaluator (English) Contractor Ongoing

Content Writing Editing Proofreading +12

United States (US: CA, GA, IL +5)
$25 – $30+/hr
May 7, 2026

About the Role DataAnnotation is hiring AI Content Writing Specialists to help train and improve advanced AI language models. In this role, you will evaluate AI-generated writing, create editing and writing tasks, assess response quality, and help improve chatbot performance…

View Details Apply Now

Generative AI Generalist

DataAnnotation.tech

LLM Evaluator (English) Contractor Ongoing

Generative AI LLM Evaluation AI Training +12

United States (US: MI)
$20+/hr
May 7, 2026

About the Role Surge AI is hiring Generative AI Generalists to help train and improve advanced AI chatbot systems. In this role, you will evaluate AI-generated responses, assess logical reasoning, provide writing and editing tasks, and help improve overall AI model quality and…

View Details Apply Now

Query AI Generalist Trainer (Germany-Based | English & German Required)

RemoExperts

Bilingual LLM Evaluator Contractor

LLM Evaluation German English +11

Germany
$35 – $40/hr
May 7, 2026

About the Role Query AI is hiring Germany-based AI Generalist Trainers to support the evaluation and improvement of advanced AI systems. In this role, you will assess, compare, and rank AI-generated outputs across a variety of domains while helping improve the quality,…

View Details Apply Now

Small Business Owners (AI Response Evaluation)

Turing

Response Rater Contractor Short-term

Business Analysis Critical Thinking LLM Evaluation +10

Remote (Global)
Not specified
May 7, 2026

About Turing Turing is one of the world’s fastest-growing AI companies helping advance frontier AI systems and build real-world AI solutions for businesses worldwide. The company collaborates with leading AI labs to improve reasoning, coding, multimodal understanding,…

View Details Apply Now

Remote LLM & Agent Evaluation jobs

Filters

No jobs available

Tips for finding remote jobs