Curated remote opportunities

Remote LLM & Agent Evaluation jobs

Every LLM & Agent Evaluation listing below is remote and traceable to a real, verified employer. 108 open right now, updated daily; when a role goes stale it comes down within 24 hours.

LLM & Agent Evaluation is part of our AI & Data Training domain. Filter by pay, experience level, or country to narrow the list.

Filters

80 remote jobs found
LLM & Agent Evaluation Part-time ↘ +45 regions
English AI Testing +12 Infrastructure History Quality Assurance Engineering Writing Python FastAPI JavaScript TypeScript React Docker PostgreSQL
  • Ireland, Belgium, Denmark, Finland, Norway, Sweden
  • $50/hr
  • Jun 30, 2026

Please submit your CV in English and indicate your level of English proficiency. Mindrift connects specialists with project-based AI opportunities for leading tech companies, focused on testing, evaluating, and improving AI systems.

View Details
AI Training Creativity +12 Make Flexible Writing Organization Reasoning Accuracy Consistency Prompt Writing Communication Journalism Education Communications
  • Remote (Global)
  • $40 – $120/hr
  • Jun 29, 2026

Prompt Writer (AI Training) About the Role What if your curiosity, creativity, and way with words could directly shape how the world's most advanced AI systems think, reason, and respond?

AI Make Algorithms +12 Flexible Communication Organization Accuracy Reasoning Testing Writing Editing Journalism Research Assurance AI Chatbots
  • Remote (Global)
  • $40 – $120/hr
  • Jun 29, 2026

AI Chatbot Tester About the Role What if your opinions could make AI smarter, safer, and more useful for millions of people? As an AI Chatbot Tester with Alignerr, that's exactly what you'll do.

Critical Reading Attention to Detail Instruction Following +3 Analytical Reasoning Structured Evaluation Written communication
  • United States
  • $35 – $40/hr
  • Jun 25, 2026

Role Overview Mercor is partnering with a leading AI research laboratory to improve the quality of AI-generated document evaluation systems through structured human review. This role is designed for highly detail-oriented professionals who can critically assess AI-generated…

Bilingual LLM Evaluator Contractor · Part-time
Spanish English LLM Evaluation +11 Quality Assurance Latin American Localization Internet Culture Analysis Social Media Analysis AI Training Translation Review Cultural Localization Content Evaluation Internet Slang Meme Analysis Multilingual Communication
  • Remote (Global)
  • $25/hr
  • Jun 19, 2026

Mercor is seeking Internet-Native Bilingual Evaluator Experts (Spanish–English) to contribute to cutting-edge AI research focused on multilingual communication, internet culture, and real-world language use. In this role, you will evaluate, annotate, and improve AI model…

Bilingual LLM Evaluator Contractor · Part-time
French English LLM Evaluation +11 Quality Assurance Localization QA Internet Culture Analysis Social Media Analysis AI Training Translation Review Cultural Localization Content Evaluation Internet Slang Meme Analysis Multilingual Communication
  • Remote (Global)
  • $50/hr
  • Jun 19, 2026

Mercor is seeking Internet-Native Bilingual Evaluator Experts (French–English) to contribute to advanced AI research focused on multilingual communication, internet culture, and real-world language use. This role is ideal for individuals who actively participate in both French-…

Bilingual LLM Evaluator Contractor · Part-time
Japanese English LLM Evaluation +11 Quality Assurance Localization QA Internet Culture Analysis Social Media Analysis AI Training Translation Review Cultural Localization Content Evaluation Internet Slang Meme Analysis Multilingual Communication
  • Remote (Global)
  • $30-$40/hr
  • Jun 19, 2026

Mercor is seeking Internet-Native Bilingual Evaluator Experts (Japanese–English) to contribute to advanced AI research focused on multilingual communication, internet culture, and contemporary digital language use. This role is ideal for individuals who actively participate in…

Bilingual LLM Evaluator Contractor · Part-time Short-term
prompt-engineering AI Evaluation Annotation +7 Content Moderation Quality Assurance Linguistics Critical Thinking content-analysis Research Written communication
  • Remote (Global)
  • $15/hour
  • Jun 18, 2026

About Turing Based in San Francisco, California, Turing is a leading research accelerator supporting frontier AI laboratories and enterprises deploying advanced AI systems. Turing helps organizations improve AI performance through high-quality datasets, advanced training…

LLM & Agent Evaluation Contractor · Part-time Ongoing
Linux Ubuntu Data Annotation +9 Quality Assurance Attention to Detail AI Model Evaluation App Installation Remote Work Browser Tools Technical Troubleshooting Documentation AI & Data Training
  • Albania, Argentina, Armenia, Bangladesh, Bhutan, Bolivia, Bosnia and Herzegovina, Brazil, Bulgaria, Cambodia, Chile, Colombia, Croatia, Ecuador, Estonia, Georgia, Guyana, Hungary, India, Indonesia, Kosovo, Latvia, Lithuania, Malaysia, Moldova, Montenegro, Nepal, North Macedonia, Pakistan, Paraguay, Peru, Poland, Romania, Serbia, Singapore, Slovakia, Slovenia, Sri Lanka, Suriname, Thailand, Czech Republic, Philippines, United States, Timor-Leste, Ukraine, Uruguay, Vietnam
  • Up to $20/hr
  • Jun 17, 2026

As an AI Data Trainer with Linux/Ubuntu, you will work remotely on hourly-paid AI data training projects involving structured, detail-oriented tasks. Responsibilities include installing applications, following detailed instructions, and recording actions while completing…

kurdish Translation Editing +7 Proofreading Linguistic analysis Localization Written communication Data Annotation Language Review Cultural Consulting
  • Remote (Global)
  • $40 – $95/hr
  • Jun 16, 2026

Job Summary As a Kurdish Language Expert, you will leverage your linguistic expertise to help train next-generation AI systems through translation, editing, linguistic analysis, data annotation, and language quality evaluation tasks. Your contributions will help AI systems…

wolof Translation Localization +7 Editing Proofreading Writing Linguistic Review Quality Assurance Cultural Adaptation Communication
  • Remote (Global)
  • $40 – $95/hr
  • Jun 16, 2026

Job Summary As a Wolof Language Expert, you will use your linguistic expertise to help train next-generation AI systems through high-quality translation, localization, writing, editing, and language evaluation tasks. Your work will contribute directly to improving how AI…

LLM & Agent Evaluation Contractor Ongoing
LLM Evaluation Rubric Design Quality Assessment +9 prompt-engineering AI Training Critical Thinking Written communication Benchmark Development human-feedback productivity career-planning health-research
  • United States
  • $50 – $200/hr
  • Jun 15, 2026

About the Role A leading AI research organization is seeking experienced AI power users to evaluate how effectively advanced language models perform real-world, highly personalized life-assistance tasks. Contributors will assess AI outputs across domains such as productivity,…

AI English Testing +12 Infrastructure History Quality Assurance Engineering Writing Python FastAPI JavaScript TypeScript React Docker PostgreSQL
  • United Kingdom
  • Up to $50/hr
  • Jun 15, 2026

Please submit your CV in English and indicate your level of English proficiency. Mindrift connects specialists with project-based AI opportunities for leading tech companies, focused on testing, evaluating, and improving AI systems.

Korean AI Evaluation Data Annotation +10 Content Moderation LLM Prompt Engineering AI Personalization Quality Assurance Critical Thinking Analytical Reasoning Side-by-Side Evaluation Content Review Research English
  • Remote (Global)
  • $15 – $15/hr
  • Jun 14, 2026

About the Role Turing is seeking Korean-speaking AI Quality Analysts to evaluate a new personalization feature for Gemini. In this role, you will assess how effectively the AI uses information from a user's Gemini conversations, Gmail, Google Search history, and YouTube activity…

Red Team / Safety Evaluator Contractor Short-term
Data Annotation Technical Writing Cybersecurity +12 LLM Evaluation English AI Safety Prompt Injection Jailbreaking Adversarial Testing Vulnerability Analysis Risk Management Safety Benchmarking Marathi Annotation Security Research
  • Remote (Global)
  • $20 - $22/hour
  • Jun 11, 2026

Mercor is seeking bilingual AI Safety Experts fluent in both English and Marathi to help evaluate and strengthen the safety of frontier AI systems. This role focuses on red teaming AI models by identifying vulnerabilities, testing misuse scenarios, and generating high-quality…

Bilingual LLM Evaluator Contractor Short-term
Proofreading Fact-Checking Research +12 Critical Thinking Data Annotation LLM Evaluation Dutch English Editing Quality Assurance Transcription Annotation Accuracy Instruction Following Analytical Reasoning
  • United Kingdom, United States, Singapore
  • $50/hr
  • Jun 11, 2026

Mercor is seeking Dutch Audio Generalist Evaluator Experts to contribute to a high-impact AI research project focused on audio understanding, transcription, annotation, and model evaluation. Contributors will help train and benchmark advanced language models by converting audio…

Bilingual LLM Evaluator Contractor · Part-time
Transcription Localization Editing +11 Proofreading Fact-Checking Quality Assurance LLM Evaluation Research English Argentinian Spanish Annotation Audio Evaluation Linguistics Rubric Development
  • United Kingdom, United States
  • $50/hr
  • Jun 11, 2026

Mercor is seeking an Argentinian Spanish Audio Generalist Evaluator Expert to contribute to a high-impact AI research project focused on audio understanding, transcription, annotation, and model evaluation. This role supports the training and benchmarking of advanced language…

Agent System Evaluator Contractor · Full-time
AI Training Engineering +12 Teaching Flexible Organization Reasoning Testing Python JavaScript TypeScript Java C++ Go Rust
  • Remote (Global) (US: WA)
  • $10 – $40/hr
  • Jun 11, 2026

Senior Software Engineer — Agentic Coding (AI Training) About the Role What if your software engineering expertise could define how the next generation of AI writes, debugs, and ships code on its own? We're looking for Senior Software Engineers in Seattle to work at the…

Bilingual LLM Evaluator Contractor · Part-time
  • Remote (Global)
  • Jun 10, 2026

<h2>About the Role</h2> <p>In this remote contractor role, you'll apply your Nepali language expertise to help train next-generation AI systems. Your work will shape how models learn, reason, and perform through high-quality, real-world input.</p> <h2>Key Responsibilities</h2>…

Red Team / Safety Evaluator Contractor Short-term
Cybersecurity LLM Evaluation Technical Writing +11 English AI Safety Prompt Injection Jailbreaking Adversarial Testing Vulnerability Analysis Risk Management Safety Benchmarking Odia Annotation Security Research
  • Remote (Global)
  • $20 - $22/hour
  • Jun 6, 2026

Mercor is seeking bilingual AI Safety Experts fluent in both English and Odia to help evaluate and strengthen the safety of frontier AI systems. This role focuses on red teaming AI models by identifying vulnerabilities, testing misuse scenarios, and generating high-quality…

Red Team / Safety Evaluator Contractor Short-term
LLM Evaluation Data Annotation Technical Writing +12 Cybersecurity English AI Safety Prompt Injection Jailbreaking Adversarial Testing Vulnerability Analysis Risk Management Safety Benchmarking Gujarati Annotation Security Research
  • Remote (Global)
  • $20 - $22/hour
  • Jun 6, 2026

Mercor is seeking bilingual AI Safety Experts fluent in both English and Gujarati to help evaluate and strengthen the safety of frontier AI systems. This role focuses on red teaming AI models by identifying vulnerabilities, testing misuse scenarios, and generating high-quality…

Red Team / Safety Evaluator Contractor Short-term
LLM Evaluation Data Annotation Technical Writing +12 Cybersecurity English AI Safety Prompt Injection Jailbreaking Adversarial Testing Vulnerability Analysis Risk Management Safety Benchmarking Assamese Annotation Security Research
  • Remote (Global)
  • $20 - $22/hour
  • Jun 6, 2026

Mercor is seeking bilingual AI Safety Experts fluent in both English and Assamese to help evaluate and strengthen the safety of frontier AI systems. This role focuses on red teaming AI models by identifying vulnerabilities, testing misuse scenarios, and generating high-quality…

Red Team / Safety Evaluator Contractor Short-term
Cybersecurity Data Annotation Technical Writing +12 LLM Evaluation English AI Safety Prompt Injection Jailbreaking Adversarial Testing Vulnerability Analysis Risk Management Safety Benchmarking Punjabi Annotation Security Research
  • Remote (Global)
  • $20 - $22/hour
  • Jun 6, 2026

Mercor is seeking bilingual AI Safety Experts fluent in both English and Punjabi to help evaluate and strengthen the safety of frontier AI systems. This role focuses on red teaming AI models by identifying vulnerabilities, testing misuse scenarios, and generating high-quality…

Red Team / Safety Evaluator Contractor Short-term
LLM Evaluation Data Annotation Technical Writing +12 Cybersecurity English AI Safety Prompt Injection Jailbreaking Adversarial Testing Vulnerability Analysis Risk Management Safety Benchmarking Malayalam Annotation Security Research
  • Remote (Global)
  • $20 - $22/hour
  • Jun 6, 2026

Mercor is seeking bilingual AI Safety Experts fluent in both English and Malayalam to help evaluate and strengthen the safety of frontier AI systems. This role focuses on red teaming AI models by identifying vulnerabilities, testing misuse scenarios, and generating high-quality…

Red Team / Safety Evaluator Contractor Short-term
Cybersecurity Data Annotation Technical Writing +12 LLM Evaluation English AI Safety Prompt Injection Jailbreaking Adversarial Testing Vulnerability Analysis Risk Management Safety Benchmarking Telugu Annotation Security Research
  • Remote (Global)
  • $20 - $22/hour
  • Jun 6, 2026

Mercor is seeking bilingual AI Safety Experts fluent in both English and Telugu to help evaluate and strengthen the safety of frontier AI systems. This role focuses on red teaming AI models by identifying vulnerabilities, testing misuse scenarios, and generating high-quality…

Red Team / Safety Evaluator Contractor Short-term
LLM Evaluation Data Annotation Technical Writing +12 Cybersecurity English AI Safety Prompt Injection Jailbreaking Adversarial Testing Vulnerability Analysis Risk Management Safety Benchmarking Tamil Annotation Security Research
  • Remote (Global)
  • $20 - $22/hour
  • Jun 6, 2026

Mercor is seeking bilingual AI Safety Experts fluent in both English and Tamil to help evaluate and strengthen the safety of frontier AI systems. This role focuses on red teaming AI models by identifying vulnerabilities, testing misuse scenarios, and generating high-quality…

Red Team / Safety Evaluator Contractor Short-term
LLM Evaluation Data Annotation Technical Writing +12 Cybersecurity English AI Safety Prompt Injection Jailbreaking Adversarial Testing Vulnerability Analysis Risk Management Safety Benchmarking Kannada Annotation Security Research
  • Remote (Global)
  • $20 - $22/hour
  • Jun 6, 2026

Mercor is seeking bilingual AI Safety Experts fluent in both English and Kannada to help evaluate and strengthen the safety of frontier AI systems. This role focuses on red teaming AI models by identifying vulnerabilities, testing misuse scenarios, and generating high-quality…

Red Team / Safety Evaluator Contractor Short-term
LLM Evaluation Data Annotation Technical Writing +12 Cybersecurity English AI Safety Prompt Injection Jailbreaking Adversarial Testing Vulnerability Analysis Risk Management Safety Benchmarking Urdu Annotation Security Research
  • Remote (Global)
  • $20 - $22/hour
  • Jun 6, 2026

Mercor is seeking bilingual AI Safety Experts fluent in both English and Urdu to help evaluate and strengthen the safety of frontier AI systems. This role focuses on red teaming AI models by identifying vulnerabilities, testing misuse scenarios, and generating high-quality…

Red Team / Safety Evaluator Contractor Short-term
LLM Evaluation Data Annotation Technical Writing +12 Cybersecurity English AI Safety Prompt Injection Jailbreaking Adversarial Testing Vulnerability Analysis Risk Management Safety Benchmarking Bengali Annotation Security Research
  • Remote (Global)
  • $20 - $22/hr
  • Jun 6, 2026

Mercor is seeking bilingual AI Safety Experts fluent in both English and Bengali to help evaluate and strengthen the safety of frontier AI systems. This role focuses on red teaming AI models by identifying vulnerabilities, testing misuse scenarios, and generating high-quality…

LLM Evaluator (English) Contractor · Part-time
LLM Evaluation Editing Proofreading +7 Critical Thinking Communication Attention to Detail Writing Content Development English Grammar Content Review
  • New Zealand
  • $25-$30+/hr
  • Jun 4, 2026

DataAnnotation is seeking Secondary Education Teachers to help train and evaluate AI models. In this role, contributors will assess AI-generated responses, evaluate reasoning quality, and complete writing and editing tasks that help improve AI performance and accuracy.…

Agent System Evaluator Contractor · Full-time
LLM Evaluation Quality Assurance Technical Writing +10 Process Improvement Attention to Detail Rubric Development Structured Observation Research Analysis Documentation SaaS Tools Evaluation Frameworks English Communication Reporting
  • Remote (Global)
  • $20 - $35/hr
  • May 30, 2026

As an AI Evaluation Specialist, you will help train and evaluate next-generation AI systems by designing evaluation tasks, creating grading rubrics, and assessing AI performance across real-world computer-based workflows. This role focuses on structured thinking, precise…

Bilingual LLM Evaluator Contractor · Part-time
Transcription Balinese English +12 Linguistic Annotation Cultural Analysis context analysis Written communication Verbal communication Attention to Detail Organizational Skills Grammatical analysis Emotional tone analysis Remote Collaboration Timestamping Independent Work
  • Remote (Global)
  • $15 - $95/hr
  • May 27, 2026

In this role, you'll apply your expertise to help train next-generation AI systems. Your work will shape how models learn, reason, and perform through high-quality, real-world input. No prior experience in AI is required, your domain knowledge is what matters. Key…

Data Entry Attention to Detail Problem Solving +11 Adaptability English Communication Multitasking Typing Remote Collaboration Organizational Skills Workflow Management Documentation Fast-Paced Work Communication Skills General Operations
  • Remote (Global)
  • $15 - $25/hr
  • May 19, 2026

As a Generalist, you will contribute to the training and improvement of next-generation AI systems by providing high-quality real-world input, feedback, and operational support. This role is designed for adaptable professionals who can work efficiently across multiple tasks and…

LLM & Agent Evaluation Contractor Short-term
Analytical Reasoning Financial Analysis Rubric Evaluation +10 LLM Evaluation Strategic Analysis Decision Making Quantitative Reasoning Qualitative Assessment Written communication Enterprise Operations Critical Thinking Operational Analysis Attention to Detail
  • Remote (Global)
  • $60-$85/hr
  • May 17, 2026

Mercor is seeking experienced enterprise professionals to contribute to an AI evaluation and benchmarking project in partnership with a leading AI lab. Contributors will evaluate the quality of AI-generated reasoning, decision-making, and analytical outputs across complex…

Acting Role-Play Improvisation +11 Emotional Communication Performing Arts Voice Acting Character Performance Conversational Roleplay Emotional Intelligence Communication Skills Drama Theater Performance Empathy AI Safety Testing
  • Remote (Global)
  • $40/deliverable
  • May 16, 2026

Vetto is hiring contributors for the Safety Project — Emotional Distress (Role-Play Track), an AI safety initiative focused on evaluating how AI systems respond during emotionally sensitive and high-risk conversations. The project centers on testing whether AI systems can…

LLM & Agent Evaluation Intern · Full-time
Multimodal AI LLM Evaluation Benchmark Design +10 Machine Learning Natural Language Processing Computer Vision PyTorch Hugging Face Transformers Data Annotation Statistical Analysis Research Methods Python Dataset Curation
  • United States
  • $40/hr
  • May 16, 2026

Centific is hiring a Research Intern focused on Multimodal LLM Benchmarking to contribute to advanced AI evaluation research involving multimodal foundation models. This internship focuses on designing, executing, and analyzing benchmark systems for AI models operating across:…

LLM & Agent Evaluation Contractor · Part-time
Data Annotation LLM Evaluation LLM Prompt Engineering +10 Content Evaluation Relevance Ranking Summarization Transcription Translation Generative AI Large Language Models Critical Thinking Communication Skills AI Training Data
  • United States
  • $15/hr
  • May 16, 2026

Innodata (Nasdaq: INOD) is a global data engineering company. We believe that data and Artificial Intelligence (AI) are inextricably linked. Our mission is to enable the responsible advancement of artificial intelligence by providing the data, evaluation frameworks, and human…

Critical Thinking Analytical Reasoning Spatial Reasoning +11 Visual Understanding Multi-Modal Evaluation Problem Solving LLM Evaluation Real-World Reasoning Contextual Analysis Logical Reasoning Attention to Detail Written communication Ambiguity Handling Research Skills
  • Remote (Global)
  • $34 - $40/hr
  • May 14, 2026

Mercor is seeking analytically minded generalists to help train AI systems on real-world reasoning and visual understanding tasks. This role focuses on evaluating AI performance across: Multi-modal reasoning Real-world interpretation Spatial reasoning Common-sense problem…

Communication Skills Analytical thinking Research +5 LLM Evaluation Problem Solving Attention to Detail Remote Collaboration Data Review
  • Remote (Global)
  • $35 – $45/hr
  • May 12, 2026

Mercor is hiring remote generalists to contribute to AI training and evaluation projects for leading AI labs and enterprises. This opportunity involves working on projects focused on improving frontier AI systems using human expertise across a variety of tasks and domains.…

LLM Evaluator (English) Contractor · Full-time Short-term
LLM Evaluation AI Writing Evaluation LLM Prompt Engineering +12 Content Evaluation Business Communication Academic Writing Critical Thinking Analytical Reasoning Quality Assurance Structured Feedback English Writing Editing Research Analysis Attention to Detail Evaluation Rubrics
  • United States, Canada
  • $20 – $23/hr
  • May 11, 2026

About Volga Partners Volga Partners is a U.S.-based company specializing in Artificial Intelligence, machine learning, business process outsourcing, and professional services for leading technology companies and multinational organizations. Role Overview Volga Partners…

Italian LLM Evaluation AI Annotation +12 Data Annotation Prompt Evaluation Content Review Localization Translation Quality Assurance Critical Thinking Reading Comprehension Attention to Detail Written communication AI Tools Structured Labeling
  • Italy
  • $10 – $14/hr
  • May 11, 2026

Role Overview We are seeking detail-oriented AI Evaluation & Annotation Specialists to help train and improve Large Language Models (LLMs). In this role, you will review AI-generated responses, evaluate output quality, annotate content, and follow structured guidelines to…

LLM Evaluator (English) Contractor Ongoing
Customer Support LLM Evaluation Writing +11 Editing Proofreading Grammar Brand Voice Research Fact-Checking Analytical thinking Attention to Detail Communication AI Training Quality Assurance
  • United States (US: VT)
  • $25–$30+/hr
  • May 7, 2026

About the Role DataAnnotation is seeking Customer Support Specialists to help train and improve next-generation AI systems. In this role, you will evaluate AI chatbot outputs, assess response quality, and help improve language accuracy, reasoning, and customer-facing…

LLM Evaluator (English) Contractor Ongoing
Writing Editing Content Development +12 Proofreading Grammar Brand Voice AI Training LLM Evaluation Generative AI Content Review Journalism Communications Attention to Detail English Prompt Evaluation
  • Canada
  • $25 – $30+/hr
  • May 7, 2026

About the Role DataAnnotation is hiring Freelance Writers to help train and improve advanced AI language models. In this role, you will create writing and editing tasks for AI systems, evaluate chatbot-generated responses, and help improve the overall quality, logic, and…

LLM Evaluator (English) Contractor Ongoing
Content Writing Editing Proofreading +12 Grammar Brand Voice AI Training LLM Evaluation Generative AI Content Review Writing Journalism Communications Attention to Detail English Prompt Evaluation
  • United States (US: CA, GA, IL +5)
  • $25 – $30+/hr
  • May 7, 2026

About the Role DataAnnotation is hiring AI Content Writing Specialists to help train and improve advanced AI language models. In this role, you will evaluate AI-generated writing, create editing and writing tasks, assess response quality, and help improve chatbot performance…

LLM Evaluator (English) Contractor Ongoing
Generative AI LLM Evaluation AI Training +12 Prompt Evaluation Analytical thinking Writing Editing Research Attention to Detail Reasoning Content Review Quality Assurance English AI Chatbots Data Annotation
  • United States (US: MI)
  • $20+/hr
  • May 7, 2026

About the Role Surge AI is hiring Generative AI Generalists to help train and improve advanced AI chatbot systems. In this role, you will evaluate AI-generated responses, assess logical reasoning, provide writing and editing tasks, and help improve overall AI model quality and…

LLM Evaluation German English +11 Data Annotation AI Training Research Analytical thinking Attention to Detail Fact-Checking Prompt Evaluation Content Review Written communication Reasoning Generative AI
  • Germany
  • $35 – $40/hr
  • May 7, 2026

About the Role Query AI is hiring Germany-based AI Generalist Trainers to support the evaluation and improvement of advanced AI systems. In this role, you will assess, compare, and rank AI-generated outputs across a variety of domains while helping improve the quality,…

Response Rater Contractor Short-term
Business Analysis Critical Thinking LLM Evaluation +10 Prompt Writing Communication Research Customer Operations Business Operations Analytical thinking AI Tools Data Analysis Financial Planning Problem Solving
  • Remote (Global)
  • Not specified
  • May 7, 2026

About Turing Turing is one of the world’s fastest-growing AI companies helping advance frontier AI systems and build real-world AI solutions for businesses worldwide. The company collaborates with leading AI labs to improve reasoning, coding, multimodal understanding,…

Page 1 of 2

Tips for finding remote jobs

  • Set up job alerts on multiple platforms to never miss an opportunity.
  • Highlight your remote work experience and self-management skills.
  • Prepare for video interviews and remote work assessments.
  • Customize your resume and cover letter for each remote position.
  • Build a strong online presence on LinkedIn and professional networks.

Stay in the loop.

One email per week, 5 hand-picked roles.