Project Overview We are sourcing independent Search Engine Evaluation Specialists to provide their expertise for an AI benchmark evaluation project.
Data Labeling Analyst IV
Job description
Role Overview
We are aiming to hire cultural analysts and quality experts to evaluate and improve our auto-judge AI models that assess search quality for media content across US and Canadian markets. This is not an entry-level position—we need people who can think like researchers, write like analysts, and understand internet culture like digital anthropologists.
This role directly impacts model architecture decisions. Your evaluations will be read by ML engineers, product managers, and senior leadership. Incomplete analysis or surface-level observations will not meet our quality bar.
Core Responsibilities
Expert Reviews
Evaluate AI model outputs across subjective quality dimensions including relevance, tone, and intent with >98% inter-rater reliability with product experts
Assess search prompt suggestions rendered on user-watched media content
Validate search result relevance for various query types
Identify and categorize model failure patterns with technical precision
Compile actionable reports that bridge the gap between data observations and engineering solutions
Analysis & Reporting
Document nuanced cultural context with specific examples and pattern evidence
Create structured feedback that traces failure modes to root causes
Categorize errors systematically, providing statistical distribution of failure types
Deliver qualitative insights backed by quantitative metrics
Write reports that require minimal clarification—your analysis should stand alone
Required Qualifications
Cultural Fluency
Expert-level understanding of Gen-Z and Millennial communication patterns, slang evolution, and memetic spread
Proven ability to detect humor, satire, irony, and tonal nuances across diverse content types and demographics
Comprehensive knowledge of locale-specific terms, cultural events, and regional variations across North America
Comfort consuming and analyzing trending social media content and comments
Track record of staying ahead of trends, not just following them
Analytical Skills
Experience conducting qualitative research or content analysis (academic, professional, or equivalent)
Advanced pattern recognition abilities—you see what others miss and can prove it systematically
Ability to build taxonomies and categorization systems from unstructured observations
Statistical thinking—you quantify patterns, not just describe them
Root cause analysis expertise—you don't just identify problems, you diagnose why they exist
Exceptional written communication—your reports require no follow-up questions
Technical Aptitude
Comfort with data platforms and complex evaluation tools (training provided, but you must be tech-savvy)
Conceptual understanding of how ML models work—you can reason about what models can/can't learn
Experience with QA, user research, content analysis, or similar structured evaluation work
Ability to write for technical audiences—engineers should be able to implement your recommendations
PST Preferred
Understanding Model Failure Modes
What are failure modes? These are consistent, identifiable patterns in how AI models make incorrect decisions. Your job is to spot these patterns and help us understand why they happen.
Common Failure Mode Categories You'll Identify:
1. Cultural Context Blindness
Example: Model rates "it's giving main character energy" as irrelevant to a video about confident behavior because it doesn't recognize Gen-Z slang
What you'll do: Flag when models miss cultural idioms, slang, or contextual phrases that change meaning
2. Sarcasm & Tone Misinterpretation
Example: Model suggests "helpful tutorial" for a obviously satirical "how-to" video mocking the format
What you'll do: Identify when models take ironic, sarcastic, or humorous content at face value
3. Comment Section Context Ignorance
Example: Model misses that top comments reveal video is about a different topic than the title suggests (clickbait, inside jokes)
What you'll do: Document cases where comments provide critical context the model overlooks
4. Temporal/Trending Reference Failures
Example: Model doesn't connect "Roman Empire" searches to the viral TikTok trend about things men think about constantly
What you'll do: Catch when models miss current events, memes, or trending cultural moments
5. Regional/Locale-Specific Misses
Example: Model doesn't understand "double-double" in Canadian context or regional slang like "wicked" (New England) vs standard usage
What you'll do: Identify geographical or regional context the model lacks
6. Intent Mismatch
Example: User searches "Ariana Grande" on a beauty video, model doesn't recognize they're looking for makeup tutorials inspired by her style
What you'll do: Flag when models miss the why behind search behavior
7. Multi-Modal Understanding Gaps
Example: Model evaluates relevance based only on text, missing that visual content or audio contradicts or adds crucial context
What you'll do: Identify when models need to "watch" not just "read"
Model Failure Modes Categorization Framework
For each failure you identify, you'll document:
WHAT Failed
Which quality dimension? (relevance, tone, intent)
What did the model get wrong?
WHY It Failed
What context/knowledge was missing?
Which failure mode category does this fit?
HOW OFTEN It Happens
Is this a one-off edge case or systematic pattern?
Does it affect specific content types/demographics?
IMPACT Level
Minor: Slightly off but understandable
Moderate: Clearly wrong, hurts user experience
Severe: Completely misses the mark, could be offensive/harmful
Example Scenarios:
Scenario 1: Complex Multi-Factor Analysis
Model fails on parody content in 40% of cases, succeeds in 60%
Expected output:
Build a taxonomy of parody types (deadpan, exaggerated, format parody, etc.)
Identify which types model handles well vs. poorly
Analyze linguistic/visual markers of each type
Determine decision boundary where model fails
Map to specific model limitations
Recommend graduated improvement approach
Format: 5-page analysis with data visualizations
Scenario 2: Trend Analysis
New slang term emerging, model doesn't recognize it yet
Expected output:
Track term origin and spread (platforms, demographics, geography)
Predict longevity (flash-in-pan vs. lasting)
Assess urgency of model update
Document semantic meaning and usage contexts
Identify related terms model should learn
Format: 1-page rapid brief (time-sensitive)
PST Preferred
This posting is for a contract assignment with Tundra Technical Solutions to provide services to Meta.
The pay range that Tundra in good faith reasonably expects to pay for this position is $21.77/hour - $30.06/hour.
Tundra’s benefits offering includes optional medical, dental, vision, retirement benefits, up to 15 Days PTO per annum and a New Child Benefit.
Pursuant to the California Fair Chance Act, Los Angeles County Fair Chance Ordinance for Employers, Los Angeles Fair Chance Initiative for Hiring Ordinance, and San Francisco Fair Chance Ordinance, qualified applicants will be considered for assignment with arrest and conviction records. Criminal history may have a direct, adverse, and negative relationship with some of the material job duties of this position. These include the duties and responsibilities listed above, as well as the abilities to adhere to company policies, exercise sound judgment, effectively manage stress and work safely and respectfully with others, exhibit trustworthiness, meet client expectations, standards, and accompanying requirements, and safeguard business operations and company reputation.
Tundra Technical Solutions is among North America’s leading providers of Staffing and Consulting Services. Our success and our clients’ success are built on a foundation of service excellence. We are an equal opportunity employer, and we do not discriminate on the basis of race, religion, color, national origin, sex, sexual orientation, age, veteran status, disability, genetic information, or other applicable legally protected characteristic. Qualified applicants with arrest or conviction records will be considered for employment in accordance with applicable law, including the Los Angeles County Fair Chance Ordinance for Employers and the California Fair Chance Act. Unincorporated LA County workers: we reasonably believe that criminal history may have a direct, adverse and negative relationship with the following job duties, potentially resulting in the withdrawal of a conditional offer of employment: client provided property, including hardware (both of which may include data) entrusted to you from theft, loss or damage; return all portable client computer hardware in your possession (including the data contained therein) upon completion of the assignment, and; maintain the confidentiality of client proprietary, confidential, or non-public information. In addition, job duties require access to secure and protected client information technology systems and related data security obligations.
The Meta CWX Program is enabled by a cutting-edge software platform called TalentNet that leads the contingent labor world for technology innovation. The software platform leverages Machine Learning and Artificial Intelligence to make sure the right people end up in the right job.
At Meta, we are constantly iterating, solving problems, and working together to connect people all over the world. That’s why it’s important that our workforce reflects the diversity of the people we serve. Hiring people with different backgrounds and points of view helps us make better decisions, build better products, and create better experiences for everyone.
We give people the power to build community and bring the world closer together. Our products empower more than 3 billion people around the world to share ideas, offer support, and make a difference.
This posting is for a contract assignment with Tundra Technical Solutions to provide services to Meta. Please note that this is not a full-time employment opportunity. Candidates selected for this role will be engaged as contractors for the specified duration of the project. For any inquiries regarding the terms of the contract or engagement, please contact Tundra Technical Solutions directly.
You will be redirected to the company's website to complete your application.