Remote part-time opportunity for contributors interested in evaluating, labeling, summarizing, ranking, and improving generative AI and large language model systems through flexible project-based work.
Research Intern, Multimodal LLM Benchmarking
Job description
Centific is hiring a Research Intern focused on Multimodal LLM Benchmarking to contribute to advanced AI evaluation research involving multimodal foundation models.
This internship focuses on designing, executing, and analyzing benchmark systems for AI models operating across:
- Text
- Images
- Audio
- Video
- Cross-modal retrieval systems
Contributors will work on cutting-edge multimodal evaluation problems involving benchmark design, scoring methodologies, dataset curation, and AI model analysis.
About Centific
Centific is an AI data and infrastructure company specializing in:
- AI evaluation
- Data curation
- Fine-tuned LLMs
- RAG pipelines
- Enterprise AI deployment
The company works with enterprise clients and frontier AI organizations to support safe and scalable AI systems.
Centific’s ecosystem includes:
- 150+ PhDs and data scientists
- 4,000+ AI practitioners and engineers
- 1.8 million domain experts across 230+ markets
Responsibilities
Multimodal Benchmark Design & Development
Design benchmark suites for multimodal foundation models involving:
- Text-image tasks
- Text-audio tasks
- Text-video tasks
- Cross-modal retrieval systems
Define:
- Evaluation formats
- Annotation guidelines
- Scoring criteria
- Benchmark coverage dimensions
Benchmark Execution & Analysis
Run multimodal models against benchmark suites
Analyze:
- Performance trends
- Failure modes
- Evaluation outcomes
Produce actionable research summaries and recommendations
Metric & Scoring Research
Investigate automated evaluation approaches including:
- Model-as-judge systems
- Reference-free metrics
- Human alignment evaluation
Evaluate trade-offs involving:
- Reliability
- Validity
- Scalability
- Evaluation cost
Dataset Curation & QA
- Support:
- Data collection
- Annotation workflows
- Dataset filtering
- Inter-rater reliability analysis
Literature Review & Methodology
- Survey multimodal evaluation literature
- Identify gaps in existing benchmark systems
- Propose novel evaluation approaches grounded in current research
Documentation & Communication
- Produce:
- Research write-ups
- Benchmark documentation
- Internal reports
- Presentation-ready summaries
for technical and non-technical audiences.
Focus Areas
Depending on project alignment, contributors may work on:
Vision-Language Evaluation
- Image captioning
- Visual question answering
- Chart reasoning
- Image-text alignment
Audio & Speech-Language Benchmarking
- Spoken language understanding
- Audio captioning
- Speech-text evaluation
Video Understanding
- Temporal reasoning
- Video QA
- Video-text retrieval
Cross-Modal Robustness
- Distribution shift testing
- Adversarial multimodal inputs
- Robustness analysis
Automated Multimodal Scoring
- Judge-model evaluation systems
- Open-ended multimodal generation evaluation
Required Qualifications
- Currently enrolled in:
- MS program
- PhD program
in:
Computer Science
Machine Learning
Statistics
AI
Linguistics
Related quantitative field
Experience with:
- Multimodal models
- NLP systems
- Vision-language systems
- Audio or video ML tasks
Exposure to:
- Benchmark design
- Model evaluation
- Experimental analysis
Strong Python skills
Experience with:
- PyTorch
- Hugging Face Transformers
Basic statistical analysis knowledge
Strong written and verbal communication skills
Preferred Qualifications
Experience with multimodal models such as:
- LLaVA
- GPT-4o
- Gemini
- Flamingo
Familiarity with benchmarks including:
- MMBench
- MMMU
- SeedBench
- VQAv2
- AudioCaps
- ActivityNet-QA
Experience with:
- Annotation tools
- Human evaluation workflows
- Model-as-judge systems
Research publications or open-source contributions in:
- Multimodal ML
- NLP
- AI evaluation
What You'll Gain
- Mentorship from senior AI researchers and ML engineers
- Ownership of publishable multimodal research projects
- Exposure to enterprise AI workflows and applied research teams
- Potential co-authorship opportunities
- Flexible remote work arrangement
- Competitive internship compensation
You will be redirected to the company's website to complete your application.