About the Role
Pareto AI is seeking a Technical Data Delivery Lead to design, operate, and optimize advanced AI data pipelines for leading AI research labs.
This is a highly technical role focused on building and improving data workflows used in training large language models (LLMs). You will work directly with AI researchers and engineering teams to develop scalable systems for data collection, evaluation, and model improvement.
Responsibilities
Pipeline Architecture
- Design end-to-end pipelines for:
- RLHF (Reinforcement Learning from Human Feedback)
- SFT (Supervised Fine-Tuning)
- Red-teaming and model evaluation
- Define annotation schemas, rubrics, and quality frameworks
- Identify risks and optimize workflows before deployment
Agentic System Deployment
- Build and deploy AI agents to automate pipeline tasks
- Develop systems for:
- Quality control
- Expert routing
- Output validation
- Throughput monitoring
- Work with engineering teams to integrate agentic workflows
Quality Systems
- Define and maintain data quality standards
- Implement:
- Inter-rater reliability metrics
- Calibration systems
- Statistical sampling methods
- Design automated quality checks and validation layers
Client Collaboration
- Work directly with AI researchers and technical stakeholders
- Translate research requirements into operational workflows
- Communicate performance metrics and risks clearly
Research Integration
- Stay updated on advancements in:
- LLM training methodologies
- Evaluation frameworks
- AI data tooling
- Integrate new approaches into active pipelines
Requirements
- Strong proficiency in Python and SQL
- Deep understanding of:
- LLM training (RLHF, SFT)
- Prompt engineering and output behavior
- Experience with agentic frameworks such as:
- LangChain
- DSPy
- AutoGen
- Experience designing or managing data/ML pipelines
- Strong written communication and documentation skills
- Ability to operate in fast-paced, evolving environments
Preferred Qualifications
- Experience with:
- RL environments and red-teaming workflows
- Data engineering or ML research support
- Production-level AI agent systems
- Knowledge of:
- Inter-rater reliability
- Annotation quality frameworks
- Experience in client-facing or technical program roles
Compensation
- $120,000 – $160,000 annually
- Equity included
- Remote US-based role
About Pareto AI
Pareto AI builds human-in-the-loop data pipelines that power the next generation of AI systems. The company collaborates with leading AI organizations, including research teams from Anthropic, Character.AI, and top universities, to create scalable and ethical AI training infrastructure.
Pareto focuses on combining human expertise with AI systems to improve model performance, reliability, and real-world applicability.