Please submit your CV in English and indicate your level of English proficiency. Mindrift connects specialists with project-based AI opportunities for leading tech companies, focused on testing, evaluating, and improving AI systems.
STEM Computational Scientific Software & Evaluation Design — Computational Chemistry & Electronic Structure
Job description
About the Project
We're building a large-scale evaluation benchmark for advanced AI reasoning across scientific and engineering domains. Task designers create challenging computational problems that test whether AI systems can use real scientific software tools to solve research-grade problems — from querying simulations and interpreting outputs to designing experimental strategies and recovering hidden information from data.
This is not a typical annotation or labeling role. You'll be designing original, graduate-level computational problems grounded in real scientific workflows, calibrating them against frontier AI models, and iterating on problem design until the difficulty is right.
What You'll Do
- Design problems that require sophisticated use of domain-specific scientific software libraries
- Create problems requiring precise multi-step scientific workflows as well as harder problems requiring strategic experimental design
- Test each task against state-of-the-art AI models and refine until difficulty hits the target range
Domain & Tools
Domain: Working with PySCF for quantum chemistry calculations including Hartree-Fock, DFT, TDDFT, CASSCF, and post-HF methods. Ideal candidates can design problems around excited-state analysis, orbital diagnostics, method selection for tricky electronic structures, and interpreting computational artifacts from method limitations.
Requirements
- Graduate-level training in a relevant STEM domain (MS, PhD, or equivalent research experience)
- Demonstrated proficiency with at least one listed scientific software library, evidenced by research publications, open-source contributions, or professional work
- Strong Python programming skills — writing problem setups, oracle functions, and solution validators
- Ability to work independently and iterate on problem designs based on calibration feedback
- Comfortable working in a Linux/terminal environment with remote compute sandboxes
- Available for at least 15–20 hours per week
Nice to Have
- Experience across multiple listed domains or tools
- Familiarity with benchmark or evaluation design
- Background in scientific pedagogy or exam/problem-set design
- Experience with computational reproducibility and containerized environments
Contract & Payment Terms
- Independent contractor engagement, fully remote
- Weekly payments via Stripe or Wise
- Projects may be extended, shortened, or concluded early based on needs and performance
- Unable to support H1-B or STEM OPT candidates
About Mercor
Mercor partners with leading AI labs and enterprises to train frontier models using human expertise. You will be paid competitively, collaborate with leading researchers, and help shape the next generation of AI systems in your area of expertise.
You will be redirected to the company's website to complete your application.