Mathematicians (BS to PhD) to develop advanced mathematical models and evaluation benchmarks for frontier LLMs — covering algebra, number theory, topology, analysis, probability, and applied mathematics. 4+ hours/day with PST overlap.
STEM Computational Scientific Software & Evaluation Design — Computational Bayesian Statistics and Applied Mathematics
Job description
About the Project
We're building a large-scale evaluation benchmark for advanced AI reasoning across scientific and engineering domains. Task designers create challenging computational problems that test whether AI systems can use real scientific software tools to solve research-grade problems.
Domain & Tools
Computational Bayesian Statistics and Applied Mathematics: Working with Bayesian statistics libraries (PyMC, PyStan, PyJAGS, CmdStanPy); applied mathematics and numerical PDEs (FEniCS, FEniCSx, DOLFINx, scikit-fem, FiPy, Devito, Dedalus); computational topology (GUDHI); or differential algebra (DACEyPy). Experience with MCMC, Bayesian modelling, finite element/difference methods, mesh-based numerical modelling, or computational topology. Candidates need not know all packages — expertise in any one is highly regarded.
Requirements
- Graduate-level training in a relevant STEM domain (MS, PhD, or equivalent)
- Demonstrated proficiency with at least one listed scientific software library
- Strong Python programming skills
- Available for at least 15–20 hours per week
Contract & Payment Terms
- Independent contractor, fully remote, weekly payments via Stripe or Wise
- Unable to support H1-B or STEM OPT candidates
You will be redirected to the company's website to complete your application.