Systems & Infrastructure Specialist
Job description
About the Role
We are seeking a Systems & Infrastructure Specialist to support high-performance AI environments by building, managing, and optimizing complex infrastructure systems.
In this role, you will work within containerized environments, troubleshoot live systems, and ensure reliability and performance across compute-intensive workflows.
Key Responsibilities
- Troubleshoot and recover infrastructure using command-line tools
- Manage and orchestrate containerized environments (Docker, CI/CD)
- Build and optimize systems for AI model training workloads
- Respond to system failures and execute real-time recovery strategies
- Collaborate with engineering teams to ensure system reliability
- Document system architectures, incidents, and recovery processes
Requirements
- Strong experience in terminal-based system administration
- Expertise in containerized environments and DevOps workflows
- Proficiency in scripting or programming (Python, Bash, JS, Go, Rust, or C/C++)
- Experience with build systems, databases, and distributed systems
- Strong troubleshooting and problem-solving skills
- Ability to work in high-pressure, real-time environments
Preferred Qualifications
- Experience in high-compute or AI/ML infrastructure environments
- Background in Site Reliability Engineering (SRE) or DevOps
- Familiarity with distributed systems and orchestration tools
Work Details
- Work Type: Fully remote
- Engagement: Contractor (project-based)
- Schedule: Flexible
Compensation
- $40 – $70/hour depending on experience
About the Opportunity
This role focuses on building and maintaining critical infrastructure systems that power advanced AI environments and high-throughput compute workloads.
You will be redirected to the company's website to complete your application.