Site Reliability Engineer Job at SaidGig, Remote

UWt6a04vTk1JczRvL1l4dlNBOGRnaFdLc3c9PQ==
  • SaidGig
  • Remote

Job Description

Role Overview

As a Site Reliability Engineer, you will play a crucial role in training and optimizing AI models within advanced containerized infrastructures. This position focuses on real-time troubleshooting and dynamic process recovery, providing an opportunity to engage in a high-intensity project with potential for future extensions based on performance.

Key Responsibilities
  • Lead the deployment, monitoring, and recovery of complex, containerized AI training environments using advanced terminal techniques.
  • Proactively identify, diagnose, and resolve infrastructure bottlenecks and failures in long-running processes.
  • Orchestrate resilient system builds and manage infrastructure to ensure stability and optimal resource utilization.
  • Collaborate closely with engineering teams to refine CI/CD pipelines and automate routine operational tasks.
  • Manage and optimize filesystem structures, networked storage, and process scheduling in Dockerized sandboxes.
  • Conduct rapid mid-execution replanning during error states and unforeseen runtime issues.
  • Document best practices, emergent solutions, and contribute to knowledge transfer across the team.
Qualifications
  • Demonstrated expert proficiency with terminal-based problem solving and complex system administration.
  • Mastery of dynamic infrastructure recovery and long-running operational process management.
  • Deep expertise in containerized environments (e.g., Docker, Kubernetes) and sandbox orchestration.
  • Strong Python skills, with the ability to script, automate, and debug real-world production systems.
  • Proficiency in Bash and familiarity with JavaScript/TypeScript, Go, Rust, C/C++.
  • Experience with build systems, package managers, databases, version control, and cryptography tools.
  • Adept at troubleshooting, documenting, and replanning in high-velocity technical environments.
Preferred Qualifications
  • Background in machine learning operations or AI infrastructure.
  • Familiarity with ML frameworks and distributed computing.
  • Experience supporting multi-phase, high-intensity engineering projects.
Work Terms

Employment Type: Contract

Compensation

Hourly rate ranges from $40 to $70.

Eligibility

This position is fully remote.

Job Tags

Remote job, Hourly pay, Contract work

Similar Jobs

Commonwealth LNG

Director, Technical Training Job at Commonwealth LNG

 ...functions. Design technical and leadership programs enabling safe, reliable, high-performance plant operations. Integrate Management of Change (MOC) requirements into training and qualification processes. Align training with Work Management systems, including... 

Baesman Group, Inc.

Mailing/Lettershop Operator Job at Baesman Group, Inc.

 ...Responsibilities: Setting up and running the inserting and inkjet mail equipment Mechanically inclined with ability to maintain and...  ...of equipment Ability to tray mail, band mail trays and sort (keep in zip code order) Ability to read postal statements and... 

Pyramid Consulting, Inc

Staff Piping Designer Job at Pyramid Consulting, Inc

 ...Immediate need for a talented Staff Piping Designer . This is a 12+months contract opportunity with long-term potential and is located in Kansas City, MO (Onsite). Please review the job description below and contact me ASAP if you are interested. Job ID:26-1075... 

PTR Global

Project Manager Job at PTR Global

 ...basis to join their growing fiber network organization. This position is based in Wilkes-Barre/Scranton, PA and candidates must live in or near the Wilkes-Barre/Scranton market. Candidates must have experience managing large scale fiber deployment/outside plant projects... 

Motley Rice

Client Services Job at Motley Rice

 ...across a variety of practice areas. We have several Client Services opportunities available currently and are seeking motivated professionals...  ...roles start at $20/hr. PREFERRED EXPERIENCE/SKILLS: Customer service experience, excellent customer support focus...