Experience Inc. Jobs

Job Information

ASRC Federal Holding Company HPC Linux System Administrator in Greenbelt, Maryland

ASRC Federal InuTeq is seeking a Linux System Administrator (HPC) to join our team in support of NASA's Center for Climate Simulation (NCCS) project. ASRC Federal InuTeq provides High Performance Computing services throughout the HPC lifecycle for computational requirements, architecture, acquisition, and operations to federal government customers. Our employees embrace innovation and are committed to a culture of continuous, standards-driven process improvement and assimilation of industry best practices.

  • Configures, installs, maintains, and upgrades Linux HPC clusters (compute, storage, and network) and applications in support of research computing environments.

  • Provides end-user support for problem resolution, and training on Linux and HPC usage best practices.

  • Diagnoses, isolates, and resolves application and system technical problems.

  • Develops scripts and automation to enhance operational services and service quality.

  • Develops, implements, and documents system architectures, new capabilities, and operational standards.

  • Supports compute, storage, and network technology evaluations and assessments.

  • Recommends and implements improvements to existing HPC system management tools and processes.

  • Provides technical expertise to improve HPC cluster performance and resiliency.

  • Leads and collaborates on projects to enhance functionality in areas such as systems monitoring, configuration management, and backups

This position will interact with the HPC Operations Manager, Program Manager, Site Lead, customer, users, and site staff, attending regularly scheduled customer meetings to keep stakeholders informed of activities and progress, and answer inquiries concerning all aspects of the program. An individual at this skill level should have demonstrated problem-solving ability in relevant areas of expertise and should have an interest in mentoring and leading others in small team environments.

Requirements:

  • Bachelor’s degree (B.A/B.S.) in Computer Science, Engineering, Physics, or related course of study, or equivalent combination of education and relevant experience

  • Minimum of 8 years of Linux System Administration experience

  • Experience Managing HPC Clusters

  • Vast knowledge in trouble shooting both hardware and software, with the ability to come on site and replace hardware if needed

  • Experience managing storage servers/hardware

  • Knowledge of at least one of CentOS or RedHat, and experience maintaining and upgrading Linux.

  • Experience with the use of configuration management and orchestration tools such as Puppet, Ansible, Chef, Cobbler.

  • Experience with system management, monitoring/alerting tools (e.g., Ganglia, Nagios, Prometheus, Zabbix).

  • Understanding of infrastructure technologies including server, storage, network, database, and virtualization.

  • Demonstrated ability to quantify, analyze, determine root cause, and resolve system and communication network issues, and develop preventive actions.

  • Ability to work independently as well as collaboratively within a team, to include the ability to lead moderately complex projects or small project teams.

  • Excellent written and oral communication skills for interacting with customers, team members, and management.

  • Proactive and innovative, with ability to foresee and prevent potential problems.

  • Organizational and time management skills, exceptional follow-through, and ability to manage multiple priorities.

  • Passion for providing excellent customer service.

  • Experience providing support for large Linux HPC clusters used for scientific computing.

  • Scripting/programming capabilities with Bash, Python, Perl.

  • Shows ability to execute and maintain a Standard Security Protocol

  • Willing to track tasks with persistent record keeping and project management

  • US Citizenship is Required and the ability to obtain a Public Trust Clearance

Preferred Skills:

  • Experience integrating systems or designing solutions for HPC workloads.

  • Experience with MPI and OpenMP.

  • Experience with performance benchmarking using profilers and debuggers to recommend code improvements for scalability and performance.

  • Experience with distributed and parallel file systems such as BeeGFS, GPFS, Lustre, NFS, Ceph.

  • Familiarity with high-performance networks such as Infiniband, and with network management.

  • Demonstrated ability to perform complex performance analysis including system processes, I/O subsystems, networks and other related components.

  • Experience installing, configuring, and maintaining workload management tools (such as Slurm, LSF, PBS, etc.).

  • Interest or previous experience in technologies including but not limited to Singularity, Docker, Spack and new emerging technologies.

ASRC Federal and its Subsidiaries are Equal Opportunity / Affirmative Action employers. All qualified applicants will receive consideration for employment without regard to race, gender, color.

ASRC Federal and its Subsidiaries are Equal Opportunity / Affirmative Action employers. All qualified applicants will receive consideration for employment without regard to race, gender, color, age, sexual orientation, gender identification, national origin, religion, marital status, ancestry, citizenship, disability, protected veteran status, or any other factor prohibited by applicable law.

DirectEmployers