Share this Job
Job Req ID:  19085

HPC Sr Systems Engineer

 

General Summary/Purpose: 

The Senior HPC Storage Engineer for the Maryland Advanced Research Computing Center (MARCC) will be part of a team that runs a high performance computing facility with over 23,000 cores and several petabytes of storage, that serves the High Performance Computing (HPC )and data intensive science needs of researchers in the University of Maryland at College Park (UMCP) and Johns Hopkins University (JHU) schools. This FTE contributes to the design, organization, planning and implementation of cutting-edge technology projects for the facility. The Senior Storage Engineer is responsible for the day to day administration of storage systems, backups, networking, security and any other services related to the operation of a large HPC center. If you are looking for an opportunity to contribute to the continued success of MARCC, this will be a challenging and rewarding role.

 

Specific Duties & Responsibilities:

70% - Systems Engineering, Administration, Security, and Oversight

  • Contributes to the design, implementation, testing, management and support of HPC storage solutions and overall HPC environment.
  • Throughly  document systems processes so that users can easily find useful information and other IT staff can perform routine tasks and provide backup.
  • Performs troubleshooting and root cause analysis of HPC cluster and file system related issues.
  • Maintain job scheduling and storage allocation systems and policies to ensure fair allocation of shared resources.
  • Maintain monitoring systems to facilitate quick, proactive responses to routine failures, and to provide comprehensive performance data logging.
  • Provide general system administration backup and escalation for other staff.
  • Consult with and provide expertise to building engineers and other staff on new facilities to be under control of MARCC.
  • Ensure resources are highly available with limited interruption.
  • Automate user account creation, management, and purging.
  • Contribute to planning sessions on network and security issues for MARCC. Work closely with the central networking group.
  • Implement and maintain secure measures to protect data subject to restrictions.
  • Use of configuration management tools (e.g. Bright, xCAT, puppet, IPMI, ROCKS) to help maintain large-scale Linux clusters, supercomputers, storage systems, and smaller systems.
  • Apply scripting and programming skills to automate systems related functions.
  • Develop reports and customize tools that automate the monitoring process of critical systems and alert team of issues automatically.
  • Maintain an effective schedule for systems backups and archive operations for mission critical systems.
  • Audit and maintain user access, authorization and authentication.
  • Research, recommend, and implement new technologies based on their value to the research facility.
  • Other Systems Tasks as assigned by supervisor.

 

20% - Technological Research

  • Architect HPC storage for future clusters, and plan the retirement of aging systems.
  • Offer technical advice on new projects that directly involve HPC computing at Hopkins.
  • Develop custom tools where necessary, and contribute useful creations back to open source development efforts where appropriate.
  • Research and implement new technologies that are beneficial to HPC.
  • Test and vet new technology in support of HPC efforts.
  • Work with vendors to procure prototypes and demo units.

 

10% - Training/Education

  • Continuously evaluate new tools and technologies for use in existing and future clusters.
  • Attend department and University-sponsored training to increase knowledge, improve skills, and learn new skills.  May substitute University training for supervisor approved commercial job related course offerings.

 

Minimum Qualifications (mandatory):

  • Bachelor’s degree.
  • Six years related experience. 
  • Additional education can be substituted for experience and additional experience can be substituted for education.

Equivalence Formula: 30 undergraduate degree credits or 18 graduate degree credits = 1 year of experience. For jobs where equivalence is permitted, up to two years of non-related college course-work may be applied towards the total minimum education/experience required for the respective job.

 

Preferred Qualifications:

  • Minimum 7 years managing Linux servers, with direct experience managing large file systems.
  • Experience as a high-level Linux system administrator.
  • Experience managing mission critical services.
  • In-depth knowledge of TCP/IP networking and related protocols, InfiniBand, etc.
  • Excellent scripting skills, python, perl, shell.
  • Knowledge of configuration management and monitoring tools (puppet, nagios, etc). 
     

Special Knowledge, Skills, and Abilities:

  • Proven experience deploying large-complex scale projects.
  • Proven experience across multiple technologies with background in applications, databases, middleware, etc.
  • In-depth knowledge of the design, and organization of cutting-edge technology in HPC environments.
  • Management of hierarchical file system infrastructure, software and services, and backups.
  • Understanding and implementation of IT project management best practices
  • Expert knowledge of Unix/Linux systems administration, including all aspects of management, monitoring, performance analysis, and integration in potentially complex heterogeneous environments.
  • Expert Level Knowledge of networking, high speed interconnects, and network security principles in an HPC environment.
  • Expert knowledge of security measures necessary to protect the facility and its data (firewalls, ACLs, network monitoring).
  • The ability to interact with peer institutions to support HPC directives effectively; furthering the goals of the MARCC facility.
  • Understand, implement, troubleshoot, and support job scheduling, resource management and workload management systems, including diagnosis of failed jobs, implementation of policies, and investigations of new features and services.
  • Advanced knowledge of Linux, Apache, SQL, PHP/Python/Perl (LAMP) technology/toolkits.
  • Ability to handle high priority escalations whenever necessary.
  • Ability to multitask, while managing time and priorities.
  • Must be adaptable and able to meet conflicting deadlines.
  • Exceptional organizational skills.
  • Ability to automate systems administration tasks wherever possible.
  • Excellent oral and written interpersonal skills.
  • Ability to meet the physical requirements of the position.
  • Keep up to date on emerging technologies.
  • Ability to maintain confidentiality.
  • Excellent customer service skills.
  • Must demonstrate strong critical thinking and analytical reasoning.

 

Additional Preferred Skills:

  • Experience with open source software compilation.
  • Programming skills in C, C++, or scientific language.
  • Experience with MySQL or Mariadb database programming.
  • Knowledge of scientific software applications in academic supercomputing environments.
  • Familiarity or experience with data subject to restrictions.

 

Classified Title: Sr. Systems Engineer 
Working Title: HPC Sr Systems Engineer 
Role/Level/Range: ATP/04/PF 
Starting Salary Range: $80,665 to $110,880 annually
Employee group: Full Time 
Schedule: M-F, 8:30 am - 5:00 pm 
Exempt Status: Exempt  
Location: 01-MD:Homewood Campus 
Department name: 10001373-Physics and Astronomy 
Personnel area: School of Arts & Sciences

 

The successful candidate(s) for this position will be subject to a pre-employment background check.

 

If you are interested in applying for employment with The Johns Hopkins University and require special assistance or accommodation during any part of the pre-employment process, please contact the HR Business Services Office at jhurecruitment@jhu.edu. For TTY users, call via Maryland Relay or dial 711.

 

The following additional provisions may apply depending on which campus you will work.  Your recruiter will advise accordingly.

 

During the Influenza ("the flu") season, as a condition of employment, The Johns Hopkins Institutions require all employees who provide ongoing services to patients or work in patient care or clinical care areas to have an annual influenza vaccination or possess an approved medical or religious exception. Failure to meet this requirement may result in termination of employment.

 

The pre-employment physical for positions in clinical areas, laboratories, working with research subjects, or involving community contact requires documentation of immune status against Rubella (German measles), Rubeola (Measles), Mumps, Varicella (chickenpox), Hepatitis B and documentation of having received the Tdap (Tetanus, diphtheria, pertussis) vaccination. This may include documentation of having two (2) MMR vaccines; two (2) Varicella vaccines; or antibody status to these diseases from laboratory testing. Blood tests for immunities to these diseases are ordinarily included in the pre-employment physical exam except for those employees who provide results of blood tests or immunization documentation from their own health care providers. Any vaccinations required for these diseases will be given at no cost in our Occupational Health office.

 

Equal Opportunity Employer
Note: Job Postings are updated daily and remain online until filled. 

 

EEO is the Law
Learn more:
https://www1.eeoc.gov/employers/upload/eeoc_self_print_poster.pdf
Important legal information
http://hrnt.jhu.edu/legal.cfm

 

Homewood Campus