Share this Job
Job Req ID:  9544

Systems Engineer

Purpose of Position

The System Administrator for the Homewood High Performance Cluster (HHPC) manages a research computing cluster, which contains over 3,000 processor cores and is connected to a petabyte of storage that serves the HPC and data intensive science needs of many researchers across the Johns Hopkins University Homewood campus.  The System Administrator oversees systems management for the group’s fileservers, cluster login nodes, and associated switching fabrics and networks, and implements and manages the queuing system that balances usage among groups.


Essential Duties & Responsibilities:

Systems Engineering and Oversight

  • Design, organize, test and implement cutting-edge hardware designs
  • Document systems so that users can easily find useful information and other IT staff can perform routine tasks and provide backup.
  • Provides stable solutions for HHPC use
  • Oversee maintenance of HHPC  community’s technical infrastructure
  • Maintain HHPC and Related clusters
  • Plans and makes purchases to meet the needs of the HHPC community.
  • Maintain job scheduling and storage allocation systems and policies in accordance with the HHPC Steering Committee to ensure fair allocation of shared resources.
  • Maintain extensive monitoring systems to facilitate quick, proactive responses to routine failures, and to provide comprehensive performance data logging.
  • May provide general system administration backup for other facilities or research groups.


Project Management and Outreach

  • Understands HPC technical needs.  Work closely with the facility’s faculty steering committee to shape policies, and ensure that these policies are successfully implemented.  
  • Conceive, initiate, define, plan, organize and execute project plans
  • Develop close ties with participating faculty and their research groups in order to maintain awareness of their computing needs. Facilitate community building among the facility’s users to encourage sharing of solutions.
  • Learn from previous experiences when developing new projects.
  • Work closely with the facility’s faculty steering committee to shape policies, and ensure that these policies are successfully implemented.
  • Create and maintain a stable, secure operating system and software environment, which continues to meet users’ evolving research needs.


Technological Research

  • Plan the retirement of aging systems.
  • Develop custom tools where necessary, and contribute useful creations back to open source development efforts where appropriate.
  • Research new technologies that could be beneficial to HPC.
  • Tests and vets new technology in support of HPC efforts
  • Works with vendors to procure prototypes and demo units
  • Oversee purchasing of additions to existing clusters. Develop custom tools where necessary, and contribute useful creations back to open source development efforts where appropriate.



  • Continuously evaluate new tools and technologies for use in existing and future clusters.
  • Attend department and University-sponsored training to increase knowledge, improve skills, and learn new skills.  May substitute University training for supervisor approved commercial job related course offerings.


Internal and External Contacts

  • This position may interact with an array of departmental and central administrative offices, faculty, staff, researchers, and students, and with numerous external vendors for the purpose of accomplishing HPC technology goals.  Works routinely with University faculty, administrators, students, and researchers. Collaborates regularly with professional colleagues from the central IT@JH organization, and from other academic departments.




Bachelor’s degree required. Equivalent experience may be substituted for education. Master’s degree preferred. Formal training in computer engineering a big plus.


Required Experience

  • Minimum 5 years experience managing Linux Servers. Additional education can be substituted for experience.
  • Experience as a high-level Linux Systems Administrator.
  • Experience managing mission critical services in a 24x7x365 environment
  • Experience administering High Performance Computing Cluster Schedulers (ie, maui, slurm, moab, etc)
  • In-depth knowledge of TCP/IP networking and related protocols
  • Excellent scripting skills, python, perl, shell


Preferred Experience

  • Experience architecting and managing HPC clusters
  • Expert level knowledge of configuration management and monitoring tools (puppet, nagios, xcat, etc.) 
  • Experience with open source software compilation and Apache administration
  • Experience with open source software development and the open source community.


Equivalency Formula: 30 undergraduate degree credits or 18 graduate degree credits = 1 year of experience. For jobs where equivalency is permitted, up to two years of non-related college coursework may be applied towards the total minimum education/experience required for the respective job.


Knowledge and Skill:

  • Knowledge of job scheduling software (e.g. OpenPBS/Torque, Maui, SLURM, Moab).
  • Advanced knowledge of Linux, Apache, MySQL, PHP/Python/Perl (LAMP) technology/toolkits.
  • Apply expert knowledge of Unix/Linux systems administration, including all aspects of management, monitoring, performance analysis, and integration in complex heterogeneous environments
  • Use configuration management tools (e.g., xCAT, puppet, IPMI) to help maintain large-scale Linux clusters, supercomputers, storage systems, and smaller systems
  • Understanding of HPC hardware and software technologies.
  • Understanding of large data storage systems.
  • Develop, debug and utilize programs to automate system management tasks and user workflows
  • Understanding of networking technologies, including high-speed networks (Ethernet/Infiniband).
  • Monitor, optimize services and performance (file system, network interconnects) using Nagios, Ganglia, etc. 
  •  Administer management servers for infrastructure (file servers, monitoring, etc.)
  • Solve escalated systems related issues, coordinate with vendors to isolate hardware problems, install firmware or software patches as necessary
  • Provide in-depth system analysis, problem resolution, design and implementation of system enhancements. This includes both functional and performance issues
  • Working autonomously, design, implement, and maintain the security and monitoring infrastructure for the HHPC
  • Independently research and make technical recommendations regarding the HHPC’s cybersecurity policies, practices, system development, and architecture
  • Respond to security alerts and tickets as required
  • Must have the ability to multi-task and prioritize. 
  • Must be adaptable and able to meet conflicting deadlines.
  • Exceptional organizational skills.
  • Must have excellent oral and written interpersonal skills in terms of customer service, training, and evangelism of new technologies, negotiation, and persuasion.
  • Knowledge of networking principles as they apply to cluster computing including protocols, routers and firewalls.
  • Ability to apply techniques to maintain a consistent operating system image across a large number of homogeneous nodes (e.g. PXEboot, CF Node, NFS Root).
  • Ability to meet the physical requirements of the position.
  • Produce effective and thorough technical documentation
  • Maintain individual components of the HHPC computing environment to assure compliance to Johns Hopkins University security standards and practices
  • Provide on-call and off-hours support as assigned.


Role/Level/Range:  ATP/4/PE

Salary Range: Commensurate with Experience

Employee group: Full-time

Schedule: M-F, 37.5 hours/week

Employee subgroup: Exempt

Location: Baltimore, MD

Department Name: Physics and Astronomy

Personnel area: KSAS





The successful candidate(s) for this position will be subject to a pre-employment background check.


If you are interested in applying for employment with The Johns Hopkins University and require special assistance or accommodation during any part of the pre-employment process, please contact the HR Business Services Office at For TTY users, call via Maryland Relay or dial 711.


The following additional provisions may apply depending on which campus you will work.  Your recruiter will advise accordingly.

During the Influenza ("the flu") season, as a condition of employment, The Johns Hopkins Institutions require all employees who provide ongoing services to patients or work in patient care or clinical care areas to have an annual influenza vaccination or possess an approved medical or religious exception. Failure to meet this requirement may result in termination of employment.


The pre-employment physical for positions in clinical areas, laboratories, working with research subjects, or involving community contact requires documentation of immune status against Rubella (German measles), Rubeola (Measles), Mumps, Varicella (chickenpox), Hepatitis B and documentation of having received the Tdap (Tetanus, diphtheria, pertussis) vaccination. This may include documentation of having two (2) MMR vaccines; two (2) Varicella vaccines; or antibody status to these diseases from laboratory testing. Blood tests for immunities to these diseases are ordinarily included in the pre-employment physical exam except for those employees who provide results of blood tests or immunization documentation from their own health care providers. Any vaccinations required for these diseases will be given at no cost in our Occupational Health office.


Equal Opportunity Employer
Note: Job Postings are updated daily and remain online until filled. 


EEO is the Law
Learn more:
Important legal information


Homewood Campus

Apply now »
Find similar jobs: