Enterprise IT Architect, Research Computing (IT@JH Research Computing)
IT@JH Research Computing is seeking an Enterprise IT Architect, Research Computing who serves as the principal technologist and deep technical authority for cutting-edge research computing infrastructure, driving the design and evolution of HPC and AI platforms at scale. This role architects and implements next-generation GPU/CPU clusters, high-bandwidth InfiniBand and Ethernet fabrics, large-scale parallel and object storage systems, and advanced AI/ML compute environments optimized for large language models and data-intensive science. The Architect is highly hands-on building and automating secure, reproducible, and high-performance systems, from kernel-level tuning and container orchestration to workload schedulers and hybrid cloud integration. While primarily technical, the position also provides mentorship and technical guidance to junior engineers, cultivating expertise in areas such as distributed computing, GPU acceleration, high-throughput data pipelines, and exascale-ready architectures. Working closely with the Director and IT Manager, the Architect ensures the research computing ecosystem remains robust, scalable, and future-ready for emerging AI-driven and computational science workloads.
Specific Duties & Responsibilities
Strategy/Leadership
- Collaborate with, and frequently lead, other technical experts, architects, and subject matter experts to contribute to technology elements for the enterprise.
- Develop guiding principles, standards, and best practices consumed by execution teams.
- Responsible for IT vision, strategy, technology innovations, and architecture services of enterprise infrastructure and applications; design, document and implement a strategic roadmap for assigned technologies.
- Address highly complex problems by meeting with clients to observe and understand current processes and the issues related to those processes. Provide written documentation of findings to share with the client and other IT colleagues.
Tracks a broad range of emerging technologies to
- Determine maturity and applicability to the enterprise.
- Assess the relative impact to IT strategy and interpret meaning to senior IT leadership team.
- Lead and manage strategic activities, including adoption of enterprise products and continuous integration strategies.
Design, Development, and Deployment
- Develop, manage, and maintain standards, partner in the development, management, establishment, and enforcement of technical standards.
- Support the translation of business and technical objectives into solution architecture requirements, application of multiple technical solutions to business problems while leveraging enterprise solutions.
- Provide support and enforcement of compliance with enterprise support and identify opportunities to fully leverage offerings to the benefit of the enterprise.
- Develop detailed tasks and project plans by analyzing project scope and milestones for highly complex projects to ensure product is delivered in a timely fashion according to lifecycle standards.
- Provide experienced leadership for strategic planning in designing and developing comprehensive, innovative, and integrated solutions. Oversee and mentor junior staff by reviewing tasks and milestones for quality standards and provide guidance in system/application design and development.
- Integrates multiple cross functional processes and disciplines to meet business and technical requirements by bringing perspective from all architecture domains (process, system, application information, data, and security).
Implementation and Maintenance
- Oversee changes by adhering to the change management policies and procedures for any given project to communicate to all parties the nature, significance, and risk factors of the solution.
- Monitor changes and resolve highly complex problems requiring the highest level of technical expertise by responding as they occur, by reviewing all processing and output of the newly implemented solution, and by proactively ensuring the solution works successfully to satisfy the customer requirements and to provide a smooth transition to the new solution.
- Strong written and oral communication skills to effectively lead change and communicate with business and IT staff.
- The ability to embrace change, adapt to the unexpected, and focus energies, people, and solutions on practical and positive results.
- Other duties as assigned.
In addition to the duties described above
- Lead architecture, design, deployment, and lifecycle management of large-scale CPU and GPU clusters to support campus research needs.
- Define hardware/software reference architectures and lead validated procurements (compute, accelerators, storage tiers, interconnects, power/cooling).
- Own capacity planning, refresh strategy aligned with research demand and grant cycles.
- Architect and tune scheduler and resource-management policies for computing environments.
- Design and operate high-performance storage and data lifecycle solutions.
- Architect and tune fabric and networking for low-latency, high-bandwidth workloads.
- Drive performance engineering and benchmarking for science and AI workloads.
- Lead automation and reproducible operations.
- Ensure reliability and observability: monitoring, logging, SLAs/OLAs, incident response, disaster recovery and high availability planning.
- Advance security and compliance posture to ensure environment is secure while still accessible and meets the requirements of the University. Serves as an expert in translating security compliance requirements into controls within the computing systems.
- Supervise and grow the engineering team: recruit and hire, set goals and performance expectations, conduct reviews, mentor staff, and develop career paths.
- Foster cross-team collaboration and training: coordinate with campus IT, storage/network/security teams, and research groups.
- Track emerging technologies and produce strategic roadmaps and recommendations to align infrastructure with institutional research priorities.
- Contribute to the programs positive and responsive customer service culture.
- Oversee changes by adhering to the change management policies and procedures for any given project to communicate to all parties the nature, significance, and risk factors of the solution.
- Monitor changes and resolve highly complex problems requiring the highest level of technical expertise by responding as they occur, by reviewing all processing and output of the newly implemented solution, and by proactively ensuring the solution works successfully in order to satisfy the customer requirements and to provide a smooth transition to the new solution.
- Ability to embrace change, adapt to the unexpected, and focus energies, people, and solutions on practical and positive results.
- Model Johns Hopkins core values, specifically in embracing and valuing different backgrounds, opinions, and experiences; commit to exceptional quality and service; inspire others to achieve their best and have the courage to do the right thing; and be kind.
On-Call Rotation
- Participate in a 24/7 on-call rotation.
Minimum Qualifications
- Bachelor’s Degree.
- Eight years of related work experience with computer systems and applications.
- Additional education may substitute for required experience and additional related experience may substitute for required education beyond a high school diploma/graduation equivalent, to the extent permitted by the JHU equivalency formula.
Preferred Qualifications
- At least 15 years of professional experience in HPC systems engineering or architecture, including proven expertise with cluster design, scheduling systems (e.g., SLURM), distributed storage, and high-speed interconnects.
- Must have a minimum of 5 years of managerial experience.
- Strong skills in Linux systems administration, configuration management, and performance tuning are essential, along with a demonstrated track record of leading major system deployments or upgrades.
- Expertise in collaborating with researchers, engineers, and campus IT to define scalable and reliable solutions, evaluate emerging technologies, and translate complex research requirements into robust, maintainable systems while ensuring the long-term sustainability and efficiency of computing resources.
Technical Skills
(Expected Skills/Proficiency Level)
- Database Management Systems - Intermediate
- Amazon Web Services - Intermediate
- Microsoft Azure - Intermediate
- Virtualization - Intermediate
- IT Infrastructure - Intermediate
- Programming Languages - Advanced
- Integration Testing (HK) - Intermediate
Classified Title: Enterprise IT Architect
Job Posting Title (Working Title): Enterprise IT Architect, Research Computing (IT@JH Research Computing)
Role/Level/Range: ATP/04/PH
Starting Salary Range: $116,600 - $204,000 Annually (Commensurate w/exp.)
Employee group: Full Time
Schedule: Mon-Fri 8:30am-5:00pm
FLSA Status: Exempt
Location: Hybrid/Johns Hopkins Bayview
Department name: IT@JH Research Computing
Personnel area: University Administration