Research Software Engineer – Clinical NLP Specialty (Data Science and AI Institute)
The Johns Hopkins Data Science and AI Institute (DSAI) is a new pan-institutional initiative at Johns Hopkins to advance artificial intelligence and its applications, in part through investments in the software engineering, data science, and machine learning space. DSAI is focused on revolutionizing discovery by advancing artificial intelligence that evolves collaboratively with human intelligence, combining the strengths of each for the betterment of society and the world in which we live. DSAI will bring together the mathematical, computational, and ethical foundations of AI with the domains of Health & Medicine, Scientific Discovery, Engineered Systems, Security & Safety, and People, Policy & Governance.
DSAI seeks a Research Software Engineer - Clinical NLP Specialty with strong academic background and relevant experience in industry or academia focused on designing and building state-of-the art clinical NLP systems. This position supports research initiatives in the development and novel application of NLP and large language models to extract insights from unstructured clinical text using techniques such as named entity recognition (NER), negation detection, structured data extraction, diagnosis prediction, risk stratification, temporal reasoning and phenotyping. The successful candidate will play a critical role in designing, implementing, rigorously evaluating, deploying and maintaining robust and scalable NLP pipelines and models to extract meaningful information from unstructured clinical text in secure environments, with the goal of enabling high-impact solutions across a range of biomedical domains. Experience with large language models - such as fine-tuning, prompt engineering, model evaluation, and adapting foundation models for domain-specific clinical tasks - is desirable, particularly in contexts that demand privacy, robustness, and interpretability. The clinical NLP RSE will work closely with clinicians, informatics researchers, data scientists and other RSEs to ensure NLP systems meet application goals with methodological rigor and scientific reproducibility.
DSAI engineers are at the forefront of modern data intensive science, where professionally developed software is rapidly becoming a key ingredient for success. The DSAI initiative includes the build-out of a substantive and professional-scale software engineering capability, and a dramatic increase in infrastructure, both in hardware and in personnel. JHU has long been a world leader in the broader domains of medicine and public health as well as a wide range of science and engineering fields. This combined with our ethos of building out capabilities to have demonstrable global impact (e.g., JHUs Coronavirus Resource Center the award-winning global resource for real-time data and analysis for COVID-19) and other unique large scientific data sets, like the archives for the Sloan Digital Sky Survey and several simulations, will be key leverage points that will make the DSAI successful.
Specific Duties & Responsibilities
- The successful candidates will participate in ground-breaking research projects that need advanced software solutions requiring expertise in software engineering not commonly found in scientific collaborations.
- The projects will require development of state-of-the art clinical NLP solutions using the latest deep learning libraries trained on state-of-the-art hardware in secure healthcare computing environments.
- Projects will involve analysis of massive data sets either in the cloud or on premises.
- Projects will require development of novel NLP software pipelines for processing of unstructured clinical notes.
- Some projects may require deep engagement, possibly leading to co-authorship on scientific publications, while others may involve a more casual consulting engagement.
- They may require software solutions developed from scratch or refactoring existing solutions to make them conform to industry standards (quality, efficiency, reusability, robustness, portability, documentation, etc.).
- It is a high-level goal of DSAI to translate the efforts for the individual projects into frameworks and template patterns for sustainable scientific infrastructure benefiting future projects.
Special knowledge, skills, and abilities
- Strong NLP, LLM, machine learning and deep learning skills.
- Practical experience building NLP models and pipelines in a secure, HIPPA compliant healthcare environment.
- Expert-level knowledge of multiple modern NLP and LLM libraries and models.
- Hands-on experience adapting and fine-tuning large language models for domain-specific clinical applications, with attention to data efficiency, interpretability, and reproducibility.
- Demonstrated expertise in prompt engineering, evaluation, and benchmarking of large language models, including applying responsible AI principles in clinical or sensitive-data contexts
- Expert-level knowledge of the Python programming language.
- Familiarity with or willingness to learn C++ or other languages as may be needed.
- Familiarity with software containerization technologies such as Docker and Singularity.
- Familiarity with the Databricks platform.
- Fluency in the Linux operating system and related tools.
- Familiarity with modern software engineering best practices, such as Git source control, peer code review, test-driven development, build automation and continuous integration / continuous delivery.
- Familiarity with cloud development and deployment.
- Demonstrated leadership and self-direction.
- Willingness to teach others both informally and in short course format.
- Willingness to continually learn new tools and techniques as needed.
- Excellent verbal and written communication.
Minimum Qualifications
- Masters in a quantitative discipline such as computer science, engineering, physics or bioinformatics, with strong scientific computing and/or mathematics background.
- Three year's experience working in software development in large clinical NLP projects in industry or academia.
- Additional education may substitute for required experience, and additional related experience may substitute for required education beyond a high school diploma/graduation equivalent, to the extent permitted by the JHU equivalency formula.
Preferred Qualifications
- PhD in a quantitative discipline.
- Five (5) years’ experience as above in clinical NLP.
- Experience in CUDA GPU programming.
- Experience authoring open-source Python packages in PyPI.
- Experience in open-source project governance.
- Experience in open-source community adoption initiatives.
Classified Title: Scientific Software Engineer
Job Posting Title (Working Title): Research Software Engineer – Clinical NLP Specialty (Data Science and AI Institute)
Role/Level/Range: APPTSTAF/01/ST
Starting Salary Range: Commensurate w/exp.
Employee group: Full Time
Schedule: 37.5 hrs/wk, M-F
FLSA Status: Exempt
Location: Hybrid/Homewood Campus
Department name: DSAI Institute
Personnel area: Whiting School of Engineering