2019 August JAMIA Journal Club Webinar

August 8, 2019
Free for AMIA members; $50 for non-members.
Majid Afshar, MD, MSCR; Ron Price, Jr.

Development and application of a high throughput natural language processing architecture to convert all clinical documents in a clinical data warehouse into standardized medical vocabularies

Co-authors Majid Afshar and Ron Price, Jr. will discuss this month's JAMIA Journal Club selection:

Afshar M, Dligach D, Sharma B, et al. Development and application of a high throughput natural language processing architecture to convert all clinical documents in a clinical data warehouse into standardized medical vocabularies. J Am Med Inform Assoc. 2019 May 30. pii: ocz068. doi: 10.1093/jamia/ocz068. [Epub ahead of print] [Abstract]


Majid Afshar, MD, MSCR
Assistant Professor
Division of Pulmonary and Critical Care
Department of Public Health Sciences
Center for Health Outcomes and Informatics Research
Maywood, IL

Dr. Afshar maintains an NIH-funded informatics and data science laboratory dedicated to prevention and early identification of diseases in critically ill patients with a focus in substance misuse, respiratory failure, and sepsis. The lab employs methods in in natural language processing (NLP) and machine learning in collaboration with his Computer Science collaborator, Dr. Dmitriy Dligach, and Biostatistician collaborator, Dr. Cara Joyce. Dr. Afshar has built an infrastructure to perform NLP and machine learning tasks with electronic health record data for high throughput computable phenotypes and applied predictive modeling. He has published work deriving and validating NLP classifiers for substance misuse, respiratory failure identification, and prediction of sepsis development. He is currently working on automated methods in substance misuse screening and designing quasi-experimental studies as well as adaptive platform clinical studies using both supervised and unsupervised machine learning methods.

Dr. Afshar received his medical degree and clinical training at Rush University and University of Maryland. During this time, he also received his Master’s in Clinical Research. He practices clinically as an intensivist at Loyola University Medical Center where he takes care of patients with the conditions that are also the focus of his research. He is board certified in critical care medicine, pulmonary medicine, and clinical informatics. He previously received an NIH F32 National Research Service Award during his fellowship at University of Maryland and currently has an NIH K23 Career Development Award.

Ron Price, Jr.
Associate Provost
Office of Informatics and Systems Development
Loyola University Chicago Health Sciences
Maywood, IL

Mr. Price's responsibilities include direction of technology teams that identify, design and implement computing initiatives that seek to advance the strategic goals of Loyola University Chicago Health Sciences. The Health Sciences Campus (HSC) is comprised of the Stritch School of Medicine (SSOM), the Marcella Niehoff School of Nursing (MNSON) and the Parkinson School of Health Sciences & Public Health.The Office of Informatics and Systems Development (OISD) has significant systems development expertise utilizing open source technologies to create educational resources, clinical and research data repositories and large-scale clinical analytics infrastructure.

Current OISD initiatives focus on traditional high-performance computing resources, “big data” environments and distributed computing approaches that support natural language processing, radiomics and machine learning. OISD supports creation and management of the institution’s PCORI and CTSA/ITM clinical data research networks and provides clinical analytics support to 250+ clinical research projects annually.

Mr. Price joined the SSOM in 1987, and beyond his current Associate Provost role has held positions of Assistant Director of the Division of Medical Informatics, Associate Dean of SSOM, and Loyola University Health System (LUHS) Chief Technologist/Chief Information Security Officer (CISO).  Mr. Price has a degree from the University of North Carolina at Chapel Hill and is a RedHat Certified Engineer (RHCE). His accomplishments include numerous publications, patents covering health-care analytics (awarded and pending), and national presentations regarding his work at HSC and the Loyola University Health System.  


  • 35-minute discussion between the authors and the JAMIA Student Editorial Board moderators including salient features of the published study and its potential impact on practice.
  • 25-minute discussion of questions submitted by listeners via the webinar tools and moderated by JAMIA Student Editorial Board members


JAMIA Journal Club managers and monthly moderators are JAMIA Student Editorial Board members:

Kelson Zawack, PhD, Postdoctoral Fellow, Biostatistics Department, Yale University, New Haven, CT

Tiffany J. Callahan, MPH, PhD Candidate, Computational Bioscience Program at the University of Colorado Denver Anschutz Medical Campus, Aurora, CO


Daniel Feller, MS, PhD Candidate in Biomedical Informatics, Columbia University, New York, NY


The PubMed citation for the paper under discussion is:

Afshar M, Dligach D, Sharma B, et al. Development and application of a high throughput natural language processing architecture to convert all clinical documents in a clinical data warehouse into standardized medical vocabularies. J Am Med Inform Assoc. 2019 May 30. pii: ocz068. doi: 10.1093/jamia/ocz068. [Epub ahead of print] [Abstract]

Student Access

Students who are not AMIA members, but whose academic institutions are members of the Academic Forum, are eligible for a complimentary JAMIA Journal Club registration. Please contact Susanne Arnold at susanne@amia.org for the discount code. In the email, please include: full name, Academic Department, and the primary Academic Forum representative of that Academic Department. Note that AMIA Student memberships are $50, which allow access to JAMIA, all JAMIA Journal Clubs, and other webinars of interest to the biomedical informatics community. 

Statement of Purpose

Information in the clinical narrative of the electronic health record (EHR) is a rich source of data and comprises a large majority of patient data, but its unstructured format renders it complex and difficult to utilize. Clinical data warehouses of health systems are becoming larger and more efficient in today’s health data ecosystem; therefore, high throughput architectures to manage and process the data are needed. Large-scale efforts at de-identification of clinical notes and curation of the data for research purposes are underway in the National Center for Advancing Translational Sciences (NCATS). Methods in natural language processing (NLP) have proven effective in automatic semantic analyses of clinical documents with concept mapping to standardized medical vocabularies. Several centers have demonstrated success in high throughput NLP but little guidance exists on optimizing their performance for an entire health system. We aim to develop a high throughput NLP architecture using the cTAKES engine to concept map over ten years of clinical documents from our CDW using the Unified Medical Language System (UMLS). Second, we aim to examine the application of our architecture in the context of a hospital 30-day readmission prediction task.

Our high throughput NLP architecture converted our health system’s data corpus of over 84 million unstructured clinical notes into a completely de-identified data repository of nearly 40 billion structured and standardized data elements. This task was accomplished at a rate of over 500,000 documents per hour through our on-premise data center. The result for predicting 30-day hospital readmission demonstrate that mapped concepts from UMLS performed similar to n-grams. The processed data is a new addition to our clinical research database for researchers and administrators interested in data mining and analytics from any note or report. This may be more appealing for end-users and researchers interested in using clinical notes from their CDW, and our results suggest that CUI features with standardized medical vocabulary is one option for large-scale clinical research in data analytics.

Target Audience

The target audience for this activity is professionals and students interested in biomedical and health informatics.

Learning Objectives

The general learning objective for all of the JAMIA Journal Club webinars is that participants will

  • Use a critical appraisal process to assess article validity and to gauge article findings' relevance to practice

After this live activity, the participant should be better able to:

  • Understand how to design a high throughput NLP architecture to produce a deidentified clinical data warehouse of a health system’s corpus of notes converted into standardized medical vocabularies, and
  • Apply concept unique identifiers (CUIs) from a big database/clinical data warehouse to perform data analytics such as applied predictive modelling or phenotyping tasks.

This JAMIA Journal Club does not offer continuing education credit.

In our dedication to providing unbiased education even when no CE credit is associated with it, we provide planners’ and presenters’ disclosure of relevant financial relationships with commercial interests that has the potential to introduce bias in the presentation: 

Disclosures for this Activity

These faculty, planners, and staff who are in a position to control the content of this activity disclose that they and their life partners have no relevant financial relationships with commercial interests: 

JAMIA Journal Club presenters: Majid Afshar, Ron Price, Jr.
JAMIA Journal Club planners: Michael Chiang, Kelson Zawack, Tiffany J. Callahan, Daniel Feller
AMIA staff: Susanne Arnold, Pesha Rubinstein

Contact Info

For questions about webinar access, email Susanne@amia.org.