AMIA 2019 Informatics Summit Workshops

Monday, March 25
8:30 a.m. – 12:00 p.m.

N. Polys, Virginia Tech

Patient outcomes and enterprise efficiencies depend on the quality and timely delivery of information. Increasingly, 3D information is being generated and used in healthcare; this workshop will explore and demonstrate the value of interoperability and the opportunities of open, International Standards technologies. From imaging and scanning to 3D printing and Virtual Reality, Extensible 3D (X3D) provides for durable data interchange and portable presentation natively over the WWW. This workshop will cover the wide range of methods and patterns used to develop interactive 3D applications based on royalty-free and open ISO-IEC standards. As a high-level scene graph language and API above the graphics library, Extensible 3D (X3D) provides a suite of standards including multiple data encodings and language bindings. We will explore the myriad of approaches, tool chains, and applications for building X3D objects and scenes, especially concerning medical and health informatics.

A. Solomonides, NorthShore University HealthSystem; K. Fultz Hollis, Oregon Health & Science University; A. Mosa, University of Missouri Medicine; N. Sánchez-Pinto, Northwestern University; L. Sheets, University of Missouri Medicine

What is data governance? As we shall explore in this workshop, it comprises the regulatory principles, policies and strategies adopted, the functions and roles that must be created to implement these policies and strategies, and the consequent architectural designs that provide both a home for the data and, less obviously, an operational expression of policies in the form of controls and audits. The reasons for the extraordinary measures taken by institutions to protect the data lie in the value of that data as a strategic asset and in the internal and external threats to the data. The workshop will include a presentation of background knowledge of principles (especially of recent developments), and an opportunity to role-play various data governance-related positions in an organization. Discussion of principles and of the simulated experience will complete the program.

R. Nagarajan, University of Kentucky; S. Madhavan, Georgetown University; F. Lee, IBM

Exponential growth and diversity in healthcare data from heterogeneous sources demands novel ecosystems for their storage, querying, visualizations and analytics. This is especially true for cancer where there is increasing emphasis on using high-throughput genomic assays for informed clinical decision making. Scalable and distributed open-source platforms such as Apache Spark in conjunction with multithreaded architectures can be especially helpful in addressing these analytic challenges. The proposed workshop will introduce the audience to the essentials of Apache Spark including case studies with publicly available cancer genomic data sets using best practices GATK pipelines on GPU-enabled Open Power Architectures. Usefulness of the genomic results in virtual tumor boards for treatment planning and genomic data standards will also be discussed.

L. Rasmussen, E. Whitley, Northwestern University

The practice of reproducible research allows a researcher to recreate the exact same results each time an analysis is performed, and is applicable from experimentation to analysis and publication. For some, the bar to conducting reproducible research may seem too high. This can be caused by technical barriers, a perceived need to switch away from their favorite software, or the impression that reproducible research is an “all-or-nothing” endeavor. In this workshop we will explore how to approach reproducible research: steps for starting small, expanding your capability, and both technical and non-technical strategies to help along the way. The first half of the workshop will engage participants in thinking through reproducible research as it applies to their own projects. The second half of the workshop will walk through two use cases in applying technology to reproducible research: one on EHR-based phenotypes, and another on natural language processing. Technologies explored will include source code control, electronic laboratory notebooks, containers and dynamic documents. Given the introductory nature of the workshop, only a high-level survey of available tools and software will be provided, but will equip participants to better assess their next steps for incorporating reproducible research practices in their own projects.

T. Mentnech, S. Zhao, University of Utah; K. Fultz Hollis, Oregon Health & Science University

Research reproducibility is not easy and takes time to learn. When we present methods we used to produce data can other researchers reproduce the same results? Methods to achieve reproducibility has gained traction since the 2005 article by Young, Ioannidis, and Al-Ubaydli although more recently, Baker reported in Nature that “more than 70 percent of researchers have tried and failed to reproduce another scientist’s experiments, and more than half have failed to reproduce their own experiments.” Many of the tools and principles to support reproducibility research have a learning curve that can cause hesitation to implement. This workshop will provide participants with the opportunity to learn, discuss, and put into practice the basic principles of clinical reproducibility as well as with improved reproducibility, better ways to share research data. In early 2018 Philip Stark said, “Science should be ‘show me’, not ‘trust me’…”; therefore, participants will be able to identify parts of their research workflow and be able to practice tasks to improve reproducibility and to enhance the rigor and transparency of their work.

Tuesday, March 26
10:30 a.m. – 3:00 p.m.

E. Crowgey, J. Myers, Nemours Alfred I. DuPont Hospital for Children; J. Romano, Columbia University; J. Andrade, S. Volchenboum, University of Chicago

The cost and speed of DNA sequencing have drastically improved due to innovations in next-generation sequencing (NGS). Whole exome sequencing (WES), or targeted gene panel sequencing, generates large unstructured datasets that require customized bioinformatic pipelines. This tutorial will provide attendees with (1) an opportunity to run WES bioinformatics pipelines for processing fastq files (raw) into variant call files (VCF) via a high-performance cluster hosted at University of Chicago, and (2) hands-on examples for interpreting WES variant data using publicly-available resources.

L. Pruinelli, University of Minnesota; T. Winden, Kansas University Medical Center; S. Johnson, University of Minnesota

The National Institutes for Health (NIH) supports multiple initiatives for data driven discovery and for workforce development in data science. Many healthcare leaders lack a broad understanding of the concepts and resources available to conduct real-world analytic projects. The goal of this workshop is to give leaders a foundation for understanding data science principles and tools. This is an intermediate level, hands-on workshop where participants will learn all phases of a data science project in order for them to have a better overall understanding and to improve collaboration with data science staff at their organization. Participants will be given access to materials prior to the workshop to be pre-loaded on their laptops. Participants are expected to have a basic understanding of analytics and informatics but are not required to have any experience with data science tools. They will undertake a data science project focusing on data preparation, exploratory data analysis and building a machine learning model using the Python scikit-learn toolset. The workshop will conclude with a discussion of how the model can be integrated with the EHR and how patient outcome data is used to assess model performance.