2016 Joint Summits Tutorials

Monday, March 21

1:30 p.m. – 5:00 p.m.

T01: Tutorial - Fundamentals and Applications of Computational Causal Discovery in Biomedicine

R. Scheines, D. Danks, Carnegie Mellon University; X. Lu, University of Pittsburgh

The last 25 years have produced a revolution in statistical and computational tools for causal inference and discovery in biomedicine. In this tutorial, we focus on causal discovery in large data sets drawn from clinical and translational research.

In the first half, we explain the basics of graphical causal models using multiple examples from biomedicine and other fields. We teach the basic graphical causal model framework, including how to represent and model causal systems, how to model and compute the effects of interventions, and the assumptions that make causal search possible. We also cover several search algorithms for learning about causal structure from background knowledge and data. We discuss why regression and related techniques are unreliable for causal discovery and demonstrate superior alternatives. Throughout, we use the freely available Tetrad program to teach these ideas with hands-on exercises using simulated and real data sets.

In the second half, we turn to more complex causal learning situations. We discuss the problem of causal discovery in the presence of unmeasured confounders (latent variables) and algorithms that can reliably extract causal information even when the measured variables fail to include hidden common causes. We examine causal learning from time series data (repeated measurements, dynamic streams, etc.), as well as feedback systems. Time permitting, we conclude by opening discussion to the types of complex clinical and translational data that tutorial attendees are analyzing and the questions they seek to answer with them.

Prerequisites:

All attendees are expected to bring laptops and to download Tetrad (http://www.phil.cmu.edu/tetrad/current.html) software prior to arriving, as the tutorial will have hands-on examples and case studies.

Attendees are expected to have knowledge of basic statistical principles, but no prior graphical modeling experience is needed. Attendees should review presentations from the Center for Causal Discovery 2015 Summer Short Course (from which these tutorials are derived) (http://www.ccd.pitt.edu/training/summer-short-course-2015/) and/or the 2013 CMU Workshop, Case Studies of Causal Discovery with Model Search (http://www.hss.cmu.edu/philosophy/casestudiesworkshop.php) to prepare for the Short Course and better understand the material to be presented since much of the introductory material will be very abbreviated due to time constraints.

Wednesday, March 23

8:30 a.m. - 12:00 p.m. 

T02: Tutorial - Immuno-informatics Coming of Age: Emerging Approaches and Applications

H. Fan-Minogue, Stanford University

The immune system is a complex system of the human body and composed of numerous cellular components. Traditional immunology approaches have been very successful in analyzing each component and its function in average and across samples, but they often miss the network functioning principles that rely on and arise from interactions among immunological components. Recent advances in genomics technology and bioinformatics approaches have allowed high-resolution acquisition and analysis of high dimensional immunologically relevant data. Immuno-informatics, or computational immunology, thus emerged as a research area that encompasses high-throughput genomic and bioinformatics approaches to immunology. Immuno-informatics also plays an essential role in Systems Immunology that aims to gain integrated comprehension of the immune system structure and function. In this tutorial, I will give an introduction to immuno-informatics by revealing the principle of emerging genomics and single cell technologies that allow deep profiling of immune components, demonstrating new computational and statistical tools that provides comprehensive analysis and visualization of high-dimensional immunological data, and describing applications of immuno-informatics in understanding immune responses during diseases. Attendees should expect to learn about the essential technologies used for capturing immunophenotypes at single-cell level and what they measure. The advantages and trade-offs of these new technologies in contrast to traditional immunology approaches will be discussed. Attendees should also expect to learn about the machine learning approaches used to analyze single-cell data and how they are implemented. Finally, the tutorial will end with a discussion of immunology databases and tools to access/view the data, case studies showing how these data have been used to gain new insights of immune system to date, and a brainstorming session about how these resources can best be applied in a translational context.

10:30 a.m. - 3:00 p.m.

T03: Tutorial - Cancer Precision Medicine

L. Li, Z. Zhao, Vanderbilt University

Precision medicine is promising in both cancer research and clinical cancer care. However, precision medicine has many translational bioinformatics and medical informatics challenges. This tutorial aims to fill this much needed knowledge gap. We will introduce the analysis of DNA and RNA sequencing data generated by high throughput technologies and bioinformatics approaches to identify clinically relevant variants from large-scale cancer genomic data; review the cancer drug molecular pharmacology; illustrate the database integration for cancer drug targets and cancer drugs; demonstrate the operation of the cancer precision medicine clinics; and present challenges in the cancer precision medicine research. This tutorial is designed at the beginner’s level for the bioinformatics analysis of sequencing data and drug and target selections. We will focus more on the clinical applications of cancer precision medicine research.

Thursday, March 24

8:30 a.m. - 12:00 p.m. 

T04: Tutorial - Developing Executable Phenotype Algorithms Using the KNIME Analytics Platform 

H. Mo, Vanderbilt University; W. Thompson, Northwestern University; J. Pacheco, Northwestern University; R. Carroll, Vanderbilt University

KNIME Analytics (www.knime.org) is an open source platform that integrates data access, data transformation, statistical analysis, data-mining tools, and snippets of different programming languages in a visual workbench. It is the sixth most popular data science tool in the 2015 KDNuggets poll. The Electronic Medical Records and Genomics (eMERGE) network has implemented more and more phenotype algorithms, such as colon polyps and type 2 diabetes mellitus (T2DM), as KNIME workflows to enhance their inter-institutional portability. The PhEMA group has also demonstrated an executable implementation of the NQF Quality Data Model (QDM) on KNIME. In addition to the applications that we will demonstrate in this tutorial, KNIME provides toolkits for Next Generation Sequencing (NGS) analyses, data mining, text processing, and social media research. It also provides support for integrating code written in SQL, Java, R, Python, and other programming languages.

Part one of the presentation will cover fundamental KNIME concepts and operations, the development of extract, transform, and load (ETL) workflows, algorithm implementation, and effective communication in KNIME. After introducing basic concepts, we will show how ETL workflows can facilitate intra- and inter-institutional collaboration and that these workflows are executable artifacts which enable reproducible research. We will then implement a phenotype algorithm, developed and validated by eMERGE, to demonstrate the value of KNIME in a more complex workflow. We will conclude this section by emphasizing collaboration as one of the core benefits of KNIME, dedicating time to discuss effective communication using KNIME. In the second part of the presentation, we will discuss how to leverage the extensibility of KNIME for more sophisticated applications, including phenome-wide association studies (PheWAS), natural language processing (NLP), XML processing, and incorporating RESTful web services.

We will limit the lecture time to around 30% in order to allow the audience to have ample hands-on experience editing and executing workflows for most of the demonstration. Instructors will be available to answer questions during this hand-on session.