• March 19-23, San Francisco

    2012 Joint Summits on Translational Science

    AMIA presents two essential back-to-back meetings for TBI and CRI scientists working throughout the spectrum of translational science. A ‘Bridge Day’ joins the two summit meetings for a day of shared sessions, a keynote, and informational exchange.


T01: Introduction to Translational Bioinformatics

Indra Neil Sarkar, University of Vermont and Jessica Tenenbaum, Duke University

In 2005, Dr. Elias Zerhouni, Director of the National Institutes of Health (NIH), wrote "It is the responsibility of those of us involved in today’s biomedical research enterprise to translate the remarkable scientific innovations we are witnessing into health gains for the nation... At no other time has the need for a robust, bidirectional information flow between basic and translational scientists been so necessary." Clearly evident in Dr. Zerhouni’s quote is the role biomedical informatics needs to play in facilitating translational medicine. American Medical Informatics Association (AMIA) now hosts the Joint Summits on Translational Science of which the Summit on Translational Bioinformatics is one of the two components. This tutorial is designed to teach the basics of the various types of molecular data and methodologies currently used in bioinformatics and genomics research, and how these can interface with clinical data. This tutorial will address the hypotheses one can start with by integrating molecular biological data with clinical data, and will show how to implement systems to address these hypotheses. The tutorial will cover real-world case-studies of how genetic, genomics, and proteomic data has been integrated with clinical data.

By the end of the tutorial, participants will be able to:

  • Understand why biologists and clinicians use each measurement technology, and the advantages of each.
  • appropriate for studying diseases.
  • Be able to list high-level requirements for an infrastructure relating research and clinical genetic and genomic data.

Outline of Topics:

  • Basic understanding of various genome-scale measurement modalities: sequencing, polymorphisms, haplotypes, proteomics, gene expression, metabolomics, and others
  • Crucial difference between genetic and genomic data
  • Nature and format of expression, polymorphism, proteomics, and sequencing data
  • Overview of the most commonly used structured vocabularies, taxonomies, and ontologies used in genomics research
  • Description of the most frequently used analysis and clustering techniques
  • How the genetic predisposition to disease is studied
  • Use of genetic information across medical specialties
  • How to find clinical genetic tests
  • Genomic and clinical data to study patient disease-free status and survival
  • How informatics can be used to identify potential drug targets
  • Types of biomarkers
  • Parallels between research methods in medical informatics and bioinformatics
  • Relating clinical measurements with molecular measurements

Intended Audience: Academic faculty or professionals setting up bioinformatics facilities and/or relating these to clinical data repositories, or to data from General Clinical Research Centers or Clinical and Translational Science Awards; health information professionals responsible for clinical databases or data warehouses and tying these to researchers; informaticians, clinicians, and scientists interested in genetics, functional genomics, and microarray analysis; physicians interested in how medicine is advancing through the use of genomics and genetics; and students.

Content level: 20% Basic, 50% Intermediate, 30% Advanced

T02: Ontology services for translational research in the i2b2 Workbench

Shawn Murphy, Massachusetts General Hospital and Ray Ferguson, Stanford University

The i2b2 platform uses vocabularies extensively in the querying and manipulation of patient data for translational and clinical research. The National Center for Biomedical Ontology offers an extensive range of Web services for accessing ontologies, generating value sets and lexicons, annotating data, and performing information retrieval that form key elements of software systems in informatics. Given the importance of the use of ontologies for data integration, NCBO and I2b2 have initiated a collaboration to provide access to cutting edge ontology services from within the i2b2 workbench to enable cross institution data transformation. This tutorial will review the drivers of this collaboration, provide experience in using the NCBO's resources, and will offer participants in-depth understanding of how ontologies and terminologies are used in biomedical informatics. This tutorial will review the use of ontologies in i2b2 and discuss their integration with the NCBO's Ontology Web Services infrastructure.

By the end of the tutorial, participants will be able to:

  • Understand i2b2 ontology representation.
  • Understand how i2b2 ontology's are manipulated in performing patient queries.
  • Understand the biomedical ontology landscape.
  • Understand the NCBO infrastructure available for data annotation and ontology access.
  • Conceive workflows that utilize NCBO Web Services and i2b2 workbench to solve their own data entry and integration problems.

Outline of Topics:

  • i2b2 ontology representation
  • diagnosis and procedures (items without associated values)
  • laboratories (items with associated values)
  • items with modifiers
  • Use of i2b2 ontology representation in patient queries
  • Overview of NCBO Web services
  • Web-based tools for Ontology search, visualization and review
  • Tools and Web Services for data annotation and semantic integration
  • Design of custom workflows to utilize ontology-services

Intended Audience: Scientists, researchers, healthcare analysts, database programmers and informaticians seeking to understand how to optimally use ontologies for clinical data integration. Health IT System developers and CIOs seeking to understand how to leverage NIH-funded infrastructure for using Ontologies.

Content level: 50% basic; 30% intermediate; 20% advanced

T03: Reusing EHRs for Clinical, Genomic, and Pharmacogenomic Discovery at Vanderbilt and within the eMERGE Network

Joshua Denny and Hua Xu Vanderbilt University

This tutorial will cover basic themes about use of EHRs for generating cohorts of patients to serve as cases and controls for given clinical phenotypes. EHRs can be used for many different types of research include disease-based, response to treatment, clinical biomarkers, redefining “normal”, and analysis of changes over time of clinical variables and parameters. Deriving such phenotypes from EHR data can be challenging. Methods typically involve use of billing data, medication records (often unstructured), laboratory data, and natural language processing. After derivation of these phenotypes, populations can be used for clinical research. Linkage to DNA biobanks also enables the possibility of genomic and pharmacogenomic associations. Research within the eMERGE network has demonstrated success with EHR-based genome-wide association studies (GWAS). In addition, use of EHR-linked genetic data uniquely enables phenome-wide associations studies (PheWAS), which allows an unbiased scan of what diseases may be associated with a given genotype.

This tutorial will review the design of EHR-linked biobanks; methods for creating phenotype algorithms with integrated use of NLP, billing codes, laboratory and test results, and medication records (with review of a number of case studies); basic principles of genetic analysis (candidate gene, GWAS, exome chips, and other platforms); use of standard vocabulary in representing and constructing phenotype algorithms; and application of PheWAS to further characterize clinical variants.

In addition to didactic portions of the tutorial, investigators will be encouraged to work through specific phenotypes of interest and begin discussion of phenotype algorithms as real-world examples. Several case-studies will be presented and worked through during the tutorial in an interactive fashion.

Topics to be covered:

  • Design of EHR-linked DNA biobanks
    • Opt-in design
    • Opt-out design
  • De-identification
  • Overview of eMERGE I and II networks, composition, and goals
  • Introduction to genomewide association studies (GWAS)
    • Discussion of example primary GWAS from eMERGE
    • Examples of “GWAS” reuse for new phenotypes, with significant new findings
  • Approaches
    • Types of data well represented in EHRs
    • When to use a fully-automated phenotyping approaches vs. “computer assisted” approach, and the importance of specific review interfaces (with demonstrations)
  • Phenotyping algorithms
    • Case studies in several eMERGE algorithms: what has worked, and what hasn’t
    • Discussion of experiences with cross-institution implementation of phenotype
    • Examples of disease-based and pharmacogenetic algorithms
    • Classes of data available and strengths/weaknesses of each
    • Multimodal approaches
    • Experiments in transportability of phenotype algorithms across sites
    • Deterministic vs. machine learning approaches
    • Evaluation of the accuracy of different EMR categories of information for accurate phenotype algorithms
    • Development of sharable phenotype libraries available for public use
  • NLP methods
    • Medication extraction
    • General NLP for conceptual elements
    • General NLP vs. specific NLP
    • Demonstration of some available tools
  • Methods for evaluating the effectiveness (e.g., positive predictive value) of phenotyping algorithms
  • Use of standard vocabularies for data dictionaries and phenotype algorithms
    • Demonstration of eleMAP, which allows efficient browsing of available structured representations of phenotype elements, demographics, and comorbidities
  • Phenome-wide association studies (PheWAS) using EMR data for relevant genomic variants
    • Methodology and general validation
    • Case studies

Intended Audience: clinical and genetic researchers; providers interested in reuse of EHR data; translational bioinformaticians and medical informaticians.

Content level: 30% basic; 50% intermediate; 20% advanced

T04: Introduction to R for Bioinformatics and Biomedicine

David Ruau, Stanford University

To analyze multiple different genomic data types. Bioconductor, a package repository for bioinformatics, contains 467 packages in addition of the 3128 general-purpose packages from R. The wide array of possibility make R a platform particularly suited for translational bioinformatics research. However, like other statistical software, the learning curve can be steep for some of us less versed in computer science. This tutorial is based on the successful workshop “Introduction to R programming” taught at Stanford. Participants will be introduced to the basic of the R language through practical examples from real biomedical research projects. We will show advance techniques on how different resources can be plug into R to perform an analysis and produce publication ready graphics.

At the end of this tutorial, participants will be able to:

  • To import and export data from different resources, including databases.
  • Use R to transform and manipulate their data and perform exploratory statistical graphs.
  • Write small R functions and objects.
  • Pre-process raw DNA microarray data and explore their gene lists.
  • Interpret their results using metadata from KEGG database.
  • How to perform classical statistical tests.

Outline of topics:

  • Introduction to the R console and interactivity concept
  • How to write R code
  • Producing publication grade quality graphics easily
  • Directly downloading raw data from public microarray repositories
  • Clustering and graphical solutions available in R
  • Knowledge database and metadata accessible through R package repositories
  • Advance topics (GWAS, high level graphics, reproducible research...)

Intended audience: Academic and professional wanting to gain hands-on skills to analyze biomedical or clinical data as well as an overview of R possibilities. Translational scientists and students interested in analyzing their genomic data and to learn how to integrate them with external resources.

Content level: 30% Basic, 50% Intermediate, 20% Advanced