Findings from CAGI, the Critical Assessment of Genome Interpretation

August 28, 2017
Free for AMIA members; $50 for non-members
Steven E. Brenner, M.Phil., PhD

The Critical Assessment of Genome Interpretation (CAGI, \'kā-jē\) is a community experiment to objectively assess computational methods for determining the phenotypic impacts of genomic variation. The primary goals are to establish the state of the art, to show where future progress may best be made, to highlight innovations and progress, and to build a strong collaborative community. In the CAGI experiments, participants are typically provided genetic variants and make blind predictions of resulting phenotypes. These predictions are evaluated against gold-standard experimental or clinical data by independent assessors. Four CAGI experiments have been conducted to date – a pilot in 2010, and three full-scale events in 2011, 2013, and 2016; CAGI 5 is currently launching for 2017-2018. Each edition of CAGI involves about 10 challenges. The experiment is conducted over a period of about a year, starting with the identification and development of suitable challenges, followed by a period during which participants are invited to submit their predictions, then a term in which the independent assessors evaluate the results, and concluding with a meeting to discuss the outcomes.

There were notable discoveries throughout the CAGI experiment, and general themes emerged. The independent assessment found that top missense prediction methods are highly statistically significant, but individual variant accuracy is limited. Moreover, missense methods tend to correlate better with each other than with experiments (for reasons that may reflect the predictive methods and the assays themselves). There might be particular potential for missense interpretation at the extreme of the distribution. Structure-based missense methods excel in a few cases, while evolutionary-based methods have more consistent performance. Bespoke approaches often enhance performance. There have notable improvements in the ability to associate broad phenotypic profiles from genomes. The results showed that predicting complex traits from exomes is fraught. Interpretation of non-coding variants shows promise but is not at the level of missense. On clinical challenges, predictors were able to identify causal variants that were overlooked by the clinical laboratory, and it appears that physicians may not always order the most relevant genetic test for their patients. CAGI data show that running multiple uncalibrated methods and considering their consensus often provides undue confidence in their correlation; we therefore advise against running multiple uncalibrated variant interpretation tools in clinical analysis.  CAGI also highlights the challenges of creating a genetic study that provides a reliable gold standard.

Complete information about CAGI may be found at    

Community assessment has emerged as an effective framework to evaluate and develop approaches in computational biology, especially experiments in which participants are challenged to solve biological problems such as determining the phenotypic consequences of genomic variation, protein structure and system perturbations. Some such challenges use community-effort to engage a large community to see how well a certain method can achieve a certain goal. Successful challenge frameworks of this type are able not only to evaluate the effectiveness of methods but also to highlight innovation, progress, and bottlenecks in the field, to guide future research efforts, and to foster strong collaborative communities.

This talk will provide a broad overview of the CAGI experiment which reflects the current state of human genome interpretation. A major motivation for this talk is to explore the present challenges with the analysis and manipulation of human genomic data and to provide a forum for identifying critical research directions.  A second goal is to facilitate communication among researchers who have come to genome interpretation from disparate fields.

Learning Objectives

After participating in this activity, the learner should be better able to:

  • Understand state of the art methods in predicting phenotype from genotypic variants
  • Learn genome variant interpretation methods and how they should be applied in research scenarios and in the clinic
  • Discuss how to participate in CAGI as a predictor, dataset provider, or assessor
  • Know the challenges that exist in developing genomic variant interpretation methods

Speaker Information

Steven E. Brenner, M.Phil., PhD
University of California, Berkeley

Steven E. Brenner is a Professor at the University of California, Berkeley, and also holds appointments at the University of California, San Francisco and Lawrence Berkeley National Laboratory.  Brenner’s undergraduate research was in the first genome laboratory, mentored by Walter Gilbert at Harvard.  He received his M.Phil. from the Department of Biochemistry at Cambridge University, and earned a Ph.D. from the University of Cambridge and the MRC Laboratory of Molecular Biology where he studied with Cyrus Chothia. After graduation, Brenner had a brief fellowship at the Japan National Institute of Bioscience, followed by postdoctoral research supervised by Michael Levitt at Stanford University School of Medicine.

Brenner’s research is primarily in the area of computational genomics, covering topics in individual genome interpretation, protein structure, RNA regulation, and function prediction. Brenner has a commitment to supporting open science and development of the scientific community. He is also currently a director of the Human Genome Variation Society, Co-chair, and is a founding editor of PLoS Computational Biology. He was founding chair of the Computational Biology graduate program at Berkeley. He has served two terms as a director of the International Society for Computational Biology, was founding chair of the Global Alliance for Genomics and Health’s Data Working Group Variant Annotation Task Team and was a founding director of the Open Bioinformatics Foundation. His recognitions including being a Miller Professor, a Sloan Research Fellow, a Searle Scholar, an AAAS Fellow, an ISCB Fellow, and the recipient of ISCB’s Overton Prize.