TBI Design Challenge
The growing abundance of omic and phenotypic data needs suitable tools to increase access, integration and use by the broader research community. Applications that integrate different types of data through publicly available data repositories provide support for and advancement of translational research.
For the purpose of supporting translational research to improve our understanding of disease, develop an original framework, rapid prototype or working application for clustering phenotypic and omic information. This should be done, minimally, with respect to Lung Adenocarcinoma (LUAD) and Lung Squamous Cell Carcinoma (LUSC) cancers obtained from The Cancer Genome Atlas (TCGA). Although this is a specific application to lung cancer within TCGA, solutions that can be generalized to other cancers/diseases and repositories are preferred.
TBI20: Improving Understanding through TCGA Data Integration
Tuesday, March 24
3:30 p.m. – 5:00 p.m.
GenePool: A Cloud-Based Platform for Interactive Visualization and Integrative Analysis of Genomics and Clinical Data
H. Fan Minogue, M. Sirota, Stanford University School of Medicine; S. Sanga, Station X; D. Hadley, A. Butte, Stanford University School of Medicine; T. Klingler, Station X
Advances in genomic technology harnessed by large-scale efforts from interdisciplinary consortia have generated near comprehensive genomic data and provided unprecedented opportunities for understanding human health and disease. The Cancer Genome Atlas (TCGA) in particular contains comprehensive clinical, genomics and transcriptomics measurements across over 16,000 of patient samples and over 30 tumor types. However, the vast volume and complexity of these data has also brought challenges to biomedical research community for translating them into insights about diseases quickly and reliably. Here we present a cloud-based platform, GenePool, which allows rapid interrogation of multi-genome data and their associated metadata from public or private datasets in a secure and interactive way. We demonstrate the functionality of GenePool with two use cases highlighting interactive sample browsing and selection, comparative analysis between disease subtypes and integrated analysis of genomic, proteomic and phenotypic data. These examples also display the promise of GenePool to accelerate the identification of genetic contributions to human disease and the translation of these findings into clinically actionable results.
A Prototype Software Pipeline to Identify Mutated Genes that Have a Similar Effect on Tumor Transcription
S. Piccolo, Brigham Young University
Genome-wide studies have shown that a wide array of somatic mutations occur in non-small cell lung cancers. These mutations influence tumor growth via altering signaling cascades. Patient responses to treatments are highly variable, perhaps because different cascades are affected or because different components within a given cascade have different downstream effects. This manuscript describes a software pipeline for parsing somatic mutation data and grouping mutations according to similarities in gene expression for samples that either carry or do not carry mutations in a given gene. Genes that show relatively high similarity in gene expression are considered to have similar effects on tumor biology and thus may respond similarly to treatments. The tool's utility is illustrated via examining the effects of KRAS and EGFR mutations on gene expression, using tumor data from The Cancer Genome Atlas.
CRI Design Challenge: Structuring Clinical Trial Eligibility Criteria
A critical step in clinical and translational science is to structure clinical research eligibility criteria. These criteria exist largely as free text, leading to vagueness and ambiguity in interpretation.
To improve the precision and computability of clinical research eligibility criteria, we elicit submissions for transforming free-text eligibility criteria on ClinicalTrials.gov into executable inclusion rules using whatever knowledge representations you wish. The emphases for this first year of the Design Challenge are the (estimated) computability of the extracted criteria and the coverage by those computable elements over the natural language expression in the original eligibility criteria. If you have a preferred form of expression for the extracted criteria you are free to use that. If not, we encourage, but do not require, the use of the OMOP Common Data model allowing for more ready comparison of the various submissions. This approach will allow the results to be used in a number of contexts, from assessing the similarity of trials to automating the specification of patient cohorts
CRI06: Design Challenge - Structuring Clinical Research Eligibility Criteria
Wednesday, March 25
1:30 p.m. – 3:00 p.m.
A Standardized Approach to Cohort Definition Applied Across a Network of OMOP-Compliant Observational Databases for Clinical Trial Feasibility Assessment
C. Knoll, F. DeFalco, P. Ryan, Janssen Research and Development
Our objective is to demonstrate an approach for identifying patient populations in order to determine the feasibility of a clinical trial protocol. We designed a tool allowing users to define a clinical protocol based on an index rule and inclusion rules, and to execute the protocol criteria against observational databases. Key areas included user interface (UI), criteria data model and query translation. We demonstrated that the system could successfully be applied to an existing clinical protocol from the FDA. The UI allowed complete specification of the index rule and inclusion criteria and the protocol definition was successfully compiled into executable queries against the OMOP Common Data Model. A standardized approach for protocol specification is feasible and can be applied clinical trials and observational studies. This solution could compliment an natural language capability to extract protocol definitions from clinicaltrials.gov and execute the extracted definition against observational data.
Human Readable Expression of Structured Algorithms for Describing and Storing Clinical Study Criteria and for Generating and Visualizing Queries
R. Duryea, M. Danese, Outcomes Insights, Inc.
Clinical research protocols commonly use natural language to specify the criteria used to build studies. This applies not only to prospective studies like clinical trials, but also observational research using electronic health data. Algorithms defined in natural language are often imprecise, open to multiple interpretations, and generally difficult to reproduce accurately. Researchers could benefit from a language that removes the ambiguity of natural language while increasing the reproducibility of their research algorithms. We created a language that unambiguously defines and structures a set of criteria in a flexible, easily parse-able, and human readable, JSON format. This format can be translated into SQL queries that run against a pre-defined data structure (OMOP’s Common Data Model (CDM)). These queries can be visualized to facilitate communication and understanding of the underlying selection process.
Structuring Clinical Trial Eligibility Criteria with Common Data Model
G. Levy-Fix, A. Yaman, C. Weng, Columbia University
This paper presents a method for classifying and structuring free-text clinical trial eligibility criteria using the OMOP Common Data Model (CDM). Our method was applied to eligibility criteria text available from the largest clinical trial repository ClinicalTrials.gov. Structurally complex criteria were simplified and rewritten as simpler sentences connected by logical operators AND or OR. Semantic annotation using the Unified Medical Language System (UMLS) and other support semantics was performed before criteria were clustered into groups, which were further classified into selected CDM domain categories according to the semantic type combination patterns in each cluster. Key criteria attributes were then extracted and annotated using the OMOP CDM.