Mining the druggable genome for personalized medicine
Citation: DGIdb - mining the druggable genome. Malachi Griffith*, Obi L Griffith*, Adam C Coffman, James V Weible, Josh F McMichael, Nicholas C Spies, James Koval, Indraniel Das, Matthew B Callaway, James M Eldred, Christopher A Miller, Janakiraman Subramanian, Ramaswamy Govindan, Runjun D Kumar, Ron Bose, Li Ding, Jason R Walker, David E Larson, David J Dooling, Scott M Smith, Timothy J Ley, Elaine R Mardis, Richard K Wilson. Nature Methods (2013) doi:10.1038/nmeth.2689. *These authors contributed equally to this work.

In the era of clinical sequencing and personalized medicine, investigators are frequently presented with lists of mutated or otherwise altered genes implicated in disease of a specific patient or cohort. Numerous resources exist to help form hypotheses about how such genomic events might be targeted therapeutically. However, utilizing these resources typically involves tedious manual review of literature, clinical trial records, and knowledge bases. No tools currently exist which collect and curate these resources and provide a simple interface for searching lists of genes against the existing compendia of known or potential drug-gene interactions. The drug-gene interaction database (DGIdb) attempts to address this challenge. Using a combination of expert curation and text-mining, drug-gene interactions have been mined from DrugBank, therapeutic target database (TTD), PharmGKB, a list of targeted agents in lung cancer and ClinicalTrials.gov. Genes have also been categorized as potentially druggable according to membership in selected pathways, molecular functions and gene families from the Gene Ontology, dGene, and “druggable genome” lists from Hopkins and Groom (2002) and Russ and Lampel (2005). Genes are defined according to Entrez and Ensembl and drugs according to PubChem. DGIdb contains over 40,000 genes and 10,000 drugs involved in over 15,000 drug-gene interactions or belonging to one of 39 potentially druggable gene categories. Users can enter a list of genes to retrieve all known or potentially druggable genes in that list. Results can be filtered by source, interaction type, or treatment type. DGIdb is implemented as part of The Genome Institute’s Genome Modeling System and forms an integral part of the Clin-Seq pipeline for analyzing genomes in a clinical context. It is built on Ruby on Rails and PostgreSQL with a flexible relational database schema to accommodate metadata from various sources.

The druggable genome can be defined as the genes or gene products that are known or predicted to interact with drugs, ideally with a therapeutic benefit to the patient. Such genes are of particular interest to large-scale cancer profiling efforts such as TCGA, ICGC and others that identify lists of potential cancer driver genes from high-throughput sequence and other genome-wide data. In cancer therapy, the increasing number of targeted drugs--those designed to inactivate proteins carrying activating amino acid changes as determined by mutational analyses--make more compelling the need for a searchable database of drug-gene interactions. A similar paradigm exists in the research of other human diseases. Thus, a commonly asked question in such projects is whether potential driver genes are targeted by any known drugs or belong to any putatively druggable gene categories. Along these lines, recent high profile cancer marker papers have presented “druggable gene” analyses. These analyses attempt to prioritize genes for further study, functional experiments, and ultimately to help guide the design of clinical trials. Unfortunately, there remains a large knowledge gap between clinical domain experts and genomic researchers. The former are intimately familiar with the disease-specific pathways and targeted therapies being used in the field. However, the latter possess the technical expertise to detect the known and potentially novel driver events hidden in the molecular data of disease samples under study. There is a critical need for tools that bridge this gap to help both basic and clinical researchers to prioritize and interpret the results of genome-wide studies in the context of gene function, clinical phenotypes, treatment decisions and patient outcomes.

Existing resources for querying the druggable genome are problematic. Data are often not made publicly accessible. Searching across multiple sources is difficult due to the plethora of gene and drug identifier systems. Some interfaces permit single gene at a time searches but have no mechanism for searching a list of genes. Others are only available for manual review and have no search interface at all. Web interfaces generally are neither user-friendly nor available in convenient formats for systematic analysis. Some data sources are available only as PDF documents or are difficult to obtain, such as the widely used but now unsupported ‘Hopkins and Groom’ and ‘Russ and Lampel’ druggable genome lists. Even when made accessible, filtering options are needed so that searches can be made with different levels of stringency. This is necessary because of the inherent trade-off between comprehensiveness and quality in such efforts. Some databases have large numbers of lower quality interactions while others have focused on very careful curation of a smaller number. The optimal resource to use depends on the goals of the researcher. Clinical researchers may wish to restrict themselves only to carefully curated interactions involving known and approved agents. Basic researchers on the other hand may be willing to evaluate experimental therapies or interactions with lower levels of support. To address these challenges we have developed the Drug Gene Interaction Database (DGIdb). Our goal was to create a user-friendly search tool and comprehensive database of genes that have the potential to be druggable, with a particular focus on cancer. We hope to capture and prioritize genes that are known to be targeted by existing drugs, especially targeted drugs rather than broad chemotherapeutics. Our motivation was to make accessible much of the information already available through databases and manuscript supplementary materials. By cross-mapping identifiers and creating a simple interface to these disparate sources we provide a single destination for druggable genome information against which gene lists can be searched and prioritized for functional characterization.

DGIdb attempts to organize the druggable genome under two main classes. The first class includes genes with known drug interactions. Such drug-gene interactions are useful for the case where a researcher has a list of candidate genes predicted to be activated in disease, and wishes to identify drugs that might inhibit or otherwise modulate those genes. The second class includes genes that are 'potentially' druggable according to their membership in gene categories associated with druggability (e.g., kinases). Membership in these categories is useful for prioritizing a list of genes according to their potential for drug development. The former are established interactions between genes and drugs, based largely on literature mining and obtained from existing publicly available reviews and databases. These can come from either gene- or drug-centric database models and are not limited by functional category or drug modality. The latter represent genes that have properties making them suitable for drug targeting but may not currently have a drug targeting them. There are various ways to define this class of potentially 'druggable' genes. We drew from several existing efforts and local domain knowledge to define categories that are most relevant to druggability. These categories tend to be biased towards genes that are amenable to targeting by small molecules such as kinases, ion channels, etc. For both classes of druggable genes, sources were manually curated and semi-automatically imported. Sources were further prioritized according to trust levels as either “expert-curated” or “non-curated” and ranked within these classes according to our own experience and feedback from collaborators. The database can be accessed programmatically or through a web-based interface at dgidb.org. Search results can be filtered and ranked in multiple ways and are easily exported for further analysis or visualization. We believe DGIdb represents a powerful resource for hypothesis generation. DGIdb may in also facilitate prioritization of gene-level events for review by clinical experts and ultimately aid in treatment decision-making.

Identifying clinically relevant genes using DGIdb has a number of limitations that should be acknowledged. DGIdb provides links between genes and their known or potential drug associations. It does not currently provide any information regarding the druggability of specific mutations, nor does it guarantee that any given drug-gene association represents an appropriate therapeutic intervention. DGIdb’s concept of a drug-gene interaction or membership in a potentially druggable category is inclusive and largely driven by the underlying data sources and publications. It includes 39 potentially druggable categories and least 35 interaction types as defined by source datasets. These include inhibitors, activators, cofactors, ligands, vaccines, and in many cases, interactions of unknown type. Wherever possible we provide filtering by source, trust level, interaction type, and drug type to allow the user to quickly disregard possibly spurious candidates.

This page provides tutorials and other resources on how to use DGIdb.