Mining the druggable genome for personalized medicine
Citation: DGIdb - mining the druggable genome. Malachi Griffith*, Obi L Griffith*, Adam C Coffman, James V Weible, Josh F McMichael, Nicholas C Spies, James Koval, Indraniel Das, Matthew B Callaway, James M Eldred, Christopher A Miller, Janakiraman Subramanian, Ramaswamy Govindan, Runjun D Kumar, Ron Bose, Li Ding, Jason R Walker, David E Larson, David J Dooling, Scott M Smith, Timothy J Ley, Elaine R Mardis, Richard K Wilson. Nature Methods (2013) doi:10.1038/nmeth.2689. *These authors contributed equally to this work.

In the era of clinical sequencing and personalized medicine, investigators are frequently presented with lists of mutated or otherwise altered genes implicated in disease of a specific patient or cohort. Numerous resources exist to help form hypotheses about how such genomic events might be targeted therapeutically. However, utilizing these resources typically involves tedious manual review of literature, clinical trial records, and knowledge bases. No tools currently exist which collect and curate these resources and provide a simple interface for searching lists of genes against the existing compendia of known or potential drug-gene interactions. The drug-gene interaction database (DGIdb) attempts to address this challenge. Using a combination of expert curation and text-mining, drug-gene interactions have been mined from DrugBank, therapeutic target database (TTD), PharmGKB, a list of targeted agents in lung cancer and Genes have also been categorized as potentially druggable according to membership in selected pathways, molecular functions and gene families from the Gene Ontology, dGene, and “druggable genome” lists from Hopkins and Groom (2002) and Russ and Lampel (2005). Genes are defined according to Entrez and Ensembl and drugs according to PubChem. DGIdb contains over 40,000 genes and 10,000 drugs involved in over 15,000 drug-gene interactions or belonging to one of 39 potentially druggable gene categories. Users can enter a list of genes to retrieve all known or potentially druggable genes in that list. Results can be filtered by source, interaction type, or treatment type. DGIdb is implemented as part of The Genome Institute’s Genome Modeling System and forms an integral part of the Clin-Seq pipeline for analyzing genomes in a clinical context. It is built on Ruby on Rails and PostgreSQL with a flexible relational database schema to accommodate metadata from various sources.

The druggable genome can be defined as the genes or gene products that are known or predicted to interact with drugs, ideally with a therapeutic benefit to the patient. Such genes are of particular interest to large-scale cancer profiling efforts such as TCGA, ICGC and others that identify lists of potential cancer driver genes from high-throughput sequence and other genome-wide data. In cancer therapy, the increasing number of targeted drugs--those designed to inactivate proteins carrying activating amino acid changes as determined by mutational analyses--make more compelling the need for a searchable database of drug-gene interactions. A similar paradigm exists in the research of other human diseases. Thus, a commonly asked question in such projects is whether potential driver genes are targeted by any known drugs or belong to any putatively druggable gene categories. Along these lines, recent high profile cancer marker papers have presented “druggable gene” analyses. These analyses attempt to prioritize genes for further study, functional experiments, and ultimately to help guide the design of clinical trials. Unfortunately, there remains a large knowledge gap between clinical domain experts and genomic researchers. The former are intimately familiar with the disease-specific pathways and targeted therapies being used in the field. However, the latter possess the technical expertise to detect the known and potentially novel driver events hidden in the molecular data of disease samples under study. There is a critical need for tools that bridge this gap to help both basic and clinical researchers to prioritize and interpret the results of genome-wide studies in the context of gene function, clinical phenotypes, treatment decisions and patient outcomes.

Existing resources for querying the druggable genome are problematic. Data are often not made publicly accessible. Searching across multiple sources is difficult due to the plethora of gene and drug identifier systems. Some interfaces permit single gene at a time searches but have no mechanism for searching a list of genes. Others are only available for manual review and have no search interface at all. Web interfaces generally are neither user-friendly nor available in convenient formats for systematic analysis. Some data sources are available only as PDF documents or are difficult to obtain, such as the widely used but now unsupported ‘Hopkins and Groom’ and ‘Russ and Lampel’ druggable genome lists. Even when made accessible, filtering options are needed so that searches can be made with different levels of stringency. This is necessary because of the inherent trade-off between comprehensiveness and quality in such efforts. Some databases have large numbers of lower quality interactions while others have focused on very careful curation of a smaller number. The optimal resource to use depends on the goals of the researcher. Clinical researchers may wish to restrict themselves only to carefully curated interactions involving known and approved agents. Basic researchers on the other hand may be willing to evaluate experimental therapies or interactions with lower levels of support. To address these challenges we have developed the Drug Gene Interaction Database (DGIdb). Our goal was to create a user-friendly search tool and comprehensive database of genes that have the potential to be druggable, with a particular focus on cancer. We hope to capture and prioritize genes that are known to be targeted by existing drugs, especially targeted drugs rather than broad chemotherapeutics. Our motivation was to make accessible much of the information already available through databases and manuscript supplementary materials. By cross-mapping identifiers and creating a simple interface to these disparate sources we provide a single destination for druggable genome information against which gene lists can be searched and prioritized for functional characterization.

DGIdb attempts to organize the druggable genome under two main classes. The first class includes genes with known drug interactions. Such drug-gene interactions are useful for the case where a researcher has a list of candidate genes predicted to be activated in disease, and wishes to identify drugs that might inhibit or otherwise modulate those genes. The second class includes genes that are 'potentially' druggable according to their membership in gene categories associated with druggability (e.g., kinases). Membership in these categories is useful for prioritizing a list of genes according to their potential for drug development. The former are established interactions between genes and drugs, based largely on literature mining and obtained from existing publicly available reviews and databases. These can come from either gene- or drug-centric database models and are not limited by functional category or drug modality. The latter represent genes that have properties making them suitable for drug targeting but may not currently have a drug targeting them. There are various ways to define this class of potentially 'druggable' genes. We drew from several existing efforts and local domain knowledge to define categories that are most relevant to druggability. These categories tend to be biased towards genes that are amenable to targeting by small molecules such as kinases, ion channels, etc. For both classes of druggable genes, sources were manually curated and semi-automatically imported. Sources were further prioritized according to trust levels as either “expert-curated” or “non-curated” and ranked within these classes according to our own experience and feedback from collaborators. The database can be accessed programmatically or through a web-based interface at Search results can be filtered and ranked in multiple ways and are easily exported for further analysis or visualization. We believe DGIdb represents a powerful resource for hypothesis generation. DGIdb may in also facilitate prioritization of gene-level events for review by clinical experts and ultimately aid in treatment decision-making.

Identifying clinically relevant genes using DGIdb has a number of limitations that should be acknowledged. DGIdb provides links between genes and their known or potential drug associations. It does not currently provide any information regarding the druggability of specific mutations, nor does it guarantee that any given drug-gene association represents an appropriate therapeutic intervention. DGIdb’s concept of a drug-gene interaction or membership in a potentially druggable category is inclusive and largely driven by the underlying data sources and publications. It includes 39 potentially druggable categories and least 35 interaction types as defined by source datasets. These include inhibitors, activators, cofactors, ligands, vaccines, and in many cases, interactions of unknown type. Wherever possible we provide filtering by source, trust level, interaction type, and drug type to allow the user to quickly disregard possibly spurious candidates.

This page provides tutorials and other resources on how to use DGIdb.
Definitions of Drug-Gene Interaction Types

This table has been assembled based upon our own understanding of the definitions of these terms, and we have provided citations to support these definitions. Many of the resources we curate do not provide their own definitions of these terms, and so we encourage users of DGIdb to use these definitions as a starting point, and review interactions of interest from their primary sources. If you have any questions or comments regarding these definitions or the use of DGIdb, please contact us!

Interaction Type Sources Using This Type Description Reference
activator ChEMBL, DrugBank, MyCancerGenomeClinicalTrial, TTD An activator interaction is when a drug activates a biological response from a target, although the mechanism by which it does so may not be understood. DrugBank examples: PMID12070353
adduct DrugBank An adduct interaction is when a drug-protein adduct forms by the covalent binding of electrophilic drugs or their reactive metabolite(s) to a target protein. PMID16199025
agonist ChEMBL, DrugBank, TALC, TTD An agonist interaction occurs when a drug binds to a target receptor and activates the receptor to produce a biological response. Wikipedia - Agonist
allosteric modulator DrugBank An allosteric modulator interaction occurs when drugs exert their effects on their protein targets via a different binding site than the natural (orthosteric) ligand site. PMID24699297
antagonist ChEMBL, DrugBank, TALC, TTD An antagonist interaction occurs when a drug blocks or dampens agonist-mediated responses rather than provoking a biological response itself upon binding to a target receptor. Wikipedia - Receptor Antagonist
antibody CancerCommons, DrugBank, MyCancerGenome, TALC, TTD An antibody interaction occurs when an antibody drug specifically binds the target molecule. Wikipedia - Antibody
antisense oligonucleotide TALC An antisense oligonucleotide interaction occurs when a complementary RNA drug binds to an mRNA target to inhibit translation by physically obstructing the mRNA translation machinery. PMID10228554
binder DrugBank, TTD A binder interaction has drugs physically binding to their target. DrugBank examples: PMID12388666 PMID7584665 PMID14507470
blocker ChEMBL, DrugBank, TTD Antagonist interactions are sometimes referred to as blocker interactions; examples include alpha blockers, beta blockers, and calcium channel blockers. Wikipedia - Receptor Antagonist
chaperone DrugBank Pharmacological chaperone interactions occur when substrates or modulators directly bind to a partially folded biosynthetic intermediate to stabilise the protein and allow it to complete the folding process to yield a functional protein. PMID17597553
cleavage DrugBank Cleavage interactions take place when the drug promotes degeneration of the target protein through cleaving of the peptide bonds. DrugBank example: PMID10666203
cofactor DrugBank, TTD A cofactor is a drug that is required for a target protein's biological activity. Wikipedia - Cofactor
competitive DrugBank Competitive antagonists (also known as surmountable antagonists) are drugs that reversibly bind to receptors at the same binding site (active site) on the target as the endogenous ligand or agonist, but without activating the receptor. Wikipedia - Receptor Antagonist
immunotherapy MyCancerGenome In immunotherapy interactions, the result of the drug acting on the target is an induction, enhancement, or suppression of an immune response. Wikipedia - Immunotherapy
inducer DrugBank, TTD In inducer interactions, the drug increases the activity of its target enzyme. Wikipedia - Enzyme Inducer
inhibitor CancerCommons, ChEMBL, DrugBank, MyCancerGenome, MyCancerGenomeClinicalTrial, TALC, TTD In inhibitor interactions, the drug binds to a target and decreases its expression or activity. Most interactions of this class are enzyme inhibitors, which bind an enzyme to reduce enzyme activity. Wikipedia - Enzyme Inhibitor
inhibitory allosteric modulator CancerCommons, DrugBank In inhibitory allosteric modulator interactions, also called negative allosteric modulator interactions, the drug will inhibit activity of its target enzyme. PMID24699297
inverse agonist DrugBank An inverse agonist interaction occurs when a drug binds to the same target as an agonist, but induces a pharmacological response opposite to that of the agonist. Wikipedia - Inverse Agonist
ligand DrugBank In ligand interactions, a drug forms a complex with its target protein to serve a biological function. Wikipedia - Ligand
modulator ChEMBL, DrugBank, TTD In modulator interactions, the drug regulates or changes the activity of its target. In contrast to allosteric modulators, this interaction type may not involve any direct binding to the target. Modulators. Segen's Medical Dictionary. (2011). Retrieved online October 9 2015.
multitarget DrugBank In multitarget interactions, drugs achieve a physiological effect through simultaneous interaction with multiple gene targets. PMID22768266
n/a CIViC, ChEMBL, ClearityFoundationBiomarkers, ClearityFoundationClinicalTrial, DoCM, DrugBank, GuideToPharmacologyInteractions, PharmGKB, TALC, TEND, TTD, TdgClinicalTrial DGIdb assigns this label to any drug-gene interaction for which the interaction type is not specified by the reporting source. N/A
negative modulator DrugBank In a negative modulator interaction, the drug negatively regulates the amount or activity of its target. In contrast to an inhibitory allosteric modulator, this interaction type may not involve any direct binding to the target. Wikipedia - Allosteric modulator
other/unknown DrugBank, MyCancerGenome This is a label given by the reporting source to an interaction that doesn't belong to other interaction types, as defined by the reporting source. N/A
partial agonist ChEMBL, DrugBank In a partial agonist interaction, a drug will elicit a reduced amplitude functional response at its target receptor, as compared to the response elicited by a full agonist. Wikipedia - Receptor Antagonist
partial antagonist DrugBank In a partial antagonist interaction, a drug will only partially reduce the amplitude of a functional response at its target receptor, as compared to the reduction of response by a full antagonist. PMID6188923
positive allosteric modulator ChEMBL, DrugBank In a positive allosteric modulator interaction, the drug increases activity of the target enzyme. PMID24699297
potentiator DrugBank In a potentiator interaction, the drug enhances the sensitivity of the target to the target's ligands. Wikipedia - Potentiator
product of DrugBank These "interactions" occur when the target gene produces the endogenous drug. N/A
stimulator DrugBank, TTD In a stimulator interaction, the drug directly or indirectly affects its target, stimulating a physiological response. DrugBank Examples: PMID23318685 PMID17148649 PMID15955613
suppressor DrugBank In a suppressor interaction, the drug directly or indirectly affects its target, suppressing a physiological process. DrugBank Examples: PMID8386571 PMID14967460
vaccine TALC In vaccine interactions, the drugs stimulate or restore an immune response to their target. NCI - Cancer Vaccines