By Christopher Hogue, PhD and Randall C. Willis
In fields ranging from medicine and biotechnology to agriculture and the environment, the genomics revolution of the past decade is slowly giving way to a systems-wide approach to solving biological questions, as scientists are reminded that living organisms are comprised of more than just their genes. Although invaluable, initiatives like the Human Genome Project provide researchers with a parts list of life, but don’t offer much information about how these parts assemble to create cells, tissues and organisms.
Proteome projects have gone a long way toward filling in the gaps, offering scientists information about protein content, numbers and modifications. But in general, even these efforts can only provide a snapshot of what is going on within a cell or organ, and do not always tell researchers how these component parts interact to form complexes and pathways critical to the function of life. More recently, however, academic, government and commercial groups worldwide are addressing this problem, finding ways to pull together biomolecular interaction and pathway data from various sources into central repositories against which researchers can test their hypotheses and probe for new insights.
Several groups, both commercial and academic, have undertaken a systematic analysis of how biologically important molecules interact both in the cell and in the lab. One company that has generated and analysed protein interaction maps (PIMs) for several years is Hybrigenics (Paris, France). Using a combination of wet (yeast two-hybrid) and dry (bioinformatics) techniques, the company generates PIMs to identify potential drug-development targets, which it then validates using other technologies. Hybrigenics uses these technologies not only for its own research collaborations in the areas of viral diseases and cancers, but also provides them as a commercial service to outside groups.
Other examples of commercial interaction repositories include Prolexys Pharmaceuticals Inc.’s (Salt Lake City, UT) (formerly Myriad Proteomics) Human Interactome Database; CuraGen Corp.’s (New Haven, CT) PathCalling database of yeast interactions; Ingenuity Systems’ (Mountain View, CA) Pathway Knowledge Base; and Jubilant Biosys Ltd.’s (Columbia, MD) PathArt©.
The one problem with commercial systems, however, is that while they typically offer comprehensive packages with well-designed user interfaces, they can also be beyond the financial reach of many small labs. And in many cases, similar databases and bioinformatics packages are freely available through academic or government institutions. Globally, dozens of databases cut a wide swath through the biomolecular literature to identify a spectrum of interactions. Examples include the University of California, Los Angeles’s (Los Angeles, CA) Database of Interacting Proteins (DIP), the University of Rome’s (Rome, Italy) Molecular Interaction (MINT) database, and the European Bioinformatics Institute’s (Cambridge, U.K.) IntAct project.
The largest of these databases, however, can be found in Canada at the Blueprint Initiative (Blueprint), a research program of the Samuel Lunenfeld Research Institute (SLRI) in Toronto, Ont.’s Mount Sinai Hospital. Led by Christopher Hogue, PhD, Blueprint’s goal is to provide researchers worldwide with free access to the information and tools they need to improve their understanding of basic biology and human health. To achieve this, they develop, host and maintain public databases and bioinformatics software tools.
The central pillar of Blueprint’s efforts is the Biomolecular Interaction Network Database (BIND), which captures data generated by expensive research efforts in a computationally accessible format. BIND records — which span molecular interactions, small-molecule chemical reactions and genetic interaction networks — allow researchers to identify macromolecular complexes, metabolic pathways and potential clues to drug targets and leads. BIND is populated with interaction data directly deposited by researchers or extracted from peer-reviewed literature and a variety of genomic, proteomic, pathway and disease-specific databases, which Blueprint curates and validates using rigorous bioinformatics standards.
Currently, BIND houses more than 120,000 records of paired interactions and complexes involving biopolymers (e.g., proteins, DNA and RNA) and small molecules (e.g., lipids, nucleotides, sugars and ions). Using any of more than 20 different search functions available through BIND’s Web interface, researchers can identify interacting molecules on the basis of their sequences, gene names, publication record and species origin, to name a few, and examine how these interactions interplay with larger molecular networks using BIND’s Interaction Viewer. Alternatively, new features allow researchers to search relatively broad terms, such as cancer, and pinpoint molecules of particular interest based on characteristics such as subcellular co-localization, biological function and binding partners.
Furthermore, because each record has been hand-curated and annotated using a variety of informatics tools, molecule descriptions are heavily cross-referenced to supplemental genetic or structural data that might prove important for further analysis. Related tools developed by Blueprint — such as the SeqHound data warehouse, which allows researchers to identify homologous proteins or genes in other organisms, and the Small-Molecule Interaction Database (SMID), which identifies small-molecule binding domains — provide researchers with clues as to their potential as drug targets.
Another approach to interaction databases, however, is to focus on specific subsets of biological data, such as those pertaining to disease states or model organisms. Such is the case with another database housed at Toronto’s SLRI: the General Repositories for Interaction Datasets (GRID), which is led by Mike Tyers, PhD. GRID focuses its efforts on three key model organisms: the fruit fly, worm and yeast. Working with several genome database organizations and through its own research efforts, the GRID repository consists of almost 60,000 biomolecular interactions.
Several disease-specific databases are also freely available on the Internet. For example, the Division of Acquired Immunodeficiency Syndrome (DAIDS) of the National Institute of Allergy and Infectious Diseases (NIAID) — in collaboration with Southern Research Institute (Birmingham, AL) and the National Center for Biotechnology Information (NCBI) — developed the HIV-1 Human Protein Interaction Database. This database provides scientists involved in HIV/AIDS research a summary of known direct and indirect interactions between HIV-1 proteins, other proteins found in HIV-1 and the human host cell.
In many cases, however, the information scientists need to develop new diagnostic tools or new drugs does not yet exist and researchers are left to extrapolate new concepts from existing patterns. Fortunately, evolution is conservative by nature and decades of experimentation have shown that biological interactions that occur in one organism often have parallels in others. With this in mind, many database organizations have started to supplement their repositories of experimentally derived data with information gleaned from interactions predicted by sequence similarity across species.
In February, SRI International (Menlo Park, CA) announced that it used its PathoLogic software to analyse the human genome, predicting molecular functions for more than 600 human genes and assigning them roles in 135 predicted metabolic pathways. They deposited the results of these efforts in the HumanCyc database, part of the much larger BioCyc project. In part, researchers compared genomic data from humans with two other, well-characterized model organisms — E. coli and Arabidopsis thaliana — to develop a sense of the biomolecular machinery involved in human cells. The findings allowed the SRI scientists to identify 203 “probable” missing enzymes in the human genome and provide researchers with a model to which they can compare their findings.
As an adjunct to its efforts to facilitate access to small-molecule information, Blueprint developed a tool called SMID-BLAST, which compares an input gene or protein sequence to consensus sequences found within the SMID repository to identify potential small-molecule interactions. Scientists want to know whether a given protein binds to a small molecule, a so-called “druggable target.” SMID-BLAST shows researchers which amino acids bind to specific small molecules and how these binding sites are conserved in protein families.
According to a researcher at a Canadian pharmaceutical company, in the absence of predictive tools like these, researchers are left to mine the scientific literature on the basis of information that they already know, making specific assumptions about potential binding partners. These tools offer researchers the opportunity to identify other potential target families by opening the door to information that is outside the researcher’s knowledge base.
Potential for Profit
Beyond the more esoteric, long-range goal of developing a blueprint for life, large-scale biomolecular interaction data offer companies the potential for immediate economic and social benefits. According to a January 2005 Drugs & Market Development Publications report, the market for methods targeting protein-protein interactions has grown steadily in recent years and is expected to move beyond $50 billion US by 2010. Critical to these efforts will be the bioinformatics tools that highlight and predict biomolecular interactions.
In August 2004, Blueprint’s Singapore node, Blueprint Asia, initiated a collaboration with the Novartis Institute for Tropical Diseases (Singapore, Singapore) (NITD) to assemble and curate known protein interactions relevant to the biology of dengue virus.
“By examining information about dengue virus alongside other data in the BIND repository, NITD scientists will gain a better understanding of the dengue life cycle and of complex interactions with host proteins leading to dengue hemorrhagic fever,” says Brian Yates, managing director of Blueprint Asia. “This information can then be used to develop drugs or vaccines to fight the disease.”
The collaboration is also expected to help NITD researchers identify gaps in their information base, which could lead to the exploration of new research avenues.
Almost regardless of the source, however, these interaction databases and bioinformatics tools offer researchers insights into the function of the cell and thereby offer the hope of turning molecular parts lists provided by genome initiatives into blueprints of life.
Christopher Hogue, PhD is principal investigator of the Blueprint Initiative, a research program of the Samuel Lunenfeld Research Institute of Mount Sinai Hospital. He is also associate professor in the department of biochemistry at the University of Toronto (Toronto, ON).
Randall C. Willis is communications manager at Blueprint and previously served as an editor of Modern Drug Discovery magazine and the Journal of Proteome Research.