The ability to interpret gene or protein interaction networks is becoming a valuable skill in biomedical research. It is useful not just to those who produce or analyze high throughput array data, but also to those who produce data from single-read-out biochemical assays such as RT-PCR, western blotting, protein-protein interaction assays. This is because network analysis programs can be used to predict interaction partners, based on the results from multiple single-gene assays.
This book is meant to be a resource for those who seek to understand what a network of genes or proteins means in terms of the biological processes that are at work in their high throughput data. It will certainly be of help to those who are just starting out, but may also be insightful to veterans of this trade.
This book does not cover the myriad methods and nuances of computing high throughput gene or protein expression data, but is primarily concerned with interpreting the network of genes or proteins that are generated from the computation. Computing the data and interpreting the data harbor their own challenges. My favorite part is interpreting the networks, which I hope you will learn to like after reading this book. As you will see, successful interpretation of gene and protein networks relies more on your knowledge of biology than it does your knowledge of bioinformatics.
This book contains two chapters. Preceding the first chapter is a page that lists the tips for interpreting gene or protein networks. This was meant to be a cheat sheet to help you quickly reference the tips as you do your analysis. Chapter 1 provides what I consider to be the minimal background information to help someone who is just starting out with network interpretations to best understand the tips. This book was written for someone with a graduate level of biology knowledge. Chapter 2 contains 18 tips, each with a short explanation that will help you interpret interaction networks. I have divided the tips into categories that will help you understand their general purpose. The appendix contains a non-exhaustive list of free online software programs that I have found useful in my own research.
Tips for Interpreting Gene or Protein Interaction Networks
Table of Contents
The 18 Tips Summary Sheet
Chapter 1 – How to Use This Book
Chapter 2 – The 18 Tips Explained
Appendix – Network Analysis Programs
By David H. Nguyen, Ph.D.
Self-Published/Kindle Direct Publishing (2014)
This Kindle eBook available on Amazon.com for $0.99.
Email me for a free PDF: david.hh.n at berkeley.edu
The Kindle eBook Reader is free for smartphones, computers, and tablets.
Interpreting gene or protein interaction networks is an invaluable skill in biomedical research. Effective interpretation of networks results in the discovery of biological mechanisms that describe the behavior of a research model (cell line, genetically modified organism, tumor, etc.) and guides future experiments. This book offers 18 tips that will help the novice and the veteran researcher interpret high-throughput omics data.
Tip #1. Things may not be what they seem.
The most important piece of advice that I can give about interpreting gene networks is that things may not be what they seem; meaning the biological processes that are enriched in your gene list of interest may not be what is actually happening in your research model. If you are studying gene expression of a tissue that does not have skeletal muscle, but receive “muscle development” as an enriched category, it wasn’t a mistake. Epithelial cells can undergo epithelial-to-mesenchymal transition (EMT) during which they up-regulate genes such as smooth muscle actin. Thus, muscle development may actually be EMT.
Here are other hypothetical examples of hidden meanings.
“EMT” may be a stem cell program. EMT has been linked to a process of dedifferentiation into a more stem-like state. Therefore, a enrichment for EMT may be a sign that dedifferentiation is occurring, or that stem cells or progenitor cells have become abundant.
“Organismal development” may be metastasis. The process of organismal development involves the movement of cells to different compartments and then differentiation into mature cell types. This differentiation process can involve the expression of genes that form cell-cell junctions and cell-matrix connections. These same processes are down-regulated during metastasis.
“Lysosome biogenesis” may be autophagy. Autophagy is the process in which a cell digests its own organelles. Autophagy can be a part of senescence, which is cellular aging and dormancy. Lysosomes are vesicles that engulf and digest other organelles. During autophagy, lysosomal activity increases. Thus, enrichment for “lysosome biogenesis” may actually indicate autophagy and/or senescence.
“Metabolism” may be enrichment for auxotrophic cells. Auxotrophs are cells that are dependent on a nutrient that is produced by another cell type. The ducts in a mammary gland consist of two main cell layers. The luminal cells line the inside of the duct, while the myoepithelial cells line the outside of the duct. Myoepithelial cells cannot produce the amino acid glutamine, and rely on luminal cells to secrete glutamine, which the auxotrophic myoepithelial cells take up. Luminal and myoepithelial cells have distinct metabolic profiles, as do the subtypes of breast cancer that reflect a luminal-like state compared to those that exhibit a myoepithelial state.
Tip #2. Interpret the enriched biological functions in light of other experimental data.
In light of Tip #1, enrichment for a biological category is no guarantee that that category really describes what’s happening in your research model. So how might you be more certain that “organismal development” (described in Tip #1) is actually something like metastasis? Well, if the tumors for which you extracted the RNA for microarray analysis also have a high rate of metastasis, then a metastasis program is more likely to be what’s really going on. Conversely, if “organismal development” appears in your results, and the genes in this category are cell-cell junction genes, you might want to investigate your tumor model to determine if metastasis has happened or if there has been an increase of circulating tumor cells that may not have metastasized to a distant organ. This tip is the second most important piece of advice.
Tip #3. Be familiar with fundamental cellular functions.
It is helpful to be familiar with fundamental biological processes. Changes in cell morphology are accompanied by changes in actin polymerization, cell-cell junctions, and cell matrix interactions.
Changes in metabolism are accompanied by changes in glycolysis and the citric acid cycle. Changes in growth rates are accompanied by activated growth factor signaling pathways. It is helpful to know both the canonical and non-canonical players in your favorite intracellular signaling pathway. Review articles are useful for gaining a general knowledge of the key components of these biological processes.
Tip #4. Know the master regulator genes of different tissue types.
Some would say that transcriptional complexity defined by the number of transcription factors, transcriptional co-regulators, and transcription factor binding elements makes complex organ systems possible. If you are studying a tissue type or organ, it is helpful to understand the master regulator transcription factors that govern normal organ development. This is especially the case if you are studying a tumor or a mutated organism. Germline tumors, such as teratocarcinomas, can differentiate into hair and teeth. Tumors originating from somatic cells do not have as much pluripotent potential, but can exhibit diverse biological activities. Thus, being familiar with the master regulators of organ development will help you understand perturbations a complex research model.