Role of Bioinformatics in Discovery of Protein Biomarkers
Developments in proteomic technology offer tremendous potential to yield novel biomarkers that are translatable to routine clinical use but major hurdles remain for translation into clinical application. There is a need for rigorous experimental design and methods to validate some of the unproven methods used currently. There is an ongoing debate on where the burden of proof lies: statistically, biologically or clinically. There is no consensus about what constitutes a meaningful benchmark. It has been pointed out that statistical and machine learning methods are not a crutch for poor experimental design nor can they elucidate fundamental insight from poorly designed experiments. It is now clear that SELDI-TOF MS instrumentation used in the earlier proteomic pattern studies had insufficient resolution to enable the unambiguous identification of the putative biomarker molecules, which is needed if they are to be validated for forming the basis of a simplified, more widely adopted diagnostic. There is a need for calibration style benchmarking where the linearity of instrument responsiveness is established, to the ultimate benchmark - real clinical usage - as well as for many challenges in between, such as data normalization, peak detection, identification and quantification and, at some point, classification.
For non-hierarchically organized data in proteome databases, it is difficult to view relationships among biological facts. Scientists at Eli Lilly & co have demonstrated a platform where such data can be visualized through the application of a customized hierarchy incorporating medical subject headings (MeSH) classifications. This platform gives users flexibility in updating and manipulation. It can also facilitate fresh scientific insight by highlighting biological impacts across different hierarchical branches. They have integrated biomarker information from the curated Proteome database using MeSH and the StarTree visualization tool.
A novel framework has been presented for the identification of disease-specific protein biomarkers through the integration of biofluid proteomes and inter-disease genomic relationships using a network paradigm (Dudley and Butte 2009). This led to the creation of a blood plasma biomarker network by linking expression-based genomic profiles from 136 diseases to 1028 detectable blood plasma proteins. The authors also created a urine biomarker network by linking genomic profiles from 127 diseases to 577 proteins detectable in urine. Through analysis of these molecular biomarker networks, they found that the majority (>80%) of putative protein biomarkers are linked to multiple disease conditions and prospective disease- specific protein biomarkers are found in only a small subset of the biofluid pro- teomes. These findings illustrate the importance of considering shared molecular pathology across diseases when evaluating biomarker specificity. The proposed framework is amenable to integration with complimentary network models of biology, which could further constrain the biomarker candidate space, and establish a role for the understanding of multiscale, interdisease genomic relationships in biomarker discovery.