ABSTRACT In web environments, proteomics data integra-tionin the life sciences needs to handle the problem of data conflicts arising from the het-erogeneity of data resources and from incom-patibilities between the inputs and outputs of services used in the analysis of the resources. The integration of complex, fast changing bio-logical data repositories can be potentially sup-ported by Grid computing to enable distributed data analysis. This paper presents an approach addressing the data conflict problems of pro-teomics data integration. We describe a pro-posed proteomics data integration architecture, in which a heterogeneous data integration sys-tem interoperates with Web Services and query processing tools for the virtual and materialised integration of a number of proteomics resources, either locally or remotely. Finally, we discuss how the architecture can be further used for supporting data maintenance and analysis ac-tivities.
Cite this paper
nullFan, H. and Wang, F. (2008) Using schema transforation pathways for biological data integration. Journal of Biomedical Science and Engineering, 1, 204-209. doi: 10.4236/jbise.2008.13035.
 P. Buneman et al. (1994) Comprehension syntax. SIGMOD Re-cord, 23(1):87–96.
 A. Bairoch and R. Apweiler. (2000) The SWISS-PROT protein sequence database and its supplement TrEMBL in 2000. Nucleic Acids Res, 28:45–48.
 H. M. Berman, J. Westbrook, and et al. (2000) The Protein Data Bank. Nucleic Acids Res, 28:235–242.
 R. Craig, J. P. Cortens, and R. C. Beavis. (2004) Open source system for analyzing, validating, and storing protein identification data. Journal of Proteome Research, 3(6).
 H. Fan and A. Poulovassilis. (2005) Using schema transformation pathways for data lineage tracing. In Proc. BNCOD’05, LNCS 3567, pages 133–144.
 Hao Fan. (2005) Investigating a Heterogeneous Data Integration Approach for Data Warehousing. PhD thesis, Birkbeck College, University of London.
 H. Fan and L. Li. (2007) Study on Metadata Applications for Pro-teomics Data Integration. In Proc. ICBBE’07, IEEE.
 K. Garwood et al. (2004) Pedro: A database for storing, searching and disseminating experimental proteomics data. BMC Genomics, 5.
 E. Jasper, A. (2003) Poulovassilis, and L. Zamboulis. Processing IQL queries and migrating data in the AutoMed toolkit. Technical Report 20, Automed Project.
 P. McBrien and A. Poulovassilis. (2003) Data integration by bi-directional schema transformation rules. In Proc. ICDE’03, pages 227–238.
 T. McLaughlin, J. A. Siepen, J. Selley, J. A. Lynch, K. W. Lau, H. Yin, S. J. Gaskell, and S. J. Hubbard. (2006) Pepseeker: a database of proteome peptide identifications for investigating fragmentation patterns. Nucleic Acids Research, 34.
 D.N. Perkins, D.J. Pappin, D.M. Creasy, and J.S. Cottrell. (1999) Probabilitybased protein identification by searching sequence da-tabases using mass spectrometry data. Electrophoresis, 20(18).
 L. Zamboulis, H. Fan et al, (2006) Data Access and Integration in the ISPIDER Proteomics Grid. In proc. DILS, pages 3–18.