JBiSE  Vol.1 No.3 , November 2008
Using schema transforation pathways for biological data integration
Abstract: In web environments, proteomics data integra-tionin the life sciences needs to handle the problem of data conflicts arising from the het-erogeneity of data resources and from incom-patibilities between the inputs and outputs of services used in the analysis of the resources. The integration of complex, fast changing bio-logical data repositories can be potentially sup-ported by Grid computing to enable distributed data analysis. This paper presents an approach addressing the data conflict problems of pro-teomics data integration. We describe a pro-posed proteomics data integration architecture, in which a heterogeneous data integration sys-tem interoperates with Web Services and query processing tools for the virtual and materialised integration of a number of proteomics resources, either locally or remotely. Finally, we discuss how the architecture can be further used for supporting data maintenance and analysis ac-tivities.
Cite this paper: nullFan, H. and Wang, F. (2008) Using schema transforation pathways for biological data integration. Journal of Biomedical Science and Engineering, 1, 204-209. doi: 10.4236/jbise.2008.13035.

[1]   P. Buneman et al. (1994) Comprehension syntax. SIGMOD Re-cord, 23(1):87–96.

[2]   A. Bairoch and R. Apweiler. (2000) The SWISS-PROT protein sequence database and its supplement TrEMBL in 2000. Nucleic Acids Res, 28:45–48.

[3]   H. M. Berman, J. Westbrook, and et al. (2000) The Protein Data Bank. Nucleic Acids Res, 28:235–242.

[4]   R. Craig, J. P. Cortens, and R. C. Beavis. (2004) Open source system for analyzing, validating, and storing protein identification data. Journal of Proteome Research, 3(6).

[5]   H. Fan and A. Poulovassilis. (2005) Using schema transformation pathways for data lineage tracing. In Proc. BNCOD’05, LNCS 3567, pages 133–144.

[6]   Hao Fan. (2005) Investigating a Heterogeneous Data Integration Approach for Data Warehousing. PhD thesis, Birkbeck College, University of London.

[7]   H. Fan and L. Li. (2007) Study on Metadata Applications for Pro-teomics Data Integration. In Proc. ICBBE’07, IEEE.

[8]   K. Garwood et al. (2004) Pedro: A database for storing, searching and disseminating experimental proteomics data. BMC Genomics, 5.

[9]   E. Jasper, A. (2003) Poulovassilis, and L. Zamboulis. Processing IQL queries and migrating data in the AutoMed toolkit. Technical Report 20, Automed Project.

[10]   P. McBrien and A. Poulovassilis. (2003) Data integration by bi-directional schema transformation rules. In Proc. ICDE’03, pages 227–238.

[11]   T. McLaughlin, J. A. Siepen, J. Selley, J. A. Lynch, K. W. Lau, H. Yin, S. J. Gaskell, and S. J. Hubbard. (2006) Pepseeker: a database of proteome peptide identifications for investigating fragmentation patterns. Nucleic Acids Research, 34.

[12]   D.N. Perkins, D.J. Pappin, D.M. Creasy, and J.S. Cottrell. (1999) Probabilitybased protein identification by searching sequence da-tabases using mass spectrometry data. Electrophoresis, 20(18).

[13]   L. Zamboulis, H. Fan et al, (2006) Data Access and Integration in the ISPIDER Proteomics Grid. In proc. DILS, pages 3–18.