Back
 IJIS  Vol.2 No.3 , July 2012
Combining Generative/Discriminative Learning for Automatic Image Annotation and Retrieval
Abstract: In order to bridge the semantic gap exists in image retrieval, this paper propose an approach combining generative and discriminative learning to accomplish the task of automatic image annotation and retrieval. We firstly present continuous probabilistic latent semantic analysis (PLSA) to model continuous quantity. Furthermore, we propose a hybrid framework which employs continuous PLSA to model visual features of images in generative learning stage and uses ensembles of classifier chains to classify the multi-label data in discriminative learning stage. Since the framework combines the advantages of generative and discriminative learning, it can predict semantic annotation precisely for unseen images. Finally, we conduct a series of experiments on a standard Corel dataset. The experiment results show that our approach outperforms many state-of-the-art approaches.
Cite this paper: Z. Li, Z. Tang, W. Zhao and Z. Li, "Combining Generative/Discriminative Learning for Automatic Image Annotation and Retrieval," International Journal of Intelligence Science, Vol. 2 No. 3, 2012, pp. 55-62. doi: 10.4236/ijis.2012.23008.
References

[1]   A. W. M. Smeulders, M. Worring, S. Santini, et al., “Content-Based Image Retrieval at the End of the Early Years,” IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol. 22, No. 3, 2000, pp. 1349-1380. doi:10.1109/34.895972

[2]   R. Datta, D. Joshi, J. Li, et al., “Image Retrieval: Ideas, Influences, and Trends of the New Age,” ACM Computing Surveys, Vol. 42, No. 2, 2008, pp. 1-60.

[3]   Z. X. Li, Z. P. Shi, Z. Q. Li, et al., “A Survey of Semantic Mapping in Image Retrieval,” Journal of Computer-Aided Design and Computer Graphics, Vol. 20, No. 8, 2008, pp. 1085-1096. (in Chinese)

[4]   J. Li, J. Z. Wang, “Automatic Linguistic Indexing of Pictures by a Statistical Modeling Approach,” IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol. 25, No. 9, 2003, pp. 1075-1088. doi:10.1109/TPAMI.2003.1227984

[5]   E. Chang, K. Goh, G. Sychay, et al., “CBSA: Content-Based Soft Annotation for Multimodal Image Retrieval Using Bayes Point Machines,” IEEE Transactions on Circuits and Systems for Video Technology, Vol. 13, No.1, 2003, pp. 26-38. doi:10.1109/TCSVT.2002.808079

[6]   C. Cusano, G. Ciocca and R. Schettini, “Image Annotation Using SVM,” Proceedings of SPIE Conference on Internet Imaging V, San Jose, Vol. 5304, 2004, pp. 330- 338.

[7]   G. Carneiro, A. B. Chan, P. J. Moreno, et al., “Supervised Learning of Semantic Classes for Image Annotation and Retrieval,” IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol. 29, No. 3, 2007, pp. 394-410. doi:10.1109/TPAMI.2007.61

[8]   C. Wang, S. Yan, L. Zhang, et al., “Multi-Label Sparse Coding for Automatic Image Annotation,” Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition, Hefei, 20-25 June 2009, pp. 1643-1650.

[9]   J. Jeon, V. Lavrenko and R. Manmatha, “Automatic Image Annotation and Retrieval Using Cross-Media Relevance Models,” Proceedings of the 26th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, New York, 2003, pp. 119-126.

[10]   V. Lavrenko, R. Manmatha and J. Jeon, “A Model for Learning the Semantics of Pictures,” Advances in Neural Information Processing Systems 16, Vol. 16, 2004, pp. 553-560.

[11]   S. L. Feng, R. Manmatha and V. Lavrenko, “Multiple Bernoulli Relevance Models for Image and Video Annotation,” Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition, Washington, DC, 2004, pp. 1002-1009.

[12]   P. Duygulu, K. Barnard, J. F. G. de Freitas, et al., “Object Recognition as Machine Translation: Learning a Lexicon for a Fixed Image Vocabulary,” Lecture Notes in Computer Science, Vol. 2353, 2002, pp. 97-112. doi:10.1007/3-540-47979-1_7

[13]   K. Barnard, P. Duygulu, D. Forsyth, et al., “Matching Words and Pictures,” Journal of Machine Learning Research, Vol. 3, 2003, pp. 1107-1135.

[14]   D. M. Blei and M. I. Jordan, “Modeling Annotated Data,” Proceedings of the 26th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, Toronto, 28 July-1 August 2003, pp. 127-134.

[15]   F. Monay and D. Gatica-Perez, “Modeling Semantic Aspects for Cross-Media Image Indexing,” IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol. 29, No. 10, 2007, pp. 1802-1817. doi:10.1109/TPAMI.2007.1097

[16]   R. Zhang, Z. Zhang, M. Li, et al., “A Probabilistic Semantic Model for Image Annotation and Multi-Model Image Retrieval,” Proceedings of the 10th IEEE International Conference on Computer Vision, 15-21 October 2005, pp. 846- 851.

[17]   A. Bosch, A. Zisserman and X. Munoz, “Scene Classification Using a Hybrid Generative/Discriminative Approach,” IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol. 30, No. 4, 2008, pp. 712-727. doi:10.1109/TPAMI.2007.70716

[18]   T. Hofmann, “Unsupervised Learning by Probabilistic Latent Semantic Analysis,” Machine Learning, Vol. 42, No. 1-2, 2001, pp. 177-196. doi:10.1023/A:1007617005950

[19]   D. M. Blei, A. Y. Ng and M. I. Jordan, “Latent Dirichlet Allocation,” Journal of Machine Learning Research, Vol. 3, 2003, pp. 993-1022.

[20]   J. Read, B. Pfahringer, G. Holmes, et al., “Classifier Chains for Multi-Label Classification,” Lecture Notes in Artificial Intelligence, Vol. 5782, 2009, pp. 254-269.

[21]   J. Liu, M. Li, Q. Liu, et al., “Image Annotation via Graph Learning,” Pattern Recognition, Vol. 42, No. 2, 2009, pp. 218-228. doi:10.1016/j.patcog.2008.04.012

[22]   Z. X. Li, Z. P. Shi, X. Liu, Z. Q. Li and Z. Z. Shi, “Fusing Semantic Aspects for Image Annotation and Retrieval,” Journal of Visual Communication and Image Representation, Vol. 21, No. 8, 2010, pp. 798-805. doi:10.1016/j.jvcir.2010.06.004

[23]   Z. X. Li, Z. P. Shi, X. Liu and Z. Z. Shi, “Automatic Image Annotation with Continuous PLSA,” Proceedings of the 35th IEEE International Conference on Acoustics, Speech and Signal Processing, Dallas, 14-19 March 2010, pp. 806-809.

[24]   Z. X. Li, Z. P. Shi, X. Liu and Z. Z. Shi, “Modeling Continuous Visual Features for Semantic Image Annotation and Retrieval,” Pattern Recognition Letters, Vol. 32, No. 3, 2011, pp. 516-523. doi:10.1016/j.patrec.2010.11.015

[25]   C. M. Bishop, “Pattern Recognition and Machine Learning,” Springer, New York, 2006.

[26]   A. P. Dempster, N. M. Laird and D. B. Rubin, “Maximum Likelihood from Incomplete Data via the EM Algorithm,” Journal of the Royal Statistical Society, Vol. 39, No. 1, 1977, pp. 1-38.

 
 
Top