[1] Robbins, H. (1952) Some Aspects of the Sequential Design of Experiments. Bulletin of the American Mathematical Society, 58, 527-535. http://dx.doi.org/10.1090/S0002-9904-1952-09620-8
[2] Freund, Y. and Schapire, R.E. (1997) A Decision-Theoretic Generalization of On-Line Learning and an Application to Boosting. Journal of Computer and System Sciences, 55, 119-139.
http://dx.doi.org/10.1006/jcss.1997.1504
[3] Auer, P., Cesa-Bianchi, N., Freund, Y. and Schapire, R.E. (2002) The Non-Stochastic Multi-Armed Bandit Problem. SIAM Journal on Computing, 32, 48-77.
http://dx.doi.org/10.1137/S0097539701398375
[4] Allenberg-Neeman, C. and Neeman, B. (2004) Full Information Game with Gains and Losses. Algorithmic Learning Theory: 15th International Conference, 3244, 264-278.
[5] Dani, V., Hayes, T.P. and Kakade, S.M. (2008) The Price of Bandit Information for Online Optimization. In: Platt, J.C., Koller, D., Singer, Y. and Roweis, S., Eds., Advances in Neural Information Processing Systems, MIT Press, Cambridge, 345-352.
[6] Bartlett, P., Dani, V., Hayes, T., Kakade, S., Rakhlin, A. and Tewari, A. (2008) High-Probability Regret Bounds for Bandit Online Linear Optimization. Proceedings of 22nd Annual Conference on Learning Theory (COLT), Helsinki.
[7] Cesa-Bianchi, N. and Lugosi, G. (2012) Combinatorial Bandits. Journal of Computer and System Sciences, 78, 1404-1422. http://dx.doi.org/10.1016/j.jcss.2012.01.001
[8] Uchiya, T., Nakamura, A. and Kudo, M. (2010) Algorithms for Adversarial Bandit Problems with Multiple Plays. In: Hutter, M., Stephan, F., Vovk, V. and Zeugmann, T., Eds., Algorithmic Learning Theory, Lecture Notes in Artificial Intelligence No. 6331, Springer, 375-389.
[9] Auer, P., Cesa-Bianchi, N., Freund, Y. and Schapire, R.E. (1995) Gambling in a Rigged Casino: The Adversarial Multi-Armed Bandit Problem. Proceedings of 36th Annual Symposium on Foundations of Computer Science, Milwaukee, 322-331.
[10] Hochbaum, D.S. (1995) Approximation Algorithms for NP-Hard Problems. PWS Publishing Company, Boston.
[11] He, D., Chen, W., Wang, L. and Liu, T.-Y. (2013) Online Learning for Auction Mechanism in Bandit Setting. Decision Support Systems, 56, 379-386. http://dx.doi.org/10.1016/j.dss.2013.07.004
[12] Park, C. and Lee, J. (2012) Intelligent Traffic Control Based on Multi-Armed Bandit and Wireless Scheduling Techniques. International Conference on Advances in Vehicular System, Technologies and Applications, Venice, 23-27.
[13] Bertsekas, D.P. (1998) Network Optimization. Athena Scientific, Belmont.
[14] Blum, A. and Burch, C. (2000) On-Line Learning and the Metrical Task System Problem. Machine Learning, 39, 35-88. http://dx.doi.org/10.1023/A:1007621832648
[15] Cole, S.J. and Lim, C. (2008) Algorithms for Network Interdiction and Fortification Games. Springer Optimization and Its Applications, 17, 609-644. http://dx.doi.org/10.1007/978-0-387-77247-9_24
[16] Vanëk, O., Jakob, M. and Pëchoucek, M. (2011) Using Agents to Improve International Maritime Transport Security. IEEE Intelligent Systems, 26, 90-95. http://dx.doi.org/10.1109/MIS.2011.23