Received 5 November 2015; accepted 12 January 2016; published 15 January 2016
Both humans and animals live in a rich world of constantly changing external signals, thus the detection of changes and adequate reaction to environmental variation is crucial for the survival and successful adaptation. The learning and decision-making abilities are essential for adaptive behavioral strategy.
With no doubt, the behavioral studies remain the most prominent tool for exploring neural mechanisms of learning, memory and decision-making, though the past year experiments in behavioral and cognitive neuroscience   have shaped innovate interdisciplinary approaches integrating the findings from different fields of science. Among them are widely-used mathematical-statistical methods  -  . At the same time, as a result of cross-disciplinary studies, various equations expressing biological relations gain a prominent position within mathematics  . Mathematical approach is believed to explain all phenomena where at least one variable varies towards other variables or affects them   .
The probabilistic and stochastic processes in cognitive neuroscience are considered as constitute foundations of learning and decision-making    -  . Application of algebraic expressions has been proposed for description of learning processes  -  . More explicit mathematical methods are suggested for assessment of cognitive mechanisms involved in the adaptive learning, repeated decision tasks, reinforcement, and strategic changes  . The learning ability is studied in a binary choice task, where subjects have to choose from two possibilities (correct vs. incorrect) and the authors assume that the answers follow Bernoulli distribution that depends on a hidden state, reflecting the subject’s performance  . The probability that animals perform better than by chance is quantified and trial by trial performance is estimated. To develop a dynamic approach for analyzing learning experiments with binary response (correct, incorrect), state-space model of learning is introduced.
The question arises of how we go about understanding the probabilistic and stochastic processes that groundwork learning and memory. Due to a large number of possible parameters involved in learning, memory, and formation of adequate behavioral strategy, sometimes it seems difficult to generalize the results of behavioral studies. From this point, mathematical approach to the problem in general, and quantification of the measured parameters in particular, should be considered as the most reasonable means to identify the behavioral features and to interpret numeric data. Nowadays, such attitude is rather common to behavioral studies. Beside the wide range of traditional statistical methods used for analysis of behavioral parameters, different mathematical approaches and models are proposed for the data analysis in ethology, psychology, neuroscience, etc.  -  .
Markov chains are one of the basic tools in modern statistical computing, providing the basis for numerical simulations conducted in a wide range of disciplines. This mathematical technique undergoes transitions from one state to another through a set of values which process can take. Every state depends only on the current state and not on the sequence of events that precede it. Markov chains have many applications in biological modeling         . They are widely applied in neuroscience research, including behavioral studies, neuron assemblies, ion channels and others     , are the output of the Markov chains which can be different variables, for instance, the decision, or animal’s motor output and neural activity      . Markov chains are also used in simulations of brain function, such as the activity of mammalian neocortex  .
There are several ways we may go while studying cognitive abilities. We can simply observe, record and analyze neurobiological, behavioral conformities, or/and make an attempt to construct quantitative models in order to understand the computations that underpin cognitive abilities. What different cognitive studies share is the attempt to identify the involvement of various brain states in the behavioral processes. Sometimes the segmenting of observed and measured behavioral processes into consequence elements is needed to explore and quantify transitions between them. Thorough inquiry of animal behavioral conformities across learning process gave huge body of information on behavioral consequences which can be tested with Markov chains analysis. Markov chains have shown to provide better estimation of learning conformities in comparison with other methods used to infer from behavior data treatment. Such modern approaches contribute in studying the cognitive abilities and their behavioral correlates.
From neurobiological point of view, it is interesting to perform extrapolation on the basis of experimental data in order to establish quantitatively the degree of learning and the dynamics of the memory. The results of behavioral experiments can be predicted by means of the mathematical model using one of the main objects of probabilistic-statistical investigation-Markov chains  . In this paper the mathematical apparatus describing the direct delayed reactions using the discrete-time Markov chains is considered.
The experiments with using a modified method of direct delayed reactions made it possible to observe the learning process of the animals along with establishing the maximal delay and identifying an optimal algorithm of minimum errors and maximal reward. Here we will discuss the development of optimal algorithms and the dynamics of variability at delays of different duration.
31 albino rats of both sexes (with an average weight of 150 g) have been examined. The animals were individually housed in stainless steel cages in conditions of natural light-dark cycle and temperature of 20˚C ± 1˚C. The rats had free access to food and water throughout the whole experiment.
The rats were tested in a 25 cm-walled wooden T-maze (64 cm in length). The start compartment arm (47 cm in length) joined the two goal arms, each of which was 17 cm in width. Wooden feeders were situated at the far end of each goal arm. The floor under the feeders was electrified. The light and audio signal sources were attached to the top of starting compartment to study rat spatial learning through the different behavioral tasks using different experimental schemes (Figure 1).
The experiments were conducted on white rats using a modified method of direct delayed reactions  . Rats were trained in a T-maze-based spatial learning task that required animals to make trial-by-trial choices contingent upon their previous experience. The aim of the experiment was to fixate the complex perception of food in conditions of two feeders. Ten trials were conducted daily with a strictly defined time-spatial program, with strictly defined sequence of signaling feeders, the time interval between trials and the duration of delay. Before the delay, the animal was allowed to move in the experimental cabin without the intervention of the experimenter and obtain food in any feeder defined by the program. The rat could first run up to the feeder with food, and then to an empty one, or vice versa, and if it did not return to the starting compartment, the experimenter returned it with force. The animal was allowed to return from the feeder with food without correction to the start compartment so that it did not run to the empty feeder. After the delay the door of start compartment was opened and the animal had the possibility of free behavior. Both direct and indirect reactions were registered. During direct reaction the animal got food, while in case of indirect reaction did not. In the protocol the right move performed by the rat was marked by “1”, and incorrect-by “0”. The distribution of zeroes and ones was recorded in the protocol (Table 1).
Proposed approach gave us the possibility to characterize animal behavior and describe a learning algorithm  . For example, the sequence “11001110” means that in pre-delayed behavior (first 5 digits), the animal leaves the starting compartment without the interference of the experimenter (1), performs a proper move and runs to the feeder, where it has got food in the previous trial (1). This time it had not received food, since it was not provided by the program (0). So it was necessary to make the correction of the movement to the opposite feeder, as the food was placed there. The animal corrects its movement after the interference of the experimenter (0) and returns to the starting compartment (1). The last three digits of the given algorithm describe delayed reactions. The animal itself leaves the starting compartment (1), performs a right move and runs towards the
Figure 1. T-maze experimental cabin.
Table 1. The experimental protocol.
feeder, where before delay it has got reinforcement (1). After this the experimenter returns the animal to the starting compartment (0). Table 2 demonstrates the dynamics of delayed reactions’ algorithms for pre-delayed behavior and describes the process of optimal behavioral strategy formation. The obtained algorithms are applied for further mathematical analysis.
The schematic diagram of behavioral strategies based on obtained algorithms for one of the experimental days is presented on Figure 2. The diagram shows different behavioral strategies identified as bed (1,2,3), medium (4,5,6), good (7,8,9) and the best (10,11,12) learning algorithms.
2.4. Mathematical Description of Behavioral Algorithm
Description of Markov chain is as follows: we have a set of experimental trials (states). The process starts in one of these states and moves successively from one state to another. Each move is called a step. If the chain is currently in state ni then it moves to state ni at the next step with a probability denoted by Pij, and it does not depend upon which states the chain was at before the current state. In Markov chain the probabilities Pij are called transition probabilities. The process can remain in particular state, which occurs with probability Pij. An initial probability distribution, defined on n, specifies the starting state. Usually this is done by specifying a particular state as the starting state of behavioral experiment. We present a behavioral method for estimating these functions.
3. Results and Discussion
The initial state of a system or a phenomenon and a transition from one state to another appear to be principal in the explanation of Markov chain. In our experiments the initial state is 0 or 1; while the transition may occur from 0 to 1, or vice versa from 0 to 0, or from 1 to 1.
For studying Markov chains it is necessary to describe the probabilistic character of transitions. It is possible
Figure 2. The schematic description of behavioral algorithms. FD1-Feeder 1; FD2-Feeder 2; SC-Start Chamber.
Table 2. The dynamics of delayed reactions’ behavioral algorithms.
that the time intervals between transitions be permanent.
All states n (in our case) are numbered. If the system is described by Markov chain, then the probability that the system will moves from state i to during the next time interval depends only on the variables i and j and not on the behavior of the system before the transition to state i. In other words, the Pij probability that the system will transit from state i to j does not depend on the type of behavior before state i. Proceeding from the explanation and features of the probability it is easy to assume that
For the modeling the above described experiment, it is convenient to define Markov chain as follows: let’s say number of trials is conducted; if across consecutive trials the conditional probability that any event occurred in trial (we consider the results of previous trials as known) depends only on the result of the last nth trial and does not depend on previous trials, one can say that Markov condition holds. The observed process is called “Markov chain”―random process with discrete (finite) state-space.
Let us assume that the observed behavioral process appears to be random chain with Markov properties, the possible values of which are 0 and 1, and the transition probabilities are determined (estimated) using obtained empirical frequencies. As the initial state is 0 or 1, the transition may occur from 0 to 1, from 0 to 0, or from 1 to 1. We consider the question of determining the probability that, given the chain is in state i, it will be in state j across behavioral treatment. Simply, if we will know the probability that the result of the first trial is 0, then we can define the probability that during the trial nth result will also be 0
Let the conditional probability for transition from initial state 0 for trial n to state 1 in trial be written as:
It should be noted that the events and create a complete system of events (i.e. a system of such events, one of which necessarily occurs in any trial and any two different events cannot occur simultaneously). Therefore, the event can be denoted as follows:
It is easy to determine the probability of the event using total probability formula:
where and are conditional probabilities, i.e. the transition probabilities of Markov chain.
Now we need to recall formulas:
which give recurrence equation for the reliable probability:
In the general case. Therefore when. As when, from the equation (1) we gain formula:
In the case, we obtain:
Analogously, when . The probabilities p and q are called “final” probabilities. They depend on transition probabilities only and do not depend on the initial state. For a big n we can assume that the process is “balanced”―it’s probabilities (independently from n) approximately equal to p or q. This feature of Markov process (chain) is very important in applied science and bears the name of ergodic.
In the general case finding solutions to Equation (1) are very difficult, but in case when Pn does not depend on n, i.e., stationary equation can be written:
The solution of stationary Equation (2) is simple:
As solution of stationary Equation (2) is known, we can get the general solution. We only have to suppose that
If we substitute (4) into the Equation (1) for Pn, we get:
From the stationary Equation (2) the following recurrence equation is correct
, from which
According to assumption (4) we get which gives possibility to move from to Pn.
The Equation (5) can be used to predict the experimental results: if we know values for probabilities a, b and P1 for the first n number of trials, we can calculate the probability of getting 0 the result of trail will be.
The probabilities a, b and P1 can be estimated by corresponding empirical frequencies.
To illustrate this, let’s discuss in detail an algorithm of calculation of the probability (prognosis) using the Equation (5) on the basis of the data of 7th column () of Table 1.
Analogically, the following results are obtained for 8th - 14th columns of Table 1:
In this paper, the original modified method of the direct delayed reaction on the basis of results of the Markov chains theory is developed. The proposed mathematical apparatus allow to calculate the probability (prognosis) that the results of the following trail will be 0 if the estimates of the probabilities a, b and P1 for the first n number of trials are known. It should be noted that the probabilities of possible events of the theoretically calculated reactions coincide with the experimental data. It gives us an opportunity to use widely the above described method in neurophysiological investigations. In addition, it is possible to use them also for delayed reactions carried out by indirect and alternate methods. If we imagine everyday experimental results as one trial, it will be possible to make a long-term prognosis of the animals’ behavior.
 Houillon, A., Lorenz, R.C., Boehmer, W., Rapp, M.A., Heinz, A., Gallinat, J. and Obermayer, K. (2013) The Effect of Novelty on Reinforcement Learning. Progress in Brain Research, 202, 415-439.
 Mathew, B., Bauer, A.M, Koistinen, P., Reetz, T.C., Léon, J. and Sillanpaa, M.J. (2012) Bayesian Adaptive Markov Chain Monte Carlo Estimation of Genetic Parameters. Heredity, 109, 235-245.
 Smith, A.C., Wirth, S., Suzuki, W.A. and Brown, E.N. (2007) Bayesian Analysis of Interleaved Learning and Response Bias in Behavioral Experiments. Journal of Neurophysiology, 97, 2516-2524.
 Tejada, J., Bosco, G.G., Morato, S. and Roque, A.C. (2010) Characterization of the Rat Exploratory Behavior in the Elevated Plus-Maze with Markov Chains. Journal of Neuroscience Methods, 193, 288-295.
 Zin, T.T., Tin, P., Toriu, T. and Hama, H. (2012) A Series of Stochastic Models for Human Behavior Analysis. IEEE International Conference on Systems, Man, and Cybernetics (SMC), Seoul, 14-17 October 2012, 3251-3256.
 Solway, A. and Botvinick, M.M. (2012) Goal-Directed Decision Making as Probabilistic Inference: A Computational Framework and Potential Neural Correlates. Psychological Review, 119, 120-154.
 Blaettler, F., Kollmorgen, S., Herbst, J. and Hahnloser, R. (2011) Hidden Markov Models in the Neurosciences. In: Dymarski, P., Ed., Hidden Markov Models, Theory and Applications, InTech Publisher, Rijeka, 69-186.
 Arantes, R., Tejada, J., Bosco, G.G., Morato, S. and Roque, A.C. (2013) Mathematical Methods to Model Rodent Behavior in the Elevated Plus-Maze. Journal of Neuroscience Methods, 220, 141-148.
 Khodadadi, A., Fakhari, P. and Busemeyer, J.R. (2014) Learning to Maximize Reward Rate: A Model Based on Semi-Markov Decision Processes. Frontiers in Neuroscience, 8, 101.
 Linderman, S.W., Johnson, M.J., Wilson, M.A. and Chen, Z. (2014) A Nonparametric Bayesian Approach to Uncovering Rat Hippocampal Population Codes during Spatial Navigation. CBMM Memo, 027.
 Ito, M. and Doya, K. (2011) Multiple Representations and Algorithms for Reinforcement Learning in the Cortico-Basal Ganglia Circuit. Current Opinion in Neurobiology, 3, 368-373.
 Schliehe-Diecks, S., Kappeler, P.M. and Langrock, R. (2012) On the Application of Mixed Hidden Markov Models to Multiple Behavioural Time Series. Interface Focus, 2, 180-189.
 Zilli, E.A. and Hasselmo, M.E. (2008) The Influence of Markov Decision Process Structure on the Possible Strategic Use of Working Memory and Episodic Memory. PLoS ONE, 3, e2756.
 Lloyd, K., Becker, N., Jones, M.W. and Bogacz, R. (2012) Learning to Use Working Memory: A Reinforcement Learning Gating Model of Rule Acquisition in Rats. Frontiers in Computational Neuroscience, 6, 87.
 Huh, N., Jo, S., Kim, H., Jung, H.S. and Min, W.J. (2009) Model-Based Reinforcement Learning under Concurrent Schedules of Reinforcement in Rodents. Learning and Memory, 16, 315-323.
 Chen, Z., Kloosterman, F., Brown, E.N. and Wilson, M.A. (2012) Uncovering Spatial Topology Represented by Rat Hippocampal Population Neuronal Codes. Journal of Computational Neuroscience, 33, 227-255.
 Gomes, C.F., Brainerd, C.J., Nakamura, K. and Reyna, V.F. (2014) Markovian Interpretations of Dual Retrieval Processes. Journal of Mathematical Psychology, 59, 50-64.
 Huang, Y. and Rao, R.P.N. (2013) Reward Optimization in the Primate Brain: A Probabilistic Model of Decision Making under Uncertainty. PLoS ONE, 8, e53344.
 Vevea, J.L. (2006) Recovering Stimuli from Memory: A Statistical Method for Linking Discrimination and Reproduction Responses. British Journal of Mathematical and Statistical Psychology, 59, 321-346.
 Costa, A.A., Roque, A.C., Morato, S. and Tinós, R. (2012) A Model Based on Genetic Algorithm for Investigation of the Behavior of Rats in the Elevated Plus-Maze. Intelligent Data Engineering and Automated Learning—IDEAL, the Series Lecture Notes in Computer Science, 19, 151-158.
 McKellar, A.E., Roland, L.R., Walters, J.R. and Kesler, D.C. (2014) Using Mixed Hidden Markov Models to Examine Behavioral States in a Cooperatively Breeding Bird. Behavioral Ecology, 26, 148-157.
 Rao, R.P.N. (2010) Decision Making under Uncertainty: A Neural Model Based on Partially Observable Markov Decision Processes. Frontiers in Computational Neuroscience, 4, 146.