Modeling Abstraction Hierarchy Levels of the Cyber Attacks Using Random Process

Show more

1. Introduction

Cyber security provides protection and prevention for a network system. However, security technology is sometime perceived as an obstacle [1] . For some users, the difficulties in security implementation may overwhelm them. The relation between cyber defense and cyber attack is fundamentally a cognitive issue. The cyber attacker wants to manipulate the reflection of the defender. The purpose is to establish a cognitive support system for agents, the persons who involve directly the cyber security processes, are expected to be always aware of cyber threats. Based on the human factors/ergonomics concept of abstraction hierarchy, the agents being in a high abstraction hierarchy level of the mental picture are able to improve their self-defense against the cyber threats. The role of hierarchical knowledge is important in decision-making process, since the decision-makers have to adapt to the requirements of the situation under the specific condition in order to develop the proper actions [2] [3] .

In a degraded situation of work, the agents have finally to implement a concrete solution after analyzing the problem. In cognitive terms, they go down in the abstraction hierarchy level of the environment [3] [4] [5] . The decision support system must facilitate the possibility to navigate through the different abstraction hierarchy levels and intervene in the problem solving process to permit the agents to visit the best abstraction level for controlling the situation. At the high level of abstraction hierarchy, the agents can manage the defense against a cyber attack on the system more efficiency [4] . This means that they have a more global and abstract mental representation of the cyber attack and its consequences. The remainder of this paper is organized as follows. In Section 2.1, we give a description on the attacks simulation system. The cyber security center of the University of Southern Brittany simulates the cyber attacks and practices the defense procedure. In Section 2.2, the relationship between the psychological aspects of the agents and the security levels is explained. The ergonomic reactions to the cyber threats are mentioned as well. In Section 3, we develop a statistical model using hidden Markov chain with the requisite properties from the psychological aspects to infer the mental picture of an agent from a set of observations. In Section 4, we propose a parametric model based on the hidden Markov chain, and validate the behavior of the simulated data from the psychological viewpoint. Section 5 is devoted for the learning procedure of the model from the data, and the estimation method for the parameters as well as the abstraction hierarchy level of the mental picture is also detailed in the section. The survival functions given state are investigated in Section 6. The nonparametric estimation for the survival functions is described in Section 7. The concluding remarks are given in Section 8.

2. Problem Description

We describe the cyber attacks simulation and the psychological aspects associated to the abstraction hierarchy of the cyber threats.

2.1. Attacks Simulation System

A cyber security center at University of Southern Brittany, France has been invested to do research on cyber attack and cyber defense (http://www.cyber-security-center.com). There are two main teams in the simulation system:

1) The attack team (aka red team) plays a role as an attacker, this team creates the cyber pseudo-attacks derived from around the world. A sequence of cyber attacks is simulated to attack the security system of the defense team.

2) The defense team (aka blue team) includes IT group, SOC (security operation center) group, the forensic group and the management department. In general, these groups will have to detect the attack(s) through abnormal accesses such as multiple suspected connections to the server. The groups also report the damages, describe the procedure of the attacks. The description of the attack(s) is sent to the management department. Based on the collected data, the agents’ job is to analyze the severity of the damage, the sophisticated level of the invasion. After analyzing the situation, they need to find the strategy to defend the system, and resolve the damage.

The scheme of the attack simulation system is illustrated in Figure 1. The focus of our concern, from the psychological viewpoint, is the human aspect of these agents. Specifically, mental state of the agent that affects the behavior is studied. The mental state of the agents in the blue team is important since they are the ones who have to comprehend the situation and make the appropriate decisions. Under stressful situation, their mental state may not help the agents have a complete evaluation of the situation. For example, if the agent loses consciousness of the functional purpose of a potential threat on the system (i.e. invading the system), and focuses only on the form of the attack processes (i.e. attack’s dynamics), the agent may fail in judgment on the danger of a given attack process, and then commit errors.

2.2. Psychological Aspects

2.2.1. Work Domain Analysis of a Cyber Threat

Different hierarchy levels of the mental states are studied in ergonomics science [5] [6] . Construction of the abstraction hierarchy levels could use the Work Domain Analysis approach (WDA) [7] . This is the initial phase of cognitive work analysis. The aim of WDA in our scenario is to model the constraints that

Figure 1. Cyber attack simulation system.

relate to the purposive and physical context of the cyber threats. One characteristic of WDA is that it is event-independent. In other words, WDA generally represents categories of knowledge on work domain [8] . Therefore, when confronted with an unanticipated event, the agents can rely on their knowledge of the threat constraints to explore variety ways of dealing with the situation. The Abstraction Hierarchy is made of five abstraction levels [3] [8] :

S5 General purposes: comprehended at the highest level of abstraction hierarchy. When the agent perceives the event at this level, the fundamental purposes of the attack and its origin are recognized thoroughly.

S4 Abstract functions: at this hierarchy level, the agent is capable of understanding the laws, the principles, the attack sophistication and smartness.

S3 Processes: the process relates to the goal such as a set of dynamic flows of the event, information or sequence of states. In other words, the agent can per- ceive the requisite elements to achieve the goal.

S2 Physical functions: represents the functional values directly associated with the concrete forms, such as Trojans, viruses.

S1 Physical forms: apparent forms such as broken files, attack occurrence, or code lines of a virus, that can be perceived by an agent.

Here we have one-to-one relation between the abstraction hierarchy levels of the cyber attacks and the mental picture levels of the agent. When the agent is at a certain mental level, that agent perceives the respective abstraction hierarchy level of the cyber attack. It is essential for the agent to perceive the abstraction hierarchy level of the attack at the best level in order to have the best performance. When the diagnosis is executed at the highest level, then when the agent goes down in the abstraction hierarchy to specify the best solution and envisage several alternatives, the solution will be exhausted.

A scam email sent through the system (ex: service@paypal.com service.payp- al@pay-paypal.com). We illustrate the levels of abstraction depicted by the abstraction hierarchy, and the mental model:

In high-level behaviors, the diagnosis stage focuses on the fundamental meaning of the suspected content. The agent seems to visit often the high levels of abstraction (abstraction functions, general purposes) to improve the understanding of the content of the email, which can lead to better performance, the solution can be exhausted. In the low-level behaviors, the low levels of abstraction are more often visited. The subject’s attention is on the physical form (or physical functions) of the email. The real address mail (service.paypal@pay-paypal.com) hidden under the exposed address (service@paypal.com) is perceived. The interface in the content of the email replicated from the legitimate email from PayPal (icons, images, color, symbol...) gains trust. Even if the agent recognizes that the email is illegitimate, the poor performance may cause a risk (e.g. a Trojan installed).

From the example, we have learned a relation between the mental behaviors of human and the abstraction hierarchy levels of a cyber attack that is observed. Once again, the human-centered security, or self-defense from the agent is an effective layer in the cyber security system, beside innovative technologies. How- ever, these levels of abstraction hierarchy as well as the mental picture levels can be only deduced from the observable data. The observable outcomes that imply the mental state of the agent is discussed in the sequel.

2.2.2. The Reaction Time to an Arrival Cyber Attack

The interaction of a person to a computer is more likely different according to the current mental state of that one. Usually, the attackers never want their attacks detected. Therefore, if the agents lack awareness, intrusion can be perceived as a normal access, or the detection could be too late. From this argument, we propose the following assumption:

・ When a person is in high awareness, which means the actions will be based on the fundamental knowledge of the cyber threats. Then the situation will be perceived at its high abstraction hierarchy level. Roughly speaking, the brain is always on high alert, which helps it detect the abnormal access soon. Even if the detection is a false alarm, the system is still secure.

・ In contrast, if one is in a low level of mental state, that person lacks awareness of the potential dangers from an access. The attack will be perceived at its low abstraction hierarchy level (e.g. physical form), since the brain is ‘tired’ to process the information to detect the abnormal activities. In the cognitive terms, the reaction is low level behaviors. The agent focuses only on the technical issues rather, the concrete form than the main purpose of the attack. Therefore, the attack can pass and continue until it reaches the goal(s) or being detected.

With this observation, we propose R is a random variable representing the time since the cyber threat arrives until the agent is aware of its activity. Very likely, the high hierarchy levels agents spend less time to detect abnormal access than the ones are in lower hierarchy levels. Let ${\mu}_{R}$ denote the mean value of R, this value ${\mu}_{R}$ is constructed by three components

${\mu}_{R}={b}_{R}+{V}_{R}\left(z\right)+{D}_{R}\left(z\right),$

where ${b}_{R}$ represents the basic reaction time of the agent with respect to the current mental state, or the time needed for the agent to perceive the appearance of an event’s arrival [9] , z denotes the complexity of the attack, ${V}_{R}\left(\mathrm{.}\right)$ is the average time needed in order to comprehend the content of the event; the value depends on the complexity of the message and ${D}_{R}\left(\mathrm{.}\right)$ represents the average time required to reach the decision after comprehending the content.

3. Hidden Markov Based Model

Since the mental state at a certain time of the agent is unable to observe, and could be only inferred from the observable data, this unobserved information can be considered as a hidden sequence. In this section, we construct a model using the hidden Markov chain to adjust the data. Particularly, the hidden Markov chain can be applied for modeling the abstraction hierarchy level of the attack that the agent perceived as well as the corresponding mental picture level of that agent. Let us assume that the mental picture state is classified into K levels/states (hidden). The set of states is denoted by

$S=\left\{{s}_{1},{s}_{2},\cdots ,{s}_{K}\right\}.$

The elements are arranged in the increased order, i.e. the state level k is represented by ${s}_{k}$ . Without misunderstanding, it can be written ${s}_{i}>{s}_{j}$ if $i>j$ .

The mental states of the agent are illustrated by a random process $X=\left({X}_{n}\right)$ , ${X}_{n}$ represents the mental state of the agent at the time n, ${X}_{n}\in S$ , where n is a positive integer in $\left\{\mathrm{1,2,}\cdots \mathrm{,}N\right\}$ . We assume that the process satisfies the Markov property given by

$P\left({X}_{n}|{X}_{n-1},{X}_{n-2},\cdots ,{X}_{1}\right)=P\left({X}_{n}|{X}_{n-1}\right).$

The meaning of this property is that, given the information in the recent past, the state at the present is independent of the further pasts. The state transition probability distribution ${A}_{n}=\left\{{a}_{ij}^{\left(n\right)}\right\}$ is the transition matrix for $1\le i\mathrm{,}j\le K$ where the coefficients

${a}_{ij}^{\left(n\right)}=P\left({X}_{n}={s}_{j}|{X}_{n-1}={s}_{i}\right)$

are the probability that the state moves from ${s}_{i}$ to ${s}_{j}$ at time n. The transition probabilities satisfy the stochastic constraints, ${a}_{ij}^{\left(n\right)}\ge 0$ , and ${\sum}_{j=1}^{K}}\text{\hspace{0.05em}}\text{\hspace{0.05em}}{a}_{ij}^{\left(n\right)}=1$ . It is intuitively observed that one of the factors which can directly affect the mental state is the attack that the agent suffered. Particularly, the more complex attack that the subject suffered, the more likely the agent is at the lower level of mental state at the current observation n. Therefore, the value at the current state depends not only on the state of the subject previously but also on the attacks that occurred in the recent past. With this argument, we describe the transition probabilities including the effect of the cyber attack given by

${\stackrel{\u02dc}{a}}_{ij}\left({z}_{n-1}\right):={a}_{ij}^{\left(n\right)}=P\left({X}_{n}={s}_{j}|{X}_{n-1}={s}_{i},{z}_{n-1}\right),$

where ${z}_{n}$ is the level (or complexity) of the attack at time n. We propose the requisite properties for the transition probabilities: if ${z}_{n-1}\le {z}_{n}$ , the attack ${z}_{n}$ is not less complex than the attack ${z}_{n-1}$ . In other words, the subject suffers no less complicated attack than previous time, then

1) if $i>j$ , ${\stackrel{\u02dc}{a}}_{ij}\left({z}_{n-1}\right)\le {\stackrel{\u02dc}{a}}_{ij}\left({z}_{n}\right)$ , the agent is more likely to go down in mental state level,

2) if $i<j$ , ${\stackrel{\u02dc}{a}}_{ij}\left({z}_{n-1}\right)\ge {\stackrel{\u02dc}{a}}_{ij}\left({z}_{n}\right)$ , the agent is less likely to go up in mental state level,

3) ${\stackrel{\u02dc}{a}}_{ii}\left({z}_{n-1}\right)\le {\stackrel{\u02dc}{a}}_{ii}\left({z}_{n}\right)$ if ${s}_{i}$ is a low mental level,

4) ${\stackrel{\u02dc}{a}}_{ii}\left({z}_{n-1}\right)\ge {\stackrel{\u02dc}{a}}_{ii}\left({z}_{n}\right)$ if ${s}_{i}$ is a high mental level.

With these properties, it is necessary to categorize S into high levels and low levels subsets under the cognitive viewpoint. In the totally ordered index set I, $S={\left\{{s}_{i}\right\}}_{i\in I}$ , there exists $\omega >\mathrm{inf}I$ such that

${S}_{l}=\left\{{s}_{i}\in S|i<\omega \right\}\text{isasetoflowhierarchystates,}$

${S}_{h}=\left\{{s}_{i}\in S|i\ge \omega \right\}\text{isasetofhighhierarchystates}\mathrm{.}$

The sequence $O=\left({O}_{1},{O}_{2},\cdots ,{O}_{N}\right)$ represents the observations and $V=\left\{{v}_{m}\right\}$ is a set of observable outcomes corresponding to the possible informations collected from the agent. The distribution of the observation in each state is given by $B=\left\{{b}_{k}(.)\right\}$ , where ${b}_{k}(.)$ is the distribution of the observation in state ${s}_{k}$ .

Finally, the last component of the Hidden Markov chain is the initial state distribution $\pi =\left\{{\pi}_{1},{\pi}_{2},\cdots ,{\pi}_{K}\right\}$ of ${X}_{1}$ , where ${\pi}_{i}$ is the probability that the model is in state ${s}_{i}$ at the time $n=1$ , ${\pi}_{i}=P\left({X}_{1}={s}_{i}\right),\text{\hspace{0.05em}}\text{\hspace{0.05em}}\text{\hspace{0.05em}}1\le i\le K$ . Figure 2 shows the general sheme of a Markov chain.

4. Two-State Model

4.1. Model Description

We construct a parametric model that satisfies the aforementioned properties in Section 3 for the hidden process $\left({X}_{n}\right)$ . Assuming that the set of states S has two states, $S=\left\{0,1\right\}$ . Under this assumption, the values represent the low and high levels of mental state respectively. A sequence of attacks z is considered, $z=\left({z}_{1},{z}_{2},\cdots ,{z}_{n},\cdots ,{z}_{N-1}\right)$ , where ${z}_{n}$ , as mentioned, is the level of the attack detected at the time n. The variable ${z}_{n}$ takes the non-negative integer values. At the cyber security center, the attacks are simulated in four levels: (1) Low, (2) Normal, (3) High and (4) Emergency. Assuming that only the most recent attack affects the current mental state, i.e.

$P\left({X}_{n}={x}_{n}|{X}_{n-1}={x}_{n-1},{z}_{n-1},\cdots ,{z}_{1}\right)=P\left({X}_{n}={x}_{n}|{X}_{n-1}={x}_{n-1},{z}_{n-1}\right).$

In order to satisfy the properties of the transition probabilities in the model, it is required that the probability

$P\left({X}_{n}=1|{X}_{n-1}=x,{z}_{n-1}\right)$ (1)

decreases with respect to ${z}_{n-1}$ . We consider the following expression of the transition probability

$P\left({X}_{n}=1|{X}_{n-1}=x,{z}_{n-1}\right)=\mathrm{exp}\left(\left(1+\mathrm{log}\left(1+{z}_{n-1}\right)\right)\mathrm{log}{a}_{x}\right),$ (2)

where ${a}_{x}$ is the probability that the high level status is recorded at the present time n, ${X}_{n}=1$ , given the previous recorded status is x, ${X}_{n-1}=x$ , and there is no effective attack,

$P\left({X}_{n}=1|{X}_{n-1}=x,{z}_{n-1}=0\right)={a}_{x}.$

Figure 2. Hidden Markov chain scheme.

The term “no effective attack” has to be understood that the attack is very easy to manipulate or it is a false alarm of the agent. With this observation, without any effective attack, ${a}_{1}$ is considered as a parameter presenting the “self-main- tain” ability of the agent, and ${a}_{0}$ presents the ability of “self-recover” of the agent. These two parameters ${a}_{0}$ and ${a}_{1}$ are the personal characteristics of an agent and can be measured using the simulated cyber attacks.

From (2), we observe that if the agent is at the high level of mental state, the probability that the agent remains in that level, $P\left({X}_{n}=1|{X}_{n-1}=1,{z}_{n-1}\right)$ , decreases with respect to the level of the attack, which leads to the probability of decreasing in the mental state becomes greater,

$P\left({X}_{n}=0|{X}_{n-1}=1,{z}_{n-1}\right)=1-P\left({X}_{n}=1|{X}_{n-1}=1,{z}_{n-1}\right).$ (3)

Similarly, (2) shows that the one being at the lower level will harder goes up in the mental level after suffering an effective attack, i.e. $P\left({X}_{n}=1|{X}_{n-1}=0,{z}_{n-1}\right)$ decreases with respect to the level of the recent attack. These are the properties proposed in Section 3.

4.2. Simulation Study

From (2), we generate a sequence of length 30 with self-recover and self-maintain parameters equal to
${a}_{0}=0.7$ ,
${a}_{1}=0.9$ , and
$P\left({X}_{1}=1\right)=0.9$ . The simulated sequence is given in Table 1. The first row represents the complexity of attack in the past that affects the state of
${X}_{n}$ . As described in Subsection 4.1, the attack with complexity z_{n} = 0 is ineffective. The second row is the realization
$x=\left({x}_{n}\right)$ of
$\left({X}_{n}\right)$ .

With the high self-maintain values, the mental level of the agent is capable of remaining high even after high level attacks. The high values of self-recover parameter can help the agent in the low state easier regain the high level. Table 2 corresponds to the simulation associated to a smaller value of the self-recover parameter ( ${a}_{0}=0.4$ ).

The sequence of observation $O=\left({O}_{1},{O}_{2},\cdots ,{O}_{N}\right)$ is simulated from the distribution ${b}_{i}$ ’s. Let us assume that ${b}_{i}$ ’s follow the Gaussian distribution,

${O}_{n}{|}_{{X}_{n}=i}~\mathcal{N}\left({\mu}_{i},{\sigma}_{i}^{2}\right),$

where ${\mu}_{i}$ is the mean value of the observation when the state of the subject is in level i, and ${\sigma}_{i}^{2}$ is the variance. Figure 3 displays the simulated observations and we observe the difference between the two sets of data.

Table 1. Simulated sequence of length 30 with P(X_{1} = 1) = 0.9, a_{0} = 0.7, a_{1} = 0.9.

Table 2. Simulated sequence of length 30 with P(X_{1} = 1) = 0.9, a_{0} = 0.4, a_{1} = 0.9.

Figure 3. One thousand values of the observation are simulated with P(X_{1} = 1) = 0.9, a_{0} = 0.4, a_{1} = 0.9, the parameters of
$\left({\mu}_{0}\mathrm{,}{\sigma}_{0}^{2}\right)$ and
$\left({\mu}_{1}\mathrm{,}{\sigma}_{1}^{2}\right)$ are respectively (15, 3) and (5, 2).

5. Estimating the Parameters and Reconstructing the Hidden States

We describe a procedure based on the Maximum Posterior Marginal (MPM) [10] [11] maximizing the marginal posterior distribution $P\left({X}_{n}|O\right)$ . We recall the forward-backward procedures [12] [13] . The forward-backward probabilities are defined by:

${\alpha}_{n}\left(i\right)=P\left({O}_{1}={o}_{1},\cdots ,{O}_{n}={o}_{n},{X}_{n}={s}_{i}\right),$ (4)

and

${\beta}_{n}\left(i\right)=P\left({O}_{n+1}={o}_{n+1},\cdots ,{O}_{N}={o}_{N}|{X}_{n}={s}_{i}\right).$ (5)

However, the original recursion derived from (4) and (5) has numerical problems [10] [14] . The replaced joint probabilities have been proposed by Devijver et al. [14]

${\alpha}_{n}\left(i\right)\approx P\left({X}_{n}={s}_{i}|{O}_{1}={o}_{1},\cdots ,{O}_{n}={o}_{n}\right)$ (6)

${\beta}_{n}\left(i\right)\approx \frac{P\left({O}_{n+1}={o}_{n+1},\cdots ,{O}_{N}={o}_{N}|{X}_{n}={s}_{i}\right)}{P\left({O}_{n+1}={o}_{n+1},\cdots ,{O}_{N}={o}_{N}|{O}_{1}={o}_{1},\cdots ,{O}_{n}={o}_{n}\right)}.$ (7)

Using the numerically stable recursions, the forward-backward probabilities are approximated as follow:

・ Forward initialization:

${\alpha}_{1}\left(i\right)=\frac{{\pi}_{i}{b}_{i}\left({o}_{1}\right)}{{\displaystyle \underset{j=1}{\overset{K}{\sum}}}\text{\hspace{0.05em}}\text{\hspace{0.05em}}{\pi}_{i}{b}_{i}\left({o}_{1}\right)},\text{\hspace{0.05em}}\text{\hspace{0.05em}}\text{\hspace{0.05em}}\text{\hspace{0.05em}}\text{\hspace{0.05em}}\text{for}\text{\hspace{0.17em}}\text{\hspace{0.05em}}1\le i\le K.$

・ Forward induction:

${\alpha}_{n}\left(j\right)=\frac{{b}_{j}\left({o}_{n}\right){\displaystyle \underset{i=1}{\overset{K}{\sum}}}\text{\hspace{0.05em}}\text{\hspace{0.05em}}{\alpha}_{n-1}\left(i\right){a}_{ij}^{\left(n\right)}}{{\displaystyle \underset{l=1}{\overset{K}{\sum}}}\text{\hspace{0.05em}}\text{\hspace{0.05em}}{b}_{l}\left({o}_{n}\right){\displaystyle \underset{i=1}{\overset{K}{\sum}}}\text{\hspace{0.05em}}\text{\hspace{0.05em}}{\alpha}_{n-1}\left(i\right){a}_{il}^{\left(n\right)}},\text{\hspace{0.05em}}\text{\hspace{0.05em}}\text{\hspace{0.05em}}\text{\hspace{0.05em}}\text{\hspace{0.05em}}\text{\hspace{0.05em}}\text{\hspace{0.05em}}\text{for}\text{\hspace{0.17em}}\text{\hspace{0.05em}}1\le j\le K,\text{\hspace{0.05em}}2\le n\le N.$

The backward ${\beta}_{n}\left(i\right)$ is also calculated inductively as follows:

・ Backward initialization:

${\beta}_{N}\left(i\right)=\mathrm{1,}\text{\hspace{0.05em}}\text{\hspace{0.05em}}\text{\hspace{0.05em}}\text{\hspace{0.05em}}\text{\hspace{0.05em}}\text{for}\text{\hspace{0.17em}}1\le i\le K$

・ Backward induction:

${\beta}_{n}\left(i\right)=\frac{{\displaystyle \underset{j=1}{\overset{K}{\sum}}}\text{\hspace{0.05em}}\text{\hspace{0.05em}}{a}_{ij}^{\left(n+1\right)}{b}_{j}\left({o}_{n+1}\right){\beta}_{n+1}\left(j\right)}{{\displaystyle \underset{l=1}{\overset{K}{\sum}}}\text{\hspace{0.05em}}\text{\hspace{0.05em}}{b}_{l}\left({o}_{n+1}\right){\displaystyle \underset{j=1}{\overset{K}{\sum}}}\text{\hspace{0.05em}}\text{\hspace{0.05em}}{\alpha}_{n}\left(i\right){a}_{jl}^{\left(n+1\right)}},\text{\hspace{0.05em}}\text{\hspace{0.05em}}\text{\hspace{0.05em}}\text{\hspace{0.05em}}\text{\hspace{0.05em}}\text{\hspace{0.05em}}\text{\hspace{0.05em}}\text{for}\text{\hspace{0.17em}}1\le i\le K,\text{\hspace{0.05em}}n=N-1,N-2,\cdots ,1.$

In case of two-state model in Section 4, the transition probabilities ${a}_{ij}^{\left(n\right)}$ are computed by (2) and (3). We define the probability

${\xi}_{n}\left(i,j\right)=P\left({X}_{n}={s}_{i},{X}_{n+1}={s}_{j}|O,\lambda \right)$

of being in the states ${s}_{i}$ and ${s}_{j}$ at respectively times $n$ and $n+1$ given the model $\lambda $ , where $\lambda $ denotes the complete parameters set of the model and $O$ the sequence of observations.

The probability ${\xi}_{n}\left(i\mathrm{,}j\right)$ can be written using forward backward variables

$\begin{array}{c}{\xi}_{n}\left(i,j\right)=\frac{{\alpha}_{n}\left(i\right){a}_{ij}^{\left(n+1\right)}{b}_{j}\left({o}_{n+1}\right){\beta}_{n+1}\left(j\right)}{P\left(O|\lambda \right)}\\ =\frac{{\alpha}_{n}\left(i\right){a}_{ij}^{\left(n+1\right)}{b}_{j}\left({o}_{n+1}\right){\beta}_{n+1}\left(j\right)}{{\displaystyle \underset{l=1}{\overset{K}{\sum}}}{\displaystyle \underset{m=1}{\overset{K}{\sum}}}\text{\hspace{0.05em}}\text{\hspace{0.05em}}{\alpha}_{n}\left(l\right){a}_{lm}^{\left(n+1\right)}{b}_{l}\left({o}_{n+1}\right){\beta}_{n+1}\left(m\right)}.\end{array}$

Moreover, the marginal a posterior probability, i.e. the probability of being in state ${s}_{i}$ at time n given the observation and the model, can be obtained as follow

${\gamma}_{n}\left(i\right)=P\left({X}_{n}={s}_{i}|O,\lambda \right)={\displaystyle \underset{j=1}{\overset{K}{\sum}}}\text{\hspace{0.05em}}\text{\hspace{0.05em}}{\xi}_{n}\left(i,j\right)=\frac{{\alpha}_{n}\left(i\right){\beta}_{n}\left(i\right)}{{\displaystyle \underset{l=1}{\overset{K}{\sum}}}\text{\hspace{0.05em}}\text{\hspace{0.05em}}{\alpha}_{n}\left(l\right){\beta}_{n}\left(l\right)}.$

In order to obtain the MPM solution, each element ${\stackrel{^}{X}}_{n}$ is attributed to the state ${s}_{{i}_{n}}$ that maximizes ${\gamma}_{n}\left(i\right)$ .

The estimation of parameters of the model $\lambda $ is updated by EM algorithm [15] [16] . With $O\mathrm{=}\left({O}_{1}\mathrm{,}\cdots \mathrm{,}{O}_{N}\right)$ to be the observed data and the state sequence $X=\left({X}_{1},\cdots ,{X}_{N}\right)$ to be hidden, the complete-data likelihood function is $P\left(O\mathrm{,}X|\lambda \mathrm{,}z\right)$ . Where z is the observed sequence of attacks introduced in Section 4. The EM algorithm first finds the expectation of the log-likelihood of the complete data (E-step) with respect to the hidden data X given the observation and the initial or previous ${\lambda}^{\prime}$

$\begin{array}{c}Q\left(\lambda \mathrm{,}{\lambda}^{\prime}\right)=E\left(logP\left(O\mathrm{,}X|\lambda \mathrm{,}z\right)|O\mathrm{,}{\lambda}^{\prime}\mathrm{,}z\right)\\ ={\displaystyle \underset{x\in \mathcal{X}}{\sum}}logP\left(O\mathrm{,}x|\lambda \mathrm{,}z\right)P\left(x|O\mathrm{,}{\lambda}^{\prime}\mathrm{,}z\right)\mathrm{.}\end{array}$

In fact, for the easier calculation, the used density is $P\left(O,x|{\lambda}^{\prime},z\right)=$ $P\left(x|O,{\lambda}^{\prime},z\right)P\left(O|{\lambda}^{\prime},z\right)$ . Since the factor $P\left(O|{\lambda}^{\prime}\mathrm{,}z\right)$ is not depending on $\lambda $ , the sub-sequence steps are not effected. Then, the following form of function Q is used

$Q\left(\lambda ,{\lambda}^{\prime}\right)={\displaystyle \underset{x\in \mathcal{X}}{\sum}}\mathrm{log}P\left(O,x|\lambda ,z\right)P\left(O,x|{\lambda}^{\prime},z\right).$ (8)

The second step is to determine the maximum with respect to $\lambda $ of Q (M-step). Given a state sequence x, $P\left(O\mathrm{,}x|\lambda \mathrm{,}z\right)$ is represented as

$P\left(O,x|\lambda ,z\right)={\pi}_{{x}_{1}}{\displaystyle {\prod}_{n=2}^{N}{a}_{{x}_{n-1}{x}_{n}}^{\left(n\right)}}{\displaystyle {\prod}_{n=1}^{N}{b}_{{x}_{n}}\left({o}_{n}\right)}.$

Then the Q function is

$\begin{array}{c}Q\left(\lambda ,{\lambda}^{\prime}\right)={\displaystyle \underset{x\in \mathcal{X}}{\sum}}\mathrm{log}{\pi}_{{x}_{1}}P\left(O,x|{\lambda}^{\prime},z\right)+{\displaystyle \underset{x\in \mathcal{X}}{\sum}}\left({\displaystyle \underset{n=1}{\overset{N}{\sum}}}\mathrm{log}{b}_{{x}_{n}}\left({o}_{n}\right)\right)P\left(O,x|{\lambda}^{\prime},z\right)\\ \text{\hspace{0.17em}}\text{\hspace{0.17em}}+{\displaystyle \underset{x\in \mathcal{X}}{\sum}}\left({\displaystyle \underset{n=2}{\overset{N}{\sum}}}\mathrm{log}{a}_{{x}_{n-1}{x}_{n}}^{\left(n\right)}\right)P\left(O,x|{\lambda}^{\prime},z\right).\end{array}$ (9)

The parameters are now separated into three independent terms, and each term can be optimized individually. The first term is

$\begin{array}{c}{\displaystyle \underset{x\in \mathcal{X}}{\sum}}\mathrm{log}{\pi}_{{x}_{1}}P\left(O,x|{\lambda}^{\prime},z\right)={\displaystyle \underset{{x}_{1}=1}{\overset{K}{\sum}}}\mathrm{log}{\pi}_{{x}_{1}}{\displaystyle \underset{{x}_{2}=1}{\overset{K}{\sum}}}\cdots {\displaystyle \underset{{x}_{N}=1}{\overset{K}{\sum}}}P\left(O,{x}_{1},\cdots ,{x}_{N}|{\lambda}^{\prime},z\right)\\ ={\displaystyle \underset{i=1}{\overset{K}{\sum}}}\mathrm{log}{\pi}_{i}P\left(O,{x}_{1}=i|{\lambda}^{\prime},z\right).\end{array}$

The optimization with the constraint ${\sum}_{i=1}^{K}}\text{\hspace{0.05em}}\text{\hspace{0.05em}}{\pi}_{i}=1$ is solved by using the Lagrange multiplier and we obtain

${\pi}_{i}=\frac{P\left(O,{x}_{1}=i|{\lambda}^{\prime},z\right)}{P\left(O|{\lambda}^{\prime},z\right)}=P\left({x}_{1}=i|O,{\lambda}^{\prime},z\right).$

The second term in (9) becomes

$\underset{x\in \mathcal{X}}{\sum}}\left({\displaystyle \underset{n=1}{\overset{N}{\sum}}}\mathrm{log}{b}_{{x}_{n}}({o}_{n})\right)P\left(O,x|{\lambda}^{\prime},z\right)={\displaystyle \underset{n=1}{\overset{N}{\sum}}}{\displaystyle \underset{i=1}{\overset{K}{\sum}}}\mathrm{log}{b}_{i}\left({o}_{n}\right)P\left(O,{x}_{n}=i|{\lambda}^{\prime},z\right).$

When the distribution of $\left\{{b}_{i}\right\}$ is Gaussian, the solution for the optimization of this term is

${\mu}_{i}=\frac{{\displaystyle \underset{n=1}{\overset{N}{\sum}}}\text{\hspace{0.05em}}\text{\hspace{0.05em}}{o}_{i}\times P\left(O,{x}_{n}=i|{\lambda}^{\prime},z\right)}{{\displaystyle \underset{n=1}{\overset{N}{\sum}}}\text{\hspace{0.05em}}\text{\hspace{0.05em}}P\left(O,{x}_{n}=i|{\lambda}^{\prime},z\right)},$

and

${\sigma}_{i}^{2}=\frac{{\displaystyle \underset{n=1}{\overset{N}{\sum}}}{\left({o}_{n}-{\mu}_{i}\right)}^{2}\times P\left(O,{x}_{n}=i|{\lambda}^{\prime},z\right)}{{\displaystyle \underset{n=1}{\overset{N}{\sum}}}\text{\hspace{0.05em}}\text{\hspace{0.05em}}P\left(O,{x}_{n}=i|{\lambda}^{\prime},z\right)}.$

The third term in (9) can be written as

$\begin{array}{l}{\displaystyle \underset{x\in \mathcal{X}}{\sum}}\left({\displaystyle \underset{n=2}{\overset{N}{\sum}}}\mathrm{log}{a}_{{x}_{n-1}{x}_{n}}^{\left(n\right)}\right)P\left(O,x|{\lambda}^{\prime},z\right)\\ ={\displaystyle \underset{n=2}{\overset{N}{\sum}}}{\displaystyle \underset{{x}_{n-1}=1}{\overset{K}{\sum}}}{\displaystyle \underset{{x}_{n}=1}{\overset{K}{\sum}}}\mathrm{log}{a}_{{x}_{n-1}{x}_{n}}^{\left(n\right)}{\displaystyle \underset{{x}_{1}=1}{\overset{K}{\sum}}}\cdots {\displaystyle \underset{{x}_{N}=1}{\overset{K}{\sum}}}P\left(O,{x}_{1},\cdots ,{x}_{n-1},{x}_{n}\cdots ,{x}_{N}|{\lambda}^{\prime},z\right)\\ ={\displaystyle \underset{n=2}{\overset{N}{\sum}}}{\displaystyle \underset{i=1}{\overset{K}{\sum}}}{\displaystyle \underset{j=1}{\overset{K}{\sum}}}\mathrm{log}{a}_{ij}^{\left(n\right)}P\left(O,{x}_{n-1}=i,{x}_{n}=j|{\lambda}^{\prime},z\right).\end{array}$

With the two-state model in Section 4, the transition probabilities are expressed as

$P\left({X}_{n}=1|{X}_{n-1}=x,{z}_{n-1}\right)=\mathrm{exp}\left(\left(1+\mathrm{log}\left(1+{z}_{n-1}\right)\right)\mathrm{log}{a}_{x}\right),$

$P\left({X}_{n}=1|{X}_{n-1}=x,{z}_{n-1}\right)=\mathrm{exp}\left(\left(1+\mathrm{log}\left(1+{z}_{n-1}\right)\right)\mathrm{log}{a}_{x}\right),$

For the notational convenience, we denote ${g}_{n}\left(z\right)=1+\mathrm{log}\left(1+{z}_{n-1}\right)$ . Then the third term of Q can be rewritten as

$\begin{array}{l}{\displaystyle \underset{n=2}{\overset{N}{\sum}}}{\displaystyle \underset{i=1}{\overset{K}{\sum}}}{\displaystyle \underset{j=1}{\overset{K}{\sum}}}\mathrm{log}{a}_{ij}^{\left(n\right)}P\left(O,{x}_{n-1}=i,{x}_{n}=j|{\lambda}^{\prime},z\right)\\ ={\displaystyle \underset{n=2}{\overset{N}{\sum}}}({g}_{n}\left(z\right)\mathrm{log}{a}_{0}P\left(O,{x}_{n-1}=0,{x}_{n}=1|{\lambda}^{\prime},z\right)\\ \text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{\hspace{0.05em}}\text{\hspace{0.05em}}+\mathrm{log}\left(1-{a}_{0}^{{g}_{n}\left(z\right)}\right)P\left(O,{x}_{n-1}=0,{x}_{n}=0|{\lambda}^{\prime},z\right)\\ \text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{\hspace{0.05em}}\text{\hspace{0.05em}}+{g}_{n}\left(z\right)\mathrm{log}{a}_{1}P\left(O,{x}_{n-1}=1,{x}_{n}=1|{\lambda}^{\prime},z\right)\\ \text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{\hspace{0.05em}}\text{\hspace{0.05em}}+\mathrm{log}\left(1-{a}_{1}^{{g}_{n}\left(z\right)}\right)P\left(O,{x}_{n-1}=1,{x}_{n}=0|{\lambda}^{\prime},z\right)).\end{array}$

This term has to be maximized under the constraints $0<{a}_{0},{a}_{1}<1$ . This optimization problem is solved numerically by BFGS algorithm [17] . We generate 100 sequences of states $\left({X}_{n}\right)$ of length 3000 with the two-state model in Section 4. The observations are simulated according to the Gaussian distribution. Table 3 shows the means and standard deviations of the estimators of ${a}_{0}$ and ${a}_{1}$ from 100 replicates, the parameters of $\left({\mu}_{0}\mathrm{,}{\sigma}_{0}^{2}\right)$ and $\left({\mu}_{1}\mathrm{,}{\sigma}_{1}^{2}\right)$ are respectively (13, 16) and (5, 4). The rate of correctly reconstructing the hidden states is in average 93.32%, which means approximately 2800/3000 hidden states are correctly detected. Figure 4 displays the goodness-of-fit between the true and the estimated distributions.

At the Cyber Security Center, we conducted the simulated attacks and the students were playing a role as the agents in the defense team. There are 67 valid sets of data collected. The values of the collected outcomes, time of reaction, are shown in Figure 5. As mentioned, four complexity levels of the attacks are observed. The mental states deduced from the observations are represented by the circles and the stars. The stars represent the low mental level, and the circles represent the high mental level. Figure 6 shows the Gaussian distributions with the estimated parameters. The short reaction time, corresponding to the high mental level, is more concentrated than the reaction time at low level of abstraction hierarchy. In this experiment, roughly speaking, the reaction time of a person at high mental state is usually within three hours. The average reaction time at high mental state of the person is 1.7 hour.

Table 3. Descriptive statistics for the estimators of a_{0} and a_{1} from 100 samples. The sample length is 3000.

Figure 4. Fit of the estimation for the simulated observations.

Figure 5. An example of the reaction time from 67 observations, and the implied hierarchy states from these observations (circles and stars). Higher states are presented by the circles and lower states are presented by the stars.

Figure 6. The distributions of two states estimated from the observation.

6. Two-State Renewal Model

The spending time in a given state is investigated. We propose to model the variation of mental levels of the agent over time by a piecewise-constant continuous-time process
${\left({X}_{t}\right)}_{t\ge 0}$ with two states. Similarly to the Hidden Markov chain based model, we consider the mental level of an agent to be either high or low at a time. We thus have the state given by
$E=\left\{-1,1\right\}$ , where −1 stands for the low mental level, while the high level is denoted by 1. For any
$t\ge 0$ , X_{t} taking its value on E models the mental level of the agent. Indeed, as shown in Figure 7, at each time one may consider that an agent is either in low mental leval or high mental level.

The process $\left({X}_{t}\right)$ changes its location at random times, called jump times. Let $\left({T}_{k}\right)$ denote the sequence of the jump times of $\left({X}_{t}\right)$ . For a renewal process, one also considers the inter-jumping times $\left({S}_{k}\right)$ , for any $k\ge 1$ , ${S}_{k}={T}_{k}-{T}_{k-1}$ . The first inter-jumping times ${S}_{1}$ is usually unknown since the limit of the observable time. The sequence $\left({Y}_{k}\right)$ of location of $\left({X}_{t}\right)$ is also taken into account

${Y}_{k}={X}_{t}\mathrm{,}\text{\hspace{0.05em}}\text{\hspace{1em}}\text{for}\text{\hspace{0.17em}}{T}_{k}\le t<{T}_{k+1}\mathrm{.}$

The sequence
$\left({Y}_{k}\right)$ is assumed to be a Markov chain on
$\left(E\mathrm{,}\mathcal{B}\left(E\right)\right)$ . As the above construction, the discrete-time process
$\left({Y}_{k}\mathrm{,}{S}_{k}\right)$ contains all the information of
$\left({X}_{t}\right)$ . In our particular case, the behavior of the process
$\left({X}_{t}\right)$ also depends on the complexity of the arrived attacks z_{t}. The step function z_{t} presents the priority of the attack detected at time t, z_{t} is non-negative. The values of z_{t} is deterministic for all t. For
$k\ge 1$ and for
$t\ge 0$ , the conditional distribution of the
${S}_{k}$ ’s satisfies

$\begin{array}{l}P\left({S}_{k+1}>t|{Y}_{k},\cdots ,{Y}_{0},{S}_{k},\cdots ,{S}_{1},{z}_{t}\right)\\ =P\left({S}_{k+1}>t|{Y}_{k},{z}_{t}\right)=\mathrm{exp}\left(-{\displaystyle \underset{0}{\overset{t}{\int}}\stackrel{\xaf}{\lambda}\left({Y}_{k},s,{z}_{t}\right)\text{d}s}\right).\end{array}$

The function $\stackrel{\xaf}{\lambda}$ is called the conditional jump rate of the process $\left({X}_{t}\right)$ . The integral of $\stackrel{\xaf}{\lambda}$ which is the cumulative jump rate is also considered,

$\forall \left(y\mathrm{,}t\mathrm{,}{z}_{t}\right)\in E\times {\mathbb{R}}_{+}\times {\mathbb{Z}}_{+}\mathrm{,}\Lambda \left(y\mathrm{,}t\mathrm{,}{z}_{t}\right)={\displaystyle \underset{0}{\overset{t}{\int}}\stackrel{\xaf}{\lambda}\left(y\mathrm{,}s\mathrm{,}{z}_{t}\right)\text{d}s}\mathrm{.}$

The value of ${z}_{t}$ plays a role in the moment of jump of $\left({X}_{t}\right)$ . Intuitively, if

Figure 7. Example of trajectory of the two-state renewal process for modeling mental state level.

${Y}_{k}$ is at low level, the complex cyber attack will probably prolong the inter- jumping time. In contrast, if ${Y}_{k}$ is at high level, the inter-jumping time will be more likely shortened. With this argument, we propose the following form of the cumulative jump rate

$\Lambda \left(y\mathrm{,}t\mathrm{,}{z}_{t}\right)={\left(1+{z}_{t}\right)}^{y}{\displaystyle \underset{0}{\overset{t}{\int}}\lambda \left(y\mathrm{,}s\right)\text{d}s}\mathrm{.}$

Since the prior information about the behavior of the agent at a given state is unknown and it depends on the particular individual, a parametric model could not be chosen. Therefore, the nonparametric estimation of the cumulative jump rate is studied instead. In the sequence, the number of observed jumps is denoted by m. The estimator of the cumulative jump rate is proposed by the Nelson-Aalen estimator [18] [19]

${\stackrel{^}{\Lambda}}_{m}\left(y,t,{z}_{t}\right)={\displaystyle \underset{k=1}{\overset{m}{\sum}}}\text{\hspace{0.05em}}\text{\hspace{0.05em}}{R}_{m}\left(y,{S}_{k+1}\right){1}_{\left\{{Y}_{k}=y\right\}}{1}_{\left\{{S}_{k+1}\le t\right\}},$

where ${1}_{A}$ is indicator function, and ${R}_{m}\left(y\mathrm{,}t\right)$ is defined as follow

${R}_{m}\left(y,t\right)=(\begin{array}{ll}\frac{1}{{L}_{m}\left(y,t\right)}\hfill & \text{if}\text{\hspace{0.17em}}{L}_{m}\left(y,t\right)>0\hfill \\ 0\hfill & \text{otherwise},\hfill \end{array}$

where ${L}_{m}\left(y\mathrm{,}t\right)$ counts how many times ${S}_{k+1}$ ’s are not less than t under state ${Y}_{k}=y$ ,

${L}_{m}\left(y,t\right)={\displaystyle \underset{k=1}{\overset{m}{\sum}}}\text{\hspace{0.05em}}\text{\hspace{0.05em}}\text{\hspace{0.05em}}{1}_{\left\{{Y}_{k}=y\right\}}{1}_{\left\{{S}_{k+1}\ge t\right\}}.$

The first inter-jumping time ${S}_{1}$ is usually omitted since it is unknown. Moreover, when the process $\left({X}_{t}\right)$ is hidden, only the approximation $\left({\stackrel{^}{S}}_{k}\right)$ of $\left({S}_{k}\right)$ is able to obtained. We do not compute the Nelson-Aalen estimator ${\stackrel{^}{\Lambda}}_{m}\left(y\mathrm{,}t\mathrm{,}{z}_{t}\right)$ but an approximation of this estimator ${\stackrel{\u02dc}{\Lambda}}_{m}\left(y\mathrm{,}t\mathrm{,}{z}_{t}\right)$ from $\left({\stackrel{^}{S}}_{k}\right)$ , see for details [20] .

Moreover, the conditional survival functions H associated with $\Lambda $ can also be estimated from this approximate cumulative jump rate. These functions take values between 0 and 1, whereas the range of values taken by ${\stackrel{\u02dc}{\Lambda}}_{m}$ depends on m, this is called the Fleming-Harrington estimator ( [21] ) of H. For any $y\in E$ , $t\ge 0$ , it is given by

${\stackrel{\u02dc}{H}}_{m}\left(y,t,{z}_{t}\right)=\mathrm{exp}\left(-{\stackrel{\u02dc}{\Lambda}}_{m}\left(y,t,{z}_{t}\right)\right).$

7. Estimation Procedure

In practice, the process
$\left({X}_{t}\right)$ cannot be observed directly. Assuming that the observable process is
$\left({G}_{t}\right)$ , and the behavior of these signals depends on the process
$\left({X}_{t}\right)$ . Indeed, the values of G_{t} should be small when
${X}_{t}$ is high, and large when
${X}_{t}$ is low. The values of process
$\left({G}_{t}\right)$ are collected in a fixed time interval
$\left[\mathrm{0,}T\right]$ . For a particular agent, the values of
$\left({G}_{t}\right)$ are in an interval
$\left[a\mathrm{,}d\right]\in {\mathbb{R}}_{+}$ . For a finite set of
$\left\{{t}_{i}|i\in \stackrel{\xaf}{\mathrm{1:}N}\right\}$ in
$\left[\mathrm{0,}T\right]$ , let
${V}_{i}={G}_{{t}_{i}}$ be a random variable with the corresponding continuous probability density function f. The number of modes, called
$N\left(f\right)$ , of f is unknown. However, this
$N\left(f\right)$ can be ‘guessed’ by using the Silverman test [22] . Intuitive speaking, the frequency of the signal G_{t} around the value x can be represented by
$f\left(x\right)$ . In order to have a clear relation between f and G_{t}, the following assumptions are proposed

Assumptions 7.1

1. There exists a pair $\left(b\mathrm{,}c\right)$ , with $a<b<c<d$ , such that, $\forall t\in \left[0,T\right]$ , ${X}_{t}=1$ then ${G}_{t}<b$ , and ${X}_{t}=-1$ then ${G}_{t}>c$ .

2. $N\left(f\right)\le 2$ , $f$ has no flat part and has at most one anti-mode (at $\theta $ if $N\left(f\right)=2$ ).

The first assumption expresses natural behavior that the smaller values according the threshold b of G_{t} always reflect the high mental level of the agent, and vice versa the signals G_{t} greater than c reflect the low mental level of the agent. This assumption separates out the values of G_{t} that we know almost surely the mental level. When the signals are between b and c, the mental state of the agent can be either high or low. Note that b and c can arbitrarily close to each other. The second one particularly means that the density function f has either one mode or two modes. Function f has one mode means that the state of the agent is most likely unchanged, except the signals outside
$\left[b\mathrm{,}c\right]$ . Two modes occur, statistically, when the agent has been in both states during the observa- tion.

In the case that $N\left(f\right)=1$ , for instance $mode\left(f\right)>\frac{b+c}{2}$ , we set b as a thre-

shold to determine the hidden states and approximate inter-jumping times
${S}_{k}$ ’s. The instants G_{t} crosses this threshold will lead to the approximation of
${S}_{k}$ ’s.

The same argument is applied as $mode\left(f\right)\le \frac{b+c}{2}$ .

For $x\in \mathbb{R}$ , the kernel density estimator ${f}_{N}\left(x\right)$ of $f\left(x\right)$ is

${f}_{N}\left(x\right)=\frac{1}{N{h}_{N}}{\displaystyle \underset{i=1}{\overset{N}{\sum}}}\text{\hspace{0.05em}}\text{\hspace{0.05em}}K\left(\frac{x-G\left({t}_{i}\right)}{{h}_{N}}\right),$

where K is the Gaussian kernel, $K\left(t\right)=\frac{1}{\sqrt{2\text{\pi}}}\mathrm{exp}\left(-\frac{1}{2}{t}^{2}\right)$ for $t\in \mathbb{R}$ , and ${h}_{N}$

is the positive real bandwidth. Using the method in [22] , we choose ${h}_{N}={h}_{crit}$ which is defined as

${h}_{crit}=min\left\{h\mathrm{:}{f}_{N}\text{\hspace{0.17em}}\text{has}\text{\hspace{0.17em}}\text{at}\text{\hspace{0.17em}}\text{most}\text{\hspace{0.17em}}N\left(f\right)\text{\hspace{0.17em}}\text{modes}\right\}\mathrm{.}$ (10)

Assuming that, in case $N\left(f\right)=2$ , ${f}_{N}$ has a unique anti-mode located at ${\theta}_{N}$ . In order to properly estimate the density f with $N\left(f\right)$ modes, we also need the following assumptions (for details see [23] ).

Assumptions 7.2

1) f is uniformly continuous on $\mathbb{R}$ .

2) $f\in {\mathcal{C}}^{2}\left(\left]a\mathrm{,}d\right[\right)$ .

3) ${\mathrm{lim}}_{t\downarrow a}{f}^{\left(1\right)}\left(t\right)>0$ and ${\mathrm{lim}}_{t\uparrow d}{f}^{\left(1\right)}\left(t\right)<0$ .

Under the assumptions and the chosen ${h}_{N}$ as (10), the convergence of ${\theta}_{N}$ toward $\theta $ is ensured. When N is large enough, it is able to construct ${\theta}_{N}$ from the signal $\left({G}_{t}\right)$ . The estimator ${\theta}_{N}$ of $\theta $ will be taken as a threshold, and the moment ${G}_{t}$ crosses it or b or c will be used to construct an approximation of ${S}_{k}$ ’s. We define the sets ${I}^{-}\left(x\right)$ and ${I}^{+}\left(x\right)$ , in which ${I}^{-}\left(x\right)$ is the subset of $\left\{{t}_{i}|i\in \stackrel{\xaf}{\mathrm{1:}N}\right\}$ such that ${G}_{{t}_{i}}\le x$ for all ${t}_{i}\in {I}^{-}\left(x\right)$ , ${I}^{-}\left(x\right)=\left\{{t}_{i}|{G}_{{t}_{i}}\le x\right\}$ and ${I}^{+}\left(x\right)=\left\{{t}_{i}\right\}\backslash {I}^{-}\left(x\right)$ . It is noted that

${I}^{-}\left(b\right)\subset {I}^{-}\left({\theta}_{N}\right)\subset {I}^{-}\left(c\right)$

${I}^{+}\left(b\right)\supset {I}^{+}\left({\theta}_{N}\right)\supset {I}^{+}\left(c\right)\mathrm{.}$

For later use, we also define set $D\left(t\right)=\left\{{t}_{i}|{t}_{i}\le t\right\}$ . The procedure for the approximation of ${S}_{k}$ ’s is described in two cases, single mode density and two modes density. For the presenting purpose, we define three temporary sequences $\left({{Y}^{\prime}}_{k}\right)\mathrm{,}\left({{S}^{\prime}}_{k}\right)$ and $\left({{T}^{\prime}}_{k}\right)$ with k is an integer.

Single Mode Density Algorithm

Without the loss of generality, assuming that $mode\left(f\right)>\frac{b+c}{2}$ , then the

chosen threshold is b. Depending on the first observed signal ${G}_{{t}_{1}}$ , we label the state of ${{Y}^{\prime}}_{0}$ . If ${G}_{{t}_{1}}<b$ , ${{Y}^{\prime}}_{0}$ is set to equal to 1. Otherwise, ${{Y}^{\prime}}_{0}$ equals to −1. Then the observation time set $\left\{{t}_{i}\right\}$ is updated. The new times set ${\left\{{t}_{i}\right\}}_{\text{new}}$ $={\left\{{t}_{i}\right\}}_{\text{old}}\backslash D\left({t}_{1}\right)$ , this procedure of updating $\left\{{t}_{i}\right\}$ is in order to update the sets ${I}^{\pm}\left(x\right)$ . Let us assume ${G}_{{t}_{1}}<b$ , and ${{Y}^{\prime}}_{0}$ is set to equal to 1. The procedure to construct $\left({{Y}^{\prime}}_{k}\right)$ , $\left({{S}^{\prime}}_{k}\right)$ and $\left({{T}^{\prime}}_{k}\right)$ is described as follow.

Set ${{T}^{\prime}}_{0}={t}_{1}$ and ${{T}^{\prime}}_{1}=\mathrm{min}{I}^{+}\left(b\right)$ , the temporary inter-jump is approximated by ${{S}^{\prime}}_{1}={{T}^{\prime}}_{1}-{{T}^{\prime}}_{0}$ , then we update the set $\left\{{t}_{i}\right\}$ with $D\left({{T}^{\prime}}_{1}\right)$ and label the state of ${{Y}^{\prime}}_{1}=-1$ . At the second loop, ${{T}^{\prime}}_{2}=\mathrm{min}{I}^{-}\left(b\right)$ , the second temporary inter-jump ${{S}^{\prime}}_{2}={{T}^{\prime}}_{2}-{{T}^{\prime}}_{1}$ , we update again the set $\left\{{t}_{i}\right\}$ with $D\left({{T}^{\prime}}_{2}\right)$ and label the state of ${{Y}^{\prime}}_{2}=1$ . The procedure repeats until the update of set $\left\{{t}_{i}\right\}$ is empty. In case ${G}_{{t}_{1}}\ge b$ , and ${{Y}^{\prime}}_{0}$ equals to −1, the procedure is similar. The approximation of the inter-jumping times $\left({\stackrel{^}{S}}_{k}\right)$ is then $\left({{S}^{\prime}}_{k}\right)$ , and the deduced hidden states $\left({\stackrel{^}{Y}}_{k}\right)$ is $\left({{Y}^{\prime}}_{k}\right)$ .

Two Modes Density Algorithm

When the kernel density has two modes, three interesting thresholds are $b\mathrm{,}{\theta}_{N}$ and $c$ . The procedure to construct the sequences $\left({{Y}^{\prime}}_{k}\right)$ , $\left({{S}^{\prime}}_{k}\right)$ and $\left({{T}^{\prime}}_{k}\right)$ are described, with ${{T}^{\prime}}_{0}={t}_{1}$ , as follow

Step 1. Compare $min{I}^{-}\left(b\right)$ and $min{I}^{+}\left(c\right)$

if $min{I}^{-}\left(b\right)\le min{I}^{+}\left(c\right)$

set ${{Y}^{\prime}}_{0}=1$ , the high state

update the set $\left\{{t}_{i}\right\}$ with $D\left(t=\mathrm{min}{I}^{-}\left(b\right)\right)$

set ${{T}^{\prime}}_{1}=\mathrm{min}{I}^{+}\left({\theta}_{N}\right)$

else $\left(\mathrm{min}{I}^{-}\left(b\right)>\mathrm{min}{I}^{+}\left(c\right)\right)$

set ${{Y}^{\prime}}_{0}=-1$ , the low state

update the set $\left\{{t}_{i}\right\}$ with $D\left(t=\mathrm{min}{I}^{+}\left(c\right)\right)$

set ${{T}^{\prime}}_{1}=\mathrm{min}{I}^{-}\left({\theta}_{N}\right)$

Step 2. set ${{S}^{\prime}}_{1}={{T}^{\prime}}_{1}-{{T}^{\prime}}_{0}$ ,

Step 3. update the set $\left\{{t}_{i}\right\}$ with $D\left({{T}^{\prime}}_{1}\right)$ ; repeat again from Step 1.

The loop stops when either ${I}^{+}\left(c\right)$ or ${I}^{-}\left(b\right)$ is empty. The loop stops at iteration ${K}^{\prime}$ , if ${I}^{-}\left(b\right)$ is not empty, then the state of ${{Y}^{\prime}}_{{K}^{\prime}}=1$ . Otherwise, if ${I}^{+}\left(c\right)$ is not empty, the state is then ${{Y}^{\prime}}_{{K}^{\prime}}=-1$ . In case ${I}^{+}\left(c\right)$ and ${I}^{-}\left(b\right)$ are empty but the set $\left\{{t}_{i}\right\}$ is not empty, the last state is set as the previous state ${{Y}^{\prime}}_{{K}^{\prime}}={{Y}^{\prime}}_{{K}^{\prime}-1}$ . And ${{S}^{\prime}}_{{K}^{\prime}+1}=\mathrm{max}\left\{{t}_{i}\right\}-{{T}^{\prime}}_{{K}^{\prime}}$ . Finally, to obtain the approximation $\left({\stackrel{^}{Y}}_{k}\right)$ and $\left({\stackrel{^}{S}}_{k}\right)$ , we merge the values under the same state of $\left({{Y}^{\prime}}_{k}\right)$ and $\left({{S}^{\prime}}_{k}\right)$ . For example, we obtain the sequences $\left({{Y}^{\prime}}_{k}\right)=\left({y}_{0}=1,1,-1,-1,-1,1,{y}_{6}=1\right)$ ,

$\left({{S}^{\prime}}_{k}\right)=\left({s}_{1},{s}_{2},{s}_{3},{s}_{4},{s}_{5},{s}_{6},{s}_{7}\right)$ , then $\left({\stackrel{^}{Y}}_{k}\right)=\left({\stackrel{^}{Y}}_{0}=1,{\stackrel{^}{Y}}_{1}=-1,{\stackrel{^}{Y}}_{2}=1\right)$ and

$\left({\stackrel{^}{S}}_{k}\right)=\left({\stackrel{^}{S}}_{1}={s}_{1}+{s}_{2},{\stackrel{^}{S}}_{2}={s}_{3}+{s}_{4}+{s}_{5},{\stackrel{^}{S}}_{3}={s}_{6}+{s}_{7}\right)$ .

With the parametric model described in Section 4, we generated $N=800$ observations of the signal G, the observed times are ${t}_{i}$ on the interval $\left[\mathrm{0,}T\right]$ ,

${t}_{i}=\frac{iT}{N}$ . The threshold is computed from our procedure (Figure 8). From these

simulated data, we give ${\stackrel{\u02dc}{H}}_{m}\left(y\mathrm{,}t\mathrm{,}{z}_{t}\right)$ for $y\in \left\{\text{High},\text{Low}\right\}$ in Figure 9 with $T=24$ hours. In this simulation, there are 615/800 moments that the values are at high state. Psychological speaking, the agent is in the high mental state most of the pseudo-observed time. Statistically, the solid red line presents the ‘survival’ time in high mental state, and the dash line presents the ‘survival’ time in low mental state. Due to the technical issues, we have not collected the observed times during the simulation of the cyber attacks. However, these promising results from the simulated observations show the potential application in determining the mental state of an agent. This helps us understand the mental characteristic of each agent based on the behavior of his or her survival functions estimated for a long period of time.

The simulations, estimations, and figures presented in the paper are implemented using R language [24] .

8. Concluding Remarks

The cyber security relating to the human behavior and specifically the cognitive

Figure 8. An example of signal G_{t} with the corresponding threshold computed from our procedure.

Figure 9. Fleming-Harrington estimates of the survival functions with respect to t for the abstraction hierarchy states.

aspects were explored. The perception of the cyber threats perceived by the agents was described by the Work Domain Analysis. The relationship between the abstraction hierarchy levels of a cyber threat and mental picture states of a human user is equivalent. We also explained the important role of the mental picture level of an agent to the security of system during the cyber attacks.

A non-stationary hidden Markov model was applied to the detection of the mental states of the agent. A parametric two-state model was proposed to simulate the variation of the mental states under the stress of the cyber attacks. The estimation algorithm for the parameters was developed based on the EM algorithm. The reconstruction of the hidden mental states is developed from the maximum posterior marginal method. We also studied the model and the estimation method on simulations as well as the observations from real-world data sets. The spending time in a given state was also investigated. The estimation based on a nonparametric framework was developed. We anticipate that this approach could have a significant contribution to understand mental characteristics of the agents dealing with the cyber threats.

References

[1] Pfleeger, S.L. and Caputo, D.D. (2012) Leveraging Behavioral Science to Mitigate Cyber Security Risk. Computers & Security, 31, 597-611.

[2] Klein, G.A. and Calderwood, R. (1991) Decision Models: Some Lessons from the Field. IEEE Transactions on Systems, Man and Cybernetics, 21, 1018-1026.

[3] Rasmussen, J. (1985) The Role of Hierarchical Knowledge Representation in Decision Making and System Management. IEEE Transactions on Systems, Man and Cybernetics, SMC-15, 234-243.

[4] Meineri, S. and Morineau, T. (2014) How the Psychological Theory of Action Identification Can Offer New Advances for Research in Cognitive Engineering. Theoretical Issues in Ergonomics Science, 15, 451-463.

https://doi.org/10.1080/1463922X.2013.815286

[5] Morineau, T. (2011) Turing Machine Task Analysis: A Method for Modelling Affordances in the Design Process. International Journal of Design Engineering, 4, 58-70.

https://doi.org/10.1504/IJDE.2011.041409

[6] Morineau, T., Frenod, E., Blanche, C. and Tobin, L. (2009) Turing Machine as an Ecological Model for Task Analysis. Theoretical Issues in Ergonomics Science, 10, 511-529.

https://doi.org/10.1080/14639220802368849

[7] Vicente, K.J. (1999) Cognitive Work Analysis: Toward Safe, Productive, and Healthy Computer-Based Work. CRC Press, Boca Raton.

[8] Naikar, N., Hopcroft, R. and Moylan, A. (2005) Work Domain Analysis: Theoretical Concepts and Methodology. Tech. Rep., DTIC Document.

[9] Posner, M.I. (1980) Orienting of Attention. Quarterly Journal of Experimental Psychology, 32, 3-25.

https://doi.org/10.1080/00335558008248231

[10] Fjortoft, R., Delignon, Y., Pieczynski, W., Sigelle, M. and Tupin, F. (2003) Unsupervised Classification of Radar Images Using Hidden Markov Chains and Hidden Markov Random Fields. IEEE Transactions on Geoscience and Remote Sensing, 41, 675-686.

[11] Geman, S. and Geman, D. (1984) Stochastic Relaxation, Gibbs Distributions, and the Bayesian Restoration of Images. IEEE Transactions on Pattern Analysis and Machine Intelligence, PAMI-6, 721-741.

[12] Rabiner, L.R. (1989) A Tutorial on Hidden Markov Models and Selected Applications in Speech Recognition. Proceedings of the IEEE, 77, 257-286.

https://doi.org/10.1109/5.18626

[13] Rabiner, L.R. and Juang, B.H. (1986) An Introduction to Hidden Markov Models. IEEE ASSP Magazine, 3, 4-16.

https://doi.org/10.1109/MASSP.1986.1165342

[14] Devijver, P.A. (1988) Champs aléatoires de pickard et modélisation d’images digitales. Traitement du Signal, 5, 131-150.

[15] Bilmes, J.A., et al. (1998) A Gentle Tutorial of the em Algorithm and Its Application to Parameter Estimation for Gaussian Mixture and Hidden Markov Models.

[16] Dempster, A.P., Laird, N.M. and Rubin, D.B. (1977) Maximum Likelihood from Incomplete Data via the em Algorithm. Journal of the Royal Statistical Society. Series B (Methodological), 39, 1-38.

[17] Nocedal, J. and Wright, S. (2006) Numerical Optimization. Springer Science & Business Media, Berlin, Heidelberg.

[18] Andersen, P.K., Borgan, O., Gill, R.D. and Keiding, N. (2012) Statistical Models Based on Counting Processes. Springer Science & Business Media, Berlin, Heidelberg.

[19] Azas, R., Dufour, F., Gégout-Petit, A., et al. (2013) Nonparametric Estimation of the Jump Rate for Non-Homogeneous Marked Renewal Processes. In: Annales de l’Institut Henri Poincaré, Probabilités et Statistiques, 49, 1204-1231.

https://doi.org/10.1214/12-AIHP503

[20] Azais, R., Coudret, R. and Durrieu, G. (2014) A Hidden Renewal Model for Monitoring Aquatic Systems Biosensors. Environmetrics, 25, 189-199.

https://doi.org/10.1002/env.2272

[21] Fleming, T.R. and Harrington, D.P. (1984) Nonparametric Estimation of the Survival Distribution in Censored Data. Communications in Statistics—Theory and Methods, 13, 2469-2486.

https://doi.org/10.1080/03610928408828837

[22] Silverman, B.W. (1981) Using Kernel Density Estimates to Investigate Multimodality. Journal of the Royal Statistical Society. Series B (Methodological), 43, 97-99.

[23] Coudret, R., Durrieu, G. and Saracco, J. (2015) Comparison of Kernel Density Estimators with Assumption on Number of Modes. Communications in Statistics— Simulation and Computation, 44, 196-216.

https://doi.org/10.1080/03610918.2013.770530

[24] R Core Team (2015) R: A Language and Environment for Statistical Computing. R Foundation for Statistical Computing, Vienna, Austria.

https://www.R-project.org