# A network business availability modeling method based on Markov chain – EURASIP Journal on Wireless Communications and Networking

Aug 19, 2022

### Characteristics of MTNs

MTNs are mainly used to enhance combat forces’ battlefield situational awareness ability, the cooperation ability of each mobile unit, and the sustainability and timeliness of command and control [21]. As networks that do not rely on pre-existing communication infrastructure, MTNs are the basis of battlefield information sharing and present some distinctive features that deserve to be noted [22]. MTNs are team collaboration of a large number of mobile nodes with self-organization and self-repair abilities. Some key characteristics of these networks are multi-hop, center-less, energy-limited, the need for supporting multi-media real-time traffic, and low delay access to distributed resources [23, 24]. The communication link between any pair of nodes is considered to be dynamic. Due to the burstiness and uncertainty of tactical tasks, mobile nodes arranged in different areas need to quickly move to the designated area within the limited time and organize according to the specific task requirements. The topology of MTNs is flexible and reconfigurable since they can cater for a variety of application scenarios. In addition, MTNs have different requirements for delay, traffic volume, packet loss rate, etc., according to the different specific tasks. For example, in some tactical scenarios, low delay is required for situational awareness data transmission, beyond which the communication network is judged to be a failure; In other scenarios, even if the network is free-flow, the volume of traffic does not meet the requirements, the network is still considered a failure.

### Modeling and analysis for Markov steady-state availability of MTNs

The following definitions and elaborations of reliability and availability are introduced to help better understand the model.

A complex system’s reliability can be defined as its capacity to run normally from the beginning of the operation to a certain moment. By contrast, its availability refers to the possibility of normal operation for it may experience multiple failures and repairs during this period. A failure and a repair of a system are regarded as a life cycle. Reliability is the performance evaluation of the system in a life cycle, and availability is in multiple life cycles. To simplify the analysis, we make the following assumptions:

(1) MTN consists of n repairable nodes; (2) The node life X and the node repair time Y obey the exponential distributions with parameters λ and μ, respectively. X and Y are independent of each other; (3) A failed node after repair is considered new, and the node loss is ignored.

The network is operational when k or more nodes are active [27]. When n-k + 1 nodes are out of order, the network fails. At this moment, k-1 normal nodes in the network cease working as well, and no failure occurs until one node is repaired. The network will re-enter the running state when k nodes work normally.

The state of the network is defined by the number of failed nodes. X(t) = j means that at moment t, j nodes of the network have failed and are waiting to be repaired.

It can be proved that {X(t), t ≥ 0} is a time-continuous multi-state homogeneous Markov chain [26], and the state transition probability in (Delta t) is as follows:

$$left{ {begin{array}{*{20}l} {P_{j, j + 1} left( {Delta t} right) = left( {n – j} right)lambda Delta t + Oleft( {Delta t} right), j = 0, 1, ldots , n – k} hfill \ {P_{j, j – 1} left( {Delta t} right) = jmu Delta t + Oleft( {Delta t} right), j = 1, 2, ldots , n – k + 1} hfill \ {P_{jj} left( {Delta t} right) = – left( {n – j} right)lambda Delta t – jmu Delta t + Oleft( {Delta t} right), j = 1, 2, ldots , n – k} hfill \ {P_{n – k – 1, n – k + 1} left( {Delta t} right) = – left( {n – k + 1} right)mu } hfill \ {P_{j, k} left( {Delta t} right) = Oleft( {Delta t} right),;{text{others}};{text{and}};j ne k} hfill \ end{array} } right.$$

(1)

The state transition probability diagram of the network can be described as (not shown with each state transition to itself):

Then the transition probability matrix A is described as follows:

$$left[ {begin{array}{*{20}l} { – nlambda } hfill & {nlambda } hfill & {} hfill & {} hfill & {} hfill & {} hfill \ mu hfill & { – left( {n – 1} right)lambda – mu } hfill & {left( {n – 1} right)lambda } hfill & {} hfill & {} hfill & {} hfill \ {} hfill & {2mu } hfill & { – left( {n – 2} right)lambda – 2mu } hfill & {left( {n – 2} right)lambda } hfill & {} hfill & {} hfill \ {} hfill & {} hfill & {} hfill & { cdots cdots } hfill & {} hfill & {} hfill \ {} hfill & {} hfill & {} hfill & {left( {n – k} right)mu } hfill & { – klambda – left( {n – k} right)mu } hfill & {klambda } hfill \ {} hfill & {} hfill & {} hfill & {} hfill & {left( {n – k + 1} right)mu } hfill & { – left( {n – k + 1} right)mu } hfill \ end{array} } right]$$

(2)

(P_{j} left( t right)) represents the probability that the network is in the j-th state at moment t. The row vector ({mathbf{P}}left( {text{t}} right) = left( {P_{0} left( t right), P_{1} left( t right), ldots , P_{n – k + 1} left( t right)} right)) represents the distribution probability of the network in each state at moment t. The following relationship can be established as

$$left{ {begin{array}{*{20}l} {{mathbf{P^{prime}}}left( t right) = {mathbf{P}}left( t right){mathbf{A}}} hfill \ {{text{Initial}};{text{condition}};{text{P}}left( 0 right)} hfill \ end{array} } right.$$

(3)

The solution is:

$${mathbf{P}}left( {text{t}} right) = {mathbf{P}}left( 0 right)e^{{{varvec{A}}t}} = {mathbf{P}}left( 0 right)mathop sum limits_{{{text{n}} = 0}}^{infty } frac{{t^{n} }}{n!}{mathbf{A}}^{{text{n}}}$$

(4)

According to the properties of the homogeneous Markov process, we have

$$mathop {lim }limits_{t to infty } P_{j}^{^{prime}} left( t right) = 0, mathop {lim }limits_{t to infty } P_{j}^{^{prime}} = pi_{j}$$

(5)

The following results can be obtained by combining (3):

$${mathbf{pi A}} = 0,{ }sum pi_{j} = 1$$

(6)

where ({varvec{pi}}) is that state probability distribution vector in a steady state,({{varvec{uppi}}} = left( {pi_{0} , pi_{1} , ldots , pi_{n – k + 1} } right)).

Substituting (2) into (6) gives rise to:

$$left{ {begin{array}{*{20}l} { – nlambda pi_{0} + mu pi_{1} = 0} hfill \ {nlambda pi_{0} – left( {left( {n – 1} right)lambda + mu } right)pi_{1} + 2mu pi_{2} = 0} hfill \ {left( {n – 1} right)lambda pi_{1} – left( {left( {n – 2} right)lambda + 2mu } right)pi_{2} + 3mu pi_{3} = 0} hfill \ { cdots cdots } hfill \ {left( {n – j} right)lambda pi_{j} – left( {left( {n – left( {j + 1} right)} right)lambda + left( {j + 1} right)mu } right)pi_{j + 1} + left( {j + 2} right)mu pi_{j + 2} = 0} hfill \ { cdots cdots } hfill \ {klambda pi_{n – k} – left( {n – k + 1} right)mu pi_{n – k + 1} = 0} hfill \ {sum pi_{j} = 1} hfill \ end{array} } right.$$

(7)

then (pi_{j}) can be obtained by solving this linear system of equations, and the steady-state availability could be described as:

$$A = sum pi_{i} , i = 0, 1, 2, ldots , n – k$$

(8)

where A means the sum of probabilities when various network devices are in a steady state of normal operation. When the running time of the network tends to be infinite, its availability state will tend to be steady to some extent.

### Business availability based on CSMA/CA protocol

As a prevailing random-access algorithm for wireless LANs, Carrier Sense Multiple Access with Collision Avoidance (CSMA/CA) is used to get avoided of data transmission conflicts among nodes as much as possible. It is frequently used in MTNs. CSMA/CA employs the Binary Exponential Backoff Algorithm. A node must wait for a certain time interval when it wants to send a data frame. Besides, a random backoff time is calculated so that data transmission conflicts can be avoided when trying to access the channel again. In this paper, delay performance is taken as an example to model and analyze the business availability of the network.

It is common for multiple nodes to compete for the channel in wireless networks. The analysis in this paper is based on the following assumptions:

1. (1)

n nodes are competing for the channel to transmit data;

2. (2)

There is no hidden node and the channel is ideal;

3. (3)

The transmission queue for each node is always non-empty.

The Markov model is established according to the characteristics of individual nodes in the network. This model can deduce the probability p of a node to have a conflict in a data transmission within a random time interval. Moreover, it can also derive the steady-state probability (τ) of a node transmitting a data frame in a random time interval. In the Binary Exponential Backoff Algorithm used in CSMA/CA, the backoff time is chosen uniformly in the range (0, w—1), where w represents the size of contention window whose size depends on the number of retransmissions of the data frame. W0 is the minimum size of backoff window. (W_{m} = 2^{m} W_{0}), where m is equal to the number of retransmissions, m  [0, M], and M is the maximum backoff order.

s(t) is the stochastic process of the node’s backoff order, while b(t) is the stochastic process of the node’s backoff time counter at moment t. Each data frame attempts to be sent without regards to its retransmission count. The data frames conflict at a constant and independent probability p. The discrete-time Markov chains of the two-dimensional stochastic process {s(t), b(t)} are modeled as:

the non-empty one-step transition probability is:

$$left{ {begin{array}{*{20}l} {P{ i, k | i, k + 1} = 1, k in left( {0,W_{i} – 2} right), i in left( {0, m} right)} hfill \ {P{ 0, k | i, 0} = left( {1 – p} right)/W_{0} , k in left( {0, W_{i} – 1} right), i in left( {0, m} right)} hfill \ {P{ i, k | i – 1, 0} = p/W_{i} , k in left( {0,W_{i} – 1} right), i in left( {1, m} right)} hfill \ {P{ M, k | M, 0} = p/W_{M} , k in left( {0, W_{M} – 2} right)} hfill \ end{array} } right.$$

(9)

The above equation explains the following in order:

1. (1)

At the beginning of each time interval, the backoff time decreases;

2. (2)

After a data frame is successfully transmitted, the backoff order of a new data frame starts from 0;

3. (3)

The backoff order increases if a data frame fails to transmit at a backoff order of i-1;

4. (4)

The backoff order will not increase when the backoff order reaches the maximum M.

All nodes transmit data frames only when the backoff counter is 0. Then p and (tau) could be calculated as follows

$$p = 1 – left( {1 – tau } right)^{n – 1}$$

(10)

$$tau = frac{2}{{1 + W_{0} + pW_{0} mathop sum nolimits_{i = 0}^{m – 1} left( {2p} right)^{i} }}$$

(11)

where (left( {1 – tau } right)^{n – 1}) is the probability that none of the other n −  1 nodes transmits a data frame.

Then the probability (P_{S}) that a data frame is successfully transmitted; the probability (P_{N}) that a data frame is not transmitted; and the probability (P_{C}) that a data frame is transmitted but conflicts occur could be described as follows:

$$P_{S} = ntau left( {1 – tau } right)^{n – 1}$$

(12)

$$P_{N} = left( {1 – tau } right)^{n}$$

(13)

$$P_{C} = 1 – P_{S} – P_{N}$$

(14)

The average size of the backoff window is an important part of delay analysis. In the backoff stage j, the size of backoff window obeys a uniform distribution in the range (left[ {0,W_{j} – 1} right]). This indicates that in addition to the selection of the minimum and maximum size of backoff window, the size of the average backoff window W is also related to the conflict probability p. That is, the average size of backoff window is related to both the size of the backoff window for the successful transmission phase and the size of the window that has been backed off before the successful transmission phase.

The average size of backoff window for each round could be calculated as:

$$W_{{{text{ave}}}} = mathop sum limits_{i = 0}^{M} frac{{W_{i} – 1}}{2}p^{i} frac{{W_{i} + 1}}{2}left( {1 – p} right)tau$$

(15)

The following equations can be used to calculate the average busy time (Ts) of the channel when the transmission is successful and the average busy time (Tc) of the channel when the transmission is in conflict:

$$T_{{text{s}}} = D_{{{text{IFS}}}} + H + E_{{text{D}}} + delta + S_{{{text{IFS}}}} + A_{{{text{CK}}}}$$

(16)

$$T_{{text{C}}} = D_{{{text{IFS}}}} + H + E_{{text{D}}} + delta$$

(17)

where H is the total length of the physical and MAC layer header fields; (D_{{{text{IFS}}}}) is the distributed inter-frame spacing; (S_{{{text{IFS}}}}) is the short inter-frame space; (delta) is the transmission delay; and (E_{{text{D}}}) is the average data frame transmission delay.

The average transmission delay (left( {E_{{text{D}}} } right)) could be calculated as:

$$E_{{text{D}}} = left( {E_{{text{B}}} + T_{{text{S}}} } right) + E_{{text{C}}} left( {E_{{text{B}}} + T_{{text{C}}} + T_{0} } right)$$

(18)

where EB is the average backoff time of data frames; EC is the average number of conflicts that occur for successful transmission of data frames.

The delay probability distribution (f_{D} left( t right)) is obtained from the average data frame transmission delay (ED) and the corresponding probability of occurrence.

Then, the compliance rate of specific indicators PD can be obtained based on the indicator threshold DB required by the battlefield mission, i. e., the probability that the delay value does not exceed the threshold:

$$P_{{text{D}}} = mathop smallint limits_{0}^{{D_{{text{B}}} }} f_{{text{D}}} left( t right){text{d}}t$$

(19)

According to the definition of the compliance rate, PD can also be expressed as the ratio of the number of data frames (left( {Q_{{left( {E_{{text{D}}} le D_{{text{B}}} } right)}} } right)) whose average data frame transmission delay (ED) is less than or equal to the threshold (DB) to the total number of data frames (Q_{{{text{all}}}}):

$$P_{D} = frac{{Q_{{left( {E_{D} le D_{B} } right)}} }}{{Q_{all} }}$$

(20)

There is a possibility that the network is in an unavailable state, but the performance meets the criteria when the number of failed nodes in the network does not exceed a certain value. Therefore, we propose a business availability model that is defined as

$$A_{{text{p}}} = P_{{text{D}}} *A.$$

(21)

That is, the steady-state availability is multiplied by the compliance rate of the metrics to evaluate the business availability of the network.