The basic idea of the proposed Ano-Det method is introduced as follows: we firstly convert the health data of students into lightweight health indexes and stored them in the cloud platform; next, we calculate the similarity between each pair of the health conditions of students based on the health indexes; finally, we cluster the students based on their health indexes and discover the possible anomalies based on the clustering results. The concrete details of Ano-Det method is described as follows.

### Step 1: Generate each student’s health index

As indicated in the example in Fig. 1, the students’ health data monitored by wearable sensors are often expressed with a curve which fluctuates with time. Therefore, we first model the students’ health data with a multi-dimensional matrix (kappa) depicted in Eqs. (1)-(2). Here, we assume that there are *N* students, i.e., (s_{1}),…, (s_{N}) and *M* health criteria (e.g., heart rate, blood pressure, etc), i.e., (c_{1}),…, (c_{M}). Moreover, each entry in matrix (kappa), i.e., (A_{i,j}) (*i* = 1, 2,…, *N*; *j* = 1, 2,…, *M*) represents the student (s_{i})’s health data over criterion (c_{j}). Furthermore, as described in Fig. 1, each entry (A_{i,j}) is a time-aware fluctuant curve; therefore, we formulate (A_{i,j}) with a vector in Eq. (2) where *K* denotes the number of time points at which wearable sensors monitor and record the health conditions of students. For example, *K* = 3 means that three pieces of health data are monitored by wearable sensors. From certain points of view, parameter *K* describes the health data monitoring frequency.

$$begin{aligned} kappa = begin{array}{cc} &{} c_{1}quad cdots quad c_{M}\ begin{array}{c} s_{1} \ vdots \ s_{N} end{array} &{} left[ begin{array}{ccc} A_{1, 1} &{} cdots &{} A_{1, M} \ vdots &{} ddots &{} vdots \ A_{N, 1} &{} cdots &{} A_{N, M} end{array}right] end{array} end{aligned}$$

(1)

$$begin{aligned} A_{i, j} = (a_{i, j, 1},ldots , a_{i, j, K}) end{aligned}$$

(2)

As Eqs. (1)-(2) shows, (kappa) is an (N*M*K) tensor. To ease the following calculations, we need to convert the (N*M*K) tensor (kappa) into a multi-dimensional vector. To achieve this goal, we first convert the *K*-dimensional vector (A_{i,j}) into a concrete value. Concretely, we first produce a *K*-dimensional vector *B* presented in Eq. (3). Here, each entry in vector *B* is generated by Eq. (4) where function (varGamma (-1, 1)) is responsible for producing a random data belonging to [-1, 1]. Thus, with the *K*-dimensional vector (A_{i,j}) and the *K*-dimensional vector *B*, we compute their inner product according to Eq. (5) and the final result is denoted by (varOmega _{i, j}).

$$begin{aligned} B=(b_{1},ldots , b_{K}) end{aligned}$$

(3)

$$begin{aligned} b_{k} = varGamma (-1, 1)(k=1, 2,ldots , K) end{aligned}$$

(4)

$$begin{aligned} varOmega _{i, j} = A_{i, j}*B end{aligned}$$

(5)

According to Eq. (5), (varOmega _{i, j}) is a concrete value belonging to ((-inf, +inf)). Next, to ease the following calculations, we convert the real-value (varOmega _{i, j}) into a Boolean-value (varPsi _{i, j}), which is formulated by Eq. (6). In Eq. (6), (varPsi _{i, j}) value is mapped to be 1 or 0, whose rationale is explained as follows: let us consider a data point *D* and a hyperplane *H*; if point *D* is above the hyperplane *H*, then the (varPsi _{i, j}) value corresponding to *D* is equal to 1; otherwise, if point *D* is below the hyperplane *H*, then the (varPsi _{i, j}) value corresponding to *D* is equal to 0. This way, we can use such a kind of position relationship between point *D* and hyperplane *H* to evaluate whether two points are close or not. This is the theoretical basis behind the hash mapping operation adopted in Eq. (6).

This way, we convert the *K*-dimensional vector (A_{i,j}) in Eq. (2) into a Boolean-value (varPsi _{i, j}). Correspondingly, the (N*M*K) tensor (kappa) in Eq. (1) can be simplified to be the (N*M) matrix (kappa) in Eq. (7). Next, we continue to simplify the (N*M) matrix (kappa) into an *N*-dimensional vector, which could be finished by the transformation in Eq. (8). Here, (pi _{i}) is the decimal value corresponding to the Boolean vector ((varPsi _{i, 1}),…, (varPsi _{i, M})). For example, if ((varPsi _{i, 1}),…, (varPsi _{i, M})) = (1, 1, 1), then (pi _{i}) = 7. This way, we successfully convert the (N*M) matrix (kappa) in Eq. (7) into an *N*-dimensional vector (kappa) in Eq. (8). In other words, each student (s_{i}) is corresponding to a concrete decimal value (pi _{i}). According to the index theory, decimal value (pi _{i}) can be considered as the health index of student (s_{i}).

$$begin{aligned} varPsi _{i, j}=left{ begin{array}{rcl} 1 &{} &{} text {when} varOmega _{i, j}>0\ 0 &{} &{} text {when} varOmega _{i, j}<0 end{array} right. end{aligned}$$

(6)

$$begin{aligned} kappa = begin{array}{cc} &{} c_{1}quad cdots quad c_{M}\ begin{array}{c} s_{1} \ vdots \ s_{N} end{array} &{} left[ begin{array}{ccc} varPsi _{1, 1} &{} cdots &{} varPsi _{1, M} \ vdots &{}ddots &{}vdots \ varPsi _{N, 1} &{} cdots &{} varPsi _{N, M} end{array}right] end{array} end{aligned}$$

(7)

$$begin{aligned} kappa = begin{array}{cc} begin{array}{c} s_{1} \ vdots \ s_{N} end{array} &{} left[ begin{array}{c} pi _{1} \ vdots \ pi _{N} end{array}right] end{array} end{aligned}$$

(8)

The advantages of health index here are three-fold: first, health index contains little privacy of students and hence can be transmitted or released to the cloud platform with less privacy risks, which can minimize the privacy disclosure concerns of people when a cloud platform integrates the distributed data of people together for uniform data processing and mining; second, health index-based similar student retrieval is rather quick; third, health index-based similar student retrieval results are rather close to the similar student retrieval results based on original health data that are sensitive to students. Therefore, we use the health indexes of students to take part in the subsequent distance calculation (Step 2) and anomaly detection (Step 3). This way, we can guarantee that the distance calculation and anomaly detection process is time-efficient and privacy-guaranteed.

### Step 2: Calculate the similarity between each pair of students based on their health indexes

As discussed in Step 1, each student (s_{i}) is corresponding to a concrete decimal value (pi _{i}). Here, (pi _{i}) is obtained from the random vector *B* in Eq. (3) which bring additional uncertainty in creating the accurate health indexes of students. To minimize the uncertainty, *q* (*q* is an integer larger than 1) decimal values are necessary to be obtained for each student (s_{i}). In concrete, for each (s_{i}), we repeat the operations in Eqs. (3)-(8) *q* times to generate (pi _{i, 1}),…, (pi _{i, q}). After that, we get a new matrix (kappa) as specified in Eq. (9). According to Eq. (9), each student (s_{i}) is corresponding to a *q*-dimensional vector ((pi _{i, 1}),…, (pi _{i, q})). Then vector ((pi _{i, 1}),…, (pi _{i, q})) can be regarded as the health index of student (s_{i}).

$$begin{aligned} kappa = begin{array}{cc} begin{array}{c} s_{1} \ vdots \ s_{N} end{array} &{} left[ begin{array}{c} left( pi _{1, 1} cdots pi _{1, q} right) \ vdots \ left( pi _{N,1} cdots pi _{N, q} right) end{array}right] end{array} end{aligned}$$

(9)

With the health indexes of two students (s_{i}) and (s_{j}), i.e., ((pi _{i, 1}),…, (pi _{i, q})) and ((pi _{j, 1}),…, (pi _{j, q})), we can compute the similarity between (s_{i}) and (s_{j}) (denoted by (Sim(s_{i}, s_{j}))) based on the formula in Eqs. (10)-(11). Here, (Sim(s_{i}, s_{j})) represents the number of dimensions whose values of (s_{i}) and (s_{j}) are equal. For example, let us consider two students (s_{1}) and (s_{2}) whose health indexes are (1, 2, 3, 4, 5) and (1, 2, 3, 6, 7), respectively. Then their similarity (Sim(s_{1}, s_{2})) = 3 according to Eqs. (10)-(11). Furthermore, to loosen the judgement condition in Eq. (11), we create *p* (*p* is an integer larger than 1) hash tables, i.e., we generate (kappa _{1}) ,…, (kappa _{p}) by Eq. (9). Next, we update Eq. (11) to be Eq. (12) where the similarity judgement condition is loosened considerably.

$$begin{aligned} Sim(s_{i}, s_{j}) = sum limits _{z=1}^{q} Sim_{i, j, z} end{aligned}$$

(10)

$$begin{aligned} begin{aligned} Sim_{i, j, z} = 1, text {iff} pi _{i, z} = pi _{j, z}(z=1, 2,ldots , q) end{aligned} end{aligned}$$

(11)

$$begin{aligned} begin{aligned} Sim_{i, j, z} = 1, text {iff} pi _{i, z} = pi _{j, z}(z=1, 2,ldots , q) \ text {holds} text {in} text {any} kappa _{1},ldots , kappa _{p} end{aligned} end{aligned}$$

(12)

### Step 3: Student health condition clustering and anomaly detection

According to the similarity between different students calculated in Step 2, we can cluster the students into different groups. In general, the students whose similarity with each other is large belong to an identical group. For example, if two students whose similarity is *q*, then they would be put into an identical group. Here, for discovering the most similar students, we set a threshold (T (T le q)) for (Sim(s_{i}, s_{j})). More specifically, only the students (s_{i}) and (s_{j}) whose (Sim(s_{i}, s_{j})) is not smaller than *T* are deemed as similar. Following such a clustering rule, we can divide all the students into different groups. Furthermore, the students who have no similar students could be regarded as anomaly. This way, we can recognize the anomaly students accurately and meanwhile the sensitive information contained in health data transmitted to the cloud platform can be protected very well.

Next, we use the following algorithm to better ease the understanding of our Ano-Det method.

## Rights and permissions

**Open Access** This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

##### Disclaimer:

This article is autogenerated using RSS feeds and has not been created or edited by OA JF.

Click here for Source link (https://www.springeropen.com/)