In this section, we describe the detailed design of BTP, including the task identification and prediction of task-size category. Besides, the main variables used in the paper are shown in Table 1.
Task identification
In our design, task identification consists of the following three steps:
1. Attribute exploration: We first explore a set of flow, community, and application level attributes, which might be useful in task identification. For flow-level attributes, we consider flow’s starting time, average packet size, variance of packet size, average arrival interval of flows. Besides, the data center can be separated into multiple service groups. Given the fact that intra-group communication is frequent while inter-group communication is rare, any two flows belonging to the same community are likely to be inside one task. Finally, we can take advantage of application designs to help task identification. For example, the port assignment rule in Spark reveals that the data transmitted to the same executor typically have the same destination IP and port.
2. Distance calculation: Given the attributes, BTP can calculate distances between flows to capture task relationships. For flows in the same task, they have smaller distances. The key problem is that we need a good metric to effectively reflect the importance of each attribute and task relationships.
In our design, we leverage distance metric learning [35] to obtain the relationship between any two flows. If the distance between two flows is smaller than the given threshold, then they belong to the same task. Otherwise, they belong to the different tasks. The distance between two flows fi and fj can be calculated as
$$d(f_{i}, f_{j}) =||f_{i}- f_{j}||_{A}=sqrt{(f_{i}-f_{j})^{T}A(f_{i}- f_{j})},$$
(1)
where A is the distance matrix and reflects weights of different attributes.
We desire a metric where any pairs of flows in a task have small distances while any pairs of flows belonging to different tasks have distances larger than a given threshold. Here, we simplify the problem by restricting A to be diagonal and adopt Newton-Raphson method [35] solve it.
3. Identifying tasks via clustering: According to the features of aforementioned attributes and distance measurement learning, BTP master leverages the clustering algorithm R-DBSCAN [34] to identify tasks. In fact, R-DBSCAN is a variant of DBSCAN [36] and has several advantages listed below. First, although the number of tasks changes dynamically during the transmission process and cannot be determined, DBSCAN can automatically divide clusters within the ε domain without specifying the number of tasks in advance. Second, the tasks in the data center network may be composed of one single flow and R-DBSCAN can extract such tasks. Finally, R-DBSCAN uses the core objects in the cluster as an index to reduce the running time and time complexity.
Next, we introduce how the clustering algorithm works. It first scans the data set to obtain the parent object set and the child object set while establishing the relationship between the parent and child objects. Here, the parent object set refers to the set of flows that have the same traffic characteristics while the child object set is defined as the set of flows that are not classified. Then the parent object set is regarded as the data set and the master calls the DBSCAN to obtain the cluster of the parent object. Finally, based on the relationship between the parent and child objects, the master can obtain all the child objects in the identified parent object and merges them into a cluster, which is the task.
The prediction mechanism of task-size category
When the task is obtained through the task identification, we can get the task width. When the task is in transit, both its length and size are collected and put them into the data set. According to these information, we use bayesian decision to predict the task-size category.
Table 2 shows the typical traffic characteristics of tasks in the data center network. These characteristics are extracted from the Hive/MapReduce [37] traces, which are collected from the Facebook’s data center with 3,000 machines and 150 ToR switches [38]. As shown in Table 2, although the proportion of short-narrow tasks is larger than other task types, the proportion of short-narrow tasks’ bytes is very small (only 0.01%) and we can ignore the data of short-narrow tasks. Moreover, since the proportions of long-narrow tasks (16%), long-wide tasks(15%) and short-wide tasks(17%) are very similar, the classification problem is balanced. Finally, since there are 526 tasks in Facebook’s data set, we can obtain the amount of data per group according to the measured result in the Table 1. More specifically, the numbers of short-narrow tasks, long-narrow tasks, long-wide tasks and short-wide tasks are 273, 84, 79, 90, respectively. In our design, the classification problem is binary-class. That is, both the task width and task size are 2-classes.
From these traffic characteristics, we observe that the number of bytes of narrow tasks only accounts for 0.68% of total byte amount, but the number of narrow tasks accounts for 68%. However, although the number of wide tasks only accounts for 32%, its byte amount accounts for 99.32%. Therefore, we think that there is a correlation between task width and size. That is, the smaller width of task is, the greater possibility that it is a small task. On the contrary, the larger width of the task, the greater possibility that it is a large task.
In order to verify our conjecture, we further analyze the above traces and try to find the maximum threshold wmax of task width, so that when the task width is smaller than wmax, it is a small task with high probability. If the task size is less than the threshold Ks of small task, then it is a small task. Let Fs(x),Fw(x),Fl(x) denote the cumulative distribution functions of task size, task width and task length, respectively. Therefore, given the threshold Ks, we can calculate the maximum threshold wmax of the task width as
$$begin{aligned} & min ; w_{max} \ & subject ; to ; frac{F_{s,w}(K_{s},w_{max})}{F_{w}(w_{max})}geq alpha, end{aligned}$$
(2)
Where Fs,w(x) is the joint distribution function of task size and task width. That is, when the width of a task is less than wmax, it is the probability that the task is small task. If this probability is greater than α (i.e., 85%), then the minimum value of wmax is the expected threshold value.
Furthermore, we analyze the traces from Facebook’s production data center. Figure 2 shows the distribution of task width. We observe that 80% of small tasks have a task width less than 15. Therefore, when the task width is less than 15, it is a narrow task.
Similar to the literature [20], Ks and wmax are set to 100 MB and 50, respectively. Note that the task width of 50 is also the threshold value for distinguishing the narrow tasks from the wide tasks. Figure 3 (a) and (b) show the distributions of task size and length, respectively.
Figure 3 (a) verifies our conjecture that narrow tasks are more likely to be small tasks while wide tasks are more likely to be large tasks. In Fig. 3 (b), we observe that there are about 80% of narrow tasks whose task lengths are less than 5MB. Therefore, if the width of the task is small, then its task length is most likely to be small. Since the task length is the size of the longest flow in the task, the task size will not exceed the product of task width and length. Therefore, the smaller the task width, the greater the probability that the task is a small task. The above is a qualitative analysis of the relationship between task width and task length. Next, we present the quantitative analysis of the relationship between task width and length.
Since small tasks are preferentially scheduled during the transmission process, we need to identify small tasks from these wide tasks. We guess that this kind of wide task is a small task potentially caused by the following reason: the task width just exceeds the threshold, but the task length is extremely small. As a result, the number of task bytes is small. In order to verify our guess, we further analyze the data and present the results.
Figure 4 (a) and (b) are the cumulative distribution functions of the task width of small tasks (task size < 100MB) and short tasks (task length < 5MB).
Therefore, for the wide task, we want to find the threshold Δ of task width. That is, when the task width is greater than K, it is a small task with high probability. Then we can calculate the task width threshold Δ as
$$begin{aligned} & min ; Delta \ & subject ; to ; frac{P(SW)}{P(W)}geq beta, end{aligned}$$
(3)
where P(W) represents the probability that the task width in the trace is within the range of [K,Δ] while P(SW) is the probability that the task width is within the range of [K,Δ] and it is a small task. When β and the threshold of the small task are respectively set to 85% and 100MB, the threshold of task width Δ is 60. As shown in Fig. 5, when the width of a wide task is 50 ∼ 60, the probability that it is a small task is 85%.
Finally, we predict the category of task size via bayesian decision, which consists of three stages. The first stage is the preparation stage, including feature attribute selection and division. The second stage is the training stage, which trains the sample data and calculates the frequency of each category as well as the probability of each category under different conditions. The third stage is the prediction and evaluation stage that predicts the task-size category and evaluates the accuracy as well as error rate.
Feature attribute selection and division: We explore the three feature attributes: task width w, task length L and task size S. However, Fig. 3 (b) shows that there is a correlation between task length and task width. That is, the smaller the task width, the smaller the task length. Therefore, we ignore the task length attribute and only consider the task width and task size. We assume that it is a small task when S is 0 while it is a large task as the value of S is 1. As shown in Fig. 5, we divide the task width into three set intervals W={w≤50,50<w<60,w≥60}.
Training data: We use Facebook’s real traces to test. These traces are collected from one-hour Hive/MapReduce data warehouse in Facebook’s data center with 3,000 machines and 150 ToR switches. There are 526 tasks and about 7×105 flows in these traces (i.e., data set). In this paper, the data set is divided into training samples and test samples at the ratio of (frac {2}{5}) and (frac {3}{5}), respectively. Then we calculate the small task probability P(S=0) and the large task probability P(S=1) according to the sample data. Finally, we calculate the conditional probability P(wi|S=0) and P(wi|S=1) of task widths with different task-size categories, where wi∈W.
The Prediction and evaluation of task-size category: Given the task width, we can calculate the probability of the task-size category (i.e., P(S=0|wi) and P(S=1|wi)). In our design, if P(S=0|wi) is larger than P(S=1|wi), then it is a small task. Otherwise, it is a large task. Therefore, P(S=0|wi) and P(S=1|wi) can be expressed as
$$P(S=0|w_{i}) =frac{P(w_{i}|S=0)times P(S=0)}{P(w_{i})},$$
(4)
$$P(S=1|w_{i}) =frac{P(w_{i}|S=1)times P(S=1)}{P(w_{i})},$$
(5)
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.
Disclaimer:
This article is autogenerated using RSS feeds and has not been created or edited by OA JF.
Click here for Source link (https://www.springeropen.com/)