# Low-rate Denial of Service attack detection method based on time-frequency characteristics – Journal of Cloud Computing

Aug 30, 2022

### Experimental settings

The hardware and software configurations of the experimental platform for model training and detection are as follows: hardware: Intel Core i9-12900F, 128GBRAM(DDR5), NVIDIA RTX3090; software: Ubuntu 18.04LTS, CUDA11.2, Pytorch1.8.

### Evaluation indicators

We design a fast detection method for LDoS attacks based on the TFD model with the aim of quickly discovering LDoS attack traffic from the traffic to be detected. As for the initiation phase of the attack and the type of the attack, they are not our focus, so the detection objective is finally converted into a binary classification problem. Normal traffic is defined as negative samples and attack traffic is defined as positive samples, and five metrics, Accuracy, Precision, Recall, False positive rate (FAR), and F1 value, are used to evaluate the performance of the TFD model, and these metrics are calculated as follows:

$$Accuracy=frac{TP+ TN}{TP+ TN+ FP+ FN}$$

(12)

$$Precision=frac{TP}{TP+ FP}$$

(13)

$$Rmathrm{e} call= TPR=frac{TP}{TP+ FN}$$

(14)

$$FPR=frac{FP}{FP+ TN}$$

(15)

$${F}_1=frac{2times P recisiontimes mathit{operatorname{Re}} call}{P recision+mathit{operatorname{Re}} call}$$

(16)

Where TP, TN, FP, and FN indicate the interrelationship between the true and predicted results, and the specific meanings can be referred to the confusion matrix in Table 4.

### TFD model training

The two reconstructors in the TFD model have the same input data, the training process is independent of each other, and both can output results independently, so the two reconstructors are trained separately. Figure 11 shows the loss function values when the two reconstructors are trained for 100 iterations on a training set of All-United traffic data.

It can be seen that the loss function values of the two reconstructors change slowly in the initial training phase, then suddenly and rapidly decrease, and stabilize after reaching a certain procedure. Relatively speaking, the time-domain-based reconstructor has a slight vibration in the loss function value at the initial training stage, but can decrease rapidly and enters a stable state first after about 50 iterations. The frequency domain-based reconstructor, on the other hand, has a stable but slow decaying loss function value at the beginning of training and stabilizes only after 70 iterations, and its steady-state loss value is smaller than that of the time-domain-based reconstructor. The reason why the time-domain-based reconstructor reaches the steady state first may be because the parameters of the model are less than those of the frequency-domain-based reconstructor model, which is easier to train and can complete the training faster, but oscillations may occur in the early training period.

In addition, because the number of samples in the dataset we use is relatively small, and the features of each sample are only a 16 × 2 2-dimensional matrix with few trainable parameters, the average time to complete one iteration of training on the training set for each attack traffic is no more than 7 seconds, which is converted to millisecond detection time for a single sample. In addition, since the detection target is a simple binary classification problem, the computation time to perform anomaly determination can be neglected, which means that our detection model can complete millisecond detection from data input to result output, and can be considered to be able to perform time-to-real detection of network traffic even with the addition of data collection and pre-processing time.

### Classifier threshold setting

The error in the steady state of the two reconstructors by using the validation set as the threshold for anomaly determination during testing, combined with their performance during training, was defined as the steady state after 70 iterations, and the average of the errors generated from 71 to 100 training sessions was calculated as the threshold. In addition, to eliminate the chance in the operation, the average of the thresholds calculated for five times was taken as the final threshold of this reconstructor, as shown in Table 5.

### Testing results

The testing experiment is divided into two phases, the first phase is to test each attack traffic individually to obtain the detection capability of the TFD model for each LDoS attack; the second phase is to test the All-United dataset consisting of multiple attack traffic to evaluate the detection capability of the model for complex attacks. A 5-fold cross-validation approach is used to calculate the mean value of the model’s detection results for each dataset as the model’s performance metric. The detection results for LDoS attacks alone are shown in Table 6.

As shown in Table 5, all the results are the average of five detection results, excluding the possible chance factors in computing. It can be seen that the proposed method achieves a recall rate of more than 95% for the detection of each attack traffic. Although the false alarm rate for the Slowhttptest attack exceeds 7%, several other attacks are kept at a low level with an overall false alarm rate of 3.88%, which is acceptable in the design of detection with security alerts as the goal. The best detection indicator for Pwnloris attacks, Pwnloris is actually an upgraded version of Slowloris with more obvious features.

Secondly, the detection of Hping attacks was also good, and the reason for this was analyzed, as the Hping program was originally used for flooding attacks of DDoS. In this experiment, for the effect of low-speed attacks, Hping is set to generate attacks only in the first 0.1 s interval per second, and the attack packet size is fixed, which makes the Hping attack traffic has a significant periodic change characteristic, and therefore is more easily detected.

The reason for the relatively poor detection of Slowhttptest attacks should be due to the Slowhttptest generation is a slow-read attack traffic, the time span itself is relatively large, and we choose data for the first 16 time steps in each 10-second stream segment features, resulting in the selection of some data just to deal with Slowhttptest attack traffic “silent period” so that no obvious data features, which adversely affects the detection.

The TFD model performs well in detecting individual attack traffic, but in order to study the ability of the proposed method to cope with complex attacks, it is also necessary to test the All-United data set containing multiple attack traffic, and after training the model using the data in the training set, the threshold value is derived using the validation set data, and the same 5-fold cross-validation method is used for the test set data, with the arithmetic of each result The average value is the final result, as shown in Fig. 12 where the average accuracy is 0.935 8, the average precision 0.936 3, the average recall 0.940 7, the average false alarm 0.059 2, and the average F1 value 0.938 4.

Figure 13 shows the recall rate of the TFD model for each LDoS attack type, and it can be seen that the recall rate of detecting one LDoS attack alone can reach more than 95%, and the detection rate of multiple attacks also reaches 94%. In addition, the recall here is calculated in terms of flow segments, and an attack will generate multiple flow segments, so the probability of this attack being detected will be close to 100%, and basically, there will be no problem of undetected attacks, so our proposed method is effective on the dataset we designed.

To verify the adaptability of our proposed method to heterogeneous network traffic data, it is tested respectively on five publicly available datasets, including NSL-KDD, DARPA2000, ISCX2016, CICDDoS2019, and UTSA2021, where:

NSL-KDD dataset is the most commonly used dataset in the field of network traffic anomaly detection research, including ping-of-death, syn flood, smurf and other resource-consuming attacks, NSL-KDD is a dataset generated based on the improvement of KDD-CUP-99, which removes the redundant data in the KDD-CUP-99 dataset and makes an appropriate selection of the ratio of normal and abnormal data, with a more reasonable distribution of the number of test and training data.

The DARPA2000 dataset [28] is a standard dataset in the field of network intrusion detection and is one of three separate datasets in the DARPA dataset. Unlike DARPA 1998 and DARPA 1999, the DARPA2000 dataset focuses on attack traffic for Windows NT and adds internal attack and internal eavesdropping data.

The ISCX2016 low-speed denial-of-service attack dataset [29] is a dataset generated in a simulation environment, where the developers obtained eight different application-layer DoS attack traffic by building web servers such as Apache Linux, PHP5, and Drupal v7, and mixed them with the normal traffic from the original ISCX-IDS dataset to form an LDoS attack traffic dataset.

The CICDDoS2019 dataset [25] is a real-world DDoS attack-like dataset. The latest DDoS attack procedures, including reflective attacks, are used to simulate the generation of attack traffic. There are 50,063,112 samples in the dataset, among which there are 50,006,249 DDoS attack samples and 56,863 normal samples, with very few normal samples, so the dataset is used with the choice of loading normal traffic from external sources or selecting only some attack samples.

The UTSA2021 dataset [30] is a set of normal and attack traffic at different rates generated using the DNS network testbed, mainly including multiple rates of TCP SYN flood attacks, HTTP slow read and slow acquisition attacks, and in this study, a subset of Syn50 with an attack peak of 50 r/s is used to participate in the validation.

Since we design the detection method for LDoS attacks at the transport and application layers, each data set needs to be processed to extract DoS traffic and normal traffic to form a new data set before conducting detection experiments, and then feature extraction is performed on the new data set to form the required data structure for our detection. For larger datasets, such as CICDDoS2019, only about 1% of partial attack samples are selected to improve computing efficiency.

Since the network configuration environment of each dataset has a large variability, the model TFD is trained using some normal samples from each dataset and the threshold value for anomaly determination is calculated before conducting the detection, and then the detection set containing both normal and anomalous samples is tested with the recall rate, accuracy rate and F1 value as the detection index, and the specific results are shown in Fig. 14, which shows that we Concerned about the attack recall rate, basically maintain above 90%, which reached more than 98% on NSL-KDD, DARPA2000, and 96% on CICDDoS2019, UTSA2021, while the recall rate of 91.7% on ISCX2016 is relatively low. To analyze the reason for this is that the attack traffic in the three datasets NSL-KDD, DARPA2000, and CICDDoS2019 are DoS or DDoS attacks, which have more obvious statistical characteristics than LDoS in terms of packet interval and packet size, and thus can be easily identified. In the UTSA2021 dataset, we chose the attack peak of 50 r/s subset Syn50, which is more challenging than detecting attack samples from a dataset with higher attack rates, and then the 96% recall is still a good result. The reason for the relatively low recall of ISCX2016 is probably due to the fact that the attack samples are generated in a different network environment than the normal samples, such that the TFD model trained using the normal samples was more rough and difficult to detect small changes in the statistical features.

The accuracy of the model on the NSL-KDD, DARPA2000, ISCX2016, and CICDDoS2019 datasets did not achieve as excellent results as the recall, and the analysis may be due to the fact that we chose relatively few feature vectors and simple model results, which required much less training parameters compared to large deep network models, using sufficient samples for training. The overfitting problem occurs. This makes the model more “demanding” in determining normal traffic, so that some normal samples are mistaken for attack samples.

The UTSA2021 dataset performs the most spectacularly, even surpassing the All-United dataset we designed. The reason for this is probably due to the network traffic collection environment. The normal and attack samples of the UTSA2021 dataset are generated and collected in the same network environment, so they have good isomorphism and can be trained to produce more “pure” classifiers. In contrast, the normal samples of the All-United dataset are collected from the real network environment, which are inevitably disturbed by external conditions and produce “impurities” during the collection process. The attack samples are collected from experimental platforms with similar topology and are less subject to external interference. The model trained with the “impurity” data is a “rough” model, which may ignore the small differences between the feature data and cause misjudgment of the sample type.

The TFD model shows strong adaptability on several datasets, achieving a recall of 91.7% even on ISCX2016. Since we conduct detection experiments with stream fragments, an attack stream is composed of multiple stream fragments, which makes the detection probability of the attack stream will be much higher than the detection probability of the stream fragments, therefore, the TFD detection model we designed can well meet the original design intention of targeting the discovery of attack samples.