The methodology described in the “Methodology” section is applied to two different scenarios: subsets of 25 and 100 households representing a small and a medium-sized residential area, respectively. Before discussing the results, we describe the parameter selection for our proposed method for the two scenarios.

Parameter selection

The selection of parameters is described and demonstrated for the small residential area consisting of 25 households. The medium-sized residential area consisting of 100 households is derived analogously.

Only two parameters need to be selected—p, the number of dimensions used for the fingerprint (step 4 in Fig. 1), and k, the number of neighbors considered during matching (steps 5 and 6). p is varied between 5 and 50, and k is varied between 1 and 15. Figure 2 illustrates the median accuracy (Y axis) over all weeks of all households with respect to the different values of p and k.

Fig. 2
figure 2

Impact of the parameters p and k on the median accuracy for the small scenario with 25 households

As can be seen, the dependency of the accuracy on p (X axis) is much more pronounced than on k (pluses for the training set and crosses for the test set, respectively) for any particular value of p. For the sake of visibility, only (k=1,5,9,13) are depicted as single points. The fact that all four points are very close to each other for all values of p shows that the dependency on k is weak. Thus, (k=1) neighbor is chosen for sake of simplicity.

However, the effect of p is significant: Fig. 2 shows that (p=5) dimensions are too small since the accuracies of both, the training and the test set are small. This indicates that not enough of the available information is used to distinguish different households. For (10 le p le 20), the accuracies of both, the training set (dash-dotted, light grey line) and test set (solid, dark grey line), increase. For larger values of p, the training accuracy stays high, but the test accuracy drops compared to the training set. This indicates that the features learned during training are mostly specific to the households of the training set and do not generalize to the households of the test set.

Thus, a value of (p=25) is used for the small scenario with 25 households. Analogously, a value of (p=20) is used for the medium-sized scenario with 100 households as can be seen from Fig. 3. The value of (k=1) is used for both scenarios.

Fig. 3
figure 3

Impact of the parameters p and k on the median accuracy for the medium-sized scenario with 100 households

Matching performance

In this section, the matching performance achieved with the parameters selected in the previous section is assessed. Figure 4 depicts the per-household matching accuracy for the training set (light grey) and the test set (dark grey) for both scenarios (25 and 100 households per set, respectively). The black dots illustrate the individual per-household matching accuracy.

Fig. 4
figure 4

Matching performance (accuracy) for the training dataset (light grey) and the test dataset (dark grey) with 25 (left) and 100 households (right) per set, respectively

The overall accuracy within the test set (dark grey) is surprisingly high, considering the simplicity of the approach and the difficulty of the corresponding classification problem with 25 and 100 classes, respectively. For reference, guessing the correct household (class) randomly is expected to yield an accuracy of (textit{acc}_{h,rand}=1/n Test), i.e., (4%) and (1%) for a 25 and a 100-household-sized set, respectively. This reference (guessing) accuracy is depicted as thick dashed lines in Fig. 4.

Compared to random guessing, the median accuracy of the proposed methodology is between roughly 16 and 35 times higher on average for the small and medium-sized residential areas, respectively. Note that the difficulty of the problem increases with the number of households. This explains why the accuracy is lower for the case of a medium-sized residential area compared to the small residential area. Yet, the performance of the proposed methodology is significantly better in the medium-sized case relative to guessing.

The black dots in Fig. 4 depict the matching accuracies of the individual households. For some households, the accuracy is nearly 100(%) which implies that the corresponding household can be identified based on an arbitrary single week of a year. This is surprising as one would expect the seasonal differences to have a significant impact on the consumption patterns throughout the weeks of a year.

The subsequent privacy implication is that some households exist which can be identified very easily from a single, arbitrary week’s worth of energy consumption with an approach that uses off-the-shelf algorithms. A number of other households cannot be detected well, i.e., they have a low matching accuracy. However, the matching accuracy for these households is still much better than guessing.

Extreme households

The question arises what makes the identification of a household easier or harder, i.e., why the matching accuracy is relatively high or relatively low, respectively. As a first attempt to answer this question, a preliminary descriptive analysis is provided.

Based on the matching results, the most extreme households, i.e., those with the highest and the lowest matching accuracy, are visualized. The consumption data of a whole year of a household is illustrated as a heatmap. The X axis denotes the days of the year from left to right, the Y axis denotes the time of day from top to bottom in intervals of 15 min. The color of each 15-min interval depicts the associated energy consumption in kWh. Dark (purple) represents 0 kWh and bright (yellow) represents 1.4 kWh.

Figure 5 shows the energy consumption for the household with the highest matching accuracy within the test set of the small residential area. One can see that the consumption is quite regular, i.e., the consumption barely changes between weeks of the periods from April to November, and December to March, respectively. Note that the apparent 1-h time shifts in March and October are mostly likely due to daylight saving time.

Fig. 5
figure 5

One year of consumption data of the household with the highest matching accuracy from the 25-household test dataset

The regular rectangular areas might be from a pool pump as proposed in Burkhart et al. (2018). While this pattern is not the same throughout the whole year, it is comparatively regular over periods of multiple weeks. This suffices as the proposed methodology only needs to find one of the few similar weeks. The identifiability seems to be related to periodic behavior due to the dominance of Fourier and Wavelet features but requires further investigation in future work.

Figure 6 shows the household with lowest matching accuracy within the test set of the small residential area. While its consumption is comparatively regular over the year, it does not show any remarkable features which appear over multiple consecutive weeks. Thus, with the proposed methodology, any given week of this household shares more similarities with weeks from other households than it does with weeks from the same household.

Fig. 6
figure 6

One year of consumption data of the household with the lowest matching accuracy from the 25-household test dataset

The extreme households of the medium-sized residential area exhibit similar characteristics to those of the small residential area described above. For the sake of completeness, the corresponding heatmaps are visualized in Figs. 7 and 8.

Fig. 7
figure 7

One year of consumption data of the household with the highest matching accuracy from the 100-household test dataset

Fig. 8
figure 8

One year of consumption data of the household with the lowest matching accuracy from the 100-household test dataset

Note that this analysis is a first attempt of an explanation. Future analyses might offer further insight into the relevant household-specific characteristics which impact matching accuracy.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

Disclaimer:

This article is autogenerated using RSS feeds and has not been created or edited by OA JF.

Click here for Source link (https://www.springeropen.com/)

Loading