With urbanization and the increasing frequency of natural disasters, it has become increasingly important to collect accurate spatiotemporal data efficiently on changes in landcover to enable (quasi) real-time monitoring of such changes and set up the opportunity for risk mitigation (El-Masri and Tipple, 2002; Wulder et al., 2008). Remote sensing has been used in a variety of ways to monitor changes in landcover, and the resolution of satellite imagery has steadily improved (Herold et al., 2003; Rogan and Chen, 2004). In addition to satellite data, other sources such as aerial imagery, unmanned aerial vehicle imagery, and point clouds are also being used for this purpose (Rau et al., 2015; Ahmed et al., 2017).

To monitor continuously changing landcover efficiently, there has recently been a shift from research methods focused on increasing classification accuracy to an automated research method that analyzes a large amount of remotely sensed data (DeFries and Chan, 2000). To this end, various machine learning, deep learning, and artificial intelligence (AI) algorithms are being utilized (Rogan et al., 2008; Karpatne et al., 2016; Kussul et al., 2017). In particular, deep learning algorithms based on convolutional neural networks (CNNs) have demonstrated higher performance than machine learning algorithms (Kussul et al., 2017; Guidici and Clark, 2017; Rußwurm and Körner, 2017).

Deep learning methods process large datasets by identifying features in the data at various levels, enabling high-speed analysis and enhanced functionality in data applications (Najafabadi et al., 2015). In the field of computer vision, detected objects of the same category (e.g., people and clothes) are divided into various shapes and patterns (Felzenszwalb et al., 2010). This has led to the emergence of a large number of datasets with object classification and detection fields, such as ImageNet (Lin et al., 2014). An important type of CNN algorithm involves semantic segmentation, which classifies images at the pixel level (Long et al., 2015) and enables high detection accuracy in the remote sensing field; examples of such models include fully convolutional networks, U-Net, ResNet, and DeeplabV3+ (Wang et al., 2020). In particular, for supervised machine learning, large high-quality datasets play an important role in the performance of CNN-based algorithms (Helber et al., 2019).

There are many different types of AI training datasets for the development of machine learning algorithms (Mohamadou et al., 2020). Pascal VOC is a collection of datasets for object detection and classification tasks and has been used to evaluate algorithm performance in various studies and competitions (Everingham et al., 2010; Everingham et al., 2015; Noh et al., 2015). Microsoft COCO (Lin et al., 2014) is a large-scale dataset of approximately 330,000 images that has been used for object detection and segmentation (Common Objects in Context, 2021). In addition, there are datasets such as Cityscape, featuring images captured from in-vehicle sensors used for autonomous driving (Cordts et al., 2016; Zhang et al., 2017; The City Scapes Dataset, 2021).

There are also many remote-sensing datasets, such as aerial image collections that classify means of transportation, such as cars and ships, into objects (Xia et al., 2018; Azimi et al., 2021), as well as datasets of thermal or infrared imagery for object detection and tracking (Bondi et al., 2020). An example of an AI dataset for landcover is the Landcover. ai website that provides orthophotos at 0.25 and 0.5 m per pixel resolution covering Poland (Boguszewski et al., 2021). Skyscapes is an aerial image dataset compiled by the German Aerospace Center that includes various categories, such as buildings and roads (Azimi et al., 2019).

More recently, there has been a focus on increasing the predictive accuracy of multi-resolution remote sensing data. This involves accurately constructing outer boundaries and categories of training data, which entails considerable time and effort (Luo et al., 2018). In addition, in fields that require (quasi) real-time analysis, such as landcover studies in areas undergoing rapid change due to natural disasters and human activities, fast construction of training data is the main goal, with precision as a secondary objective (Choi et al., 2017; Avilés-Cruz et al., 2019).

Therefore, in this study, we present an approach for constructing a landcover dataset that combines multi-resolution remote sensing annotation data using a combination of annotation techniques, verification processes to build precise datasets, and dataset analyses via algorithm application.

Figure 1 shows a flowchart of our research. First, aerial and satellite images at spatial resolutions of 0.51 and 10 m, respectively, were compiled into a high-quality multi-resolution dataset to create a precise landcover AI dataset. By processing the multi-resolution data simultaneously, we were able to construct a large-scale dataset with high precision.

FIGURE 1. Flow chart in this study.

Metadata in JavaScript Objection Notation (JSON) format was then constructed, in addition to the image dataset. Information on data for reference purposes was added to AI data (e.g., raw image type, spatial information format (vectors, shapefiles), and the widths and heights of data points).

Next, a separate inspection process was carried out to reduce errors in the constructed dataset. Refined data were checked for incomplete processing, and classification errors were checked with respect to the annotation results to minimize errors for better reliability.

Finally, the SegNet, U-Net, and DeeplabV3+ algorithms, commonly used in semantic segmentation and landcover applications, were applied to the dataset, and the results were analyzed.

Study Area and Construction Datasets for the AI Training Methodology

AI Training Datasets Status in Korea

The dataset constructed in this study is officially provided by AI Hub in Korea (AI Hub in Korea, 2021). The AI Hub is an AI-driven integration platform made available to the public to support the AI infrastructure required for the development of AI technology, products, and services. Training data provided by AI Hub include a total of eight major categories and 43 subcategories, including text data, such as laws and patents; Korea’s unique image data, such as Korean landmark images and Korean facial images; traffic-related image data, such as images of vehicles driving on roads and people walking on sidewalks; and human motion and disease diagnosis images. Landcover information is classified as environmental data in this platform, and includes 53,300 data points (AI Hub in Korea, 2021).

Study Area and Data Acquisition

The metropolitan area of Korea, including the capital city of Seoul, was selected to collect raw data for dataset construction (Figure 2). The study area included the Gyeong-gi-do region surrounding Seoul, adjacent to the coastline to the west and Gangwon-do to the east, with Seoul located at the center. It is a large area of 10,185 km2, which is about 10% of the country (Gyeonggi Province in Korea, 2021). The subdivided landcover map of the study area (scale of 1:5,000) consisted of 4,109 map sheets (EGIS, 2021).

FIGURE 2. Study area in this study.

Aerial and satellite imagery were used to construct datasets. Aerial images taken in 2018, produced and distributed by the National Geographical Information Service (NGII), were obtained. For satellite imagery, Sentinel-2 images provided by the European Space Agency were acquired to construct datasets with images captured in 2019–2020 without clouds or snow. The spatial resolutions of the aerial and Sentinel-2 imagery were 0.51 and 10 m, respectively (Table 1). Table 1 A, B, and C of the satellite image lists the indices indicating the grade of the image cloud, as follows: A is no cloud, B is an image with less than 10% cloud, and C is an image with less than 25% cloud.

TABLE 1. Original aerial and satellite image acquisition information.

Specifically, each band was converted into a GeoTIFF format using QGIS software, and the red, green, blue, and near-infrared (NIR) bands were layer-stacked. The image information used in this study is shown in Table 1. A total of 396 aerial images and four satellite images were acquired to construct datasets, using aerial and satellite images with fine and coarse annotations.

Training Datasets Annotation

The dataset annotation process involved selecting target areas corresponding to 393 map sheets, based on the digital map sheet at a 1:5,000 scale provided by the NGII. A three-step annotation process was performed of the selected target area. First, the object classes and metadata were designed. Second, the classification and format of annotations for each object item of an image were defined to construct the annotations. Third, training data were built based on the defined annotations. The classification items and annotation standards were derived from the subdivided landcover map and urban ecological map (Biotope Map) provided by the Ministry of Environment. Landcover classification studies were also referenced (Lee et al., 2020; Lee and Lee, 2020; Korea National Law Information Center, 2021).

For the classification items of aerial imagery, we selected and annotated eight categories (Table 2) commonly used in AI classification that included buildings, roads, and parking lots. Five categories were selected for the satellite imagery. Each object was assigned a specific code number, such as 10 for buildings and 20 for parking lots.

TABLE 2. Difference between classification catagories of aerial image and satellite image.

Image annotation was performed to demarcate the boundaries of objects to be classified. Fine annotation was conducted using QGIS open-source software for precise annotation. Coarse annotation was carried out in a similar way using QGIS.

Classification criteria for image annotation were based on the guidelines for the preparation of subdivided landcover maps in Korea (Korea National Law Information Center, 2021). Object items were classified as follows. Roads were required to have linear widths of 12 m or more. Areas of more than 100 m2 were classified as buildings; more than 500 m2 as parking lots, paddy fields, or bare land; and more than 2,500 m2 as forest. Even if an object did not meet the criteria, items that could be clearly distinguished in the image were classified. In addition, items that could not be identified due to shadows were excluded from classification, as well as those with unclear boundaries or properties. Because the satellite imagery had a lower spatial resolution than the aerial imagery, objects such as roads with a linear width of 36 m or more, buildings with an area of 10,000 m2, paddy fields/fields of 50,000 m2, and forests of 100,000 m2 or more were annotated (Table 2).

Based on the common annotation classification criteria for aerial and satellite imagery, buildings included apartment complexes and factories; greenhouses, buildings under construction, solar panels, or structures with green roofs were excluded. For the parking lot category, the annotation was carried out as described, with the exception of access roads and unpaved parking lots without parking lines. In the road category, intersections were annotated separately according to their direction. In the colonnade category, a colonnade of trees, lined up in parallel, was annotated, and cases that were not clearly distinguished from the surroundings were excluded. Paddy fields refer to rice cultivation areas; in the field category, orchards and greenhouse cultivation areas were excluded. For the forest category, non-forest items such as deforested areas and cemeteries in the forest were excluded from annotation. Finally, for the bare land category, as an artificially created area, mining areas were also excluded. All items that did not fall under the classified categories were treated as non-target sites.

Given the difference in resolution of aerial and satellite imagery, parking lots, colonnades (street trees), and bare lands were not classified from satellite imagery. Notably, the classification criteria, even for the same item, can differ for the two types of images. As for the building category of aerial imagery, all buildings that could be categorized were classified. For satellite imagery, apartment buildings were classified into complexes, and detached houses and multiplex housing were classified into blocks. Items in other categories were classified based on the same criteria. An example of the annotation results is shown in Figure 3.

FIGURE 3. Example of data annotation for each classes.

In this study, the construction of learning data was divided into fine and coarse constructions. Fine construction involves annotating the outer boundary of the classification object as precisely as possible (Figure 3). Coarse construction schematically annotates only representative characteristics of an object (Figure 3). The AI training datasets constructed in this study included both fine and coarse annotations.

All annotated data were constructed into images of 512 × 512 pixels, and original images were segmented by applying a 25% overlap rate. The dataset for AI training consisted of raw images, annotated images, and metadata in JSON format. The annotation datasets were saved using the tagged image file format (TIFF), and annotated images were constructed using an 8-bit grayscale format. The metadata in JSON format contained information about the data, such as the raw image names and the widths and heights of data points (Figure 4).

FIGURE 4. Example of constructed dataset.

Metadata were composed of three items: image information, annotation, and the data provider, with nine, three, and one sub-item/s, respectively (Table 3). Image information included information such as the image file name, length and width, the type of the original image, image resolution, and provider. For the coordinates, a sub-item, the coordinate system and upper-left coordinates were provided in the metadata; however, this information was not included in the TIFF format images related to the national security of South Korea. In addition, the captured time of original images was also provided in the image information. Annotation information included annotation identifiers, such as file names, annotation type, and file type. Information on the provider of the datasets was also specified in the metadata.

TABLE 3. Information contained in metadata in json format.

Training Datasets Verification

The quality of the constructed dataset was inspected to ensure high precision. Data quality inspection was carried out by dividing it into refined data inspection and annotation data inspection. First, refined data were assessed by considering the red, green, blue, and NIR bands of refined satellite data. If an error was identified, then the image was refined again and resolved through a second inspection of the refined product.

Annotation data were assessed as follows. Once annotation was complete, but before the image dataset was constructed into grayscale format, a first inspection was performed. The landcover dataset for AI training was inspected simultaneously. If any errors were found, an error report including the image file name, error type, and error location was prepared. Data with errors and the error report were sent back to training data personnel for correction. Thus, the objectivity and homogeneity of the dataset were secured through cross-validation.

Next, the errors were classified into three categories: unclassified, over-classified, and misclassified (Figure 5). An unclassified error refers to a case in which an object that should be annotated is not. Overclassification occurs when a non-target object is included in the annotation of the target object. Misclassification corresponds to an error in which the annotation class is set incorrectly. Figure 5 shows examples of the three error types. After the first inspection was complete, the image dataset was constructed as a 512 × 512 pixel image, and the metadata created in the form of name/value pairs were checked for missing data to confirm whether information was correctly written for each item. An error log was prepared and managed in the same way as the annotation data inspection. Metadata in which errors were found were also corrected.

FIGURE 5. Examples of Fine and Coarse annotation error types. The red line is the annotation status, and the yellow line is the annotation that needs to be added or modified.

As for the dataset completed up to the data inspection stage, a total of 49,700 aerial image datasets were finally constructed, along with 300 satellite image datasets. The detailed construction amount for each image is shown in Table 4. All data sets in Table 4 were used for the learning algorithm.

TABLE 4. The amount of data set finally constructed in this study.

Algorithm Application Results

Various algorithms related to semantic segmentation have been published until recently. SegNet and U-Net algorithms are representative semantic segmentation algorithms that were first announced in 2015 and 2017, respectively, and are still presented in various versions. In addition, the DeeplabV3+ algorithm is a recent network published in 2018 (Ronneberger et al., 2015; Badrinarayanan et al., 2017; Chen et al., 2018). In this study, machine learning and performance evaluation were conducted using SegNet, U-Net, and DeeplabV3+ algorithms. All of the three algorithms were developed based on the FCN algorithm. The three algorithms selected in this study have showed superior performance compared to other algorithms in the FCN series (Garcia-Garcia et al., 2017; Guo et al., 2018; Zhang J. et al., 2019; Asgari Taghanaki et al., 2021). Furthermore, the performance of all of the three algorithms has been verified through various studies in the field of remote sensing and computer vision (Lin et al., 2020; Weng et al., 2020). In this study, since learning data for land cover was built with multiple resolutions, the latest stable and reliable algorithms were used.

The training data were learned by splitting the composition of training and validation/testing data at a ratio of 8:2. The ratio referred to related previous studies; the ratio commonly used in previous studies including Helber et al., 2019 and Friedl et al., 2000 was 8:2 (Shirzadi et al., 2018; Chakraborty et al., 2021; Saha et al., 2021). Also, previous studies were referenced because there were practical limitations depending on the research environment and time in verifying them at various ratios.

The hyperparameters for machine learning algorithms were equally applied to aerial imagery, satellite imagery, and the three algorithms. In addition, machine learning was conducted in an environment equipped with 11 GB of GPU memory or more. The machine was trained for 800 training epochs, using a batch size of 10, considering the hardware performance of the computer (Table 5). The learning rate was set to 1 × 10–6, and the remaining parameters were set as the default values of the individual algorithms.

TABLE 5. Hyperparameter value for train algorithms.

The aerial dataset results are shown in Table 6. U-Net had the highest overall accuracy, of about 77.8%, followed by DeeplabV3+ at 76.3% and SegNet at about 71.5%. However, there was a difference between algorithms in terms of the categories showing the highest accuracy. U-Net showed the highest accuracy for buildings, roads, and paddy fields, whereas SegNet produced the highest accuracy for parking lots and bare land (about 20% higher accuracy than the other two algorithms). The DeeplabV3+ algorithm had the highest accuracy for the forest category. For the non-target category, U-Net yielded the highest accuracy.

TABLE 6. Aerial image dataset algorithm pixel accuracy of each category.

Figure 6 shows an example image of the test results produced by each algorithm. DeeplabV3+ segmented forests were the most similar to the annotation data, as the algorithm had the highest pixel accuracy for that category. For the parking lot category, in which the SegNet algorithm yielded the highest accuracy, the algorithm identified parking lots that did not meet the area standard. For paddy fields, the U-Net algorithm displayed the most error-free segmentation.

FIGURE 6. Result aerial image of SegNet, U-Net and DeeplabV3+ algorithms.

The results of machine learning obtained by applying the satellite imagery to each algorithm are shown in Table 7; the overall pixel accuracy was highest for U-Net at 91.4%, followed by SegNet at 88.4% and DeeplabV3+ at 85.8%. Thus, in general, the results were about 10% higher in accuracy compared to that of aerial imagery. However, unlike the aerial image results, the satellite imagery results had low accuracy in categories other than forest. In particular, in the case of the field item, it was difficult to classify it because it had a shape similar to that of a forest and had a more irregular shape than that of the Paddy Field. This is attributable to the relatively large number of forest items in the dataset, as well as the many differences from aerial images in the quantity of the entire dataset. It is expected that this result could be improved by applying techniques such as data augmentation and by constructing additional datasets.

TABLE 7. Satellite image dataset algorithm pixel accuracy of each classes.

The images resulting from the application of the three algorithms to satellite imagery are shown in Figure 7. As the overall pixel accuracy was high, the segmented boundaries were considered to be more accurate than the aerial imagery. However, the DeeplabV3+ algorithm was found to require additional datasets and reinforcement training. For the U-Net and SegNet algorithms, paddy fields and buildings were clearly segmented compared to the annotations.

FIGURE 7. Result satellite image of SegNet, U-Net and DeeplabV3+ algorithms.


In this study, effective AI training data were constructed using multi-resolution datasets with various spatial resolutions, combined with aerial and satellite imagery with spatial resolutions of 0.51 and 10 m, respectively. A total of 396 aerial images were utilized to construct 47,000 AI training data, while 14 satellite images were used to build 300 AI training data. Using this training data, we examined the possibility of analyzing (quasi-) real-time environmental changes for landcover change prediction purposes. In addition, metadata in JSON format that can applied directly in AI algorithms provided by Zenodo and GitHub were prepared for all classified landcover category objects (a total of 50,000).

The data were used to construct a large-scale dataset with high precision. In addition, raw data were compiled as metadata reference material. To minimize error, a two-step verification process was performed of refined data and annotated data to improve the quality of the machine learning datasets. In this study, errors were classified into three categories: unclassification, overclassification, and misclassification. Unclassification is an error in which the object to be annotated is not annotated. Overclassification is an error wherby objects to be annotated are annotated beyond their boundary. Misclassification is an error in which annotated objects are mislabeled. All of these errors were corrected through the three-step error verification process. Each step error was reduced, with the first step accounting for about 11.4% (1,111,799) of all data, the second step accounting for about 1.58% (156,978) of all data, and the third step accounting for about 0.04% (15,062) of all data.

Finally, SegNet, U-Net, and DeeplabV3+ algorithms were applied to the datasets and the results were analyzed; these algorithms showed accuracy levels of 71.5%, 77.8%, and 76.3%, respectively, for aerial images and 88.4%, 91.4%, and 85.8% for satellite images. The learning results using U-net showed high accuracy overall for both aerial and satellite imagery. Of the object categories classified using the three algorithms, the highest classification accuracy was found for forests (93.81%, 95.56% and 93.36%, respectively).

The overall accuracy of the algorithms yielded significant results, but low accuracy for the colonnade category was found. This result is due to the fact that a colonnade of street trees occupies a more limited area than other categories. If the area is increased in the future, higher classification performance is expected.

Notably, this study is a pilot study for building AI training data for applications involving AI algorithms. In the future, we expect that our approach can be applied to various AI algorithms. This will enable analyses of appropriate training data and optimal algorithms for individual landcover items. Furthermore, if a new semantic segmentation-based AI algorithm is developed, it will be possible to increase the classification accuracy of landcover items with a smaller area by applying the AI training data of this study to the algorithm.


This study constructed multi-resolution AI learning data to analyze efficiently and predict (quasi-) real-time environmental changes caused by various development projects. Raw data included both satellite (from the Sentinel-2 mission) and aerial imagery. Additionally, a multi-resolution dataset was created so that AI training data could be utilized at various spatial resolutions.

Our approach has three advantages compared to other methods. First, our landcover datasets for AI training were built using data of different resolutions. In this way, improved high-resolution datasets were presented from existing MODIS-based multi-resolution landcover datasets with spatial resolutions of 30 m (Yu et al., 2014). Datasets were constructed such that various spatial resolutions could be used to classify the same landcover items. In addition, the multi-resolution datasets, the product of this study, can be utilized by selecting a resolution suitable for various fields of application, such as landcover classification and land use changes.

Second, the landcover datasets were analyzed with respect to their practicality and accuracy in landcover classification, using three common CNN-based AI algorithms: SegNet, U-Net, and DeeplabV3+. The results showed accuracy levels of 71.5%, 77.8%, and 76.3% for aerial image datasets and 88.4%, 91.4%, and 85.8% for satellite image datasets, respectively. Thus, the landcover datasets for AI training constructed in this study provide a helpful reference for classification and change detection.

In addition, the same land cover classification item was classified with multi-spatial resolution images, and the accuracy of the algorithms applied to each classification item was analyzed. Based on this, for aerial images with a high spatial resolution, U-net classified buildings with the highest accuracy (83.39%), while SegNet classified roads with an accuracy of 84.31%. With regard to the classification of forests, SegNet, U-Net, and DeeplabV3+ all showed an accuracy of 93% or more for satellite images with relatively low spatial resolution. These results can be used as basic data for selecting an appropriate spatial resolution and algorithm for each classification in the future. They are also considered to provide an important basis for utilizing the findings of this study.

This study achieved the research results of determining the data suitability of each landcover item through the construction of AI training data and the application of the data to training algorithms. However, our approach has two limitations. First, AI training data for the whole of Korea were not established. Therefore, at present, the representativeness of the training data for each landcover item is still insufficient. Second, additional research is required to select appropriate AI training data and algorithms for each landcover item in the future. The results of this study were analyzed to be suitable for some items, such as forests, which showed an accuracy of about 90% or more.

We expect that our data will be a useful reference for AI landcover classification and change detection, currently an active research area (Kussul et al., 2016; Lyu et al., 2016; Zhang C. et al., 2019; Sefrin et al., 2020; Zhang et al., 2020). In addition, if a landcover dataset for AI learning is built for the whole of Korea in the future, our work will be useful for various environmental studies, beyond classification and change detection. In addition, AI training datasets are expected to be increasingly relevant in the future; the findings of this study should provide a useful reference to this end.

Data Availability Statement

The Aerial data used here are available from the Korea NGII (National Geographic Information Institute).

Author Contributions

S-HL and M-JL conceived and designed the experiments. S-HL performed data collection and processing. S-HL and M-JL analyzed the data. S-HL drafted the manuscript. S-HL and M-JL revised the manuscript. All authors contributed to this manuscript and approved the final version. The English in this document has been checked by at least two professional editors, both native speakers of English. For a certificate, please see:


This research was conducted at Korea Environment Institute (KEI) with support from project “The Application technology and system of satellite image radar in the environmental” by Korea Environment Industry & Technology Institute (KEITI), and funded by Korea Ministry of Environment (MOE) (2019002650001) and This research was also supported by a grant The Basic Science Research Program through the National Research Foundation of Korea (NRF) funded by the Ministry of Education (NRF-2018R1D1A1B07041203).

Conflict of Interest

The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Publisher’s Note

All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article, or claim that may be made by its manufacturer, is not guaranteed or endorsed by the publisher.



This article is autogenerated using RSS feeds and has not been created or edited by OA JF.

Click here for Source link (