Due to its decentralized immutable and traceable characteristics, blockchain technology is used in data trading market in recent years, which has attracted great attention of the industry. For example, Shanghai Data Trading Center [13] uses alliance chain to store transaction related information in blockchain nodes to ensure data transaction security, efficiency and credibility.

Wang et al. [13] applied blockchain technology to the data market, which improved the transparency and security of data transactions, but did not take into account the long-term sustainability of data market. Zyskind [14]used blockchain to protect the privacy of personal data, transforming the blockchain into an automatic access control manager to strengthen the ownership of data, which realized data storage and data access control without third parties.. Crowdbc [15] is a crowdsourcing system constructed by blockchain in which a requester’s task can be solved by a crowd of workers without relying on any third trusted institution. The author focuses on and processing of image data in transaction. Baig [16] constructed a data market based on blockchain and introduced a trusted intermediary in the transaction between the buyer and the seller. Although this makes the transaction between the buyer and the seller easier, it also reduces the security of the system a lot. Dai proposed SDTE [17], a blockchain-based data trading ecosystem. In SDTE, buyers of data cannot directly access the original data they purchased, but can only obtain the analysis results of the data, which is generated from Intel SGX (Software Guard Extensions). However, if there are too many malicious data sellers in a transaction, honest data sellers will not be able to get reasonable compensation.

Spatial data in this paper include [18]: remote sensing, mapping and other raw data, such as low-resolution satellite images, medium resolution satellite images, high-resolution satellite images, sub-metre high-resolution satellite images, aerial photogrammetry data, series scale vector data, terrain data and other types of original spatial information data. As shown in Fig. 1, after obtaining spatial data through various devices such as satellite, access point and mobile phones, the data owners upload them to the cloud storage platform through network and then sell them to the data buyers through data trading platform who may use the data to achieve high-precision target positioning, make road condition monitoring or realize disaster information acquisition.

Fig. 1
figure 1

Spatial data trading system

The spatial data trading system based on blockchain designed in this paper consists of three main components: a smart contract in Ethereum blockchain, trading application system for users and a point-to-point data transmission network. As shown in Fig. 2, after collecting some data for sale, the data seller registers and adds data digest information in the smart contract while data itself is stored in IPFS using data storage module. When needing some data to calculate a task, a data buyer will use the data query module and issue an order containing data requirements through the smart contract. Then, the system queries for qualified data in the way of security calculation. Subsequently, using auction method, the data pricing module will determine the price of the data. As the payment has been completed, the system delivers the data to the buyer securely. With the data delivered and used, data reputation module is used to evaluate the data quality and the reputation of the seller.

Fig. 2
figure 2

Data query

Data buyers can actively locate spatial data resources through different query methods – keyword query, region query, nearest neighbor query, anti nearest neighbor query, etc.- and view the details of spatial data through data digest, combining with thumbnails and metadata, so as to specify their own requirements; In addition, data buyers can clarify their specific data requirements into a logical expression or a mathematical function and store them in the blockchain for the seller to query [19, 20]. Thus, according to the buyer’s data requirements, the seller can also query the data to confirm whether the data is consistent with the buyer’s demands.

Although the query conditions of spatial data are relatively simple, it is inappropriate to upload the buyer’s data query requirements directly to the blockchain, otherwise, it is easy for attackers to infer the buyer’s data requirements and obtain the privacy of both the buyer and the seller, which obviously increases the risk of privacy disclosure. Therefore, in order to protect the privacy of system users, we should query the data while obfuscating the query conditions. The most common solution is function encryption, by which data buyer with decryption key, when the query conditions are encrypted, can obtain the data query value of cipher text data, instead of any information about original data.

Data storage

Most public blockchain systems have restrictions on the number and space of transactions in the block due to the limited size of the block (such as bitcoin system) or the “Gas” consumed in the block (such as Ethereum system). Therefore, it is infeasible to store massive data directly on the blockchain in the spatial data trading system. Recently a distributed data storage InterPlanetary file system (IPFS) is introduced for storing the shared data in various other domains like health care, cloud computing, IoT, etc.

IPFS [21,22,23] is a peer-to-peer, content-addressable, distributed file storage system, using a swarm of computers connected. When a file is uploaded to IPFS, it is available to all peers in the IPFS network. The uploaded file is divided into chunks, which are assigned as a unique cryptographic hash. Thus, data added to IPFS are addressed by using this unique cryptographic hash, which makes it content addressable. It uses distributed hash tables (DHTs) to locate files. In this case, traded data files are stored in IPFS while the hashes correlated with IPFS are stored in blocks, thus reducing the huge cost of storage space. In summary, IPFS provides high throughput with secure storage model that supports concurrent access of data with high storage capacity.

The cloud storage service of the spatial big data trading platform is mainly to establish a storage space station, which uses computer, Internet, Internet of things and other technologies to carry out daily storage management on the products and services traded by the platform and various information generated during the data trading, and can quickly and accurately complete the statistical summary of product and service transaction information. Taking the merits of cloud storage service, such as rapid retrieval, high reliability, large amount of storage and good confidentiality, a great deal of data can be safely saved [24, 25].

Although it is impractical to store the original data in the blockchain, the data digest correlated with specific data can be stored in the blockchain, while the original data can be stored in IPFS. When the data are successfully stored in IPFS, the user will receive a hash index, which helps to retrieve the file later. This index will replace the data stored in the smart contract, reducing storage bottlenecks of the entire system.

We represent spatial data as Tı= {hashid, time, space, other attributes }, where hashid is the hash value generated by hashing spatial data, i.e. hashid = hash (time, space, other attributes); time with the time attribute indicates the time when the data is generated; space with a spatial attribute is usually expressed in the form of longitude and latitude coordinates. In order to facilitate the query and storage of spatial data, the space here is converted by GeoHash(Geographical Hash) algorithm, which can hash the longitude and latitude information. Other attributes symbolize other traits of spatial data besides temporal and spatial attributes, including signature information.

Data pricing

Undoubtedly, data, as a commodity, has some unique properties [26, 27], contributing to data pricing needs considering more issues. First, the marginal cost of data is extremely low, among which the marginal cost refer to the cost of copying a product, so that once the data buyer obtains the data from data seller, he may resell the data. Second, the value of data is not only related to the amount of data, but also to its content and quality. For example, a pile of face images is of little value to a person who needs remote sensing images. Third, the quantification of data value is difficult to estimate. Moreover, valuations vary greatly among different users.

Many studies [13] have found that most data buyers only need some statistical results such as calculating the average value of data sets, or extracted features by training data for machine learning, rather than the data itself. Consequently, data buyer merely purchases the right to use the data rather than the ownership of the data. In this way, data is isolated from the data consumer.

Although the value of the data itself is difficult to quantify, it is essential to evaluate the value of data, for buyers often need data from multiple sellers whose valuation of data may vary greatly. When calculating the value of data, Shapley value, which is a solution to distribute benefits and costs fairly to multiple participants, can be used to calculate the contribution of each data [28]. Since the computation complexity of Shapley value enlarges exponentially with the increase of data quantity, approximate algorithms or distributed algorithms are often employed in practical application.

In this paper, we use auction theory to design pricing mechanism in spatial data trading system. Each data seller has a private valuation including the risk assessment of privacy leakage for their data; comparatively, for each data buyer, he also has a valuation for the data he will buy. Because of unsymmetrical information of their valuation, both buyer and seller may dishonestly report their valuation of data. To solve the problem, we use auction theory to design incentive compatible mechanism so that two parties can get the highest return when reporting their real valuation supplemented by Shapley value calculation algorithm, which makes it much easier to design pricing mechanism.

Data reputation

It is hard for data buyers to estimate the quality of seller’s data instantly before they use it. Due to the particularity of online data trading, sellers usually have more obvious information advantages than that of buyers. In the case of asymmetric information, malicious sellers cannot be prevented from cashing out through lower quotations and selling low-quality data to seek improper interests.

To solve the problem, in this paper, we introduce a reputation mechanism to ensure the long-term sustainability of data trading. As a significant signal of data quality, reputation is an important intangible asset of data seller. When the quality of goods such as data cannot be directly observed, reputation plays a decisive role and can effectively reduce the uncertainty of data trading. The reputation of a data seller is computed as reputation score by aggregating the subjective feedback provided by data buyers after they acquire and use the data, which accurately represents the data quality of a seller.

Security computing

Generally, information in the blockchain system is open to all users, and the execution of all transactions or scripts is transparent because of the openness and transparency of the Blockchain technology. Besides, the computing power of the blockchain smart contract is relatively weak due to the limitation of block size, therefore, it is neither safe nor feasible to run the trading transactions issued by the buyer directly in the smart contract, for the computing cost is too high. Similarly, such problems also exist in the process of data query and data pricing. Furthermore, public chain nodes in Bitcoin, Ethereum and other blockchain are not trusted with each other, which makes the privacy protection work more challenging, since they are also completely open and transparent.

Trusted hardware, such as trusted execution environment (TEE) [29,30,31], is a common method to implement secure computing. TEE ensures that the code and data loaded in it are protected in terms of confidentiality and integrity. The data seller will send his data to the TEE equipment of the secure buyer who has some TEE hardware, thus the calculation task will be executed in the TEE and the result will be returned to the buyer in a safe way. Hardware isolation in the TEE protects data and computing service from applications running on the operating system, while trusted applications running in TEE can access the full functionality of the device’s main processor and memory. The typical hardware technologies supporting TEE are ARM TrustZone and Intel SGX( Software Guard Extensions).

Intel Software Guard Extensions (SGX) [32] provides a widely used TEE implementation for general-purpose computation, which is known as enclaves in SGX. Code running inside an enclave has a protected address space. When data in an enclave moves off the processor to memory, it is transparently encrypted with keys only available to the processor. Thus the operating system, hypervisor and other users cannot access the enclaves memory. In the enclave, the code and data are measured at the startup stage and the measurement is signed into an attestation report based on a hardware-based root of trust. The report can be verified to show the unmodified enclave code logical, by which users can confirm the security of enclave.

In this paper, we use Intel SGX as the exemplary implementation to build a trusted exchange by using trusted execution environment, assisting in the fair payment of the transactions.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

Disclaimer:

This article is autogenerated using RSS feeds and has not been created or edited by OA JF.

Click here for Source link (https://www.springeropen.com/)

Loading