How is historical data stored on Conflux?
19 December, 2019
One question frequently asked by our community members:
“How does Conflux store the massive amounts of data with the substantial increase of network throughput?”
With Conflux’s current network parameters of 4 blocks per second and a block-size of 400 KB and a Conflux network running at its full capacity, the throughput is 1.2 MB per second, around 100 GB per day and historical transactions can reach 30 TB per year.
Storing 30TB is common with enterprise and industrial-grade applications and the investment for three 10TB hard drives is only about 700 to 1000 USD. In comparison to “Enterprise Consortium Chains” like Hashgraph, Conflux needs to ensure decentralization on a high-level and is required to lower the costs of the consensus participation.
In order to decrease the threshold for the consensus participation, Conflux set the minimum full node requirement equivalent to a common 2019 home office desktop computer during development. Including synchronization, execution, and confirmation of transactions or operations such as maintaining the tree-graph structure, computing resources such as a full node’s CPU, hard drive and memory consumption are under strict control. Therefore, the ‘simple solution’ of requiring miners to add new hard drives for data storage has not been a consideration from the very beginning.
High throughput also brings another problem: When a new node joins, how long does it take to synchronize all the data?
For solving the problem high throughput brings to storage and synchronization, Conflux adopts the following solution: full nodes only need to store the block header and not the all transaction information of blocks that are considered old. Conflux will provide codes for “Archive Nodes”, allowing anyone to operate an archive node, storing all historical data.
The above-described design might be slightly different from the blockchain everyone knows, therefore the following part will explain why the design does not lose the fundamental characteristics of a blockchain.
Let’s have a look at the information included in a block header:
Firstly, the block header includes all information a block references. This means that the tree-graph structure is completely stored on every full node.
Secondly, Proof-of-Work requires the hash of every block header to be small with enough leading zeros. Even without saving transactions from the block, the block header is sufficient to proof enough work was done for the block creation. In other words, the block header includes data on the Proof-Of-Work.
Thus, the structure between the blocks and the information of the block header are stored decentralized and immutable in every full node. If attackers wants to rewrite history, they need to pay the price equivalent to the historical cumulative workload.
Besides the tree-graph structure and Proof-of-Work, block headers also include the Merkle Root, which includes the blocks transactions and transaction execution results. Here, the Merkle Root is a hash that represents the current data content. Every Conflux block header includes these there hash value:
Transaction Root: Corresponding to all transaction content within the block
State Root: Corresponding to the “world status” after the transaction execution, including every account addresses balance and the state of all smart contracts. (Due to applying delayed execution strategy, the State Root in the block header of Conflux corresponds to the state after the transaction in the previous block has been executed, and the transactions of the current block will be delayed to a later block for execution.)
Receipt Root: Corresponding to the receipts produced during the contract execution process. Includes whether the execution was successful or not and whether the transfer was executed during the execution.
If a user wants to receive previous transaction data and execution results, he/she can query it from the “Archive Node” and determine whether the data provided by the archive node is correct by comparing the hash value in the block header.
For archive nodes, the only method to attack is refusing to provide transaction data, and the consequences will at most only affect the query ability of historical transactions. Even if no honest node keeps the historical transactions, those transactions will only be forgotten without being tampered with, and the immutability of the transaction can still be guaranteed.
Due to the limited possibilities of an Archive Node being “evil”, the archiving nodes themselves do not have high requirements for decentralization. It is entirely possible for the Conflux Foundation and the community to make joint efforts to maintain several Archive Nodes.
In fact, because running an Archive Node does not require any audit or permission, large users can also run an Archive Node themselves or commission others to ensure the performance and reliability of historical transaction data.
For a DApp running on Conflux, if the amount of data it needs to save and access at any time is not large — for example, 1MB of data is added every year — these business data can be stored on the internal state of the smart contract. The data stored in the internal state of the contract will be saved and synchronized by each full node as part of the world state, without the need to run the archive node itself to ensure the highest degree of availability.
Conflux is a State-of-the-Art public blockchain system that can achieve high TPS without sacrificing decentralization or safety. By delicately combining its unique and advanced algorithm with a novel structure — — Tree Graph (TG), Conflux makes consensus no longer a performance bottleneck, thereupon solves a series of problems in the industrialization of public chains. Currently, in its first stage, Conflux adopts PoW (Proof of Work) mechanism as the basis of its consensus.
Join the Revolution here: