Blockchain and the return of scalability issues. TPS much?

“How many transactions per second can you handle?” – is a question asked during almost any blockchain AMA. Our answer is typically “enough”, but what is enough and why is there a limit to blockchain scalability anyway?

Scalability hasn’t been an issue in web tech for almost a decade. Internet giants like Facebook and Amazon have shown it’s possible to handle vast amounts of requests. The anti-patterns (common design mistakes) that lead to scalability issues are public knowledge and the methods to solve them are well documented.

We should be able to apply the same scalability techniques to blockchain. Right?

Self-imposed limits

Where are seemingly arbitrary limits as Bitcoin’s 7 transactions per second coming from? Are these network limits or system limits? Surely, with modern servers and high-speed internet connections, it should be possible to handle a lot more transactions.

It’s important to realize that blockchains have self-imposed limits. This is done through two rules; the maximum block size and block time. The maximum transactions per second can be calculated as `(1 / block time) * (max block size / avg transaction size)`.

Examples


Block timeMax sizeAvg tx sizeMax TPS
Bitcoin600 s1 MB280 bytes7 tx/s
Ethereum13 s8,000,000 GAS20,000 GAS30 tx/s
LTO60 s1 MB160 bytes104 tx/s

The maximum block size is typically defined in MB. Some blockchain, notably Ethereum, uses a different metric; gas. Gas is correlated to the computational effort.

Web servers are also configured to handle a limited number of requests. This is done to prevent it from freezing by being overloaded. But these are much closer to the system limits than is done a typical blockchain node.

With a blockchain, you can’t finetune these limits per node. Instead, limits are applied network-wide. This means that the limit effectively sets the minimum system requirements. But there are other, more important reasons for this limit.

Ethereum will automatically adjust the max gas limit based on demand. When submitting a transaction, you’re free to offer any gas price you want. If there would be a huge overcapacity, all transactions would be added to the next block regardless of the offered gas price.

Tuning the max capacity based on the required capacity creates an artificial scarcity.

The downside of this approach is that during peak times, the capacity is reached and transactions stay in the queue. This results in long confirmation times.

LTO Network has opted to enforce a soft and hard minimum transaction fee, allowing for overcapacity. In a Nash equilibrium, the fees should always be at the hard limit, but in practice, everybody respects the soft limit.

Real limits

More transactions mean more data. If LTO operates on the current full capacity, it would grow with 1.4GB per day. Raising the block size to 10MB would allow for 1000 tx/s, resulting in 14GB worth of transactions per day or a staggering 5TB per year.

Disk space is cheap but requiring terabytes of disk space is a serious system requirement. The real problem wouldn’t be storing this data though. New nodes require to download and process all that data. Synchronization of a new node would take many days.

Then again, it wouldn’t be running on full capacity all the time. A big block-size would mainly help with peak traffic. But this can still pose an issue. The bigger the size of the block, the longer it takes to propagate it through the network. This causes issues that are specific to blockchain and not so much a problem in web tech.

With PoW, the long propagation time creates an unfair advantage to those close to the miner of that block. Nodes that are farther away, continue to work on a mining a block that’s already solved and start on the new block later.

In PoS a similar situation might occur. If the times it takes a block to reach my node exceeds my block delay, I risk the chance of missing the opportunity to forge the block.

Reducing the block time has more or less the same effect and gives the same issues. While a small block propagates through the network quickly, the limited time nodes have to forge will result in missed blocks.

Large mining pools will suffer the least from this as there is no latency when you forge a block yourself. This ultimately incentives centralization.

To get around this, the LTO public chain incorporates the NG protocol. After forging a block, a node will start validating new transactions and already propagate them through the network. Once a new node is allowed to forge, it only needs to close and sign the block.

The upcoming summary blocks will allow nodes to only store a fraction of the data. Each (key) block will contain a Merkle hash of all transactions. The transaction themselves are store separately and are no longer part of the block. A summary block contains the balance changes. Once blocks reach a hard finality, nodes are free to remove the transactions and only keep the key blocks and summery block. 

More importantly, summery blocks greatly reduce the data a new node will receive. The time required to synchronize will be a number of minutes instead of days.

Horizontal scaling

If the network latency and required disk space are no longer the bottlenecks, we get to a situation that’s similar to a web environment where system resources like CPU and memory become the limiting factor.

Vertical scaling means requiring bigger and stronger systems to run as nodes. This is not a sustainable solution as there are limits to the size a system can grow. Also, bigger systems are more expensive. Increasing the costs of running a node reduces profitability and likely result in less decentralization.

Internet giants are able to handle a vast amount of requests due to horizontal scaling. They’re able to increase capacity by adding systems to their network rather than having to upgrade systems.

Adding nodes to a blockchain network like Bitcoin, Ethereum or LTO doesn’t increase capacity, instead, it only increases redundancy. This is because blockchains are write-intensive rather then read-intensive and new transactions are processed and stored on every node.

This is a similar problem as web applications faced with relational databases. While these could be replicated over multiple systems, every system needed to write all the data, which of course has its limits.

Sharding

To solve database scaling limits, web applications started to apply sharding; a technique where the data was denormalized and split-up. Shards store all data of a limited set of users. If a change applies to multiple users, like a new friendship, it’s written to both shards.

With sharding, the database is no longer able to query information about users on another shard. This means that the application server needs to collect all the data and process the query itself. This is much slower than when the database handles it. As such, the general notion is that scalability is traded in for performance.

While sharding is a proven method in web technology, it’s not being universally adopted by blockchain. The issue is that sharding requires coordination. It also requires trust between the nodes, as each node only has partial information and may not be able to assess if a transaction is valid or not. A blockchain with sharding requires master nodes (or a similar concept), which undermine the permissionless nature of a blockchain.

Blockchains that focus smart contracts, like Ethereum, tend to have an architecture with the busy-database antipattern. This antipattern describes a common issue where the data storage is designed a service, spending a significant proportion of time running code, rather than responding to requests to store and retrieve data.

Adding too much logic to the data layer greatly reduces the number of requests it’s able to handle. This forces a network towards sharding much quicker compared with a network where logic is handled by application servers and nodes primarily deal with storing data.

This antipattern also means having shards that handle a small portion of addresses, which can be an additional challenge. Transactions need to be atomic; either the full transaction succeeds or it fails. It should never partly succeed. This increases the coordination required and the communication between nodes.

LTO Network has chosen to stay away from sharding. Instead, all efforts are made to prevent the busy-database antipattern. There is only a limited set of transaction types for the public chain and the effect of these are hard-coded. With summery blocks, new LTO nodes can synchronize quickly and without the need for excessive resources.

Non-trivial logic is done in the LTO event-chain layer. Event chains are share nothing hash-chains. Rather than a single server, event chains are managed by a set of (Docker) containers, running a separate database and application servers. This allows for elastic horizontal scaling.

More important event chains share data peer-to-peer and to specific nodes based on the participants specified in the contract, limiting the data that the whole network needs to process and store. More on this subject later.

Conclusion

The architecture of a public blockchain network is significantly different from the private network of internet giants. Because of specific blockchain concerns, it’s not a trivial task to apply the same scaling techniques.

LTO Network reduces the capabilities of the public chain and relies on the private event-chain layer for handling complex logic. The private layer doesn’t have the same concerns as a permissionless public blockchain has, making it possible to apply a similar architecture as scalable web applications. By taking the non-conventional choice of hybrid blockchain, LTO Network is side-stepping the scalability issues seen with other blockchains.

Arnold Daniels, Lead Architect of LTO Network

By Lead Architect of LTO Network, Arnold Daniels. Join the chat and ask him all the questions you have. You can also check the previous piece about the integrator approach for adoption. Till the next piece of Tech Series!

Website | Telegram | Twitter | Reddit | LinkedIn