Data storage on the blockchain has been pretty rudimentary. However, that appears to be changing, and as it does, more innovative Web3 apps will be able to emerge.
The Limitations of Data Stored “On Chain”
The data on the blockchain is immutable. The question is: What actually needs to be on the blockchain, and what can be stored off-chain, and still have the integrity and the promise of “decentralized”?
It’s a question that a few chains have been exploring. Solana and Near have both been looking at the ever growing data needs that Node operators experience, needing to have a full copy of the entire blockchain. Particularly with fast growing blockchains, that adds up quickly.
The solution? Solana is going to offload much of the data that has lived on its blockchain to Arweave. (one of our earlier posts on NFT storage covers Arweave). And NEAR blockchain is developing an off chain storage mechanism of their own.
In case you think this is not more widespread, Vitalik Buterin also laid out one of the future evolution steps for Ethereum blockchain as “purge” – which will remove data from the ethereum blockchain that is no longer needed.
The need to optimize the requirements and expense node operators experience is the catalyst in this case – due to the sheer volume of data, and having ALL of it distributed across every node, that is causing the necessary innovation that will allow for expansion as well as efficiencies.
Web3 Apps face a similar question about how to build robust apps, and how to store and manage the data for those applications appropriately.
How to distinguish between On-Chain vs. Off-Chain storage needs
As I think through the various applications and how they may fit or evolve as a “Web 3 App”, similar questions arise. Here are the questions Web3 Application developers should be asking to ensure the application architecture will be optimized:
- If all data on the blockchain is immutable (i.e. unchangeable), what is the appropriate data to store “on-chain” vs off-chain? What information, once known, is a “fact” that will never change?
- Conversely, if data WILL change, do we need to keep a history of every modification that is made (like corporations do in their accounting systems, so no funny business can occur)?
- If no detailed transaction history must be retained, and there is no detriment on the application or users of the application should the data change, then what are the off-chain storage options?
- And finally, since most Web3 Applications (Dapps) are actually centralized at the point of their front-end, can other aspects of the application (data storage and databases) also be “centralized”?
What about access privileges?
This is likely the elephant in the room for truly transformative Web3 applications that go beyond trading, finance, NFTs, and the metaverse. The ability to define access permissions to data that is within a web3 application – and can it be done without introducing centralization? That’s a great question! Perhaps we will ultimately end up in a hybrid world – where some things remain centralized.
I think remaining open to the best-for-now architecture choices will allow for innovative Web3 apps that can evolve.
Don’t make an enemy out of Web2
Blockchain would be wise to learn from Web2 apps that come before them. I’ve seen articles that applaud blockchain, and attempt to instill fear into using a Relational Database Management System (RDBMS), somehow saying they’re not secure, or reliable. That’s NUTS!! The Visa system processes 24k transactions per second – and yes, it’s centralized. Salesforce.com, one of the largest SaaS companies, also handles large numbers of users and transactions — and is centralized.
The web2 footprint, INCLUDING security, is well thought out, with deep experience base – and while breaches occur, the theft that occurs in the blockchain world, in both DeFi and Cross-Chain Bridges, is astounding… and as we outlined in our post about the causes of the breaches – almost completely avoidable — yet continue to occur.
So – instead of throwing knives back and forth – or taking pot shots at Web2 infrastructure, let’s look at how blockchain will empower the apps of the future – which may not be “purely” decentralized out of the gate. What’s more important? Getting a vision developed that democratizes access to necessary infrastructure, financial, and other applications that put users in control of their destiny? Or only doing what “pure” and “decentralized” blockchain capabilities currently allow for?
But… What are the databases for Blockchain?
Why am I so interested in database technology? Because it can power very fast queries and transaction speeds, and can maintain a level of access permissions in terms of the data that is stored, so only those who ‘should’ see it, can actually see it.
If you look at Blockchain straight-up – it’s a transparent data structure. While it has amazing fault tolerance, it doesn’t offer privacy. There are apps that would be GREAT to have as blockchain web3 apps – but there’s a certain amount of data privacy that also needs to be maintained – so that is a consideration to take into account.
What are the options?
There are several database options – in this post, we explored and looked into BigChainDB, as it appears to be pretty robust. But there are also other options that we might blog about at some point.
BigChain DB. (https://www.bigchaindb.com/)
Developed by MongoDB, this is a database that allows for fast access, privacy, and has the decentralization and blockchain characteristics of immutability. It allows you to add decentralization and blockchain to your application. BigChain DB uses Tendermint for network and consensus protocols.
The cool thing about Tendermint is its Byzantine Fault Tolerance – which means that even if half of the nodes in the network are compromised, it will still be able to detect and remove the faulty nodes, and continue to operate and maintain integrity of the records.
The challenge with using BigChainDB is that instead of thinking through processes and the data needs as tables, JSON, or key/value pairs, you’ve got to shift your thinking to the storage of “assets” – and this is a shift for developers that come from a traditional DBA mindset. (NOTE: Don’t throw the baby out with the bathwater, because there are aspects of data design that will still serve you very well – just know there’s a mindset shift with BigChainDB).
Per BigChainDB’s website, “An asset can characterize any physical or digital object that you can think of like a car, a data set or an intellectual property right.”
The data is immutable, so cannot be deleted, but the transaction model has not only a “create” but also a “transfer” – allowing the transfer of an asset to occur. The mindset shift is one from process-centric to asset-centric. In this way, if you’ve got object-oriented skills – they’ll come in handy as you model the “asset” objects that your app will manage.
Start considering what Web3 Dapps might be possible
There are more database options – and some very cool possibilities, but this will get you thinking about what data management ‘could’ start to look like – and open possibilities for Web3 Dapp innovation. Your system needs will dictate which is right for your project. Understanding that there are database solutions that could help you build the next generation of web3 applications opens up the possibilities of what those applications could be beyond what we’ve seen so far.
What’s the catch? What if data needs to change?
In Web2, the notion of updating database records is a pretty standard notion. Not everything needs to be permanent. And that analysis is where creating your application architecture must take your functionality into account.
In thinking through different systems – yes, all centralized – and which support literally millions if not billions of users at scale – and data changes must be allowed. Is that bad? No! It’s efficient use of data and storage, and it allows your application to operate – just like using Chainlink’s Oracles for real time prices of offchain resources – that data changes and is updated all of the time – it’s decentralized, but the data can be altered or updated. Of course this is slightly different, because it’s just a real time price “feed” of real world prices and rates – not something that is stored and modified regularly.
It’s not a blockchain app in the sense that the oracle data is stored on the blockchain – it’s not. The smart contracts that allow access to the oracle network is what is on blockchain – so Dapps can access that data when it’s needed in real-time.
As with that, there are needs to update data within an application – that data doesn’t need to be stored on the blockchain – so understanding what data must be stored on-chain vs. off-chain can make a huge difference in the design of an innovative Dapp.
In Summary – Construct the “best for now” architecture for your Dapp, and know that it’s a continuously evolving process. That way we can unlock and unblock innovation of Web3 apps, while gaining user adoption, and learning what works and what’s needed, so the entire ecosystem can evolve.