Gideon Greenspan: How to Spot a Half-Baked Blockchain When Chains and Blocks Serve No Useful Purpose

Opinion

Gideon Greenspan: How to Spot a Half-Baked Blockchain When Chains and Blocks Serve No Useful Purpose

Gideon Greenspan

16 December 2016

About 18 months have passed since the finance sector woke up, en masse, to the possibilities of permissioned Blockchains, or to use the more general term, “distributed ledgers”. The period since has seen a tsunami of activity, including research reports, strategic investments, pilot projects, and the formation of many consortia. No one can accuse the banking world of not taking the potential of this technology seriously.

Naturally, the explosive growth in Blockchain projects has driven the development of permissioned Blockchain platforms, on which those projects are built. For example, our product MultiChain has tripled in usage over the past year, whether we measure web traffic, monthly downloads or commercial inquiries. And of course, there are many other platforms, such as BigChainDB, Chain, Corda, Credits, Elements, Eris, Fabric, Ethereum (deployed in a closed network), HydraChain and Openchain. Not to mention still more startups who have developed some kind of Blockchain platform but have not made it publicly available.

For companies wishing to explore and understand a new technology, an abundance of choice is generally a good thing. However, in the case of Blockchains, which still remain loosely defined and poorly understood, this cornucopia comes with a significant downside: many of the available “Blockchain” platforms don’t actually address the core problem they are meant to solve. And what is that problem? Allow me to quote the succinct video definition by Richard Gendal Brown, CTO of R3, in full:

A distributed ledger is a system that allows parties who don’t fully trust each other to come to consensus about the existence, nature and evolution of a set of shared facts without having to rely on a fully trusted centralized third party.

R3 – Definition of a Distributed Ledger from R3 on Vimeo.

To take an extreme example, consider a bunch of Lego bricks tied together with string. If we use the term “block chain” to describe this fashion item, who’s to say that we’re not describing it accurately? And yet, that particular chain of blocks will not help multiple parties to safely and directly share a database without a central intermediary. Similarly, many “Blockchain” platforms do something related to chains of blocks, but also lack the necessary properties to serve as the basis for a peer-to-peer database.

Another chain of blocks that does not help with database sharing – source.

Minimum viable Blockchain

In order to understand the basic requirements of a distributed ledger, it helps to clarify how these systems differ from regular databases, which are controlled by a single entity. For example, let’s consider a simple system for tracking who owns a particular company’s shares. The ledger, as implemented in a database, has one row for each owner containing two columns: the owner’s identifier, such as their name, and the corresponding quantity of shares.

Here are six crucial ways in which this system could fail its users:

Forgery: Transferring shares from one person to another without the sender’s permission.
Censorship: Refusing to fulfill someone’s request to transfer some shares elsewhere.
Reversal: Undoing a transfer that took place at some point in the past.
Illegitimacy: Changing the total quantity of shares in the system without a corresponding action by the issuer.
Inconsistency: Giving different responses to inquiries from different users.
Downtime: Not responding to incoming requests for information at all.

Because of all these possibilities, the shareholders must maintain a high level of trust in whoever is managing this ledger on their behalf. Building and running an organization worthy of that trust comes with substantial hassle and cost.

Blockchains or distributed ledgers remove the need for this kind of central database operator, by allowing the users of a database to interact directly with each other on a peer-to-peer basis. In our example, the stockholders could safely hold their shares on a Blockchain which they collectively manage, and make transfers to each other instantly over that chain. (The disadvantage is a significant loss of confidentiality between the chain’s users, which we won’t address here but I’ve previously discussed at length.)

All this brings us back to the question of Blockchain platforms. In order to serve as a viable basis for peer-to-peer database sharing, a Blockchain has to protect its participants against all six types of database failure – forgery, censorship, reversal, illegitimate transactions, inconsistency and downtime. While many products in the market fulfill these requirements, quite a few of them come up short.

I call these Blockchains “half-baked” because they may address some of these risks, but not all. In some respects at least, the database’s users remain dependent on the good behavior of a single participant, which is precisely the scenario we want to avoid.

These half-baked Blockchains come in any number of varieties, but three archetypes stand out as the most common or obvious. I’m not going to name individual products because, well, I don’t want to offend. The Blockchain startup community is small enough that most of us know each other through conferences and other meetings, and the interactions tend to be positive. Nevertheless, if Blockchains (in the sense of useful peer-to-peer databases) are ever going to emerge as a coherent product category, it’s important to distinguish between half-baked and real solutions.

The one validator Blockchain

One pattern we’ve seen a few times is a Blockchain in which only one participant can generate the blocks in which transactions are confirmed. Transactions are sent to this one node instead of being broadcast to the network as a whole, so their acceptance is subject to this party’s whims rather than some kind of majority consensus. Still, once a block has been built by this central party, it is broadcast to the other nodes in the network, who can independently confirm the validity of the transactions within, and record the new block locally and permanently.

To return to our six forms of database malfunction, this type of Blockchain is far from useless. Transactions must be digitally signed by the entity whose funds they move, so they cannot be forged by the central party. They cannot be reversed because each node maintains its own copy of the chain. And transactions cannot perform illegal operations like creating assets out of thin air, because every node independently validates each transaction for correctness. Finally, each node maintains its own copy of the database, so its content is always available for reading.

Unfortunately, four out of six is not enough. The validating node can easily censor individual transactions, by refusing to include them in the blocks it creates. Even if the operators of this node are honest, a system or communications failure can render it unavailable, causing all transaction processing to come to a halt. In addition, depending on the setup, the validating node may be able to transmit different versions of the Blockchain to different participants. In terms of censorship and consistency, the database still contains a single point of failure, on which all the other nodes rely.

One platform offers a twist on this scheme, in which blocks are centrally generated by a single node, but a quorum of other designated nodes signs them to indicate consensus. In terms of the risk of inconsistency, this certainly helps. The nodes in the quorum will only lend their signatures to a single version of the Blockchain, which can therefore be considered as authoritative. Nonetheless, the quorum nodes cannot help if the block generator censors transactions, or loses its connection to the Internet. Ultimately, this type of Blockchain still uses a hub-and-spoke architecture, rather than a peer-to-peer network.

The shared state Blockchain

Technically speaking, there are many similarities between Blockchains and more traditional distributed databases such as Cassandra and MongoDB. In both cases, transactions can be initiated by any node in the network, and must reach all the other nodes as part of a consensus about the database’s developing state. Both Blockchains and distributed databases have to cope with latency (communication delays which stem from the distance between nodes) and the possibility of some nodes and/or communication links intermittently failing.

Distributed databases have been around for a while, so any Blockchain platform developer would do well to understand their consensus algorithms and the strategies they use to globally order transactions and resolve conflicts. Nonetheless, it’s important not to take the comparison too far, because Blockchains must contend with a crucial additional challenge – an absence of trust between the database’s nodes. Whereas distributed databases focus on providing scalability, robustness and high performance within a single organization’s boundaries, Blockchains must be redesigned in order to safely traverse those boundaries.

To return to our six types of database risk, a node in a distributed database need only worry about downtime, i.e. the possibility of other nodes becoming unavailable. Nodes can safely assume that every transaction and message on the network is valid, and are not concerned with forgery, censorship, reversal, illegitimacy or inconsistency. Their worst problem is dealing with two simultaneous but valid transactions, initiated on different nodes, which affect the same piece of data. Solving these conflicts is by no means trivial, but it’s still a lot easier than worrying about “Byzantine faults“, in which some nodes deliberately act to disrupt the functioning of others.

A database can only be shared safely across trust boundaries if nodes treat all activity on the network with a certain degree of suspicion. For example, every transaction which modifies the database must be individually digitally signed since, in a peer-to-peer architecture, there is no other way to know its true point of origin. Similarly, every incoming message, such as the announcement of a new block, has to be critically assessed for its content and context. Unlike in distributed databases, nodes must not be able to immediately and directly modify another node’s state.

Some “Blockchain” platforms have been developed by starting with a distributed database, and sprinkling some features on top to make them more Blockchainy. For example, by grouping transactions into blocks and storing hashes (digital fingerprints) of those blocks in the database, they aim to add a form of immutability. But unless each node can be sure that its list of hashes cannot be modified by another node, this type of immutability is easily gamed. The standard response to these criticisms is that every security problem can be solved with sufficient time and coding. But this is rather like holding some prisoners in an open field, and trying to stop them escaping with tripwires and ditches. It’s far safer to use a purpose-built concrete structure, whose doors are locked and whose windows are barred.

The one cloud Blockchain

By far the strangest phenomenon I’ve seen is Blockchain platforms which can only be accessed through their developer’s cloud-based platform-as-a-service. To be clear, we’re not talking about some of a Blockchain’s participants choosing to host their nodes on their cloud provider of choice, such as Microsoft Azure or Amazon Web Services. Rather, this is a Blockchain which can only be accessed through APIs exposed by the servers of a company “hosting” it.

Let us grant, for argument’s sake, that a centralized Blockchain provider genuinely has a group of nodes running under its control. What difference does this make to the users of the system who are sending API requests and receiving responses? The participants have no way of assessing if everyone’s transactions have been processed without omission or error. Perhaps the central service is malfunctioning, or perhaps it is censoring or reversing some transactions deliberately. And if you believe the Blockchain provider has no reason to do this, why not use them to host a regular centralized database instead? You’ll get a more mature product with better performance, and suffer none of the risks of working with new technologies. In short, centralized Blockchains are about as useful as Lego on a string.

Solving the mystery

We’ve now seen three types of platform which market themselves as “Blockchains”, and indeed make some use of a chain of blocks, but which don’t solve the fundamental problem for which these systems are designed. To recap, this is to enable a single database to be safely and directly shared across trust boundaries, without a central intermediary.

Apart from pointing at this peculiar phenomenon, I believe it’s instructive to consider what might underlie it. Why are so many Blockchain startups building products which don’t fulfill the promise of this technology, often achieving no more than traditional centralized or distributed databases? Why are so many talented people wasting so much of their time?

I can see two main classes of explanation – technical and commercial. To start with the technical, it is rather tricky to create distributed consensus systems which can tolerate one or more nodes behaving maliciously in unpredictable ways. In the case of MultiChain, we somewhat cheated, by using bitcoin’s battle-hardened reference implementation as a starting point, and then replacing proof of work by a structurally similar consensus algorithm called “mining diversity”. Teams developing a Blockchain node from scratch have to think deeply about asynchronous and adversarial processes – a combination which few programmers have experience of. I can certainly understand the temptation to take a shortcut, such as using a single node to generate blocks, or piggybacking on an existing distributed database, or only running nodes in a trusted environment. Choosing any of these undoubtedly makes life easier for developers, even if this undermines the entire point.

As for commercial reasons, every startup seems to be approaching the Blockchain opportunity from a different angle. Here at Coin Sciences, we’re focused on becoming a (database) software vendor, so we’re distributing MultiChain for free while developing a premium node with additional features. Other startups want to sell subscription services, so they will naturally build a platform which customers cannot host themselves. Some are hoping to centrally control a Blockchain or help their partners to do so (an odd ambition for a disintermediation technology!) and are naturally drawn to consensus algorithms that rely on a single node. And finally, there are companies whose primary goal is to sell consulting services, in which case their platform need not function at all, so long as its website brings in some large customers.

Perhaps another issue is that some Blockchain companies are being run by people who are undoubtedly bursting with talent, but lack a deep understanding of the technology itself. In startups carving out a new field, it’s probably vital for strategic decisions to be taken by people who understand the nature of that field and how it differs from what came before. Not a few Blockchain startups appear to have painted themselves into a corner by pursuing a product vision which is attractive to their customers, but cannot actually be built.

As a user of Blockchains, how can you avoid being caught by these fallacies? When evaluating a particular Blockchain platform, be sure to ask whether it fulfills the six requirements of safe peer-to-peer database sharing: prevention of downtime and inconsistency, as well as transaction forgery, censorship, reversal and illegitimacy. And beware of explanations that consist of too much mumbling or hand waving – they probably mean that the answer is no.

Please post any comments on LinkedIn.

Cookie	Duration	Description
cookielawinfo-checkbox-advertisement	1 year	Set by the GDPR Cookie Consent plugin, this cookie is used to record the user consent for the cookies in the "Advertisement" category .
cookielawinfo-checkbox-analytics	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Analytics".
cookielawinfo-checkbox-functional	11 months	The cookie is set by GDPR cookie consent to record the user consent for the cookies in the category "Functional".
cookielawinfo-checkbox-necessary	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookies is used to store the user consent for the cookies in the category "Necessary".
cookielawinfo-checkbox-others	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Other.
cookielawinfo-checkbox-performance	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Performance".
CookieLawInfoConsent	session	Records the default button state of the corresponding category & the status of CCPA. It works only in coordination with the primary cookie.
spcsrf	2 hours	This cookie is used for ensuring the security of the website and visitors. This cookie ensures visitor browsing security by preventing cross-site request forgery.
viewed_cookie_policy	11 months	The cookie is set by the GDPR Cookie Consent plugin and is used to store whether or not user has consented to the use of cookies. It does not store any personal data.
_GRECAPTCHA	5 months 27 days	This cookie is set by the Google recaptcha service to identify bots to protect the website against malicious spam attacks.
_wpfuuid	11 years	This cookie is used by the WPForms WordPress plugin. The cookie is used to allows the paid version of the plugin to connect entries by the same user and is used for some additional features like the Form Abandonment addon.

Cookie	Duration	Description
CONSENT	2 years	YouTube sets this cookie via embedded youtube-videos and registers anonymous statistical data.
SPSI	session	This cookie is used for setting a unique ID for the session and it collects user behaviour on the website during the session. This collected information is used for statistical purposes.
user_id	10 years 2 months 6 days	This cookie is used for identifying the user. It helps to keep track of the visitor profile for future sessions and for customizing their experience.
_dc_gtm_UA-127046568-1	1 minute	This is a Google Tag Manager cookie that is used to control the loading of a Google Analytics script tag, to track the performance of ad campaigns.
_ga	2 years	The _ga cookie, installed by Google Analytics, calculates visitor, session and campaign data and also keeps track of site usage for the site's analytics report. The cookie stores information anonymously and assigns a randomly generated number to recognize unique visitors.
_ga_8Z4LPYV357	2 years	This cookie is installed by Google Analytics.
_ga_HJEQGVMT2D	2 years	This cookie is installed by Google Analytics.
_gid	1 day	Installed by Google Analytics, _gid cookie stores information on how visitors use a website, while also creating an analytics report of the website's performance. Some of the data that are collected include the number of visitors, their source, and the pages they visit anonymously.

Cookie	Duration	Description
fr	3 months	Facebook sets this cookie to show relevant advertisements to users by tracking user behaviour across the web, on sites that have Facebook pixel or Facebook social plugin.
mc	1 year 1 month	Quantserve sets the mc cookie to anonymously track user behaviour on the website.
VISITOR_INFO1_LIVE	5 months 27 days	A cookie set by YouTube to measure bandwidth that determines whether the user gets the new or old player interface.
YSC	session	YSC cookie is set by Youtube and is used to track the views of embedded videos on Youtube pages.
yt-remote-connected-devices	never	YouTube sets this cookie to store the video preferences of the user using embedded YouTube video.
yt-remote-device-id	never	YouTube sets this cookie to store the video preferences of the user using embedded YouTube video.
_fbp	3 months	This cookie is set by Facebook to display advertisements when either on Facebook or on a digital platform powered by Facebook advertising, after visiting the website.
__qca	session	The __qca cookie is associated with Quantcast. This anonymous data helps us to better understand users' needs and customize the website accordingly.

Cookie	Duration	Description
adOtr	session	No description available.
BiggerBuyAmount_ABvariant	1 month	No description
clear_confirm_ABvariant	1 month	No description
ipcountry	1 month	No description
livechat_delay	1 month	No description
ppwp_wp_session	30 minutes	No description
PRLST	session	No description available.
progress_bar_ABvariant	1 month	No description
SPSE	session	No description available.
sp_lit	5 minutes	No description available.
time	session	No description available.
UTGv2	5 months 27 days	No description available.
WTP_AB_variant	6 months 3 days	No description
_dlt	1 day	No description
__zrtbanner49	3 months	No description