The Blockchain

Posted by David Harding, Justin O'Brien, Balaji S. Srinivasan

The Blockchain

Install 21

Requirements

Pre-Work

We’re going to start Bitcoin Core daemon (bitcoind) now because it can take up to a few minutes to start as it runs through integrity checks.

We’ve previously created a Bitcoin Core configuration file in .bitcoin/bitcoin.conf. Since bitcoind is all setup, start bitcoind by running the following command:

bitcoind -daemon

It should print “Bitcoin server starting” and immediately return control to your prompt, meaning bitcoind is running in the background. If you run the first bitcoin-cli command below and get an error about Bitcoin starting, wait a minute and try again.

Look at a Bitcoin transaction

alt text

Let's look at a transaction to see what information it contains to help us get a feel for how private and not-private Bitcoin can be. We’ll also be getting a feeling for transactions so that we can create some in the next tutorial. Please note that this is just one type of transaction. We’ll look at other types when we start building our own transactions.

In this case, we're going to look at a transaction made by someone who volunteered for this analysis; this transaction was made Friday, 2 Oct 2015 at a small bitcoin-accepting cafe.

## Get the transaction; the final "1" requests the raw transaction be
## decoded into JSON so we can parse it
bitcoin-cli getrawtransaction 506f29619bc80d37d7ec4023907ea03c0baa7ffeeca988ba10aeb4127d54ab39 1 | less

The command above should open the transaction in the less screen pager. You can always press q to quit, and you can use the arrow keys to scroll up, down, left, and right.

Let's go through the results one section at a time (we'll omit some unnecessary parts). In the output below we replaced some actual data with "[...]" because we want you to look at the output displayed by less in your remote terminal.

{ "txid" : "[...]",

The txid is the hash of the bytes in this transaction. A txid is the canonical identifier for any transaction, although in a future tutorial you'll learn how transaction malleability makes the txid dangerous to depend on in some cases.

"version" : [...],
"locktime" : [...],

All transactions have a version number and a locktime. Except for a few pranks, all transactions currently on the blockchain are version 1. We'll cover the semantics of the locktime field in a later tutorial, but the locktime field does provide us some identifying information here—one major Bitcoin wallet (Bitcoin Core GUI) sets the locktime for every transaction it makes; since the locktime here is 0, we can assume this transaction was not made with Bitcoin Core wallet.

"vin" : [

The vin array contains the Input Vectors (recall Bitcoin Core is programmed in C++). These correlate to the outputs from previous transactions that are being spent.

   {
        "txid" : "[...]",
        "vout" : [...],

Each input refers to the previous output it's spending by the txid of the transaction that output appeared in and the zero-based index number of the output in the Output Vector (vout). So the input above spends the second output in the transaction with the above txid. We'll look more at that output later in this tutorial.

These two fields together are called the outpoint of the previous transaction; the outpoint is the canonical identifier for a particular output and is usually written with a colon separating the txid from the output index number. For example, we would write the above outpoint as 26a990967f80c01a6df04743c8f3dc9cb13ab9755db71535fdc78b25ebc7c256:1

       "scriptSig" : {
            "asm" : "[...] [...]",
            "hex" : "[...]"
        },

The scriptSig provides a cryptographic signature or other data that proves that the person creating this transaction is authorized to spend the bitcoins paid to the output being spent. The scriptSig above is shown as hex as well as "assembly code" (asm) for the Bitcoin Script language. Decoded, the hex scriptSig above converts into the following information:

48 ................................ push 0x48 (72) bytes onto the stack

3045022100c7c35b89bd1fb5d2afd5ca
d8c8024b0a1a8912627f2e54bc1b725c
6222dddeda02207b3e9edea53337f5b7
34a491789dfd92151db8ebf0dd3c68f3
582c17c912be7601 .................. 72 bytes (signature)

41 ................................ push 0x41 (65) bytes onto the stack

047b2c9f78bd9720490c77d00b879376
c2d271fa1330799146a9c9635b384882
6b9687cec45d78cbee0fe9ae74560c9c
952f750a4b60a7d358e8759fd1f715bd
6a ................................ 65 bytes (uncompressed pubkey)

Although the two data pushes above aren't labelled in the transaction itself, someone familiar with Bitcoin Script can determine what they are by looking at the data-reading opcodes in the scriptPubKey of the output being spent here.

Related to privacy, we can see that this transaction uses an uncompressed public key (pubkey). Most modern wallets use compressed public keys (33 bytes), so we can infer that the creator of this transaction is either spending an old output or is using a non-upgraded (or unmaintained) wallet.

Going back to our JSON-encoded transaction:

       "sequence" : [...]

The sequence number is a feature added to Bitcoin by Satoshi Nakamoto to enable what he called "high frequency transactions". Unfortunately the feature was found to be insecure soon after Bitcoin was released. There are future plans to repurpose the sequence number for use in other advanced Bitcoin transactions—but for now, almost all transactions disable this feature by setting sequence to its unsigned 4-byte maximum value.

"vout" : [

We come now to the Output Vector (vout) section of the transaction.

   {
        "value" : [...],

The first output here has a value of 0.01664767 bitcoins, worth about $3.95 USD when transaction was made. Where did those bitcoins come from? In order for this transaction to be valid, those bitcoins had to have been available in the input we saw above.

The amount is implicit in the input, which is why we didn't see it explicitly declared above. When Bitcoin Core validates received transactions, it checks to ensure all the inputs together have enough value to pay for all of the outputs together.

If the total amount available in the inputs is greater than the amount spent to the outputs, the miner who mines this transaction is allowed to claim the remainder as a transaction fee. This is one reason it's dangerous to create your own transactions by hand---if you accidentally forget to spend all of your input value, you'll lose the remainder to miners.

Note: there's a proposal to allow spenders to specify the max they think they're spending and have the transaction rejected if the value in the inputs exceeds that amount.

       "scriptPubKey" : {
            "asm" : "[...]",
            "hex" : "[...]",
            "reqSigs" : [...],
            "type" : "[...]",
            "addresses" : [
                "[...]"
            ]
        }
    },

The scriptPubKey specifies the conditions that must be satisfied in order for this output to be spent. In this case, the asm field is fairly useful in providing the Bitcoin Script opcodes that get used:

  • OP_DUP duplicates the data item on the top of the stack. When this code runs, that top item should be a public key.
  • OP_HASH160 hashes the data item on the top of the stack using a certain double hash we call hash160: RIPEMD160(SHA256(data)). This is the hash used to create Bitcoin addresses in their binary form (the encoded form starting with the number 1 is a special version of base58 encoding). When this code runs, that data should be the copy of the public key.
  • 82f6...7914 is a HASH160 of the public key whose corresponding private key is authorized to spend these 0.01664767 bitcoins.
  • OP_EQUALVERIFY compares the top two items on the stack to see if they're bit-identical. If they are, it pops both of them off the stack; if they aren't, it terminates the script in failure.
  • OP_CHECKSIG takes the top item on the stack as a public key and the next item on the stack as a signature and ensures that they both were derived from the same private key, as well as ensuring the signature signed the appropriate data from this transaction. (We'll cover what data gets signed in another tutorial.)

All the rest of the information in the block above is computed by Bitcoin Core for our benefit—it does not appear in the transaction directly. The "reqSigs" tells us that only one signature is required to spend this output—we'll look at a multi-sig output in a later tutorial. The type is "pubkeyhash", which we call P2PKH. The address shown here is the base58check-encoded version of the hash160 hash shown above (we'll look at base58check in another tutorial).

   {
        "value" : [...],
        "scriptPubKey" : {
            "asm" : "[...]",
            "hex" : "[...]",
            "reqSigs" : [...],
            "type" : "[...]",
            "addresses" : [
                "[...]"
            ]
        }
    }
],

Here is a second output to this transaction. Most person-to-person transactions on the blockchain have two outputs; can you figure out why?

The answer is because the full value of any inputs to a transaction must be spent or the remainder goes to miners. So typically part of the input value goes to the person being paid and the second part goes to back to the spender as "change". This output was worth about $56 USD when created, so we can think of this cafe patron as paying for his $4 drink with $60 and receiving $56 in change.

Early Bitcoin wallets always placed change as the last output in a transaction, but this reduced privacy—it made it easy to figure out the actual amount paid versus the amount returned in change. Better wallets today randomly place change within the sequence of outputs so no external correlation exists.

Let's recap what data we found in the transaction with an eye to the privacy implications:

  1. From the input (not shown previously), we know what transaction outputs were used to fund this transaction. Here are the relevant details:

     {
         "value" : 0.25282550,
         "n" : 1,
         "scriptPubKey" : {
             "asm" : "OP_DUP OP_HASH160 d120726c7b2b3dd5341ef79c0c19cae78fa9eeb2 OP_EQUALVERIFY OP_CHECKSIG",
             "hex" : "76a914d120726c7b2b3dd5341ef79c0c19cae78fa9eeb288ac",
             "reqSigs" : 1,
             "type" : "pubkeyhash",
             "addresses" : [
                 "1L4m6YkTuPpfY5DkZoYXfSH5Nk8f5ZGJs3"
             ]
         }
     }
    
  2. From the previous transaction output, we know a Bitcoin address belonging to the person now spending the bitcoins (1L4m6YkTuPpfY5DkZoYXfSH5Nk8f5ZGJs3).

  3. From the previous transaction output, we also know how many bitcoins that person previously received (0.2528255).
  4. From the outputs in this transaction, we know how many bitcoins that person paid to various other addresses (0.01664767 out and 0.23607783 back to himself as change).
  5. From the locktime, we know this transaction wasn't created with a recent version of Bitcoin Core.
  6. From the uncompressed public key in the scriptSig input, we know either an old output was spent or the user isn't using a high-quality recent wallet.
  7. We know the user still has at least $56 in bitcoins, and that Coupa Cafe now has at least $4 in bitcoins.
  8. We know the spender here returned change to the spending address, reducing his privacy and the privacy of everyone he makes trades with.

Coinjoin: a secure & decentralized privacy enhancement

alt text

Coinjoin-style transactions were invented by Bitcoin developer Gregory Maxwell to make it difficult to perform the type of naïve blockchain analysis we used above. A coinjoin-style transaction has two or more participants (the more, the better). They do something similar to mixing bitcoins, but they do it trustlessly so nobody can steal anyone else’s bitcoins. Retrieve a sample coinjoin transaction using the following code (we’ll use less again—see the instructions at the top of the previous section for help using it).

## Get a decoded coinjoin-style transaction
bitcoin-cli getrawtransaction 92a78def188053081187b847b267f0bfabf28368e9a7a642780ce46a78f551ba 1 | less

As before, we’re replacing some data values with “[…]”. Look at the transaction displayed on your terminal in less for the correct values. We’re also going to delete more of the irrelevant data this time.

Let’s skip directly to the input section:

"vin" : [
    {
        "txid" : "bc7530978073c78fbb0e020a503748130f5e10690a752eb794f6d87dd096988b",
        "vout" : 0,
    },
    {
        "txid" : "461af0f9c71cefe13db48b3dc396834cc19b0624b08aee7420a5f356e91c4992",
        "vout" : 0,
    },
    {
        "txid" : "a1d13badbaa7ea88a1ff5a347d7b715131dcde7616ce7025876e91e75d84a33c",
        "vout" : 0,
    }
],

We see three inputs; recall that the value of these inputs is implicit in the input—the actual value is only explicitly provided in the referenced output. As a convenience, we’ll provide the those output values below:

 ## bc7530978073c78fbb0e020a503748130f5e10690a752eb794f6d87dd096988b:0
  {
      "value" : 0.01053000,
  }


  ## 461af0f9c71cefe13db48b3dc396834cc19b0624b08aee7420a5f356e91c4992:0
  {
        "value" : 0.19280926,
  {

  ## a1d13badbaa7ea88a1ff5a347d7b715131dcde7616ce7025876e91e75d84a33c:0
  {
        "value" : 0.01000000,
  }

Above we see that three inputs are provided—two approximately 0.01 bitcoins in value and one much larger than that.

Now let’s look at the output amounts:

"vout" : [
    {
        "value" : [...],
        "scriptPubKey" : {
            "addresses" : [
                "[...]"
            ]
        }
    },
    {
        "value" : [...],
        "scriptPubKey" : {
            "addresses" : [
                "[...]"
            ]
        }
    },
    {
        "value" : [...],
        "scriptPubKey" : {
            "addresses" : [
                "[...]"
            ]
        }
    },
    {
        "value" : [...],
        "scriptPubKey" : {
            "addresses" : [
                "[...]"
            ]
        }
    }
],

Above we see four outputs, three of equal amount and one that’s much larger. We can assume that the much larger output is change that was returned to the person who contributed the much larger input—but for the three equal amounts, we don’t know from looking at the blockchain which participant received which share.

It’s still possible for an investigator to talk to the participants in the coinjoin transaction and discover who controls which bitcoins, but easily automated analysis of blockchain data is prevented (provided the software randomizes the outputs; some software has failed to do this and allowed automated recovery of the association between input and output).

The coinjoin transaction is trustless because all of the participants have to sign the final transaction before it can be added to the blockchain, so they each have a chance to verify that they’re receiving back the correct amount of bitcoins before signing. This means it can provide increased privacy without any reduction in security.

Exercise: calculate the transaction fee paid by the transaction above.

Anatomy of a soft fork: the BIP30 & BIP34 soft forks

Bitcoin full validation nodes (full nodes) such as Bitcoin Core each independently validate each block on the Bitcoin blockchain to determine whether or not those blocks follow Bitcoin’s rules, such as the rules about the inflation schedule (i.e., the 21 million bitcoin limit). This independent validation allows them to deterministically reach consensus about the state of the Bitcoin ledger without using voting or other corruptible processes.

The rules these nodes use to deterministically reach consensus are called consensus rules; these consensus rules aren’t perfect, so they occasionally need patches to fix bugs or upgrades to add new features. There are two known ways to change the consensus rules:

  1. A hard fork change that causes old nodes to reject blocks created by miners following the new rules.

  2. A soft fork change that upgraded miners enforce by rejecting blocks created by non-upgraded miners (if necessary).

In the first two years of Bitcoin when there was only one Bitcoin software program (now called Bitcoin Core), Satoshi Nakamoto implemented several soft forks by simply updating the code in Bitcoin Core. Now that there are hundreds of Bitcoin programs, more coordination is needed and a Bitcoin Improvement Proposal (BIP) process was created to manage soft forks, hard forks, and other changes that affect multiple programs.

Let’s take a quick look at two closely-related soft forks designed to fix a bug in Bitcoin’s merkle tree generation code. Please open BIP30 in a browser on your laptop and read the short Abstract, Motivation, and Specification sections.

Now let’s look at one set of the duplicated transactions in the blockchain.

## Original transaction
bitcoin-cli getblock $( bitcoin-cli getblockhash 91722 ) | grep -A1 tx

## Duplicate transaction
bitcoin-cli getblock $( bitcoin-cli getblockhash 91880 ) | grep -A1 tx

What happens if we try to retrieve those transactions?

## Retrieve the duplicated transaction
bitcoin-cli getrawtransaction e3bf3d07d4b0375638d5f1db5255fe07ba2c4cb067cd81b84ee974b6585fb468 1

Notice from the output that only one of the transactions is retrieved—the later one. The earlier transaction is now entirely unspendable, meaning the 50 bitcoins in it have been lost forever.

From the specification section of BIP30 we can see the first of the soft forks to prevent future cases of this bug (rephrased slightly):

Blocks are not allowed to contain a transaction whose identifier matches that of an earlier, not-fully-spent transaction in the same chain. This rule initially applied to all blocks whose timestamp is after March 15, 2012, 00:00 UTC

Miners which did not upgrade to the new code by 15 March 2012 were still willing to build duplicated transactions—but if they had, upgraded miners would’ve rejected their blocks.

Exercise: Take a moment and see if you can identify a possible flaw with the plan described above.

Time-based soft forks (a type of “flag day” upgrade) have a serious problem: if the upgraded miners don’t control a majority of the network hash rate at the switchover time, the upgraded miners may hard fork off of the network—that is, they’ll become incompatible with the non-upgraded miners and there will be two incompatible forks of the blockchain.

This isn’t a problem for hard forks where a potential incompatibility exists no matter what upgrade mechanism is used.

Happily, there were no problems with the BIP30 soft fork, but a better fix for the duplicate transaction problem described in BIP30 was implemented using a better soft forking mechanism in BIP34.

Please take a moment to open BIP34 in a web browser on your laptop, and read the entire document (it’s short).

Although the solution described in BIP30 was effective at preventing the problem, it also slowed down full node transaction validation by requiring extra lookups for each transaction. BIP34 prevents the problem by requiring miners insert some unique data into their coinbase transactions, preventing duplicates to the maximum extent available with SHA256d’s collision and second-preimage resistance.

In order to implement BIP34, a gradual soft fork upgrade was used (the first time for this particular method):

  • Individual miners indicated that they were ready to add the extra data to their coinbase transactions by producing version 2 blocks.
  • Once 750 of the last 1,000 blocks were version 2 (or higher), any new version 2 blocks became invalid if they didn’t include the extra data in their coinbase transactions.
  • Once 950 of the last 1,000 blocks were version 2 (or higher), all blocks had to be version 2 or higher, and they had to comply with the rule to add the extra data to their coinbase transactions.

Let’s sample at the block version numbers before and shortly after version 2 (or higher) became required at block height 227,930. (Note: the last version 1 block was somewhat earlier, 227,835).

## Get the version number of every 100th block between 195-230,000
for i in $( seq 195000 100 230000 ) \
  ; do echo -n "$i " \
  ; bitcoin-cli getblock $( bitcoin-cli getblockhash $i ) | grep 'version' \
; done

Some people call this blockchain voting, although it may be more accurate to call it upgrade readiness signalling. The goal is to ensure that the overwhelming majority of miners are ready to enforce a new consensus rule before that rule becomes active.

Hard fork analysis: block size voting

As this document is being written, the Bitcoin community is debating when and how the block size limit should be raised, with almost all proposals suggesting a hard forking change. One popular proposal, BIP100, suggests that miners should be allowed to vote on what block size they want within the range of 1MB to 32MB.

Although there is currently no code that implements BIP100, many miners are expressing support for it by placing the ASCII string “BIP100” in the special coinbase field of their coinbase transactions, or using BIP100-style block size voting strings which start with the ASCII string “BV” followed by 7 to 8 ASCII-encoded digits.

The following Python3 script will connect to Bitcoin Core and scan the most recent 100 blocks on your local blockchain for the above BIP100 strings and print a tally that gives you an idea about what percentage of the network hash rate is voting for BIP100.

Before running the code below, you need to install the Python3 json-rpc library so you can communicate with bitcoind.

## Install the Python3 json-rpc module
sudo pip3 install json-rpc

You will be prompted for the password you used to login to this machine. After you’ve installed the module, paste the following code into a file named fork-bip100.py.

## (python3) Objective: count the number of blocks voting for BIP100 or
## using the BVnnnnnnn convention

# Requires: sudo pip3 install json-rpc

## WARNING: read and understand the following link before writing
## production JSON-parsing code for Bitcoin:
## https://en.bitcoin.it/wiki/Proper_Money_Handling_%28JSON-RPC%29

import requests
import json
from codecs import decode
import re

### Code for reading Bitcoin Core configuration file
### taken from https://github.com/massanchik/bitcoin-python3/blob/master/src/bitcoinrpc/config.py
# Copyright (c) 2010 Witchspace <witchspace81@gmail.com>
#
# Permission is hereby granted, free of charge, to any person obtaining a copy
# of this software and associated documentation files (the "Software"), to deal
# in the Software without restriction, including without limitation the rights
# to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
# copies of the Software, and to permit persons to whom the Software is
# furnished to do so, subject to the following conditions:
#
# The above copyright notice and this permission notice shall be included in
# all copies or substantial portions of the Software.
#
# THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
# IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
# FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
# AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
# LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
# OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
# THE SOFTWARE.
"""
Utilities for reading bitcoin configuration files.
"""


def read_config_file(filename):
    """
    Read a simple ``'='``-delimited config file.
    Raises :const:`IOError` if unable to open file, or :const:`ValueError`
    if an parse error occurs.
    """
    f = open(filename)
    try:
        cfg = {}
        for line in f:
            line = line.strip()
            if line and not line.startswith("#"):
                try:
                    (key, value) = line.split('=', 1)
                    cfg[key] = value
                except ValueError:
                    pass  # Happens when line has no '=', ignore
    finally:
        f.close()
    return cfg


def read_default_config(filename=None):
    """
    Read bitcoin default configuration from the current user's home directory.
    Arguments:
    - `filename`: Path to a configuration file in a non-standard location (optional)
    """
    if filename is None:
        import os
        import platform
        home = os.getenv("HOME")
        if not home:
            raise IOError("Home directory not defined, don't know where to look for config file")

        if platform.system() == "Darwin":
            location = 'Library/Application Support/Bitcoin/bitcoin.conf'
        else:
            location = '.bitcoin/bitcoin.conf'
        filename = os.path.join(home, location)

    elif filename.startswith("~"):
        import os
        filename = os.path.expanduser(filename)

    try:
        return read_config_file(filename)
    except (IOError, ValueError):
        pass  # Cannot read config file, ignore

### End: borrowed code

## Bitcoin Core by default listens on localhost:8332
url = "http://localhost:8332"

## Your Bitcoin Core credentials are listed in the file
## .bitcoin/bitcoin.conf as the rpcuser and rpcpassword settings.
## This code will automatically load them into the script.
config =  read_default_config()
auth=(config['rpcuser'], config['rpcpassword'])

## Prepare a couple counters for the test
bip100_blocks = 0
non_bip100_blocks = 0

## Prepare a regular expression to search binary data for the strings
## associated with BIP100
bip100 = re.compile(b'BIP100|BV[0-9]{7}')

## A simple function to get data from Bitcoin Core over JSON-RPC
def request(payload):
  return requests.post(
    url,
    data=json.dumps(payload),
    auth=auth
  ).json()

## Get the height of the blockchain
block_height = request({
  "method": "getblockchaininfo"
})['result']['blocks']

## Make sure we only parse relevant information
if block_height < 360000:
  print("Block height too low; there are no blocks indicating support for BIP100.")
  print("Try letting Bitcoin Core run for a while to download recent blocks")
  print("and then try again.")
  exit(1)


## Get the header hash of the most recent block
block = request({
  "method": "getbestblockhash"
})['result']


## Loop over the last 100 blocks and process their coinbase
## transactions. This code is not guaranteed to work with Bitcoin Core's
## default settings more than 100 blocks in the past
for i in range(0,100):
  ## Get all the data from the block
  block_data = request({
    "method": "getblock",
    "params": [ block ]
  })

  ## The header hash of the previous block, which we'll use in the next loop
  block = block_data['result']['previousblockhash']

  ## The txid of the coinbase transaction
  coinbase_txid = block_data['result']['tx'][0]

  ## Each coinbase transaction has a single input; this gets it.
  ## (This special input is called the coinbase field)
  coinbase_field = request({
    "method": "getrawtransaction",
    "params": [ coinbase_txid, 1 ]
  })['result']['vin'][0]['coinbase']

  ## Convert the coinbase field to binary for searching
  coinbase_field_binary = decode(coinbase_field, 'hex')

  ## Increment counters based on matching
  if bip100.search(coinbase_field_binary):
    bip100_blocks += 1
  else:
    non_bip100_blocks += 1

## Show our results
print("BIP100 blocks: ", bip100_blocks)
print("Non-BIP100 blocks: ", non_bip100_blocks)

Once you’ve saved the fork-bip100.py file, you can run it with the following command:

## Run fork-bip100.py
python3 fork-bip100.py

It may take up to a minute or more to run, but afterwards it will print two lines to your console: the number of the last 100 blocks voting for BIP100 and the number not voting for BIP100.