CS 6410: Advanced Systems (Fall 2017)

MVB: A Minimally-Viable Blockchain

(09/11/17 - 05:02PM)

Introduction

In this project you will be implementing a minimally viable blockchain, called MVB, and – consequently – will instantiate a new coin called MVBCoin.

Disclaimer #1: We are partially misleading you. While you will build many of the core components of a blockchain, there are certain components of actual cryptocurrencies (such as Bitcoin) which we will not have you worry about. Without these components, MVBCoin is actually unsafe and susceptible to attacks (such as double spends). Therefore, hold back on trading your MVBCoins!

Disclaimer #2: We assume that you have already familiarized yourself with the Bitcoin paper [1, 2], and have some rudimentary understanding of the protocol. This document will avoid any algorithmic descriptions, and rather simply focus on the technical details. Please read [1, 2] for protocol specifics.

Deliverables

You will be submitting a .zip file of a directory which you will call mvbcoins. The internals of the directory will be laid out in the following manner:

mvbcoins/
    Makefile
    README.txt
    src/

Makefile is executed by simply calling make (See Makefile. Internally, the Makefile should compile your sources and output a file called node inside mvbcoins/
README.txt will include any special information you want us to know about your submission. This could include potentially some steps we need to take before running make. For example, your MVBCoins node is required to work on Fractus and may require some instructions for installing a software package. However, most likely (and ideally), README.txt should be empty and without any special instructions. But if you must absolutely request some special instructions, this is the place to put them. We will, however, not accept special instructions on modification of system-wide environment variables, such as – for example – special setups to GOPATH and GOROOT if you are using GO.

After calling make, mvbcoins will look as follows

mvbcoins/
    Makefile
    README.txt
    node
    src/

If node is compiled in some strange directory, you will be penalized.
The output executable node will be run with the following command line arguments:

bootstrap: If ‘’–bootstrap’’ is passed to node, all other arguments are ignored and node becomes a bootstrap node.
numworkers: The number of worker nodes, e.g. ‘’–numworkers 2’’.
numcores: The number of computational cores available to your node, e.g. ‘’–numcores 2’’.
ports: A space separated list of ports to attach to worker threads, e.g. ‘’–ports 10000 10001 10002’’. The number of workers must be the same as the number of ports passed.
numtxinblock: The number of transactions to be put in a block. Defaults to 50000.
difficulty: Difficulty parameter. Defaults to 1.

node can be written in any of the following languages: C, C++, Javascript, Python, Golang, Java, or Rust. If you want to use a language not listed, ping the TA (Kevin). If you are submitting your program in an interpreted language (such as Python), please ensure that it is wrapped up in a shell command, which then calls the interpreter. All tests will be run as:

./node arguments

Important: Your submission must work on Fractus. In general, a 32/64-bit Linux machine (not any arbitrary Unix, and definitely not Windows), but, in particular, Fractus from MP0.a (Eucalyptus) or MP0.b (OpenStack).

If your project cannot be compiled from the command line using the make file and needs some special IDE or tools, your submission will not be accepted.

Initial steps

You will download on CMS an executable called tester. This executable takes in several parameters. Run ./tester -h to see the list of all accepted arguments and how this tool will help you lay out your wire-protocol requirements.

As an example:

./tester --ports 10000 --testtransaction --withdoublespends --concurrentconns 10 --numtransactions 1000 --verbose

Will launch 10 concurrent nodes that will bombard your implementation on port 10000 using 1000 sequential transactions. Careful about these numbers here. If you enable verbose for a lot of concurrent connections and transactions, you cause your terminal to flush quite a bit of text. If you accidentally write --concurrentconns 1000000 against your single process (which may not be suited yet to handling many connections), your machine may crash. tester is aggressive, so careful with the commands.

You will also download an executable called server. This executable works in the inverse direction of tester, i.e. it simply absorps anything you send to it and doesn’t reply back. It simply prints it out. Careful about how you connect to server, because it may print things out in a jumbled order if you send multiple concurrent messages to it.

Both of these tools will be helpful tools in sanity checks.

Architecture

Conventions: All messages will be raw bytes. By convention, if we say:

M = [VALUE1(6B)||VALUE2(32B)]

This means that the message is a 38 byte message, where the first 6 bytes are VALUE1 and the next 32 bytes are VALUE2.

Now, let’s say VALUE1 is, for example, a port number. If the port is 9000, then the most efficient way of encoding the value 9000 is in a 14 bit number, which you round up to 16 bits, for a total of 2 bytes. YOU WILL NOT BE EFFICIENT. You will, on the other hand, encode each digit as a single character, such that each digit is 1 byte. Furthermore, you must prepend the number with any leading zeros. In other words, you may not encode the port with only 4 bytes. You must encode with 6 bytes. In that case, VALUE1 must be the encoded byte string b’009000’, which as a raw byte array is [48 48 57 48 48 48].

Use tester and server to sanity check your message parsing.

Launch procedure: The first node to be launched must be the bootstrap node. This node defaults to port 8888, and is the main entry point for discovery for all other nodes. You can implement the internals of the boostrap node however you would like. However, once launched the bootstrap node remains online and listens for messages of the format:

PORT(6B)||NODE_ADDRESS(32B)

Where PORT is the port number of a worker thread (to be defined), and NODE_ADDRESS is the node address (a node has multiple worker threads). Make NODE_ADDRESS to be:

NODE_ADDRESS = SHA256("your_netid")

The bootnode will respond with a message of the form

OP_CODE_BOOTNODE(1B)||NUM_PORTS(6B)||PORT_1(6B)||PORT_2(6B)...||PORT_N(6B)

where NUM_PORTS = N, and OP_CODE_BOOTNODE defaults to '4'. Make sure that the bootnode stays alive and repeatedly broadcasts its known set of ports.

Messages

There will be five types of messages that nodes can exchange with one another:

SEND_TRANSACTION
(1B, 32B, 32B, 32B, 32B)
[0, SENDER, RECEIVER, AMOUNT, TIMESTAMP]
Description: The first byte is the opcode, and for SEND_TRANSACTION it defaults to the number 0. The next 128 bytes are SENDER, RECEIVER, AMOUNT, and TIMESTAMP.
SEND_BLOCK
(1B, 32B, 32B, 32B, 32B, 32B, 128B * numTransactionsInBlock)
[1, NONCE, PRIOR HASH, HASH, BLOCKHEIGHT, MINER_ADDRESS, BLOCKDATA]
Description: The first byte is the opcode, defaults to 1.
The NONCE is the nonce of this block, which must be mined.
PRIOR HASH is the hash of the prior block (genesis block hash PRIOR_HASH as SHA256(0)).
HASH = SHA256(NONCE||PRIOR HASH||BLOCKHEIGHT||MINER_ADDRESS||BLOCKDATA).
BLOCKHEIGHT is the height of the block (genesis block has height 0).
MINER_ADDRESS is your node address, as described above.
BLOCKDATA = [SENDER||RECEIVER||AMOUNT||TIMESTAMP||SENDER||RECEIVER||AMOUNT||TIMESTAMP…].
A block has numTransactionsInBlock number of transactions, for a total of 128B * numTransactionsInBlock bytes.
GET_BLOCK
(1B, 32B)
[2, BLOCKHEIGHT]
GET_HASH
(1B, 32B)
[3, BLOCKHEIGHT]
SEND_PORTS
(1B, 6B, 6B * NUM_PORTS)
[4, NUM_PORTS, PORT_1, PORT_2, ..., PORT_N]

Where numTransactionsInBlock defaults to 50000.

Note: It is extremely important to make sure your messages conform to the specified format above. If even a single byte is off, it may be possible for your submission to fail. Use tester and server to sanity check your message formats.

Data Structures

It is up to you to manage the internal datastructures of your node. However, here are a few helpful tips on what you should be maintaining:

The Blockchain: The blockchain itself. Caution: You should be careful about wrapping up your blockchain(s) in a datastructure that can maintain multiple forks of the blockchain if need be.

The Mempool: The mempool is the set of all transactions that have not yet been placed in a block that has been mined. In distributed-computing terms, this is the list of all transactions that have been received, but not yet committed.

The UTXO Set: The UTXO set is the set of all addresses that can spend money. Initially, the UTXO set is initialized as follows:

for i = 0 to 100:
    add {sha256(convert_to_byte_array(string(i))), 100000} to UTXO set

I.e. there are 100 addresses that can spend 100000 MVBCoins each. Remember to do this initialization when playing around with tester, because it assumes you have initialized this set of accounts.

WARNING: Here be dragons, or however the saying goes! It is important to realize that some transactions may need to be rolled-back. For example, suppose you mine a block with height 500, but another block with the same height happened to be mined at the same time (the other block is also valid). Your block got pushed back by the rest of the miners. You now must somehow be careful about rolling back all the transactions that you may have “pre-committed”. I.e. you must roll-back all account changes to your UTXO set.

Milestone #1 (Due: 09/19/17 11:59pm)

For the first milestone, you will be writing a single-process, single-threaded version of MVB. Notice that this means that the command line arguments to node will slight change in functionality. --numworkers will default to 1 and will ignore any number you supply to it, and --ports will only parse the first port and ignore the rest. Your node must do the following:

Remain quiescent until a new transaction is received. (Note, new transactions are received via the SEND_TRANSACTION message.)
Upon receiving a new transaction, check the validity of the transaction against the UTXO set. (I.e. ensure that the transaction is from a SENDER with a positive amount of MVBCoins in their account.)
If the transaction is valid, add to the latest unmined block, and propagate the transaction (i.e. propagate the transaction by sending it to all other nodes via the SEND_TRANSACTION message.) If the transaction is invalid, simply disregard it. Note that by valid we mean that the sender exists and has enough funds.
If the current unmined block reaches numTransactionsInBlock number of transactions, it is time to mine.
To mine, first compute HASH = sha256(NONCE, PRIOR HASH, BLOCKHEIGHT, MINER_ADDRESS, BLOCKDATA), where NONCE is the randomly chosen 32 byte value for this puzzle, PRIOR HASH is the 32 byte hash of the prior block, BLOCKHEIGHT is the 32 byte value of the height of this block (genesis block starts at 0), MINER_ADDRESS is 32 byte address of your node, and BLOCKDATA is the byte array of transactions concatenated together, for a total of 128 bytes * numTransactionsInBlock. After computing the 32 byte HASH using your chosen NONCE, ensure that the first 3 bytes are integer 48 (this is ‘0’). Once an appropriate nonce is found, the block has been mined, and must be broadcast to all peers.
If – while mining – a new block with the same height as the current unmined one is received, the node must drop its current work and accept the new block (no need to ensure that all transactions are valid due to our trust assumption).
If you receive a block that has a height greater than your latest height, then you must use messages GET_BLOCK and GET_HASH to synchronize your local state of the blockchain with the global state of the blockchain. GET_BLOCK will return the block at the requested height, whereas GET_HASH will return the hash of the block at the requested height.

Note that we are purposefully ignoring the internals of the algorithms chosen to perform your blockchain management. We impose no requirements on how you manage your UTXO set, and how you ensure that you roll-back transactions that you have pre-committed after receiving a new block. We only require that if your node detects multiple blockchains, then the winning blockchain is the one with at least 6 blocks over all others. As a node, you always mine only for the longest blockchain.

What/How to submit (Milestone #1)

Zip your entire mvbcoins software distribution directory and submit the output mvbcoins.zip. Use the following command, exactly: zip -r mvbcoins.zip mvbcoins

Milestone #2 (Due: 10/03/17 11:59pm)

For the second milestone, you will optimize your node, by making usage of concurrency.

Unlike the first milestone, the second milestone will be multithreaded, where --numworkers flag will now indicate the number of working ports that your node supports. You can think of this architectural design as a collection of machines, each listening in on a unique port. As a reminder, --numworkers must be the same as the ports passed using --ports flag.

We will also add an additional flag, --numcores, which indicates to your node how many cores there are available. Make usage of this flag in your architectural design (e.g. --numcores can be the number of threads in a thread pool). So, while --numworkers indicates the number of “machines”, --numcores indicates the number of threads you should be making utilization of. For example, if --numworkers is 2 and --numcores is 4, then you will be listening in on two ports, and the aggregate of all ports should use 4 threads.

What/How to submit (Milestone #2)

You will be handing in two things:

Submit a zip of your entire mvbcoins software distribution directory, named mvbcoins.zip. Use the following command, exactly: zip -r mvbcoins.zip mvbcoins
Submit a document describing your performance. It needs to include the following:
a. Initial performance at the end of milestone 1, final performance at the end of milestone 2, and how you improved performance.
b. Include how you profiled and discovered bottlenecks within your implementation.
c. How you designed your second milestone submission to overcome the bottlenecks and take best advantage of the available resources (i.e. your architecture).
d. A graph where the y-axis is the number of blocks/second, and the x-axis is --numcores. Assume default --numtxinblock and 2 --numworkers.
e. Describe what happens to your implementation if you increase --numworkers, but keep --numcores fixed? Similarly, what happens if you increase --numcores, but keep --numworkers fixed?

Assumptions

You can assume that other nodes (miners) are not malicious and will only mine valid blocks. However, you may receive transactions from clients that are double spends (or plain invalid). In other words, when you receive blocks (either from broadcasts or from explicit synchronization requests), you can assume the block you receive back has been correctly mined and has no invalid transactions (from the view of the miner that sent you the block). You must be careful about rolling back transactions appropriately if you detect a fork (as we warned before).

FAQ

TCP or UDP?

TCP.

What should the boostrap node do when it gets a new node connection?

Send all known ports to all known ports. If a node has gone down, remove it from the known-port-list.

I'm confused about the number of worker threads. Is there only one miner?

There are multiple miners (multiple instances of the process node), but each of the miners has only a single thread.

What should I do as soon as I get a new transaction?

Check if the transaction is valid, then to all other miners using the SEND_TRANSACTION message.

References

[1] Bitcoin: A peer-to-peer electronic cash system. Nakamoto, Satoshi. Consulted 1.2012 (2008): 28.

[2] Bitcoin and Cryptocurrency Technologies: A Comprehensive Introduction. A. Narayanan, J. Bonneau, E. Felten, A. Miller, S. Goldfeder. Princeton University Press; 2016.
Read Chapters 1 and 2.