Hadoop File Format

Basically this Hadoop file format is suitable for reading transactions and blocks from files in HDFS containing crypto ledger data. They can be used by any MapReduce/Tez/Spark application to process them. Currently the following crypto ledgers are supported:

Bitcoin. This module will provide three formats:
- BitcoinBlockInputformat: Deserializes blocks containing transactions into Java-Object(s). Each record is an object of the class BitcoinBlock containing Transactions (class BitcoinTransaction). Best suitable if you want to have flexible analytics. The key (ie unique identifier) of the block is currently a byte array containing hashMerkleRoot and prevHashBlock (64 Byte).
- BitcoinRawBlockInputformat: Each record is a byte array containing the raw bitcoin block data. The key (ie unique identifier) of the block is currently a byte array containing hashMerkleRoot and prevHashBlock (64 Byte). This is most suitable if you are only interested in a small part of the data and do not want to waste time on deserialization.
BitcoinTransactionInputFormat: Deserializes Bitcoin transactions into Java-Object(s). Each record is an object of class BitcoinTransaction. Transactions are identifiable by their double hash value (32 byte) as specified in the Bitcoin specification. This makes it easy to link inputs of other transaction to the originating transaction. They do not contain block header data. This make sense if you anyway want to analyse each transaction independently (e.g. if you want to do some analytics on the scripts within a transaction and combine the results later on).

Build

Note the Hadoop File Format is available on Maven Central and you do not need to build and publish it to a local Maven anymore to use it.

Execute:

git clone https://github.com/ZuInnoTe/hadoopcryptoledger.git hadoopcryptoledger

You can build the application by changing to the directory hadoopcryptoledger/inputformat and using the following command:

../gradlew clean build publishToMavenLocal

Use

Configure

The following configuration options exist:

"io.file.buffer.size": Size of io Buffer. Defaults to 64K
"hadoopcryptoledger.bitcoinblockinputformat.maxblocksize": Maximum size of a Bitcoin block. Defaults (since version 1.0.1) to: 2M. If you see exceptions related to this in the log (e.g. due to changes in the Bitcoin blockchain) then increase this.
"hadoopcryptoledger.bitcoinblockinputformat.filter.magic": A comma-separated list of valid magics to identify Bitcoin blocks in the blockchain data. Defaults to "F9BEB4D9" (Bitcoin main network). Other Possibilities are are (https://en.bitcoin.it/wiki/Protocol_documentation) F9BEB4D9 (Bitcoin main network), FABFB5DA (testnet) ,0B110907 (testnet3), F9BEB4FE (namecoin), FBC0B6DB (Litecoin), FCC1B7DC (Litecoin Testnet)
"hadoopcryptoledeger.bitcoinblockinputformat.usedirectbuffer": If true then DirectByteBuffer instead of HeapByteBuffer will be used. This option is experimental and defaults to "false".
"hadoopcryptoledeger.bitcoinblockinputformat.issplitable" (since version 1.0.1): if true then we use the default Hadoop FileInputFormat mechanism to split files (if possible). This implies using a heuristic to find the start of a BitcoinBlock using the magic number. While this should work normally in all of the cases, it cannot be excluded that it uniquely marks the start of a Bitcoin block (e.g. in case it is part of a hash). Defaults to "false". In case of "false" it is recommended to create multiple files of at least the size of one or multiple HDFS blocks containing Bitcoin Blockchain data.

More Information

Understanding the structure of Bitcoin data:

Blocks: https://en.bitcoin.it/wiki/Block

Transactions: https://en.bitcoin.it/wiki/Transactions

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Hadoop File Format

Build

Use

Configure

More Information

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Clone this wiki locally