-
Notifications
You must be signed in to change notification settings - Fork 48
Use Spark and Scala with Bitcoin Blockchain data
This is a Spark application written in Scala (find here the Java version) demonstrating some of the capabilities of the hadoopcryptoledger library. It takes as input a set of files on HDFS containing Bitcoin Blockchain data. As output it returns the total number of inputs of transactions found in the blockchain data. It has successfully been tested with the Cloudera Quickstart VM 5.5 and HDP Sandbox 2.5, but other Hadoop distributions should work equally well. Spark 1.5 was used for testing.
See here how to fetch Bitcoin blockchain data.
After it has been copied you are ready to use the example.
Execute
git clone https://github.com/ZuInnoTe/hadoopcryptoledger.git hadoopcryptoledger
You can build the application by changing to the directory hadoopcryptoledger/examples/spark-scala-bitcoinblock and using the following command:
sbt clean assembly test it:test
This will also execute the integration tests
You will find the jar "example-hcl-spark-scala-bitcoinblock.jar" in ./target/scala-2.10
Make sure that the output directory is clean:
hadoop fs -rm -R /user/bitcoin/output
Execute the following command (to execute it using a local master)
spark-submit --class org.zuinnote.spark.bitcoin.example.SparkScalaBitcoinBlockCounter --master local[8] ./target/scala-2.10/example-hcl-spark-scala-bitcoinblock.jar /user/bitcoin/input /user/bitcoin/output
After the Spark job has completed, you find the result in /user/bitcoin/output. You can display it using the following command:
hadoop fs -cat /user/bitcoin/output/part-00000
Understanding the structure of Bitcoin data:
Blocks: https://en.bitcoin.it/wiki/Block
Transactions: https://en.bitcoin.it/wiki/Transactions