Skip to content

Analyse Namecoin data using Apache Spark

Jörn Franke edited this page Nov 12, 2017 · 3 revisions

This is a Spark application demonstrating some of the capabilities of the hadoopcryptoledger library. It takes as input a set of files on HDFS containing Namecoin Blockchain data. As output it returns the total number of transactions found in the blockchain data. It has successfully been tested with the Cloudera Quickstart VM 5.5 and HDP Sandbox 2.5, but other Hadoop distributions should work equally well. Spark 1.5 was used for testing.

Namecoin describes itself as a distributed blockchain based domain name and identity system. Namecoin data has the same data structures as Bitcoin data, but has 1) special output scripts for name operations and 2) by using merged mining/AuxPOW as an incentive for Bitcoin miners to mine as well Namecoins. Both introduces additional data structures. The first one is addressed by additional methods (cf. Useful Utility functions) and the second one by supporting reading of AuxPOW information, which needs to be activated that you can properly process Namecoin blockchain data (see here). Finally, you need to configure the Namecoin network magic instead of the Bitcoin one (cf. Support for Altcoins based on Bitcoin)

Getting blockchain data

See here how to fetch Namecoin blockchain data.

After it has been copied you are ready to use the example.

Building the example

Execute

git clone https://github.com/ZuInnoTe/hadoopcryptoledger.git hadoopcryptoledger

You can build the application by changing to the directory hadoopcryptoledger/examples/spark-scala-namecoinblock and using the following command:

sbt clean assembly test it:test

This will also execute the integration tests

You will find the jar "example-hcl-spark-scala-namecoinblock.jar" in ./target/scala-2.10

Running the example

Make sure that the output directory is clean:

hadoop fs -rm -R /user/namecoin/output

Execute the following command (to execute it using a local master)

spark-submit --class org.zuinnote.spark.namecoin.example.SparkScalaNamecoinBlockCounter --master local[8] ./target/scala-2.10/example-hcl-spark-scala-namecoinblock.jar /user/namecoin/input /user/namecoin/output

After the Spark job has completed, you find the result in /user/namecoin/output. You can display it using the following command:

hadoop fs -cat /user/namecoin/output/part-00000

More Information

Blog about Namecoin analytics: https://snippetessay.wordpress.com/2017/10/10/big-data-analytics-on-bitcoins-first-altcoin-namecoin/

Understanding the structure of Bitcoin data (Litecoin is very similar):

Blocks: https://en.bitcoin.it/wiki/Block

Transactions: https://en.bitcoin.it/wiki/Transactions

Generic information about Namecoin: https://en.wikipedia.org/wiki/Namecoin

Namecoin Webpage: https://namecoin.org

Clone this wiki locally