Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Query: wallet/address clustering, privacy leak analysis #33

Open
bitjson opened this issue Nov 24, 2021 · 2 comments
Open

Query: wallet/address clustering, privacy leak analysis #33

bitjson opened this issue Nov 24, 2021 · 2 comments
Labels
enhancement New feature or request query-request Feature request to enable a new kind of query

Comments

@bitjson
Copy link
Member

bitjson commented Nov 24, 2021

The ecosystem needs better (public, open source) visibility into privacy leaks to continue improving privacy for average users. And in the non-custodial world of cryptocurrency, privacy is protection from theft and physical violence, particularly for less wealthy users and those living under failing regimes.

Particularly when claiming funds after chain splits, transactions from multiple chains often reveal far more about a user's activity than they realize. Chaingraph is uniquely suited for clustering and privacy analysis because we can easily operate across multiple chains. Privacy analysis features need not take nodes acceptance into account at all, clustering should be performed on all transactions in the database, regardless of chain.

Blockchair's Privacy-o-meter documentation is probably the best summary of available clustering heuristics. (See also – this excellent thread about privacy leak via address types.) Chaingraph should implement and display some of these heuristics by default, and make it easy to enable the rest for block explorer-type applications.

In addition to those heuristics, we should try to support clustering by timing information. (E.g. merge avoidance isn't very useful if several chains of otherwise disconnected transactions are inactive for months but always move in the same hour.)

We should also add opt-in support for tracking and querying the actual address clusters. (#29 will probably be valuable for performance, I imagine we'll want to do most of the computation on the agent before saving transactions to the database.) In addition to being able to query the full list of clustered transactions, it would be fantastic if we supported materializing columns for:

  • transaction count
  • earliest known transaction (by timestamp of earliest including block)
  • most recent transaction (by timestamp of latest including block)
  • total funds received (spent and unspent)
  • total unspent balance

Finally it would be nice to support aggregated statistics (depends on #32) for:

  • total clusters (as an estimated wallet count)
  • total clusters active in the past day/month/etc.,
  • largest clusters,
  • most active clusters (by volume)
  • average and percentile clusters balances.

(Keyword for searchers: coinjoin, coinshuffle, cashshuffle, cashfusion, taint analysis)

@bitjson bitjson added enhancement New feature or request query-request Feature request to enable a new kind of query labels Nov 24, 2021
@bitjson
Copy link
Member Author

bitjson commented Jan 22, 2022

An easy first step here is to implement some basic analysis of CashFusion usage. Some ideas/discussion: Rucknium/CashFusionStats#2

@Rucknium
Copy link

@bitjson Sounds good. As a long term goal I will try to incorporate detection of these transaction-level privacy defects into rbch, which could eventually trickle down into CashFusionStats.

My sense is that clustering can be tricky and highly dependent on judgement calls. For example, FATF recently asked Chainalysis, CipherTrace, Coinfirm, Elliptic, Merkle Science, Scorechain, and TRM Labs to estimate the prevalence of illicit activity on the BTC blockchain. The top-line conclusion was:

As set out below, there are significant challenges in the development and interpretation of these market metrics. The blockchain analytic companies often reached starkly divergent results for the same questions, so caution must be exercised in drawing conclusions.

See paragraphs 76 - 102 of their report.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request query-request Feature request to enable a new kind of query
Projects
None yet
Development

No branches or pull requests

2 participants