ARC - Towards Automated Regulation Analysis for Effective Privacy Compliance

This repository contains the data, artifact and online appendix for the paper.

Introduction

Privacy regulations are being introduced and amended around the globe to effectively regulate the processing of consumer data. These regulations are often analyzed to fulfill compliance mandates and to aid the design of practical systems that improve consumer privacy. However, at present, this is done manually, making the task error-prone, while also incurring significant time, effort, and cost for companies. This paper describes the design and implementation of ARC, a framework that transforms unstructured and complex regulatory text into a structured representation, the ARC tuple(s), which can be queried to assist in the analysis and understanding of regulations. We demonstrate ARC’s effectiveness in extracting three forms of tuples with a high F-1 score (avg. 82.1% across all three) using four major privacy regulations: CCPA, GDPR, VCDPA, and PIPEDA. We then build ARCBert that identifies semantically similar phrases across regulations, enabling compliance analysts to identify common requirements. We run ARC on 16 additional privacy regulations and identify 1,556 ARC tuples and clusters of semantically similar phrases. Finally, we extend ARC to evaluate the compliance of privacy policies by comparing it against the disclosure requirements in the four regulations. Our empirical evaluation with the privacy policies of S&P 500 companies finds 476 missing disclosures, which when manually validated, result in 71.05% true positives, as well as the discovery of 288 additional missing disclosures from the partial matches identified by ARC.

Data Release

We provide the raw dataset and the data generated by ARC.

S&P'500 Privacy Policies

The crawled HTML privacy policies are listed under html_files. Among them, the file containing _Cal suffix refers to California, _CAN refers to Canada, _EUR refers to europe, _VIR refers to Virginia, whereas _GEN refers to generic privacy policy that applies to all four regulations.

Clustered Phrases

The phrases clustered using ARCBert model are listed under clustered_phrases. The folder contains clustered phrases for each Semantic Role Arguments.

Online Appendix

We provide the online appendix for the paper here.

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
appendix		appendix
data		data
.gitignore		.gitignore
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

ARC - Towards Automated Regulation Analysis for Effective Privacy Compliance

Introduction

Data Release

S&P'500 Privacy Policies

Clustered Phrases

Online Appendix

About

Uh oh!

Releases

Packages

Uh oh!

Languages

Secure-Platforms-Lab-W-M/ARC

Folders and files

Latest commit

History

Repository files navigation

ARC - Towards Automated Regulation Analysis for Effective Privacy Compliance

Introduction

Data Release

S&P'500 Privacy Policies

Clustered Phrases

Online Appendix

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages