Skip to content

Collect and analyze film certification data from the Central Board of Film Certification (CBFC) in India.

Notifications You must be signed in to change notification settings

diagram-chasing/censor-board-cuts

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

65 Commits
 
 
 
 
 
 
 
 

Repository files navigation

censor-board-cuts

Note: Data updates paused as of June 12th, due to breaking changes on the e-Cinepramaan portal

Dataset and related anlysis of modifications or cuts made by the Central Board of Film Certification (CBFC), India.

The dataset consists of two main components:

  • Raw Data: Raw category and certificate data from the CBFC website, stored in data/raw/
  • Processed Data: Cleaned up data enhanced with code-based and LLM-based analysis of cuts, stored in data/data.csv

Preview

Further data is available in the data/ directory.

Data Collection

The following scripts fetch data from the CBFC website:

The above scripts incrementally fetch new films and append them to the relevant CSV files. After fetching the data from the CBFC website, code-based analysis of the metadata and modifications is done in scripts/analysis/ and LLM-based analysis is done in scripts/llm/. Next, scripts/imdb/ further enhances the metadata and all the fetched data is joined together using scripts/join/ which saves the final data in data/data.csv.

Data Analysis

The code-based analysis is done by a Python script scripts/analysis/main.py that cleans and processes the raw data:

  • Standardizes duration formats and attempts to pull out timestamps from the descriptions.
  • Categorizes modifications based on type (audio, visual, deletion, etc.) and the basic type of content (violence, nudity, etc.) using an LLM.

TODO

  • Create a dashboard for exploring the data.

Related Projects

About

Collect and analyze film certification data from the Central Board of Film Certification (CBFC) in India.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 2

  •  
  •  

Languages