StackOverflow 10k Rows Dataset

Overview

This repository contains a collection of 10,000 rows of data scraped from StackOverflow discussions related to Python. The dataset offers a unique insight into common questions, programming challenges, and the level of community engagement within the Python section of StackOverflow.

Scraping Notebook: A Jupyter notebook detailing the process used to scrape StackOverflow discussions.
10k Dataset: The raw dataset comprising 10,000 rows of scraped data.
Categorization Notebook: A Jupyter notebook that categorizes StackOverflow posts into popularity categories based on scraped features.
10k Categorized Dataset: The dataset after categorization, based on features such as upvotes, views, and answers.

Dataset Schema

The dataset includes the following columns:

link: URL of the discussion.
upvotes: Number of upvotes (can be negative).
answers: Number of answers in the discussion.
views: Number of views for the discussion.
content: The content of the question, excluding code and post notices.
code_length: The character length of the code within the question.

Sample Data

SO_10k

SO_10k_categorized

How to Use

Clone the repository: Get a local copy of the dataset and notebooks for analysis.
Explore the dataset: Use the provided Jupyter notebooks to understand the scraping and categorization processes.
Analysis: Leverage the categorized dataset for further analysis, such as identifying trends, common questions, and the impact of different factors on post popularity.

Contribution

Contributions to improve the dataset, scraping, or categorization methods are welcome. Please submit a pull request or open an issue to discuss potential enhancements.

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
Categorization.ipynb		Categorization.ipynb
README.md		README.md
SO_10k.csv		SO_10k.csv
SO_10k_categorized.csv		SO_10k_categorized.csv
Stack_Overflow_Scraping.ipynb		Stack_Overflow_Scraping.ipynb
stack_overflow - stack_overflow.csv		stack_overflow - stack_overflow.csv
stack_overflowMerged (1).csv		stack_overflowMerged (1).csv

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

StackOverflow 10k Rows Dataset

Overview

Contents

Dataset Schema

Sample Data

SO_10k

SO_10k_categorized

How to Use

Contribution

About

Releases

Packages

Languages

pclk/Stackoverflow-10krows

Folders and files

Latest commit

History

Repository files navigation

StackOverflow 10k Rows Dataset

Overview

Contents

Dataset Schema

Sample Data

SO_10k

SO_10k_categorized

How to Use

Contribution

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages