🧹 Data Sweeper Pro+

Data Sweeper Pro+ is an advanced data cleaning and transformation platform built with Streamlit. It allows users to upload datasets, clean them, analyze them with interactive profiling reports, and export the cleaned data in multiple formats. The app is designed for both technical and non-technical users, offering a user-friendly interface with powerful data processing capabilities.

🚀 Features

1. File Upload

Supports multiple file uploads in CSV and Excel formats.
Handles large datasets efficiently.

2. Data Profiling

Generate an interactive Profile Report using ydata-profiling to explore:
- Missing values
- Duplicate rows
- Data types
- Statistical summaries
- Correlations
Fully interactive HTML report embedded in the app.

3. Data Cleaning

Remove duplicate rows.
Handle missing values with strategies like:
- Drop rows
- Fill with mean/median
- KNN imputation.
Normalize numerical columns.

4. Data Transformations

Column Operations:

Select specific columns to keep or reorder them.

Data Type Conversion:

Convert columns to desired data types: string, integer, float, or datetime.

Feature Engineering:

Add new columns based on existing ones (e.g., sum of two columns).
Extract date parts (e.g., year from a date column).
Apply custom formulas for advanced transformations.

5. Visualization

Generate interactive charts using Plotly:
- Histograms
- Scatter plots
- Box plots
- Line charts

6. Export Options

Export cleaned data in multiple formats:
- CSV
- Excel
- JSON

🛠️ Installation

Prerequisites:

Python >= 3.12
pip (Python package manager)

Step-by-Step Guide:

Clone the repository:

git clone https://github.com/your-repo/data-sweeper-pro.git
cd data-sweeper-pro

Create a virtual environment (optional but recommended):

python -m venv myenv
source myenv/bin/activate    # On Linux/MacOS
myenv\Scripts\activate       # On Windows

Install dependencies:
```
pip install -r requirements.txt
```
Run the app:
```
streamlit run app.py
```
Open the app in your browser at http://localhost:8501.

📂 Directory Structure

data-sweeper-pro/
├── .streamlit/
│   └── config.toml       # Streamlit theme configuration
├── app.py                # Main Streamlit application script
├── requirements.txt      # Python dependencies list
├── large_test_data.csv   # Example large dataset for testing (optional)
└── README.md             # Project documentation (this file)

📊 Example Use Case

Upload a dataset (large_test_data.csv) containing missing values, duplicates, and mixed data types.
Generate a full profile report to explore the dataset.
Clean the data by removing duplicates, handling missing values, and normalizing numerical columns.
Apply transformations like converting column types or creating new features.
Visualize trends and patterns using interactive charts.
Export the cleaned dataset as a CSV or Excel file.

🧩 Dependencies

The following Python libraries are used in this project:

streamlit==1.29.0
pandas==2.1.3
numpy==1.26.4
plotly==5.18.0
ydata-profiling==4.12.2
scikit-learn==1.3.2
openpyxl==3.1.2
scipy==1.11.4

Install them using:

pip install -r requirements.txt

🤝 Contributing

Contributions are welcome! Please follow these steps:

Fork this repository.
Create a new branch (git checkout -b feature-name).
Commit your changes (git commit -m "Add feature-name").
Push to your branch (git push origin feature-name).
Open a pull request.

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
.streamlit		.streamlit
.gitattributes		.gitattributes
.gitignore		.gitignore
Readme.md		Readme.md
app.py		app.py
large_test_data.csv		large_test_data.csv
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

🧹 Data Sweeper Pro+

🚀 Features

1. File Upload

2. Data Profiling

3. Data Cleaning

4. Data Transformations

Column Operations:

Data Type Conversion:

Feature Engineering:

5. Visualization

6. Export Options

🛠️ Installation

Prerequisites:

Step-by-Step Guide:

📂 Directory Structure

📊 Example Use Case

🧩 Dependencies

🤝 Contributing

About

Uh oh!

Releases

Packages

Languages

Pranshu936/Data_cleaning

Folders and files

Latest commit

History

Repository files navigation

🧹 Data Sweeper Pro+

🚀 Features

1. File Upload

2. Data Profiling

3. Data Cleaning

4. Data Transformations

Column Operations:

Data Type Conversion:

Feature Engineering:

5. Visualization

6. Export Options

🛠️ Installation

Prerequisites:

Step-by-Step Guide:

📂 Directory Structure

📊 Example Use Case

🧩 Dependencies

🤝 Contributing

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages