Skip to content

πŸŽ›οΈ A collection of diverse regression datasets, featuring PyTorch-like dataset classes that automatically download and load datasets.

License

Notifications You must be signed in to change notification settings

PM25/Regression-Datasets

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

27 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

πŸŽ›οΈ Regression Datasets

πŸ“‹ Table of Contents
  1. Installation
  2. Usage
  3. Datasets
  4. License
  5. Contact
  6. Acknowledgments

This repository offers a diverse collection of regression datasets across vision, audio and text domains. It provides dataset classes that follow the PyTorch Datasets structure, allowing users to automatically download and load these datasets with ease. All datasets come with a permissive license, permitting their use for research purposes.

1. Installation

To install the regsets package, you can use pip:

python -m pip install regsets

Alternatively, you can download a specific dataset file (e.g., utkface.py) and include it in your project to load the dataset locally.

2. Usage

Below are examples of how to use the regsets package for loading datasets.

πŸ“Έ Vision Datasets

from regsets.vision import UTKFace

utkface_trainset = UTKFace(root="./data", split="train", download=True)

for image, label in utkface_trainset:
    ...

🎧 Audio Datasets

from regsets.audio import VCC2018

vcc2018_trainset = VCC2018(root="./data", split="train", download=True)

for audio, sample_rate, label in vcc2018_trainset:
    ...

πŸ“ Text Datasets

from regsets.text import Amazon_Review

amazon_review_trainset = Amazon_Review(root="./data", split="train", download=True)

for texts, label in amazon_review_trainset:
    (ori, aug_0, aug_1) = texts
    ...

(back to top)

3. Datasets

For datasets that do not provide a predefined train-test split, I randomly sample 80% of the data for training and reserve the remaining 20% for testing. Details for each dataset are provided below.

πŸ“Έ Vision Datasets

Dataset # Training Data # Dev Data # Test Data Target Range
UTKFace 18,964 - 4,741 [1, 116]

🎧 Audio Datasets

Dataset # Training Data # Dev Data # Test Data Target Range
BVCC 4,974 1,066 1,066 [1, 5]
VCC2018 16,464 - 4,116 [1, 5]

πŸ“ Text Datasets

Dataset # Training Data # Dev Data # Test Data Target Range
Amazon Review 250,000 25,000 650,000 [0, 4]
Yelp Review 250,000 25,000 50,000 [0, 4]

(back to top)

4. License

Distributed under the MIT License. See LICENSE for more information.

(back to top)

5. Contact

(back to top)

6. Acknowledgments

(back to top)

About

πŸŽ›οΈ A collection of diverse regression datasets, featuring PyTorch-like dataset classes that automatically download and load datasets.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Languages