π Table of Contents
This repository offers a diverse collection of regression datasets across vision, audio and text domains. It provides dataset classes that follow the PyTorch Datasets structure, allowing users to automatically download and load these datasets with ease. All datasets come with a permissive license, permitting their use for research purposes.
To install the regsets
package, you can use pip:
python -m pip install regsets
Alternatively, you can download a specific dataset file (e.g., utkface.py
) and include it in your project to load the dataset locally.
Below are examples of how to use the regsets
package for loading datasets.
from regsets.vision import UTKFace
utkface_trainset = UTKFace(root="./data", split="train", download=True)
for image, label in utkface_trainset:
...
from regsets.audio import VCC2018
vcc2018_trainset = VCC2018(root="./data", split="train", download=True)
for audio, sample_rate, label in vcc2018_trainset:
...
from regsets.text import Amazon_Review
amazon_review_trainset = Amazon_Review(root="./data", split="train", download=True)
for texts, label in amazon_review_trainset:
(ori, aug_0, aug_1) = texts
...
For datasets that do not provide a predefined train-test split, I randomly sample 80% of the data for training and reserve the remaining 20% for testing. Details for each dataset are provided below.
Dataset | # Training Data | # Dev Data | # Test Data | Target Range |
---|---|---|---|---|
UTKFace | 18,964 | - | 4,741 | [1, 116] |
Dataset | # Training Data | # Dev Data | # Test Data | Target Range |
---|---|---|---|---|
BVCC | 4,974 | 1,066 | 1,066 | [1, 5] |
VCC2018 | 16,464 | - | 4,116 | [1, 5] |
Dataset | # Training Data | # Dev Data | # Test Data | Target Range |
---|---|---|---|---|
Amazon Review | 250,000 | 25,000 | 650,000 | [0, 4] |
Yelp Review | 250,000 | 25,000 | 50,000 | [0, 4] |
Distributed under the MIT License. See LICENSE for more information.
- Pin-Yen Huang ([email protected])