Dataloader for CIFAR-N (PyTorch)

[Update 5/17/2023] A demo for automatically detecting label errors on CIFAR-N is availabel at Docta now!

Docta: A Doctor for your data
An advanced data-centric AI platform that offers a comprehensive range of services aimed at detecting and rectifying issues in your data.

This repository is the official dataset release and Pytorch implementation of "Learning with Noisy Labels Revisited: A Study Using Real-World Human Annotations" accepted by ICLR2022. We collected and published re-annotated versions of the CIFAR-10 and CIFAR-100 data which contains real-world human annotation errors. We show how these noise patterns deviate from the classically assumed ones and what the new challenges are. The website of CIFAR-N is available at http://www.noisylabels.com/.

Competition: Please refer to the branch ijcai-lmnl-2022 for details of 1st Learning with Noisy Labels Challenge in IJCAI 2022. Also available at http://competition.noisylabels.com/.

Dataloader for CIFAR-N (PyTorch)

CIFAR-10N

import torch
noise_file = torch.load('./data/CIFAR-10_human.pt')
clean_label = noise_file['clean_label']
worst_label = noise_file['worse_label']
aggre_label = noise_file['aggre_label']
random_label1 = noise_file['random_label1']
random_label2 = noise_file['random_label2']
random_label3 = noise_file['random_label3']

CIFAR-100N

import torch
noise_file = torch.load('./data/CIFAR-100_human.pt')
clean_label = noise_file['clean_label']
noisy_label = noise_file['noisy_label']

Dataloader for CIFAR-N (Tensorflow)

Note: Image order of tensorflow dataset (tfds.load, binary version of CIFAR) does not match with PyTorch dataloader (python version of CIFAR).

CIFAR-10N

import numpy as np
noise_file = np.load('./data/CIFAR-10_human_ordered.npy', allow_pickle=True)
clean_label = noise_file.item().get('clean_label')
worst_label = noise_file.item().get('worse_label')
aggre_label = noise_file.item().get('aggre_label')
random_label1 = noise_file.item().get('random_label1')
random_label2 = noise_file.item().get('random_label2')
random_label3 = noise_file.item().get('random_label3')
# The noisy label matches with following tensorflow dataloader
train_ds, test_ds = tfds.load('cifar10', split=['train','test'], as_supervised=True, batch_size = -1)
train_images, train_labels = tfds.as_numpy(train_ds) 
# You may want to replace train_labels by CIFAR-N noisy label sets

Reminder: CIFAR-10N is now available at tensorflow datasets. Please check here for more details!

CIFAR-100N

import numpy as np
noise_file = np.load('./data/CIFAR-100_human_ordered.npy', allow_pickle=True)
clean_label = noise_file.item().get('clean_label')
noise_label = noise_file.item().get('noise_label')
# The noisy label matches with following tensorflow dataloader
train_ds, test_ds = tfds.load('cifar100', split=['train','test'], as_supervised=True, batch_size = -1)
train_images, train_labels = tfds.as_numpy(train_ds) 
# You may want to replace train_labels by CIFAR-N noisy label sets

The image order from tfds to pytorch dataloader is given below:

image_order_c10.npy: a numpy array with length 50K, the i-th element denotes the index of i-th unshuffled tfds (binary-version) CIFAR-10 training image in the Pytorch (python-version) ones.
image_order_c100.npy: a numpy array with length 50K, the i-th element denotes the index of i-th unshuffled tfds (binary-version) CIFAR-100 training image in the Pytorch (python-version) ones.

Training on CIFAR-N with Cross-Entropy (PyTorch)

CIFAR-10N

# NOISE_TYPE: [clean, aggre, worst, rand1, rand2, rand3]
# Use human annotations
CUDA_VISIBLE_DEVICES=0 python3 main.py --dataset cifar10 --noise_type NOISE_TYPE --is_human
# Use the synthetic noise that has the same noise transition matrix as human annotations
CUDA_VISIBLE_DEVICES=0 python3 main.py --dataset cifar10 --noise_type NOISE_TYPE

CIFAR-100N

# NOISE_TYPE: [clean100, noisy100]
# Use human annotations
CUDA_VISIBLE_DEVICES=0 python3 main.py --dataset cifar100 --noise_type NOISE_TYPE --is_human
# Use the synthetic noise that has the same noise transition matrix as human annotations
CUDA_VISIBLE_DEVICES=0 python3 main.py --dataset cifar100 --noise_type NOISE_TYPE

Additional dataset information

We include additional side information during the noisy-label collection in side_info_cifar10N.csv and side_info_cifar100N.csv. A brief introduction of these two files:

Image-batch: a subset of indexes of the CIFAR training images.
Worker-id: the encrypted worker id on Amazon Mechanical Turk.
Work-time-in-seconds: the time (in seconds) a worker spent on annotating the corresponding image batch.

Name	Name	Last commit message	Last commit date
Latest commit weijiaheng Update README.md May 17, 2023 49df7d8 · May 17, 2023 History 29 Commits
data	data	revise pytorch CIFAR-100N coarse labels	Mar 15, 2023
models	models	1	Oct 12, 2021
.gitattributes	.gitattributes	Initial commit	Oct 12, 2021
LICENSE.md	LICENSE.md	Create LICENSE.md	Mar 27, 2022
README.md	README.md	Update README.md	May 17, 2023
fine2coarse.py	fine2coarse.py	revise pytorch CIFAR-100N coarse labels	Mar 15, 2023
image_order_c10.npy	image_order_c10.npy	update tensorflow image order	Apr 7, 2022
image_order_c100.npy	image_order_c100.npy	update tensorflow image order	Apr 7, 2022
loss.py	loss.py	1	Oct 12, 2021
main.py	main.py	Update main.py	Oct 12, 2021
side_info_cifar100N.csv	side_info_cifar100N.csv	side information	Oct 31, 2021
side_info_cifar10N.csv	side_info_cifar10N.csv	side information	Oct 31, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Dataloader for CIFAR-N (PyTorch)

CIFAR-10N

CIFAR-100N

Dataloader for CIFAR-N (Tensorflow)

CIFAR-10N

CIFAR-100N

Training on CIFAR-N with Cross-Entropy (PyTorch)

CIFAR-10N

CIFAR-100N

Additional dataset information

About

Releases

Packages

Contributors 2

Languages

License

UCSC-REAL/cifar-10-100n

Folders and files

Latest commit

History

Repository files navigation

Dataloader for CIFAR-N (PyTorch)

CIFAR-10N

CIFAR-100N

Dataloader for CIFAR-N (Tensorflow)

CIFAR-10N

CIFAR-100N

Training on CIFAR-N with Cross-Entropy (PyTorch)

CIFAR-10N

CIFAR-100N

Additional dataset information

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages