LOD 2023: Perceptrons Under Verifiable Random Data Corruption

Welcome!

This repository contains the code used for the upcoming paper Aguilar, J and Diochnos, D, Perceptrons Under Verifiable Random Data Corruption, In LOD 2023. This research project was carried out by Jose E. Aguilar Escamilla and was supervised by Dr. Dimitrios I. Diochnos at the Gallgoly College of Engineering, School of Computer Science @ The University of Oklahoma (OU), and was part of the Ronald E. McNair Post-Baccalaureate Achievement Program.

Abstract

We study perceptrons when datasets are randomly corrupted by noise and subsequently such corrupted examples are discarded from the training process. Overall, perceptrons appear to be remarkably stable; their accuracy drops slightly when large portions of the original datasets have been excluded from training as a response to verifiable random data corruption. Furthermore, we identify a real-world dataset where it appears to be the case that perceptrons require longer time for training, both in the general case, as well as, in the framework that we consider. Finally, we explore empirically a bound on the learning rate of Gallant's ''pocket'' algorithm for learning perceptrons and observe that the bound is tighter for non-linearly separable datasets.

Requirements & Installation

numpy
pandas
scikit-learn
imbalanced-learn
jupyter
seaborn
matplotlib
tqdm

Code Organization

The code is made up of 2 main groups:

Perceptron corruption experiment code
Jupyter notebooks for displaying results and cleaning datasets.

Perceptron Experiment Code

The main files containing experiment code are:

corruption_experiment.py
- Contains code used for creating perceptrons, pulling in data, and performing an experiment.
experiment_execution.sh
- Experiment configuration file where the experiment is defined. This is what is launched to run an experiment.
Perceptron Package
- Contains the code defining the Perceptron pocket algorithm.

Jupyter Notebooks

These notebooks either display data generated from the experiments, or clean data/generate synthetic datasets.

DataGeneration.ipynb
- Code used for generating linearly (and non) separable data.
Dataset Analysis.ipynb
- Shows basic information of SPECT, Bankruptcy, and Spambase datasets, storing them as pkl files.
DataVisualization.ipynb
- Used to observe changes in datasets by SMOTE.
Result_Visualizer.ipynb
- Visualizes Real-World dataset results from experiments.
ResultAnalysis-MultiDimensional.ipynb
- Visualizes synthetic dataset results from experiments.

^{Jose E. Aguilar Escamilla -- The 9th International Conference on Machine Learning, Optimization, and Data Science.}

.

Name		Name	Last commit message	Last commit date
Latest commit History 75 Commits
Perceptron		Perceptron
tests		tests
.gitignore		.gitignore
DataGeneration.ipynb		DataGeneration.ipynb
DataVizualization.ipynb		DataVizualization.ipynb
Dataset Analysis.ipynb		Dataset Analysis.ipynb
README.md		README.md
ResultAnalysis-MultiDimensional.ipynb		ResultAnalysis-MultiDimensional.ipynb
Result_Visualizer.ipynb		Result_Visualizer.ipynb
corruption_experiment.py		corruption_experiment.py
experiment_execution.sh		experiment_execution.sh
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

LOD 2023: Perceptrons Under Verifiable Random Data Corruption

Abstract

Requirements & Installation

Code Organization

Perceptron Experiment Code

Jupyter Notebooks

About

Uh oh!

Releases

Packages

Uh oh!

Languages

aguilarjose11/Perceptron-Corruption

Folders and files

Latest commit

History

Repository files navigation

LOD 2023: Perceptrons Under Verifiable Random Data Corruption

Abstract

Requirements & Installation

Code Organization

Perceptron Experiment Code

Jupyter Notebooks

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages