-
Notifications
You must be signed in to change notification settings - Fork 2
Adding a structured dataset
Rick Wierenga edited this page Feb 4, 2020
·
1 revision
With the CSVDataLoader
it's very easy to make a PR adding a structured dataset.
This document shows how to add a structured dataset to S5TF in 5 easy steps.
- Create a new file
- Write a header with a title, description, and BibTeX citation
- Copy and paste the boilerplate code
- Add general information about your dataset.
- Filling in the column names
$ touch Sources/Datasets/datasets/structured/AdultIncome.swift
// The Adult Data Set. // TODO: title
//
// Extraction was done by Barry Becker from the 1994 Census database. A set of reasonably clean records
// was extracted using the following conditions: ((AAGE>16) && (AGI>100) && (AFNLWGT>1)&& (HRSWK>0)) // TODO: description
//
// Prediction task is to determine whether a person makes over 50K a year.
//
// BibTeX citation: // TODO: BibTeX
// @misc{Dua:2019,
// author = "Dua, Dheeru and Graff, Casey",
// year = "2017",
// title = "{UCI} Machine Learning Repository",
// url = "http://archive.ics.uci.edu/ml",
// institution = "University of California, Irvine, School of Information and Computer Sciences"
// }
Use S5TFCategoricalBatch
for classification and S5TFNumericalBatch
for regression problems.
import Foundation
import S5TF
import TensorFlow
public struct AdultDataSet: S5TFDataset {
typealias DataLoader = CSVDataLoader
public static var train: CSVDataLoader<S5TFCategoricalBatch> { // TODO: S5TFCategoricalBatch/S5TFNumericalBatch
guard let localURL = Downloader.download(
fileAt: URL(string: "TODO://add.a.url.com")!,
cacheName: "cacheName", // the name of the dataset
fileName: "descriptive_filename.csv"
) else {
fatalError("File not downloaded correctly.")
}
return CSVDataLoader<S5TFCategoricalBatch>(
fromFileAt: localURL.absoluteString,
columnNames: // TODO(5)
inputColumnNames: // TODO(5)
outputColumnNames: // TODO(5)
)
}
public static let info = yourInfo
private init() {}
}
// swiftlint:disable:next private_over_fileprivate
fileprivate let yourInfo = S5TFDatasetInfo(
name: // TODO(4) ,
version: // TODO(4) ,
description: """
TODO(4)
""",
homepage: // TODO(4) ,
numberOfTrainExamples: // TODO(4) ,
numberOfValidExamples: // TODO(4) ,
numberOfTestExamples: // TODO(4) ,
numberOfFeatures: // TODO(4) ,
)
fileprivate let adultIncomeInfo = S5TFDatasetInfo(
name: "Adult Data set",
version: "0.0.1",
description: """
Extraction was done by Barry Becker from the 1994 Census database. A set of reasonably clean records
was extracted using the following conditions: ((AAGE>16) && (AGI>100) && (AFNLWGT>1)&& (HRSWK>0))
""",
homepage: URL(string: "http://archive.ics.uci.edu/ml")!,
numberOfTrainExamples: 48842,
numberOfValidExamples: 0,
numberOfTestExamples: 0,
numberOfFeatures: 15
)
If your csv file contains column names in the first row you can pass nil
to columnNames
.
return CSVDataLoader<S5TFCategoricalBatch>(
fromFileAt: localURL.absoluteString,
columnNames: ["age",
"workclass",
"fnlwgt",
"education",
"education-num",
"marital-status",
"occupation",
"relationship",
"race",
"sex",
"capital-gain",
"capital-loss",
"hours-per-week",
"native-country",
"income"],
inputColumnNames: ["age",
"workclass",
"fnlwgt",
"education",
"education-num",
"marital-status",
"occupation",
"relationship",
"race",
"sex",
"capital-gain",
"capital-loss",
"hours-per-week",
"native-country"],
outputColumnNames: ["income"]
)