Skip to content
Panos Toulis edited this page Jun 26, 2015 · 16 revisions

Large scale linear classification

These are the standard data sets in publications, modelled by logistic, linear model, and linear SVMs.

| Benchmark | Features | Observations (number of training/test if split) | | :---- | :----: | :----: | :----: | | rcv1 | 47152 | 781265/23149 | | covtype | 54 | 581012 | | delta | 500 | 500000 | | mnist | 2304 | 60000/10000 | | sido | 4932 | 12678 | alpha | 500 | 250000/250000 | | beta | 500 | 500000 | | gamma | 500 | 500000 | | epsilon | 2000 | 500000 | | zeta | 2000 | 500000 | | fd | 900 | 5469800 | | ocr | 1156 | 3500000 | | dna | 200 | 50000000 |

Survival analysis

Data sets to be used for Cox proportional hazards model:

  • medfly

Miscellaneous

Some more data repositories

Clone this wiki locally