-
Notifications
You must be signed in to change notification settings - Fork 19
Datasets
Panos Toulis edited this page Jun 26, 2015
·
16 revisions
These are the standard data sets in publications, modelled by logistic, linear model, and linear SVMs.
| Benchmark | Features | Observations (number of training/test if split) | | :---- | :----: | :----: | :----: | | rcv1 | 47152 | 781265/23149 | | covtype | 54 | 581012 | | delta | 500 | 500000 | | mnist | 2304 | 60000/10000 | | sido | 4932 | 12678 | alpha | 500 | 250000/250000 | | beta | 500 | 500000 | | gamma | 500 | 500000 | | epsilon | 2000 | 500000 | | zeta | 2000 | 500000 | | fd | 900 | 5469800 | | ocr | 1156 | 3500000 | | dna | 200 | 50000000 |
Data sets to be used for Cox proportional hazards model:
- medfly
- Economical Dataset: http://economics.mit.edu/faculty/angrist/data1/data/angkru1991
- Also, several interesting datasets could be found here: https://www.aeaweb.org/articles.php?doi=10.1257/jep.28.2 Could someone take a look at this and make a quick summary of what opportunities are there?
- http://ocp.hul.harvard.edu/contagion/plague.html
- http://mbostock.github.com/d3/
- http://lib.stat.cmu.edu/apstat/
- http://webscope.sandbox.yahoo.com/catalog.php
- http://pds.lib.harvard.edu/pds/viewtext/7902382?n=18&imagesize=1200&jp2Res=.5&printThumbnails=no
- http://simplystatistics.tumblr.com/post/15182715327/list-of-cities-states-with-open-data-help-me-find
- http://stats.wikimedia.org/
- http://www.kdnuggets.com/datasets/
- http://en.wikipedia.org/wiki/Open_data
- http://lib.stat.cmu.edu/datasets/
- http://thedata.org/
- http://www.exp-platform.com/Pages/default.aspx
- http://www.marinetraffic.com/en/p/api-services
- https://www.tycho.pitt.edu/
- http://www.bls.gov/osmr/
- http://www.statsblogs.com/2014/02/10/scraping-pro-football-data-and-interactive-charts-using-rcharts-ggplot2-and-shiny/?utm_source=feedburner&utm_medium=feed&utm_campaign=Feed%3A+statsblogs+%28StatsBlogs%29
- http://www.kaggle.com/competitions
- http://www.statsblogs.com/2014/03/11/capturing-intraday-data/?utm_source=feedburner&utm_medium=feed&utm_campaign=Feed%3A+statsblogs+%28StatsBlogs%29
- http://nces.ed.gov/pubsearch/licenses.asp
- http://astro.temple.edu/~alan/MMST/datasets.html
- http://openeconomics.net/resources/automated-game-play-datasets/
- http://www.wiod.org/new_site/database/wiots.htm
- https://wikileaks.org/sony/emails/ R HPC
- http://stats.stackexchange.com/questions/61845/large-scale-cox-regression-with-r-big-data
- http://cran.r-project.org/web/packages/sgd/index.html