-
Notifications
You must be signed in to change notification settings - Fork 27
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Which datasets are used for main paper (98 datasets) and "small data" (57 datasets) #100
Comments
The "easy_import" list seems to contain 175 classification tasks, 69 of which have less than 1250 instances. |
Hi Andreas, below are the 98 datasets from Table 1 and the 57 datasets from Table 2. Please let us know if you have more questions. datasets_table_1 = ['openml__visualizing_environmental__3602', 'openml__labor__4', 'openml__monks-problems-2__146065', 'openml__tic-tac-toe__49', 'openml__dermatology__35', 'openml__cardiotocography__9979', 'openml__lung-cancer__146024', 'openml__sonar__39', 'openml__anneal__2867', 'openml__analcatdata_chlamydia__3739', 'openml__iris__59', 'openml__irish__3543', 'openml__heart-c__48', 'openml__ionosphere__145984', 'openml__hayes-roth__146063', 'openml__fri_c3_100_5__3779', 'openml__fri_c0_100_5__3620', 'openml__analcatdata_authorship__3549', 'openml__rabe_266__3647', 'openml__balance-scale__11', 'openml__acute-inflammations__10089', 'openml__MiceProtein__146800', 'openml__banknote-authentication__10093', 'openml__mushroom__24', 'openml__kr-vs-kp__3', 'openml__analcatdata_boxing1__3540', 'openml__musk__3950', 'openml__transplant__3748', 'openml__cjs__14967', 'openml__synthetic_control__3512', 'openml__car-evaluation__146192', 'openml__fertility__9984', 'openml__postoperative-patient-data__146210', 'openml__breast-w__15', 'openml__wdbc__9946', 'openml__car__146821', 'openml__visualizing_livestock__3731', 'openml__mfeat-factors__12', 'openml__Satellite__167211', 'openml__colic__25', 'openml__lymph__10', 'openml__wall-robot-navigation__9960', 'openml__wilt__146820', 'openml__scene__3485', 'openml__mfeat-karhunen__16', 'openml__sick__3021', 'openml__dna__167140', 'openml__socmob__3797', 'openml__page-blocks__30', 'openml__PhishingWebsites__14952', 'openml__spambase__43', 'openml__splice__45', 'openml__churn__167141', 'openml__colic__27', 'openml__ecoli__145977', 'openml__semeion__9964', 'openml__ozone-level-8hr__9978', 'openml__heart-h__50', 'openml__pc1__3918', 'openml__qsar-biodeg__9957', 'openml__autos__9', 'openml__pc4__3902', 'openml__hill-valley__145847', 'openml__satimage__2074', 'openml__pc3__3903', 'openml__mfeat-fourier__14', 'openml__Australian__146818', 'openml__credit-approval__29', 'openml__cylinder-bands__14954', 'openml__mfeat-zernike__22', 'openml__kc2__3913', 'openml__bank-marketing__14965', 'openml__phoneme__9952', 'openml__elevators__3711', 'openml__breast-cancer__145799', 'openml__SpeedDating__146607', 'openml__kc1__3917', 'openml__adult-census__3953', 'openml__ilpd__9971', 'openml__vehicle__53', 'openml__ada_agnostic__3896', 'openml__tae__47', 'openml__blood-transfusion-service-center__10101', 'openml__jasmine__168911', 'openml__LED-display-domain-7digit__125921', 'openml__diabetes__37', 'openml__Click_prediction_small__190408', 'openml__profb__3561', 'openml__steel-plates-fault__146817', 'openml__jm1__3904', 'openml__glass__40', 'openml__dresses-sales__125920', 'openml__mfeat-morphological__18', 'openml__eucalyptus__2079', 'openml__libras__360948', 'openml__yeast__145793', 'openml__cmc__23', 'openml__analcatdata_dmft__3560']
datasets_table_2 = ["openml__Australian__146818", "openml__LED-display-domain-7digit__125921", "openml__MiceProtein__146800", "openml__acute-inflammations__10089", "openml__analcatdata_authorship__3549", "openml__analcatdata_boxing1__3540", "openml__analcatdata_chlamydia__3739", "openml__analcatdata_dmft__3560", "openml__anneal__2867", "openml__autos__9", "openml__balance-scale__11", "openml__blood-transfusion-service-center__10101", "openml__blood-transfusion-service-center__145836", "openml__breast-cancer__145799", "openml__breast-w__15", "openml__colic__25", "openml__colic__27", "openml__credit-approval__29", "openml__cylinder-bands__14954", "openml__dermatology__35", "openml__diabetes__37", "openml__dresses-sales__125920", "openml__ecoli__145977", "openml__eucalyptus__2079", "openml__fertility__9984", "openml__fri_c0_100_5__3620", "openml__fri_c3_100_5__3779", "openml__glass__40", "openml__hayes-roth__146063", "openml__heart-c__48", "openml__heart-h__50", "openml__hill-valley__145847", "openml__ilpd__9971", "openml__ionosphere__145984", "openml__iris__59", "openml__irish__3543", "openml__kc2__3913", "openml__labor__4", "openml__lung-cancer__146024", "openml__lymph__10", "openml__monks-problems-2__146065", "openml__pc1__3918", "openml__postoperative-patient-data__146210", "openml__profb__3561", "openml__qsar-biodeg__9957", "openml__rabe_266__3647", "openml__socmob__3797", "openml__sonar__39", "openml__synthetic_control__3512", "openml__tae__47", "openml__tic-tac-toe__49", "openml__transplant__3748", "openml__vehicle__53", "openml__visualizing_environmental__3602", "openml__visualizing_livestock__3731", "openml__wdbc__9946", "openml__yeast__145793"] |
I just saw this issue. Are you aware that the datasets for Table 2 have a duplicate? |
@LennartPurucker thanks for pointing this out - cc @crwhite14 . so we could remove the duplicate dataset from results that include it. it looks like we accidentally pulled two different openML tasks (https://openml.org/search?type=task&id=145836 and https://openml.org/search?type=task&id=10101) which appear to be identical, because they are based on the same dataset (https://openml.org/search?type=data&id=1464) |
Hi.
I'm trying to compare to some of the results in your work, but it's not clear to me which datasets were use for Table 1 and Table 2.
The Datasets A file contains 108 datasets, and the Datasets B file contains 69 datasets, so I'm not sure which the 98 ones are.
Really I care more about the 57 small datasets, but cutting off at those with 1250 or less instances doesn't yield 57 for either A or B or the combination.
The text was updated successfully, but these errors were encountered: