-
Notifications
You must be signed in to change notification settings - Fork 19
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Create Tutorial 3Models.ipynb by Cruz #78
base: master
Are you sure you want to change the base?
Conversation
A new tutorial showcasing MLJ by C. Cruz
this is ongoing work on |
Great, thank you so much
…On Fri, Jun 19, 2020, 1:14 PM Thibaut Lienart ***@***.***> wrote:
this is ongoing work on /cruz2; it will take me some time to go through
the full tutorial and adjust a few things, sorry
—
You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub
<https://github.com/alan-turing-institute/DataScienceTutorials.jl/pull/78#issuecomment-646764786>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AOF3FWQF6KS4CVELYNTPWBTRXOMIBANCNFSM4NYPPUJQ>
.
|
Actually Clarman, after now reading 2/3 of your tutorial and fixing a few things, I'm a bit uncomfortable with the fact that it's synthetic data; I thing a tutorial with this kind of depth would be great for real data because people could relate to the data and do further analysis and uncover things that may match their expectations or surprise them. Synthetic data is great for small tutorials where you show one thing; but here it's a bit awkward because explanations go in quite some depth to give context etc but ultimately the data is generated. What do you think? |
Hi
You have a good point. Random data might confuse the reader. I didn't
find real data when I created the lab. This, I randomly created the data.
It will try to find some "real" data for the lab. If find some, I will
create another version of the lab with it and with less analytics.
Thanks for your time and expertise.
…On Sat, Jun 20, 2020, 3:47 PM Thibaut Lienart ***@***.***> wrote:
Actually Clarman, after now reading 2/3 of your tutorial and fixing a few
things, I'm a bit uncomfortable with the fact that it's synthetic data; I
thing a tutorial with this kind of depth would be great for real data
because people could relate to the data and do further analysis and uncover
things that may match their expectations or surprise them. Synthetic data
is great for small tutorials where you show one thing; but here it's a bit
awkward because explanations go in quite some depth to give context etc but
ultimately the data is generated.
What do you think?
—
You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub
<https://github.com/alan-turing-institute/DataScienceTutorials.jl/pull/78#issuecomment-647038224>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AOF3FWWOKOBNJ6KJBHJ6GXLRXUG3NANCNFSM4NYPPUJQ>
.
|
Thanks a lot this is much appreciated! For good data sources: https://datasetsearch.research.google.com also UCI (https://archive.ics.uci.edu/ml/datasets.php?format=&task=&att=&area=&numAtt=&numIns=&type=&sort=dateDown&view=table) for UCI I'd suggest taking anything that's more recent than 2010 and seems interesting for you. |
this one might be fun: https://archive.ics.uci.edu/ml/datasets/Beijing+Multi-Site+Air-Quality+Data |
If you find one at OpenML, you can load it directly from MLJ using |
A new tutorial showcasing MLJ by C. Cruz