-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathindex.html
477 lines (361 loc) · 17.8 KB
/
index.html
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
<!doctype html>
<html>
<head>
<meta charset="utf-8">
<meta name="viewport" content="width=device-width, initial-scale=1.0, maximum-scale=1.0, user-scalable=no">
<title>Intro RDM</title>
<link rel="stylesheet" href="reveal/css/reveal.css">
<link rel="stylesheet" href="reveal/css/theme/simple.css">
<!-- Theme used for syntax highlighting of code -->
<link rel="stylesheet" href="reveal/lib/css/zenburn.css">
<!-- Printing and PDF exports -->
<script>
var link = document.createElement( 'link' );
link.rel = 'stylesheet';
link.type = 'text/css';
link.href = window.location.search.match( /print-pdf/gi ) ? 'reveal/css/print/pdf.css' : 'reveal/css/print/paper.css';
document.getElementsByTagName( 'head' )[0].appendChild( link );
</script>
</head>
<body>
<div class="reveal">
<div class="slides">
<!--SLIDE 1-->
<section>
<h3>Responsible Conduct in Research:<br/>Research Data Management</h3><br/>
<p>Vicky Rampin & Nicholas Wolf</p>
<img src="imgs/dataservices.png" width="30%" height="30%" align="middle"><br/>
<p>Get this presentation:</p>
<p>https://nyu-dataservices.github.io/RCR-DataManagement</p>
<div class='footer'>
<hr/>
<p>Vicky's ORCID: <a href="http://orcid.org/0000-0003-4298-168X/">0000-0003-4298-168X</a> | Nick's ORCID: <a href="http://orcid.org/0000-0001-5512-6151">0000-0001-5512-6151</a><br/>
This work is licensed under a <a rel="license" href="http://creativecommons.org/licenses/by-nc/4.0/">Creative Commons Attribution-NonCommercial 4.0 International License</a>.</p>
</div>
</section>
<!--SLIDE 2-->
<section>
<h2>The Problem</h2>
</section>
<!--SLIDE 3-->
<section>
<h2>Researchers work with a lot of data...</h2>
<img src="imgs/big-data.png" align="middle">
<h3>...but how should it be organized?</h3>
</section>
<section>
<h2>Most Scientific Research Data From the 1990s Is Lost Forever</h2>
<p><a href="https://www.theatlantic.com/national/archive/2013/12/scientific-data-lost-forever/356422/">Article</a> in the Atlantic</p>
<h3>A new study has found that as much as 80 percent of the raw scientific data collected by researchers in the early 1990s is gone forever, mostly because no one knows where to find it.</h3>
</section>
<!--SLIDE 4-->
<section>
<h2>Disappearing Data</h2>
<img src="imgs/disappearing-data.png" align="middle">
</section>
<!--SLIDE 5-->
<section>
<h2>Human Error</h2>
<img src="imgs/econ-error.png" width="50%" height="50%" align="middle"><br/>
<em>Washington Post</em>: <a href="https://www.washingtonpost.com/news/wonk/wp/2016/08/26/an-alarming-number-of-scientific-papers-contain-excel-errors/?utm_term=.9d76c9f559fd">"An Alarming Number of Scientific Papers Contain Excel Errors"</a>
<!--SPEAKER NOTES-->
<aside class="notes">
This is another example: a more recent case where a very famous economics paper was making wrong conclusions. The paper shows that countries with debt over 90% of their gross domestic product (GDP) have a negative growth rate; this paper was published at the same time that Greece was having an economic crisis. But no one could actually reproduce the conclusions that the authors had – researchers could not replicate the results of the paper. Eventually, researchers from UMass asked the authors for their data spreadsheet, and it turned out that there was a mistake in one of their Excel formulas, where they erroneously excluded 5 countries from their study. If results were made reproducible since the beginning, this mistake would have been discovered way earlier – maybe in time for publication, by reviewers – and which would avoid the bad publicity.
</aside>
</section>
<!--SLIDE 6-->
<section>
<h2>The Solution</h2>
</section>
<!--SLIDE 7-->
<section>
<h2>Spell out in detail how you will account for this in your grant (Data Management Plan)</h2>
<img src="imgs/lifecycle.png" style="float:left;">
<p><br/>Managing the way data is collected, processed, analyzed, preserved, and published for greater reuse by <span style="color: #f8981d">the community</span> and <span style="color: #f8981d">the original researcher</span>.
</p>
</section>
<!--SLIDE 8-->
<section>
<h2>What is Data? </h2>
<p>"the recorded factual material commonly accepted in the scientific community as necessary to validate research findings." -Federal Office of Management & Budget Circular A-110</p>
<img src="imgs/dataTypes.png">
</section>
<!--SLIDE 9-->
<section>
<h2>Federal Regulations</h2>
<img src="imgs/dmp-timeline_v2.png" align="middle">
<!--SPEAKER NOTES-->
<aside class="notes">
Oh hey, these are some notes. They'll be hidden in your presentation, but you can see them if you open the speaker notes window (hit 's' on your keyboard).
</aside>
</section>
<!--SLIDE 10-->
<section>
<h2>High-Level View of RDM</h2>
<table>
<tr>
<th>Data Type</th>
<th>Group Roles</th>
<th>Data Storage</th>
<th>Data Archiving</th>
</tr>
<tr>
<td>format of data to be generated</td>
<td>who is primarily responsible for carrying out RDM? Set group norms</td>
<td>where will you store your data and how will you backup your data?</td>
<td>how will you preserve and make your data available to others?</td>
</tr>
</table>
</section>
<!--SLIDE 11-->
<section>
<h2>Basically, think to yourself: </h2>
<p class="fragment">if I wanted to use this data in <span style="color: #f8981d">10 years</span>, what would I need to pack with it to make it useful?</p>
<p class="fragment"><strong>Keep all those things</strong></p>
</section>
<!--SLIDE 12-->
<section>
<h2 align="left">Documentation with the<img src="imgs/osf-logo.png" width="10%" height="10%"> Open Science Framework </h2>
<ul>
<li><span style="color: #63d297;">Wiki</span>: document your lab procedures, standards, etc. </li>
<li><span style="color: #63d297;">Collaborators</span>: add collaborators of all levels, on different parts of your project</li>
<li><span style="color: #63d297;">Components</span>: sub-projects to organize your research</li>
<li><span style="color: #63d297;">Version Control</span>: upload files of the same name & OSF will track your versions! </li>
<li><span style="color: #63d297;">Add-Ons</span>: use OSF to bring together tools you use | <a href="https://github.com/ViDA-NYU/reproducibility-news/commit/c13a87dc56e13ba0a80d5988129fad5fbf08e04f">GitHub</a></li>
<li><span style="color: #63d297;">Registrations</span>: when you have an unchanging version of your project, register it & get a DOI!</li>
</ul>
</section>
<section>
<h2 align="left">Documentation with <img src="imgs/jupyter-logo.png" width="10%" height="10%"><br/> Jupyter Notebooks</h2>
<ol>
<li><span style="color: #63d297;">Web Application</span></li>
<ul>
<li>in-browser code editing: syntax highlighting & indentation</li>
<li>run code in-browser: results attached to parent code</li>
<li>display results in LaTex, HTML, SVG, & more</li>
</ul>
<li><span style="color: #63d297;">Notebook</span></li>
<ul>
<li>a complete record of a session, interleaving code with text, maths, & objects</li>
<li>can export to LaTex, PDF, slideshows, etc. or webpage</li>
</ul>
</ol>
</section>
<!--SLIDE 14-->
<section>
<h2>Basically, think to yourself: </h2>
<p class="fragment">if I wanted to use this data in <span style="color: #f8981d">10 years</span>, what would I need to pack with it to make it useful?</p>
<p class="fragment"><strong>Keep all those things</strong></p>
</section>
<!--SLIDE 15-->
<section>
<h2>Documenting Local Files</h2>
<img src="imgs/readme.png" align="middle">
</section>
<!--SLIDE 16-->
<section>
<h2>Storage Rules!</h2>
<img src="imgs/321.png" align="middle">
</section>
<!--SLIDE 17-->
<section>
<h2>NYU Storage Resources</h2>
<font size="4">
<table width="100%">
<tr>
<th> </th>
<th>NYU Google Drive</th>
<th>NYU Box</th>
<th>NYU Research Workspace</th>
</tr>
<tr>
<th>Intended use</th>
<td>General data use requiring password access</td>
<td>General data, including sensitive or secure data</td>
<td>High-capacity data storage</td>
</tr>
<tr>
<th>Storage size</th>
<td>Unlimited</td>
<td>Unlimited</td>
<td>2 TB</td>
</tr>
<tr>
<th>Sharing and user control</th>
<td>Yes</td>
<td>Yes</td>
<td>Yes</td>
</tr>
<tr>
<th>Versioning and file change tracking</th>
<td>Yes</td>
<td>Some</td>
<td>Snapshots of files</td>
</tr>
<tr>
<th>Funder requirements</th>
<td>Moderate risk security</td>
<td>High risk security</td>
<td>U.S. based data location</td>
</table>
</font>
</section>
<!--SLIDE 18-->
<section>
<h2>Anonymizing Data</h2>
<ul>
<li>Anonymizing data:</li>
<ul>
<li>Direct identifiers (name, DOB, SSN, address, id numbers, etc.)</li>
<li>Indirect identifiers (variables in combination that enable identification)</li>
</ul><br/>
<li>Solutions:</li>
<ul>
<li>Removal of identifying variables</li>
<li>Binning values/top coding (i.e. hide unique outlier values or aggregate values)</li>
<li>Disturbing (add random values to encoded value, retaining integrity of statistical accuracy)</li>
</ul>
</ul>
</section>
<!--SLIDE 19-->
<section>
<h2>Long Term Storage</h2>
<table>
<tr>
<td style="width:50%;">Choose what you want to preserve/get to in the long term, but No matter WHAT, make sure you keep:<br/><br/>
<ul><li>documentation (lab/field notebooks, etc.)</li>
<li>tools & analysis</li>
</ul>
</td>
<td style="width:50%;">Put your data into an archival format!<br/><br/>
<ul><li>this should be open + accessible</li>
<li>Software agnostic</li>
</ul>
</td>
</table>
<img src="imgs/archivalFormats.png" align="middle">
<!--SPEAKER NOTES-->
<aside class="notes">
Oh hey, these are some notes. They'll be hidden in your presentation, but you can see them if you open the speaker notes window (hit 's' on your keyboard).
</aside>
</section>
<!--SLIDE 20-->
<section>
<h2>Archival Storage in Repositories</h2>
<span style="float:left; display:inline-block; width:55%; text-align:left;">
<p>When you publish, you should make the underlying data available in a repository that issues DOIs! You then <span style="color: #63d297;">link</span> that DOI in your "Supplementary Materials" section! <br/><br/>This means that anyone who wants to use your data <span style="color: #63d297;">must</span> go to this repository, download it, and <span style="color: #63d297;">cite their use</span> if they publish using it!</p>
</span>
<div style="float:right; width:45%; text-align:right;"><img src="imgs/dataRepositories.png"><br/><br/>
Example: <a href="http://datadryad.org/">Dryad Data Repository</a></div>
</section>
<!--SLIDE 21-->
<section>
<h2>Advantages to Tracking Citations:</h2>
<ul style="text-align:left;">
<li>Demonstrate to funders/promotion committees you & your data make big impacts in your field! </li>
<ul><li>they judge merit based on intellectual merit and wider impact</li>
<li>tangible evidence to weigh against the cost of research</li></ul><br/>
<li>Monitor usage of datasets!</li>
<ul><li>You can know what forms of data prep and data publication are most effective for sharing/open science!</li>
<li>Uncover opportunities for collaboration amongst peers</li></ul>
</ul>
</section>
<!--SLIDE 22-->
<section>
<h2>Getting Credit for Your Data</h2>
<img src="imgs/data-citations.png" align="middle">
</section>
<!--SLIDE 23-->
<section>
<h2>Data Management To-Do List</h2>
</section>
<!--SLIDE 24-->
<section>
<h2>1. Create a Researcher Identity</h2>
<div style="float:left; display:inline-block; width:58%; text-align:left;">
<p><span style="color: #b7b7b7;">O</span>pen <span style="color:#b7b7b7;">R</span>esearcher & <span style="color: #b7b7b7;">C</span>ontributor <span style="color: #93c47d;">ID</span></p>
<ul>
<li>free! persistent identifier for researchers (think DOI)</li>
<li>link all your publications to you rather than someone with your same name!</li>
<li>many journals are asking for an ORCID upon submission of materials</li>
</ul>
<p>Do you have one? No? Let’s get you an <a href="http://orcid.org">ORCID.org</a>!</p>
</div>
<div style="float:right; width:42%;"><br/><img src="imgs/linkORCID.png"></div>
</section>
<!--SLIDE 25-->
<section>
<h2>2. Get a Home for Your Research</h2>
<h3>Open Science Framework <img src="imgs/osf-logo.png" width="10%" height="10%"></h3>
<ul>
<li><span style="color: #63d297;">Wiki</span> for documentation!</li>
<li><span style="color: #63d297;">Collaborators</span> of all levels, on different parts of your project!</li>
<li><span style="color: #63d297;">Components</span>: sub-projects to organize your research!</li>
<li><span style="color: #63d297;">Add-Ons</span>: use OSF to bring together tools you use!</li>
</ul>
</section>
<!--SLIDE 26-->
<section>
<h2>3. Know What Data Management Funders Want</h2>
<img src="imgs/funders.png" align="middle">
</section>
<section>
<h2>Applying Best Practices</h2>
<h3>Data Management Plans</h3>
<div style="color: #000000;">
<p>a document that describes how you will collect, organise, manage, store, secure, backup, preserve, and share your data.</p>
</div>
</section>
<!--SLIDE 27-->
<section>
<h2>From NSF’s Data Management Plan Guidelines:</h2>
<div style="text-align:left; font-size: .83em;">
<ol>
<li><strong>the types of data</strong>, samples, physical collections, software, curriculum materials, and other materials to be produced in the course of the project;</li>
<li><strong>the standards to be used for data and metadata format</strong> and content (where existing standards are absent or deemed inadequate, this should be documented along with any proposed solutions or remedies);</li>
<li><strong>policies for access and sharing</strong> including provisions for appropriate protection of privacy, confidentiality, security, intellectual property, or other rights or requirements;</li>
<li><strong>policies and provisions for re-use</strong>, re-distribution, and the production of derivatives; </li>
<li><strong>plans for archiving data</strong>, samples, and other research products, and for preservation of access to them.</li>
</ol>
</div>
</section>
<!--SLIDE 28-->
<section>
<h2>Example Data Management Plans</h2>
<ul>
<li>NSF: <a href="https://dmptool.org/plans/11276/export.pdf">Doctoral Dissertation Research: An Agent-Based Model of Population Changes in a Vulnerable Coastal Environment</a></li><br/>
<li>NSF: <a href="https://dmptool.org/plans/24541/export.pdf">Criminal Victimization and Perceptions of Community Safety in 12 United States Cities</a></li>
</ul>
</section>
<!-- CONCLUSION SLIDE -->
<section>
<h2>Thank you! Questions?</h2><br/>
<p>Email us: <a href="mailto:[email protected]">[email protected]</a> or <a href="mailto:[email protected]">[email protected]</a></p>
<p>Learn more about RDM: <a href="http://guides.nyu.edu/data_management">guides.nyu.edu/data_management</a></p>
<p>Get this presentation: <a href="http://guides.nyu.edu/data_management/resources">guides.nyu.edu/data_management/resources</a></p>
<p>Make an appointment: <a href="http://guides.nyu.edu/appointment">guides.nyu.edu/appointment</a></p>
<div class='footer'>
<hr/>
<p>Vicky's ORCID: <a href="http://orcid.org/0000-0003-4298-168X/">0000-0003-4298-168X</a> | Nick's ORCID: <a href="http://orcid.org/0000-0001-5512-6151">0000-0001-5512-6151</a><br/>
This work is licensed under a <a rel="license" href="http://creativecommons.org/licenses/by-nc/4.0/">Creative Commons Attribution-NonCommercial 4.0 International License</a>.</p>
</div>
</section>
</div>
</div>
<script src="reveal/lib/js/head.min.js"></script>
<script src="reveal/js/reveal.js"></script>
<script>
// More info https://github.com/hakimel/reveal.js#configuration
Reveal.initialize({
history: true,
// More info https://github.com/hakimel/reveal.js#dependencies
dependencies: [
{ src: 'reveal/plugin/markdown/marked.js' },
{ src: 'reveal/plugin/markdown/markdown.js' },
{ src: 'reveal/plugin/notes/notes.js', async: true },
{ src: 'reveal/plugin/highlight/highlight.js', async: true, callback: function() { hljs.initHighlightingOnLoad(); } }
]
});
</script>
</body>
</html>