Data, notes and slide deck for use with the Digital Humanities Lab's course on OpenRefine
OpenRefine can be downloaded from: https://openrefine.org/download.html
The Library Carpentry lesson, which we will refer to throughout the workshop, and provides a more in-depth tutorial on OpenRefine, can be found at: https://librarycarpentry.org/lc-open-refine/
Note that the Library Carpentry lessons use different example data, which is linked above.
More detailed notes and 'gotchas' for installing OpenRefine: InstallingOpenRefine.pdf, and see also the 'Setup' section in the Library Carpentry notes.
The actual notes from the workshop can be found at: Text II - OpenRefine.pdf. These will be uploaded soon after the workshop each time it runs.
This is an export from the CALM catalogue management system, and holds metadata of the collection also displayed on JSTOR at: https://www.jstor.org/site/university-of-exeter/woolf/
The full dataset, exported directly from the catalogue, shows a number of features that can be collectively cleaned, and also some features that are more problematic to deal with, but is typical of the kind of data that OpenRefine can help to standardise and extract meaning from.
Click on the file above labelled CharlesWoolfSlideCollection_CALMexport.xlsx to download the data.
The images referred to in this metadata were catalogued by the team at Falmouth Archives, and are copyright. The slides themselves were transferred from the Estate of Charles Woolf to the Institute of Cornish Studies in 2016. Metadata is reproduced with permission.
The Library Carpentry lesson also makes a more science-oriented dataset available on the setup page, feel free to use that dataset (especially if you're working through the LC materials independently)