From b613a1cfe93c3fd1d5b405427867e3bc756e937f Mon Sep 17 00:00:00 2001 From: James Hetherington Date: Wed, 26 Aug 2015 16:47:55 +0100 Subject: [PATCH 1/4] Headers matching those of RITS case studies --- writeup.md | 14 ++++++++++++++ 1 file changed, 14 insertions(+) create mode 100644 writeup.md diff --git a/writeup.md b/writeup.md new file mode 100644 index 0000000..a8a84e8 --- /dev/null +++ b/writeup.md @@ -0,0 +1,14 @@ +British Library/UCL Open Books for e-Research +============================================= + +Summary +-------- + +Background +---------- + +Approach +-------- + +Outcomes +-------- From 08a29ef8e6d56b94bc576d6e392732b465e18649 Mon Sep 17 00:00:00 2001 From: James Hetherington Date: Thu, 27 Aug 2015 13:38:25 +0100 Subject: [PATCH 2/4] Abstract --- writeup.md | 6 +++++- 1 file changed, 5 insertions(+), 1 deletion(-) diff --git a/writeup.md b/writeup.md index a8a84e8..12fb4e6 100644 --- a/writeup.md +++ b/writeup.md @@ -2,11 +2,15 @@ British Library/UCL Open Books for e-Research ============================================= Summary --------- +------- + +RITS worked with digital scholarship experts and historians to use Legion to analyse a corpus of over sixty thousand digitised public domain books. Held by the British Library, the collection of books from the seventeenth to nineteenth centuries. Research questions such as "how often are different diseases mentioned" were answered. Analyses such as these would have taken over six days on a normal personal computer could be answered in under an hour on the UCL high-performance computing cluster, Legion. We built a framework to enable researchers to express complex textual analyses in simple python functions, and optimised the data layout to make best use of the parallel file system capabilities of Legion. Background ---------- + + Approach -------- From 8784a2fe3b7a41a1670aa5c81a6127536f188caf Mon Sep 17 00:00:00 2001 From: James Hetherington Date: Thu, 27 Aug 2015 13:38:48 +0100 Subject: [PATCH 3/4] Update writeup.md --- writeup.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/writeup.md b/writeup.md index 12fb4e6..eb16d85 100644 --- a/writeup.md +++ b/writeup.md @@ -4,7 +4,7 @@ British Library/UCL Open Books for e-Research Summary ------- -RITS worked with digital scholarship experts and historians to use Legion to analyse a corpus of over sixty thousand digitised public domain books. Held by the British Library, the collection of books from the seventeenth to nineteenth centuries. Research questions such as "how often are different diseases mentioned" were answered. Analyses such as these would have taken over six days on a normal personal computer could be answered in under an hour on the UCL high-performance computing cluster, Legion. We built a framework to enable researchers to express complex textual analyses in simple python functions, and optimised the data layout to make best use of the parallel file system capabilities of Legion. +RITS worked with digital scholarship experts and historians to use Legion to analyse a corpus of over sixty thousand digitised public domain books. Provided by the British Library, the collection of books dates from the seventeenth to nineteenth centuries. Research questions such as "how often are different diseases mentioned" were answered. Analyses such as these would have taken over six days on a normal personal computer could be answered in under an hour on the UCL high-performance computing cluster, Legion. We built a framework to enable researchers to express complex textual analyses in simple python functions, and optimised the data layout to make best use of the parallel file system capabilities of Legion. Background ---------- From a94be5974e73f3d09a5e0e2bfcf4fc8c3bfe6c5c Mon Sep 17 00:00:00 2001 From: James Hetherington Date: Thu, 27 Aug 2015 16:11:33 +0100 Subject: [PATCH 4/4] Working... --- writeup.md | 13 +++++++++++++ 1 file changed, 13 insertions(+) diff --git a/writeup.md b/writeup.md index eb16d85..3694473 100644 --- a/writeup.md +++ b/writeup.md @@ -6,9 +6,22 @@ Summary RITS worked with digital scholarship experts and historians to use Legion to analyse a corpus of over sixty thousand digitised public domain books. Provided by the British Library, the collection of books dates from the seventeenth to nineteenth centuries. Research questions such as "how often are different diseases mentioned" were answered. Analyses such as these would have taken over six days on a normal personal computer could be answered in under an hour on the UCL high-performance computing cluster, Legion. We built a framework to enable researchers to express complex textual analyses in simple python functions, and optimised the data layout to make best use of the parallel file system capabilities of Legion. +![Frequency of mentions of diseases in the ](https://github.com/UCL-dataspring/visualisations/blob/master/diseases/outputs/diseases%20(WEB).png?raw=true) + Background ---------- +In February this year, [Professor Melissa Terras](http://www.ucl.ac.uk/dis/people/melissaterras) of [UCL Digital Humanities](http://www.ucl.ac.uk/dh) and Dr James Baker of the British Library [Digital Research Team](http://britishlibrary.typepad.co.uk/digital-scholarship/), pitched to JISC's [Research Data Spring programme](http://www.jisc.ac.uk/rd/projects/research-data-spring). + +The Data Spring programme is providing funding to a variety of pilot projects in order to find “new technical tools, software and service solutions, which will improve researchers’ workflows and the use and management of their data”. + +The idea that UCL and the British Library (BL) pitched is that the BL has numerous digital datasets, but not the processing power for users to run advanced queries against or analyse them. Rapid, indexed full text search is easy enough, through [] , but many questions require more complex queries, looking for terms in proximity to each other or to illustrations, and cross-correlating words, publishing locations to build temporal and geospatial visualisations +of change. + +BL pitched is that the + +We will use UCL’s world leading Research Computing to open up this digital data, investigating the needs and requirements of a service that will allow researchers to undertake complex searching of the BL’s digital content. + Approach