This repository contains code for cleaning, enriching and automatically generating reports on the Finnish national bibliography, Fennica.
The live document is deployed in a CSC Rahti container: https://fennica-fennica.rahtiapp.fi The generated bookdown document consists of several different sections, or "chapters". Sections focus on different fields from the MARC formatted raw data MARC. Most chapters also have visualizations that give a quick glance on what the data looks like. Processed CSV datasets can also be downloaded for further analyses.
This README describes how to reproduce the analyses and generate the notebook.
The data was downloaded from The National Metadata Repository Melinda. See more: https://melinda.kansalliskirjasto.fi/
Reproducing the workflow or How to create "Fennica metadata conversions" from scratch.
1. Clone the repository to your computer.
# In terminal / GIT
git clone https://github.com/fennicahub/fennica.git
2. Download dataset from the National Library website
collect.py The script was provided to us by Osma Suominen (The National Library of Finland).
3. Transform raw data into a readable csv format using Python scripts one by one
4. Pick priority fields from the transformed file
5. Run init.R to collect priority fields into a main data frame in R-Studio
6. Run script <field.R> in fennica/inst/examples to harmonize each field separately and to create summary tables
7. Main polish functions to clean and harmonize different types of data field in fennica/R
8. Render qmd file for each <field.qmd> in fennica/inst/examples
9. Render the whole notebook from R-Studio terminal. How to render here
quarto render
to render a single file
quarto render <field_name>.qmd
- Upload summary tables to Allas by running a allas.R script
The bookdown document is rendered with GitHub Actions. The generated files are placed in gh-pages branch in the GitHub Repository. The generated files are copied to Rahti by utilizing a webhook and are hosted on an nginx server.
Links to notebooks that are not actively maintained but may contain useful information regarding related past work.
- Fennica: a generic overview
- Presentation slide templates (PDF) and code
- A Quantitative Approach to Book Printing in Sweden and Finland, 1640–1828 Source code for the figures
- Knowledge production in Finland 1470-1828: Digital Humanities 2016 conference presentation slides (PDF) and code
- Figures and analyses for CCQ2019
The analyses cover several steps including XML parsing, data harmonization, removing unrecognized entries, enriching and organizing the data, carrying out statistical summaries, analysis, visualization and automated document generation.
The analyses and full source code) are provided in this repository and can be freely reused under the BSD 2 clause (FreeBSD) open source licence. The analyses are based on R and rely on various R packages.
The original data has been published openly by National Library of Finland.
The project is now developed based on research and infrastructure funding from the Research Council of Finland (DHL-FI and FIN-CLARIAH). The work is based on past and present collaboration between and Turku Data Science Group (University of Turku), Helsinki Computational History Group (COMHIS) (University of Helsinki) and National library of Finland (Fennica data collection). For the list of contributors, see contributors and the related publications.
Email: [email protected] / [email protected]
The project is under active open development:
- Issues and bug reports
- Pull requests (we will acknowledge contributions)