Official description from the project site:
Das DFG-geförderte Projekt hat die Erstellung einer online zugänglichen Übersetzungsbibliographie frühneuzeitlicher nichtfiktionaler Texte zum Ziel, mit anderen Worten die Sammlung von ins Deutsche übersetzten Sachtexten im weitesten Sinne aus dem Zeitraum 1450–1850.
Neben der Entwicklung einer entsprechenden technischen Infrastruktur geht es primär um die Zusammenstellung der Datensätze: Es soll eine Sammlung zu Übersetzungen aus dem Englischen sowie dem Niederländischen ins Deutsche erstellt werden. Die Thematik der zu erfassenden Texte reicht dabei – unter Ausschluss der schönen Literatur – von Naturwissenschaft, Medizin und Technik über historische Schriften und Reiseberichte bis hin zu theologischen Abhandlungen. Darüber hinaus werden die Daten aus zwei ebenfalls DFG-geförderten Vorgängerbibliographien zur romanisch-deutschen sowie zur lateinisch-deutschen Übersetzung in der Frühen Neuzeit übernommen („Saarbrücker Übersetzungsbibliographie“ sowie „Saarbrücker Übersetzungsbibliographie – Latein“).
Insgesamt entsteht so eine umfassende Übersetzungsbibliographie frühneuzeitlicher nichtfiktionaler Texte mit Lateinisch, Französisch, Italienisch, Spanisch, Portugiesisch, Englisch und Niederländisch als systematisch erfassten Original- und Brückensprachen, die Forschenden eine umfassende Datenbasis für Untersuchungen verschiedenster Ausrichtung liefern kann. Darüber hinaus soll die entwickelte Infrastruktur zugleich interessierten KollegInnen für das Anlegen eigener bibliographischer Teilprojekte zur Verfügung gestellt werden.
In diesem Sinne versteht sich das Projekt als Beitrag zu den Digital Humanities, indem einerseits ein klassisch geisteswissenschaftlicher Gegenstand online allgemein zugänglich und darüber hinaus die IT-Infrastruktur für interessierte ForscherInnen zur kollaborativen Erweiterung nutzbar gemacht wird.
This is the application that should one day achieve these goals. It is built upon Django, its administration tooling and PostgreSQL. The frontend is build with TailwindCSS and Alpine.js.
The implementation progress from 07.2021 is the following:
- a new normalized database schema is implemented
- an administration interface is implemented that is used to add and manage entries, including a rudimentary review functionality and versioning of all entries
- all old databases are ported from ancient MySQL dumps to PostgreSQL, cleaned up, and made read-only accessible in the backend. Currently, they are kept in separate tables and not merged into the main data model because they aren't reviewed yet
- a public landing site and basic search functionality is implemented
For my successors:
Hey you! 👋 Thank you for continuing this project. I had a lot of fun working on it and learned a lot. I've handed over a set of files to the team for you to start off.
- an ssh key set. This set has access to both servers. Use it to log in, add your own keys (newly created or your regular keys) to both servers, verify! that the new keys work, and then remove my old keys and the keys I've created for you from the server. These are now yours
development_secrets.ini
andstaging_secrets.ini
their usage is described further downbackup.sql
this is a backup I've created during the last weeks of my involvement. You can use it to seed your local dev environment or simply download a backup from the servers- I've added a list of possible todos/open topics at the end of this document Have fun,
Lukas PS: Please apologize for the bodies you may find in this project.
- Install python3.8, PostgreSQL and nodejs on your local maschine.
- Clone the repository
git clone [email protected]:iued-heidelberg/hueb.git
- Change into the project folder
cd hueb
- Create a virtual environment and activate it
python3 -m venv venv
source venv/bin/activate
You have to activate this environment again whenever you want to start working on it. 5. Install all requirements (pre-commit will check that your commits adhere to the style guide before committing them.
pip3 install -r requirements.txt
pre-commit install
- Template a new
.env
file:
cd src/hueb
cp .env.template .env
- Create a new local database user and database
- Add these credentials to .env
- Change into
apps/hueb20
and install npm dependencies
cd app/hueb20
npm install
cd ../..
- Execute database migrations
./manage.py migrate
- Run tests
pytest
- Run the application
./manage.py runserver
Now you have a running Django application locally and can start developing.
The repository is split into three main sections:
- db_migration
- deployment
- src
db_migration contains the original Mysql dumps, a dump of the databases after importing them into Postgres, a dump of the cleaned-up data, and scripts to turn the raw Postgres dump into the later one. This is the place to look for if there should be inconsistencies in the imported data for the three older datasets
deployment nearly everything is associated with the deployment of the application. The only exception being the .github
directory with the CI/CD jobs and the Dockerfile for the main application. Both placed in the root of the project
The usage and function are described further down in this document.
src contains the code of the application. It is structured like a common Django application. It contains four apps. The three hueb_legacy*
apps contain the administration interface for the older datasets. They are minimal, and you probably don't want to add too much functionality to them.
hueb20
instead is the home of the new data structure, its administration, and frontend. The contained data_migrations
folder is left as a reference for how you could implement a migration of the data from the old apps to the new one. They aren't used and are probably out of date.
Adding new environment variables for configuration is currently a bit of Rube-Goldberg-Maschine that could use improvement. Assuming you want to add a boolean configuration variable, you have to make the following changes:
For development:
- add it to
/hueb/src/hueb/.env
. This is your local configuration file, used when you are starting the app on your machine. This file is deliberately added to.gitignore
to prevent leakage of secrets by committing them to the repository. - add it to
/hueb/src/hueb/.env.template
. So that the next person knows what variables must be set. - add code that parses this environment variable to
/hueb/src/hueb/settings.py
. Look at the already implemented examples for reference. Be careful while adding a boolean flag. Pythons casting of the environment variable content is unintuitive at best.
For CI/CD:
- add it to
hueb/deployment/ansible/roles/docker/templates/env.j2
. This is a template file used by this role to write the .env file to the server. - add new github secrets with the value for staging and production
- add the Github secrets to the
extraVars
-portion in the deployment steps of.github/workflows/development_workflow.yml
and.github/workflows/release_workflow.yml
. These are passed to Ansible to fill in theenv.j2
file.
For manual deployment:
- add them to the
production_secrets.ini
andstaging_secrets.ini
file found underhueb/deployment/ansible/inventory/
Deployments are fully automated and are executed via Github Actions, Ansible, Docker, and Docker-Compose. Secrets are stored in the Github secret store.
Important Note: This project supports [sentry] and [honeycomb] as monitoring and observability solutions. I have replaced the API credentials with an empty string as the accounts ran under my name. You can add them back by registering accounts for these services and updating the Github Secrets with the tokens.
Currently we have two different servers hosted in the heiCLOUD:
On both servers, the backups are located on /db_dump/backup/*
and the repository in /hueb
. Later one is only used to have all deployment scripts and configuration file (/hueb/deployment/docker.env
) locally.
The applications consist out of three services, listed in the docker_compose.yml:
hueb
- the Django application running everythingproxy
- the Nginx proxy handling SSL and proxying tohueb
database
- a Postgres database supporting everything with the cronjobs responsible for backing up the data
Commits pushed to Github will cause the .github/workflows/development_workflows.yml
to run. This workflow runs black, flake8, tests, the application- and database container build in parallel. The container images are pushed to Githubs container registry under the tags TODO
The staging CI Process is normally aborted at this point. The exception is a commit published on the development
branch. It is deployed via Ansible directly to hueb-staging.iued.uni-heidelberg.de.
A deployment to hueb.iued.uni-heidelberg.de is triggered by pushing a tag with the structure v*.*.*
and runs the same steps as for staging. It uses the tags: TODO for its docker images and adds two steps to publish release notifications to Sentry and Honeycomb.
The application can be deployed from your host computer via Ansible (pip3 install ansible
).
Before the first execution, additional dependencies for Ansible must be installed. Change directory to hueb/deployment/ansible
and execute the commands:
ansible-galaxy role install -r requirements.yml
ansible-galaxy collection install -r requirements.yml
You need a fully configured production_secrets.ini
or staging_secrets.ini
(depending on infrastructure) file placed in hueb/deployment/ansible/inventory/
.
Pay additional attention to the tag
variable. This should contain either a git sha
or a git tag
. This value is used to pull the correct docker image from the registry.
Change your working directory to hueb/deployment/ansible
enter the following command to deploy to production:
ansible-playbook -i inventory/production.ini -e @inventory/production_secrets.ini provision.yml
This executes the steps outlined in the provision.yml
playbook against the host described in inventory/production.ini
while supplying variables from inventory/production_secrets.ini
(the @ is necessary there).
Exchanging production
with staging
affects the other environment.
The available playbooks are:
provision.yml
updates the server to run the configured version of the application. An empty server can be provisioned by temporarily removing thecreate_backup
role.restore_backup.yml
creates a backup of the database, drops the current database, and restores from another backup file. The name of which is asked during execution.backup.yml
creates an additional backup of the database at this point in time.
Where could you start? Where is more work to do?
- Fortify the review system. It is currently implemented without much complexity. Every document starts of as unreviewed, and only persons with the review permissions are allowed to change this. This property is not reset if changes are made to the document manually. This is tolerable because we are currently working in an append-only mode where a group of colleagues is adding documents and others are reviewing them. Later changes are not common. This should be changed. You can search for unreviewed changes by checking if a new document revision has been added to the document by a non-reviewer after it has been originally reviewed.
- Migrate the old datasets to the new data model. The data is in Postgres and Django models exist. The migration should be relatively straightforward. You can use
hueb/src/apps/hueb20/data_migrations
as a reference. The bigger challenge is keeping the data sources distinguishable and making sure the data is correct. I suggest continuing marking all data with their source using theHUEB_APPLICATIONS
enum provided in utils.py. Correctness will be especially challenging for thehueb_legacy_latein
dataset because it contained tables named*_new
, which added multiple m:m-tables. I suspect that the models with twoNew
likeOriginalNewAuthorNew
are the most promising ones, which is the reason why they are displayed in the admin UI. But some kind of review is probably necessary. - Make a better search interface. The current one was created without real feedback or user interaction and more as a proof of concept. The implementation with Q-objects is probably fine for a start.
- Make a better backend. The backend is pretty barebones. It uses autocompletion and whatnot but could benefit a lot from some kind of guidance/workflow for our users. For example: add a view to see which documents you've added yourself, what changes have you made last, create a nicer list view which isn't so wide, ...