AWT-Project

Requirements

Install Docker and Docker-Compose, refer to the installation instructions at https://www.docker.com/ and https://docs.docker.com/compose/
If developing or generating the HTML Documentation:
- Install Python with Version 3.9, see https://www.python.org/downloads/
- Install pipenv, refer to installation instructions at https://pipenv.pypa.io/en/latest/

Getting started

Run the following in root folder to start the system:

docker-compose up --build Build Server and then start Neo4J Database and Server
curl -X POST http://localhost:5000/competencies/initialize Initialize the Database and Store (takes around 5 Minutes) or
Go to http://localhost:5000/api/docs and execute the "Initialize" Endpoint for Competencies

Set up the pre-commit hook

If you haven't already run pipenv install and then run

pre-commit install

The first time you commit something it will take a little longer to initialize the dependencies but usually the pre-commit hook only checks the diff, so it should be fast.

Development

Use the following commands for development (in the root folder):

Create a .env file
Paste (and adjust if necessary) the following content into the .env file:

DB_URI=bolt://localhost:7687
DATA_FILE=./data/skills_de.csv
COURSES_FILE=./data/courses_preprocessed.csv
MODEL_FILES=./data/MLmodel
NLTK_FILES=./data/lemma_cache_data/nltk_data
MORPHYS_FILE=./data/lemma_cache_data/morphys.csv
STOPWORDS_FILE=./data/lemma_cache_data/stopwords-de.txt
ML_DIR=./ML/
LABELED_COMPETENCIES_FILE=./data/preproccessed_labels.csv

docker-compose up db to only start Neo4J Database
pipenv install to install requirements
pipenv run python -m flask run to start the server (for Dev/Debug purposes)
curl -X POST http://localhost:5000/competencies/initialize to initialize the Database and Store (takes around 5 Minutes)

Running the Unit Tests

After having executed the prerequisites for Development in General (make sure the database is running), use the following commands to run the tests:

If the database is already initialized: Run pipenv run pytest tests/ -k 'not initialize'
If the database is not initialized, to test the initialization: Run pipenv run pytest tests/ -k 'initialize'

Clean up Database

match (a) -[r] -> () delete a, r to clean up relations
match (a) delete a to clean up nodes

Train Machine Learning Competency Extractor

Use the following commands to reproduce the Machine Learning model used in the Machine Learning based Competency Extractor:

pipenv run python app/machine_learning.py this creates the spacy files for training and testing the model
cd ML navigate the console to the "ML" directory
pipenv run python -m spacy train config.cfg --output ./output train and test the model with the created spacy files

Documentation of API

You can find the documentation of our API at http://localhost:5000/api/docs once you have the system up and running.

Generate HTML Documentation of the Project

A recent version of the HTML Documentation of the Code can be found in the docs/html folder. However, to manually generate the latest version based on the current source code, execute:

pipenv install to install required dependencies
pipenv run make html to generate HTML documentation based on the current Source Code

You will find the generated HTML Documentation afterwards in the build/html Folder. Just drag and drop the index.html File into a Browser to start browsing the Documentation.

Preprocessing

To use the preprocessing pipeline use the following code:

from app.preprocessing_utils import PreprocessorGerman
prc_pipeline = PreprocessorGerman()
preprocessed_course_descriptions = prc_pipeline.preprocess_course_descriptions(course_descriptions)

Permission troubleshooting

If the data folder doesn't show up or cannot be opened try sudo chmod a+r data -R.

Machine Learning

To use the trained Entity Recognition Model use the following code:


import spacy

nlp = spacy.load(path_to_model)

doc = nlp()

ents = doc.ents

Name		Name	Last commit message	Last commit date
Latest commit History 155 Commits
.idea		.idea
.vscode		.vscode
ML		ML
app		app
data		data
docs		docs
tests		tests
.dockerignore		.dockerignore
.flake8		.flake8
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
Dockerfile		Dockerfile
LICENSE		LICENSE
Makefile		Makefile
Pipfile		Pipfile
Pipfile.lock		Pipfile.lock
README.md		README.md
docker-compose.yml		docker-compose.yml
make.bat		make.bat
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

AWT-Project

Requirements

Getting started

Set up the pre-commit hook

Development

Running the Unit Tests

Clean up Database

Train Machine Learning Competency Extractor

Documentation of API

Generate HTML Documentation of the Project

Preprocessing

Permission troubleshooting

Machine Learning

About

Releases 1

Packages

Contributors 3

Languages

License

amir-mo1999/AWT-Project

Folders and files

Latest commit

History

Repository files navigation

AWT-Project

Requirements

Getting started

Set up the pre-commit hook

Development

Running the Unit Tests

Clean up Database

Train Machine Learning Competency Extractor

Documentation of API

Generate HTML Documentation of the Project

Preprocessing

Permission troubleshooting

Machine Learning

About

Resources

License

Stars

Watchers

Forks

Releases 1

Packages 0

Contributors 3

Languages

Packages