The repository contains the source code of the steps for the tag search engine.
docker build -t op_semantic_engine:latest .
docker run -dit -v $(pwd)/model:/app/model -p 80:8080 --env-file .env op_semantic_engine:latest
We currently use sentence transformers as text embeddings
To index manual mappings (e.g. synonym_json_test.json), execute the following command.
bash scripts/index_manual_mapping.sh
It is recomended to remove the previous index, so activate clear_index as True
To validate indexing, run the following command by changing the index_name and synonyms values accordingly:
python -m tests.validation.index_validation \
--index_name manual_mapping_v4 \
--synonyms tests/search_data/imr-tag-search-indices.jsonl \
--validate singular
To check osm tags whether they are correctly defined.
python -m tests.validation.osm_tag_validation \
--input_file {e.g. imr-tag-db_v2.jsonl} >> incorrectly_defined_osm_tags.txt
It is important that tag-imr file should not contain duplicate entries. Run the following code to check the duplicates:
python -m app.search_engine.check_duplicates \
--input_file {e.g. imr-tag-db_v2.jsonl} \
--output_file duplicates.txt
If duplicates.txt
is not empty, contact with the data providers at the team.
curl -X DELETE "[HOST]:9200/manual_mapping?pretty"
Alternatively detail description can be found at link