This toolkit provides an orchestrator microservice that integrates PrimeQA's retriever & reader modules as a REST Server and also other "search" capabilities e.g. IBM Watson Discovery.
Hence, using this orchestrator one can either integrate a neural retriever like ColBERT from PrimeQA or external search e.g. IBM Watson Discovery to fetch documents and then use PrimeQA's reader to extract answer spans from those relevant documents.
- PrimeQA: If you don't have access to running PrimeQA instance, then please refer to PrimeQA repository for more details on setting and running a local one.
- Watson Discovery (Optional): Follow instructions on IBM Cloud to configure Watson Discovery V2 service.
- Setup and activate a Virtual Environment (as shown below) or use Miniconda
# Install virtualenv
pip3 install virtualenv
# Create a new virtual environment for this project. If using pyenv, path_to_python_3.9_executable will be ~/.pyenv/versions/3.9.x/bin/python
virtualenv --python=<path_to_python_3.9_executable> venv
# Activate virtual environment
source venv/bin/activate
- Install dependencies
pip install -r requirements.txt
pip install -r requirements_test.txt
🐛 gprcio
and grpcio-tools
has limited support on Apple Silicone (M1, M2). Please refer to grpc github issue#25082 for details or download appropriate wheels from here.
Orchestrator service REST server supports mutual or two-way TLS authentication (also known as mTLS). Application's config.ini
file contains the default certificate paths, but they can be overridden using environment variables.
Self-signed certificates are generated and packaged with the Docker build. Self-signed certs may be required for local development and testing. If you want to generate them, follow the steps below:
#!/usr/bin/env bash
# Make neccessary directories
mkdir -p security/
mkdir -p security/certs/
mkdir -p security/certs/ca security/certs/server security/certs/client
# Generate CA key and CA cert
openssl req -x509 -days 365 -nodes -newkey rsa:4096 -subj "/C=US/ST=New York/L=Yorktown Heights/O=IBM/OU=Research/CN=example.com" -keyout security/certs/ca/ca.key -out security/certs/ca/ca.crt
# Generate Server key (without passphrase) and Server cert signing request
openssl req -nodes -new -newkey rsa:4096 -subj "/C=US/ST=New York/L=Yorktown Heights/O=IBM/OU=Research/CN=example.com" -keyout security/certs/server/server.key -out security/certs/server/server.csr
# Sign Server cert
openssl x509 -req -days 365 -in security/certs/server/server.csr -CA security/certs/ca/ca.crt -CAkey security/certs/ca/ca.key -CAcreateserial -out security/certs/server/server.crt
# Generate Client key (without passphrase) and Client cert signing request
openssl req -nodes -new -newkey rsa:4096 -subj "/C=US/ST=New York/L=Yorktown Heights/O=IBM/OU=Research/CN=example.com" -keyout security/certs/client/client.key -out security/certs/client/client.csr
# Sign Client cert
openssl x509 -req -days 365 -in security/certs/client/client.csr -CA security/certs/ca/ca.crt -CAkey security/certs/ca/ca.key -CAserial security/certs/ca/ca.srl -out security/certs/client/client.crt
# Delete signing requests
rm -rf security/certs/server/server.csr
rm -rf security/certs/client/client.csr
IMPORTANT:
-
By default, the application tries to load certs from
/opt/tls
. You will need to update appropriatetls_*
variables inconfig.ini
during local use. -
We recommend to generate certificates with official signing authority and use them via volume mounts in the application container.
- Open Python IDE & set the created virtual environment
- Open
orchestrator/services/config/config.ini
, setrequire_ssl = True
(if you wish to use TLS authentication) &rest_port
- Generate GRPC:
#!/usr/bin/env bash set -xeuo pipefail python -m grpc_tools.protoc -I ./orchestrator/integrations/primeqa/protos --python_out=orchestrator/integrations/primeqa/grpc_generated --grpc_python_out=orchestrator/integrations/primeqa/grpc_generated orchestrator/integrations/primeqa/protos/indexer.proto python -m grpc_tools.protoc -I ./orchestrator/integrations/primeqa/protos --python_out=orchestrator/integrations/primeqa/grpc_generated --grpc_python_out=orchestrator/integrations/primeqa/grpc_generated orchestrator/integrations/primeqa/protos/parameter.proto python -m grpc_tools.protoc -I ./orchestrator/integrations/primeqa/protos --python_out=orchestrator/integrations/primeqa/grpc_generated --grpc_python_out=orchestrator/integrations/primeqa/grpc_generated orchestrator/integrations/primeqa/protos/reader.proto python -m grpc_tools.protoc -I ./orchestrator/integrations/primeqa/protos --python_out=orchestrator/integrations/primeqa/grpc_generated --grpc_python_out=orchestrator/integrations/primeqa/grpc_generated orchestrator/integrations/primeqa/protos/retriever.proto 2to3 --fix=import --nobackups --write orchestrator/integrations/primeqa/grpc_generated
- Open
application.py
and run/debug - Go to http://localhost:{rest_port}/docs
- To be able to use
reader
,indexer
andretriever
services, be sure you have access to running instance of PrimeQA container
- Open
config.ini
and setrest_port
- Open
Dockerfile
and set the same value toport
- Run
docker build -f Dockerfile -t primeqa-orchestrator:$(cat VERSION) .
(creates docker image) - Run
docker run --rm --name primeqa-orchestrator -d -p <port>:<port> --mount type=bind,source="$(pwd)"/store,target=/store -e STORE_DIR=/store primeqa-orchestrator:$(cat VERSION)
(run docker container) - Go to <http://{Container's public URL}:{rest_port}/docs>
- To be able to use
reader
,indexer
andretriever
services, be sure you have access to running instance of PrimeQA container
-
Before first use, you will need to specify few neccessary configurations to connect to third-party depedencies. These setting are intentionally left blank for security purposes.
-
Go to
STORE_DIR
directory on your local machine and copy the primeqa.json file in that directory. -
You will need to add/update the
settings
portion inprimeqa.json
file. Primarily addservice_endpoint
information (inclusive of port) forPrimeQA
inretriever
andreader
sections in settings.a. To use a IBM® Watson Discovery based retriever, add/update
Watson Discovery
add the following to the list in theretrievers
section."Watson Discovery": { "service_endpoint": "<IBM® Watson Discovery Cloud/CP4D Instance Endpoint>", "service_api_key": "<API key (If using IBM® Watson Discovery Cloud instance)>", "service_project_id": "<IBM® Watson Discovery Project ID>" }
b. For PrimeQA based retrievers, add/update
PrimeQA
related section inretrievers
as follows"PrimeQA": { "service_endpoint": "<Primeqa Instance Endpoint>:<Port>" }
c. For PrimeQA based readers, add/update
PrimeQA
related section inreaders
as follows"PrimeQA": { "service_endpoint": "<Primeqa Instance Endpoint>:<Port>", "beta": 0.7 }
For example, to enable both
IBM® Watson Discovery
instance based retriever andPrimeQA
based retrievers andPrimeQA
based reader, the settings will look as follows{ "retrievers": { "Watson_Discovery": { "service_endpoint": "<IBM® Watson Discovery CP4D Instance Endpoint>", "service_api_key": "<API key (If using IBM® Watson Discovery Cloud instance)>", "service_project_id": "<IBM® Watson Discovery Project ID>" }, "PrimeQA": { "service_endpoint": "<Primeqa Instance Endpoint>:<Port>" } }, "readers": { "PrimeQA": { "service_endpoint": "<Primeqa Instance Endpoint>:<Port>", "beta": 0.7 } } }
NOTE: The final scoring and ranking is done with a weighted sum of the Reader answer scores and Retriever search hits scores. The
beta
field is the weight assigned to the reader scores and1-beta
is the weight assigned to the retriever scores.
- To see all available retrievers, execute [GET]
/retrievers
endpoint
curl -X 'GET' 'http://{PUBLIC_IP}:50059/retrievers' -H 'accept: application/json'
- To see all available readers, execute [GET]
/readers
endpoint
curl -X 'GET' 'http://{PUBLIC_IP}:50059/readers' -H 'accept: application/json'
curl -X 'GET' \
'http://localhost:50059/feedbacks?application=reading&application=qa&_format=primeqa' \
-H 'accept: application/json' > feedbacks.json
curl -X 'GET' \
'http://localhost:50059/feedbacks?application=retrieval&_format=primeqa' \
-H 'accept: application/json' > feedbacks.json
Keep PrimeQA documentation reference sync
Anytime this README files is updated, it is necessary to open a PR on PrimeQA repository to update, with the same modifications, the associated file used on documentation page.
Do not modify initial image path