Code for the Serverless LLM article on picovoice.ai which you can find here: picoLLM on Lambda.
THIS DEMO EXCEEDS AWS FREE TIER USAGE. YOU WILL BE CHARGED BY AWS IF YOU DEPLOY THIS DEMO.
You will need to following in order to deploy and run this demo:
-
A Picovoice Console account with a valid AccessKey.
-
An AWS account.
-
AWS SAM CLI installed and setup. Follow the offical guide completely.
-
A valid Docker installation.
- Clone the
serverless-picollm
repo:
git clone https://github.com/Picovoice/serverless-picollm.git
- Download a
Phi2
based.pllm
model from thepicoLLM
section of the Picovoice Console.
Tip
Other models will work as long as they are chat-enabled and fit within the AWS Lambda code size and memory limits.
You will also need to update the Dialog
object in client.py to the appropriate class.
For example, if using Llama3
with the llama-3-8b-instruct-326
model, the line in client.py should be updated to:
dialog = picollm.Llama3ChatDialog(history=3)
-
Place the downloaded
.pllm
model in themodels/
directory. -
Replace
"${YOUR_ACCESS_KEY_HERE}"
inside thesrc/app.py
file with your AccessKey obtained from Picovoice Console.
- Use AWS SAM CLI to build the app:
sam build
- Use AWS SAM CLI to deploy the app, following the guided prompts:
sam deploy --guided
- At the end of the deployment AWS SAM CLI will print an outputs section. Make note of the
WebSocketURI
. It should look something like this:
CloudFormation outputs from deployed stack
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Outputs
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Key HandlerFunctionFunctionArn
Description HandlerFunction function ARN
Value arn:aws:lambda:us-west-2:000000000000:function:picollm-lambda-HandlerFunction-ABC123DEF098
Key WebSocketURI
Description The WSS Protocol URI to connect to
Value wss://ABC123DEF098.execute-api.us-west-2.amazonaws.com/Prod
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
wss://ABC123DEF098.execute-api.us-west-2.amazonaws.com/Prod
Note
If you make any changes to the model, Dockerfile
or app.py
files, you will need to repeat all these deployment steps.
- Run
client.py
, passing in the URL copied from the deployment step:
python client.py -u <WebSocket URL>
- Once connected the client will give you a prompt. Type in your chat message and
picoLLM
will stream back a response from the lambda!
> What is the capital of France?
< The capital of France is Paris.
< [Completion finished @ `6.35` tps]
Important
When you first send a message you may get the following response: < [Lambda is loading & caching picoLLM. Please wait...]
.
This means the picoLLM
is loading the model as lambda streams it from the Elastic Container Registry.
Because of the nature and limitations of AWS Lambda this process may take upwards of a few minutes.
Subsequent messages and connections will not take as long to load as lambda will cache the layers.