Example of running Valohai execution on Snowpark

This code example uses the Insurance dataset from Kaggle and originated from tx-smitht's repository with adjustment to work better with Valohai.

In this example, there are 4 available steps that can be used to construct pipelines

Data loading:
- Load the data from the csv file
- Insert 10000 rows into a table called SOURCE_OF_TRUTH
- Insert the rest of the data into a table called INCOMING_DATA_SOURCE
Model training:
- Use the data in SOURCE_OF_TRUTH to train and test a model to predict the insurance charges
- Log the model to Snowflake's Model registry
- Model files are also saved in Valohai store
Mock streaming data:
- Use the some of the data in INCOMING_DATA_SOURCE and insert it to LANDING_TABLE to mock the process of data coming in.
Running inference:
- The model will be used to run inference on the incoming data
- The result is saved in the INSURANCE_GOLD where the predicted price is added for comparison
Deploy a streamlit app as a long runing service in Snowpark Container Service:
- Trigger a long runing service in Snowpark Container Service, which host a Streamlit app that connects to Snowflake database and display the data.
- This example uses Streamlit, but you can also host FastAPI, Flash or any other framework app.

This example uses snowflake-ml-python for training the model, but it can also be easily replaced with other popular libraries as you have full control over the environment in Valohai.

Prerequisites

Streamlit app docker image

Use snowflake credentials to login to docker CLI
Build docker image of the insurance_streamlit app, using the Dockerfile file.
Push the image into Snowflake registry

Preparing the database in Snowflake

CREATE or REPLACE TABLE INSURANCE.ML_PIPE.SOURCE_OF_TRUTH (
	AGE NUMBER(38,0),
	GENDER VARCHAR(16777216),
	BMI FLOAT,
	CHARGES FLOAT,
	CHILDREN NUMBER(38,0),
	SMOKER VARCHAR(16777216),
	REGION VARCHAR(16777216),
	MEDICAL_HISTORY VARCHAR(16777216),
	FAMILY_MEDICAL_HISTORY VARCHAR(16777216),
	EXERCISE_FREQUENCY VARCHAR(16777216),
	OCCUPATION VARCHAR(16777216),
	COVERAGE_LEVEL VARCHAR(16777216)
);

CREATE or REPLACE TABLE INSURANCE.ML_PIPE.INCOMING_DATA_SOURCE (
	AGE NUMBER(38,0),
	GENDER VARCHAR(16777216),
	BMI FLOAT,
	CHARGES FLOAT,
	CHILDREN NUMBER(38,0),
	SMOKER VARCHAR(16777216),
	REGION VARCHAR(16777216),
	MEDICAL_HISTORY VARCHAR(16777216),
	FAMILY_MEDICAL_HISTORY VARCHAR(16777216),
	EXERCISE_FREQUENCY VARCHAR(16777216),
	OCCUPATION VARCHAR(16777216),
	COVERAGE_LEVEL VARCHAR(16777216)
);


CREATE OR REPLACE EVENT TABLE INSURANCE.ML_PIPE.MODEL_TRACES;
ALTER ACCOUNT SET EVENT_TABLE = INSURANCE.ML_PIPE.MODEL_TRACES;

CREATE or REPLACE TABLE INSURANCE.ML_PIPE.LANDING_TABLE (
	AGE NUMBER(38,0),
	GENDER VARCHAR(16777216),
	BMI FLOAT,
	CHARGES FLOAT,
	CHILDREN NUMBER(38,0),
	SMOKER VARCHAR(16777216),
	REGION VARCHAR(16777216),
	MEDICAL_HISTORY VARCHAR(16777216),
	FAMILY_MEDICAL_HISTORY VARCHAR(16777216),
	EXERCISE_FREQUENCY VARCHAR(16777216),
	OCCUPATION VARCHAR(16777216),
	COVERAGE_LEVEL VARCHAR(16777216)
);

-- Create the stream on the landing table
CREATE OR REPLACE STREAM STREAM_ON_LANDING ON TABLE LANDING_TABLE;

-- Create a gold table for the records and their predictions to land
CREATE OR REPLACE TABLE INSURANCE_GOLD(
    AGE NUMBER(38,0),
	GENDER VARCHAR(16777216),
	BMI FLOAT,
	CHILDREN NUMBER(38,0),
	SMOKER VARCHAR(16777216),
	REGION VARCHAR(16777216),
	MEDICAL_HISTORY VARCHAR(16777216),
	FAMILY_MEDICAL_HISTORY VARCHAR(16777216),
	EXERCISE_FREQUENCY VARCHAR(16777216),
	OCCUPATION VARCHAR(16777216),
	COVERAGE_LEVEL VARCHAR(16777216),
    METADATA$ROW_ID VARCHAR(16777216),
    METADATA$ISUPDATE BOOLEAN,
    METADATA$ACTION VARCHAR(16777216),
    METADATA_UPDATED_AT DATE,
    CHARGES FLOAT,
    PREDICTED_CHARGES FLOAT
);

Note: If neccessary, grant access to the user role used in Valohai Environment

GRANT ALL ON SCHEMA INSURANCE.ML_PIPE TO ROLE <your-role>;

Name		Name	Last commit message	Last commit date
Latest commit History 11 Commits
insurance_streamlit		insurance_streamlit
utils		utils
.dockerignore		.dockerignore
.gitignore		.gitignore
Dockerfile		Dockerfile
Readme.md		Readme.md
deploy_streamlit_app.py		deploy_streamlit_app.py
load_insurance_data.py		load_insurance_data.py
make_predictions.py		make_predictions.py
requirements.txt		requirements.txt
simulate_streamed_data.py		simulate_streamed_data.py
test_insurance_connection.py		test_insurance_connection.py
train_and_save_model.py		train_and_save_model.py
valohai.yaml		valohai.yaml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Example of running Valohai execution on Snowpark

Prerequisites

Streamlit app docker image

Preparing the database in Snowflake

About

Uh oh!

Releases

Packages

Uh oh!

Languages

valohai/snowpark-example

Folders and files

Latest commit

History

Repository files navigation

Example of running Valohai execution on Snowpark

Prerequisites

Streamlit app docker image

Preparing the database in Snowflake

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages