- Create a stage to store the PDFs (Demo uses a stage called RAG)
- Load the pdfs via the UI
- Go to projects, notebooks, and upload 4_rag_sf_notebook.ipynb
- In the packages drop down add in the packages from the environment.yml file
- Run the notebook!
-
Utilize the
environment.ymlfile to set up your Python environment for the demo:- Examples in the terminal:
conda env create -f environment.ymlmicromamba create -f environment.yml -y
- Examples in the terminal:
-
Create a
.envfile and populate it with your account details:SNOWFLAKE_ACCOUNT = abc123 SNOWFLAKE_USER = username SNOWFLAKE_PASSWORD = your_password SNOWFLAKE_ROLE = your_role SNOWFLAKE_WAREHOUSE = warehouse_name SNOWFLAKE_DATABASE = database_name SNOWFLAKE_SCHEMA = schema_name
This lesson will:
- Create a stage for your unstructured documents (PDFs in this case).
- Create a UDF named
readpdfthat reads in the PDF as raw text. - Create a UDF that chunks the text leveraging Langchain.
- Create a vector store leveraging Cortex to create embeddings out of the chunks.
- Show the vector store.
- Create a table that will track all inputs and outputs from the Streamlit app.
- Showcase how we can query the most relevant chunks from the vector store.
- Showcase how we can leverage Cortex LLMs to get answers from the relevant chunks.
Copy Streamlit app code into SiS and ask the question: "What % of Snowflake customers process unstructured data?" and watch it in action!