mayooear · webtaskforce3 · Mar 21, 2023 · Mar 21, 2023 · Mar 23, 2023 · Mar 23, 2023
diff --git a/.env.example b/.env.example
@@ -1,6 +1,8 @@
 OPENAI_API_KEY=
 
-# Update these with your Supabase details from your project settings > API
-PINECONE_API_KEY=
+# Update these with your pinecone details from your dashboard. 
+# PINECONE_INDEX_NAME is in the indexes tab under "index name" in blue
+# PINECONE_ENVIRONMENT is in indexes tab under "Environment". Example: "us-east1-gcp"
+PINECONE_API_KEY=  
 PINECONE_ENVIRONMENT=
-
+PINECONE_INDEX_NAME= 
diff --git a/.gitignore b/.gitignore
@@ -38,3 +38,5 @@ next-env.d.ts
 
 #Notion_db
 /Notion_DB
+
+.yarn/
diff --git a/README.md b/README.md
@@ -1,31 +1,39 @@
-# GPT-4 & LangChain - Create a ChatGPT Chatbot for Your PDF Docs
+# GPT-4 & LangChain - Create a ChatGPT Chatbot for Your PDF Files
 
-Use the new GPT-4 api to build a chatGPT chatbot for Large PDF docs (56 pages used in this example).
+Use the new GPT-4 api to build a chatGPT chatbot for multiple Large PDF files.
 
 Tech stack used includes LangChain, Pinecone, Typescript, Openai, and Next.js. LangChain is a framework that makes it easier to build scalable AI/LLM apps and chatbots. Pinecone is a vectorstore for storing embeddings and your PDF in text to later retrieve similar docs.
 
 [Tutorial video](https://www.youtube.com/watch?v=ih9PBGVVOO4)
 
-[Get in touch via twitter if you have questions](https://twitter.com/mayowaoshin)
+[Join the discord if you have questions](https://discord.gg/E4Mc77qwjm)
 
 The visual guide of this repo and tutorial is in the `visual guide` folder.
 
 **If you run into errors, please review the troubleshooting section further down this page.**
 
+Prelude: Please make sure you have already downloaded node on your system and the version is 18 or greater.
+
 ## Development
 
-1. Clone the repo
+1. Clone the repo or download the ZIP
 
 ```
 git clone [github https url]
 ```
 
 2. Install packages
 
+First run `npm install yarn -g` to install yarn globally (if you haven't already).
+
+Then run:
+
 ```
-pnpm install
+yarn install
 ```
 
+After installation, you should now see a `node_modules` folder.
+
 3. Set up your `.env` file
 
 - Copy `.env.example` into `.env`
@@ -37,28 +45,30 @@ OPENAI_API_KEY=
 PINECONE_API_KEY=
 PINECONE_ENVIRONMENT=
 
+PINECONE_INDEX_NAME=
+
 ```
 
 - Visit [openai](https://help.openai.com/en/articles/4936850-where-do-i-find-my-secret-api-key) to retrieve API keys and insert into your `.env` file.
-- Visit [pinecone](https://pinecone.io/) to create and retrieve your API keys.
+- Visit [pinecone](https://pinecone.io/) to create and retrieve your API keys, and also retrieve your environment and index name from the dashboard.
 
-4. In the `config` folder, replace the `PINECONE_INDEX_NAME` and `PINECONE_NAME_SPACE` with your own details from your pinecone dashboard.
+4. In the `config` folder, replace the `PINECONE_NAME_SPACE` with a `namespace` where you'd like to store your embeddings on Pinecone when you run `npm run ingest`. This namespace will later be used for queries and retrieval.
 
-5. In `utils/makechain.ts` chain change the `QA_PROMPT` for your own usecase. Change `modelName` in `new OpenAIChat` to a different api model if you don't have access to `gpt-4`. See [the OpenAI docs](https://platform.openai.com/docs/models/model-endpoint-compatibility) for a list of supported `modelName`s. For example you could use `gpt-3.5-turbo` if you do not have access to `gpt-4`, yet.
+5. In `utils/makechain.ts` chain change the `QA_PROMPT` for your own usecase. Change `modelName` in `new OpenAI` to `gpt-4`, if you have access to `gpt-4` api. Please verify outside this repo that you have access to `gpt-4` api, otherwise the application will not work.
 
-## Convert your PDF to embeddings
+## Convert your PDF files to embeddings
 
-1. In `docs` folder replace the pdf with your own pdf doc.
+**This repo can load multiple PDF files**
 
-2. In `scripts/ingest-data.ts` replace `filePath` with `docs/{yourdocname}.pdf`
+1. Inside `docs` folder, add your pdf files or folders that contain pdf files.
 
-3. Run the script `npm run ingest` to 'ingest' and embed your docs
+2. Run the script `yarn run ingest` to 'ingest' and embed your docs. If you run into errors troubleshoot below.
 
-4. Check Pinecone dashboard to verify your namespace and vectors have been added.
+3. Check Pinecone dashboard to verify your namespace and vectors have been added.
 
 ## Run the app
 
-Once you've verified that the embeddings and content have been successfully added to your Pinecone, you can run the app `npm run dev` to launch the local dev environment and then type a question in the chat interface.
+Once you've verified that the embeddings and content have been successfully added to your Pinecone, you can run the app `npm run dev` to launch the local dev environment, and then type a question in the chat interface.
 
 ## Troubleshooting
 
@@ -67,18 +77,22 @@ In general, keep an eye out in the `issues` and `discussions` section of this re
 **General errors**
 
 - Make sure you're running the latest Node version. Run `node -v`
+- Try a different PDF or convert your PDF to text first. It's possible your PDF is corrupted, scanned, or requires OCR to convert to text.
+- `Console.log` the `env` variables and make sure they are exposed.
 - Make sure you're using the same versions of LangChain and Pinecone as this repo.
-- Check that you've created an `.env` file that contains your valid (and working) API keys.
-- If you change `modelName` in `OpenAIChat` note that the correct name of the alternative model is `gpt-3.5-turbo`
-- Pinecone indexes of users on the Starter(free) plan are deleted after 7 days of inactivity. To prevent this, send an API request to Pinecone to reset the counter.
+- Check that you've created an `.env` file that contains your valid (and working) API keys, environment and index name.
+- If you change `modelName` in `OpenAI`, make sure you have access to the api for the appropriate model.
+- Make sure you have enough OpenAI credits and a valid card on your billings account.
+- Check that you don't have multiple OPENAPI keys in your global environment. If you do, the local `env` file from the project will be overwritten by systems `env` variable.
+- Try to hard code your API keys into the `process.env` variables if there are still issues.
 
 **Pinecone errors**
 
-- Make sure your pinecone dashboard `environment` and `index` matches the one in your `config` folder.
+- Make sure your pinecone dashboard `environment` and `index` matches the one in the `pinecone.ts` and `.env` files.
 - Check that you've set the vector dimensions to `1536`.
-- Switch your Environment in pinecone to `us-east1-gcp` if the other environment is causing issues.
-
-If you're stuck after trying all these steps, delete `node_modules`, restart your computer, then `pnpm install` again.
+- Make sure your pinecone namespace is in lowercase.
+- Pinecone indexes of users on the Starter(free) plan are deleted after 7 days of inactivity. To prevent this, send an API request to Pinecone to reset the counter before 7 days.
+- Retry from scratch with a new Pinecone project, index, and cloned repo.
 
 ## Credit
 

diff --git a/components/layout.tsx b/components/layout.tsx
@@ -14,7 +14,7 @@ export default function Layout({ children }: LayoutProps) {
           </nav>
         </div>
       </header>
-      <div className="container">
+      <div>
         <main className="flex w-full flex-1 flex-col overflow-hidden">
           {children}
         </main>

diff --git a/config/pinecone.ts b/config/pinecone.ts
@@ -1,8 +1,12 @@
 /**
- * Change the index and namespace to your own
+ * Change the namespace to the namespace on Pinecone you'd like to store your embeddings.
  */
 
-const PINECONE_INDEX_NAME = 'langchainjsfundamentals';
+if (!process.env.PINECONE_INDEX_NAME) {
+  throw new Error('Missing Pinecone index name in .env file');
+}
+
+const PINECONE_INDEX_NAME = process.env.PINECONE_INDEX_NAME ?? '';
 
 const PINECONE_NAME_SPACE = 'pdf-test'; //namespace is optional for your vectors
 

diff --git a/declarations/pdf-parse.d.ts b/declarations/pdf-parse.d.ts
@@ -0,0 +1,5 @@
+declare module 'pdf-parse/lib/pdf-parse.js' {
+  import pdf from 'pdf-parse';
+
+  export default pdf;
+}
diff --git a/docs/MorseVsFrederick.pdf b/docs/MorseVsFrederick.pdf
diff --git a/package.json b/package.json
@@ -16,11 +16,11 @@
   },
   "dependencies": {
     "@microsoft/fetch-event-source": "^2.0.1",
-    "@pinecone-database/pinecone": "^0.0.10",
+    "@pinecone-database/pinecone": "1.1.0",
     "@radix-ui/react-accordion": "^1.1.1",
     "clsx": "^1.2.1",
     "dotenv": "^16.0.3",
-    "langchain": "0.0.33",
+    "langchain": "^0.0.186",
     "lucide-react": "^0.125.0",
     "next": "13.2.3",
     "pdf-parse": "1.1.1",

diff --git a/pages/_document.tsx b/pages/_document.tsx
@@ -1,4 +1,4 @@
-import { Html, Head, Main, NextScript } from "next/document";
+import { Html, Head, Main, NextScript } from 'next/document';
 
 export default function Document() {
   return (

diff --git a/pages/api/chat.ts b/pages/api/chat.ts
@@ -1,6 +1,7 @@
 import type { NextApiRequest, NextApiResponse } from 'next';
-import { OpenAIEmbeddings } from 'langchain/embeddings';
-import { PineconeStore } from 'langchain/vectorstores';
+import type { Document } from 'langchain/document';
+import { OpenAIEmbeddings } from 'langchain/embeddings/openai';
+import { PineconeStore } from 'langchain/vectorstores/pinecone';
 import { makeChain } from '@/utils/makechain';
 import { pinecone } from '@/utils/pinecone-client';
 import { PINECONE_INDEX_NAME, PINECONE_NAME_SPACE } from '@/config/pinecone';
@@ -11,52 +12,71 @@ export default async function handler(
 ) {
   const { question, history } = req.body;
 
+  console.log('question', question);
+  console.log('history', history);
+
+  //only accept post requests
+  if (req.method !== 'POST') {
+    res.status(405).json({ error: 'Method not allowed' });
+    return;
+  }
+
   if (!question) {
     return res.status(400).json({ message: 'No question in the request' });
   }
   // OpenAI recommends replacing newlines with spaces for best results
   const sanitizedQuestion = question.trim().replaceAll('\n', ' ');
 
-  const index = pinecone.Index(PINECONE_INDEX_NAME);
-
-  /* create vectorstore*/
-  const vectorStore = await PineconeStore.fromExistingIndex(
-    index,
-    new OpenAIEmbeddings({}),
-    'text',
-    PINECONE_NAME_SPACE, //optional
-  );
+  try {
+    const index = pinecone.Index(PINECONE_INDEX_NAME);
 
-  res.writeHead(200, {
-    'Content-Type': 'text/event-stream',
-    'Cache-Control': 'no-cache, no-transform',
-    Connection: 'keep-alive',
-  });
+    /* create vectorstore*/
+    const vectorStore = await PineconeStore.fromExistingIndex(
+      new OpenAIEmbeddings({}),
+      {
+        pineconeIndex: index,
+        textKey: 'text',
+        namespace: PINECONE_NAME_SPACE, //namespace comes from your config folder
+      },
+    );
 
-  const sendData = (data: string) => {
-    res.write(`data: ${data}\n\n`);
-  };
+    // Use a callback to get intermediate sources from the middle of the chain
+    let resolveWithDocuments: (value: Document[]) => void;
+    const documentPromise = new Promise<Document[]>((resolve) => {
+      resolveWithDocuments = resolve;
+    });
+    const retriever = vectorStore.asRetriever({
+      callbacks: [
+        {
+          handleRetrieverEnd(documents) {
+            resolveWithDocuments(documents);
+          },
+        },
+      ],
+    });
 
-  sendData(JSON.stringify({ data: '' }));
+    //create chain
+    const chain = makeChain(retriever);
 
-  //create chain
-  const chain = makeChain(vectorStore, (token: string) => {
-    sendData(JSON.stringify({ data: token }));
-  });
+    const pastMessages = history
+      .map((message: [string, string]) => {
+        return [`Human: ${message[0]}`, `Assistant: ${message[1]}`].join('\n');
+      })
+      .join('\n');
+    console.log(pastMessages);
 
-  try {
-    //Ask a question
-    const response = await chain.call({
+    //Ask a question using chat history
+    const response = await chain.invoke({
       question: sanitizedQuestion,
-      chat_history: history || [],
+      chat_history: pastMessages,
     });
 
+    const sourceDocuments = await documentPromise;
+
     console.log('response', response);
-    sendData(JSON.stringify({ sourceDocs: response.sourceDocuments }));
-  } catch (error) {
+    res.status(200).json({ text: response, sourceDocuments });
+  } catch (error: any) {
     console.log('error', error);
-  } finally {
-    sendData('[DONE]');
-    res.end();
+    res.status(500).json({ error: error.message || 'Something went wrong' });
   }
 }
Original file line number	Diff line number	Diff line change
Expand Up		@@ -38,3 +38,5 @@ next-env.d.ts

		#Notion_db
		/Notion_DB

		.yarn/