Tokenization Demo

Understand how Large Language Models (LLMs) see your context and prompts

Built With

Next.js + tailwind
The Tiktoken library
Node version 20 or higher

Start the project

Requires Node version 20+

From the project root directory, run the following command.

npm install

There are no required environment variables for this project, and this project does not use any third party services that cost money. It simply passes the text input on the frontend to the backend API route which tokenizes the text using Tiktoken.

Start the app.

npm run dev

Project structure

In this example we opted to use Next.js and the app router, which colocates the frontend and backend code in a single repository.

Frontend Client

The frontend uses Next.js and tailwind to allow users to enter free form text. This text is split by word on the client-side and then converted to tokens by the tiktoken library when the user clicks the Tokenize text button.

The tiktoken library looks up and assigns each word an ID according to its vocabulary.

Backend API route

This project exposes an API route: /api/tokenize, that uses the Tiktoken library to tokenize text that it receives from the frontend:

import { NextRequest, NextResponse } from 'next/server';
import { encodingForModel } from "js-tiktoken";

export async function POST(req: NextRequest) {

  try {
    const enc = encodingForModel('gpt-3.5-turbo');

    const { inputText } = await req.json();
    console.log(`inputText: ${inputText}`);

    const tokens = enc.encode(inputText);

    return NextResponse.json({ tokens }, { status: 200 });
  } catch (error) {
    return NextResponse.json({ error }, { status: 500 });
  }
}

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
docs		docs
public		public
src/app		src/app
.eslintrc.json		.eslintrc.json
.gitignore		.gitignore
README.md		README.md
next.config.mjs		next.config.mjs
package-lock.json		package-lock.json
package.json		package.json
pnpm-lock.yaml		pnpm-lock.yaml
postcss.config.mjs		postcss.config.mjs
tailwind.config.ts		tailwind.config.ts
tsconfig.json		tsconfig.json

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Tokenization Demo

Understand how Large Language Models (LLMs) see your context and prompts

Built With

Start the project

Project structure

About

Uh oh!

Releases

Packages

Uh oh!

Languages

pinecone-io/tokenization-demo

Folders and files

Latest commit

History

Repository files navigation

Tokenization Demo

Understand how Large Language Models (LLMs) see your context and prompts

Built With

Start the project

Project structure

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages