A high-performance Retrieval-Augmented Generation (RAG) server built with TypeScript, Express.js, and modern vector databases. This server provides APIs for document parsing, embedding, and semantic search capabilities.
- Document Processing: Parse various document formats using Apache Tika
- Text Embedding: Support for multiple embedding providers (OpenAI, Cohere)
- Vector Storage: Qdrant integration for efficient vector storage and retrieval
- Semantic Search: Fast similarity search across embedded documents
- Text Tokenization: Built-in tokenization with tiktoken
- Docker Support: Complete Docker Compose setup for easy deployment
- TypeScript: Full type safety and modern development experience
The server exposes the following REST API endpoints under /api/v1
:
/parse
- Document parsing and text extraction/embed
- Document embedding and vector storage/search
- Semantic search across embedded documents/tokenize
- Text tokenization services
- Runtime: Node.js with TypeScript
- Framework: Express.js
- Vector Database: Qdrant
- Document Processing: Apache Tika
- Embedding Providers: OpenAI, Cohere
- Validation: Zod schemas
- Logging: Consola
- Containerization: Docker & Docker Compose
- Node.js 22+
- Docker and Docker Compose
- Environment variables configured (see Configuration)
-
Clone the repository
git clone https://github.com/hopkins385/rag-server-ts.git cd rag-server-ts
-
Install dependencies
npm install
-
Configure environment
cp .env.example .env # Edit .env with your configuration
-
Start development server
npm run dev
-
Start the complete stack
docker-compose up -d
This will start:
- RAG Server (API)
- Qdrant (Vector Database)
- Apache Tika (Document Processing)
-
For development with hot reload
docker-compose -f docker-compose.dev.yml up -d
POST /api/v1/embed/file
Content-Type: application/json
{
"mediaId": "unique-media-id",
"recordId": "unique-record-id",
"mimeType": "application/pdf",
"filePath": "/path/to/document.pdf"
}
POST /api/v1/search/vector
Content-Type: application/json
{
"query": "your search query",
"recordIds": ["record-id-1", "record-id-2"]
}
POST /api/v1/tokenize/text
Content-Type: application/json
{
"text": "text to tokenize"
}
- Fork the repository
- Create a feature branch (
git checkout -b feature/amazing-feature
) - Commit your changes (
git commit -m 'Add some amazing feature'
) - Push to the branch (
git push origin feature/amazing-feature
) - Open a Pull Request
- Follow TypeScript best practices
- Add tests for new features
- Update documentation as needed
- Run linting and type checking before committing
- Use conventional commit messages
This project is licensed under the MIT License - see the LICENSE file for details.
- Qdrant - Vector similarity search engine
- Apache Tika - Content analysis toolkit
- OpenAI - AI platform for embeddings
- Cohere - Natural language AI platform
If you have any questions or run into issues, please:
- Check the Issues page
- Create a new issue with detailed information
- Join our community discussions
Built with ❤️ and Appreciation by Sven Stadhouders