Fast, native semantic search for your local documents.
About
·
Features
·
Installation
·
Development
Codexify is a desktop application that brings powerful semantic search to your local documents. While there are many document search tools available, Codexify differentiates itself by being:
- Fast: Native performance with optimized vector operations
- Private: All processing happens locally on your machine
- Smart: Uses modern AI embeddings for semantic understanding
- Simple: Clean, intuitive interface for managing document collections
-
📚 Document Support
- PDF files
- Word documents (.docx)
- Text files
- HTML files
- EPUB books
-
🔍 Smart Search
- Semantic understanding of queries
- Results ranked by relevance
- Cross-collection search
- Highlighted matches
-
📁 Collection Management
- Create document collections
- Drag-and-drop file import
- Batch operations
- Collection filtering
-
🎨 Modern UI
- Clean, minimal interface
- Dark/light mode support
- Responsive design
- Native platform feel
# Clone the repository
git clone https://github.com/sidmohan0/codexify.git
# Enter the project directory
cd codexify/electron
# Install dependencies
npm install
# Start the application
npm start
Codexify is built with:
- Electron for cross-platform desktop support
- Transformers.js for AI embeddings
- SQLite for document storage
- React + shadcn/ui for the interface
codexify/electron/
├── src/
│ ├── components/ # UI components
│ ├── lib/ # Utility functions
│ ├── model.js # AI model handling
│ ├── db.js # Database operations
│ └── index.js # Main electron process
├── assets/ # Images and static files
└── package.json
See DEV_NOTES.md for detailed development information and roadmap.
┌──────────────────────────────────────┐
│ Electron App │
├──────────────────────────────────────┤
│ ┌────────────┐ ┌────────────┐ │
│ │ React │ IPC │ Main │ │
│ │ Frontend │<─────>│ Process │ │
│ │ │ │ │ │
│ └────────────┘ └─────┬──────┘ │
│ ▲ │ │
│ │ ▼ │
│ ┌──────┴──────┐ ┌──────────────┐ │
│ │ shadcn/ui │ │ Transformers │ │
│ └─────────────┘ │ Pipeline │ │
│ └───────┬──────┘ │
│ │ │
│ ┌───────▼──────┐ │
│ │ SQLite DB │ │
│ │ (Vector DB) │ │
│ └─────────────┘ │
└──────────────────────────────────────┘
-
Document Processing
- Text Extraction: Native parsers for PDF (pdf-parse), DOCX (mammoth), HTML, EPUB
- Chunking: Custom text segmentation with overlap for context preservation
-
Embedding Generation
- Model:
Xenova/all-MiniLM-L6-v2
(384-dimensional embeddings) - Framework: Transformers.js for local inference
- Optimization: Quantized model for faster processing
- Model:
-
Vector Search
- Storage: SQLite with BLOB storage for embeddings
- Similarity: Cosine similarity with L2 normalization
- Ranking: Score-based with semantic highlighting
-
Document Ingestion
Document → Text Extraction → Chunking → Embedding → SQLite Storage
-
Search Process
Query → Embedding → Vector Similarity → Result Ranking → UI Display
- Embedding Generation: ~100ms per text chunk
- Vector Search: Sub-second for collections < 10k documents
- Memory Usage: ~200MB baseline, scales with collection size
- Storage: ~1.5KB per text chunk (384-dim float32 vector)
Contributions are welcome! Please read the Contributing Guide for details on the code of conduct and the process for submitting pull requests.
This project is licensed under the MIT License - see the LICENSE file for details.