This project implements an advanced job matching system powered by AI vector embeddings and MongoDB Atlas Vector Search. Using natural language processing (NLP) and machine learning, it transforms job postings and user profiles into semantic vectors, enabling precise, context-aware job matching with confidence scores and detailed reasoning.
The system fetches job postings, generates embeddings using the mixedbread-ai/mxbai-embed-large-v1 model from Hugging Face, and stores them in MongoDB Atlas. It then uses vector similarity search to match user profiles to jobs, providing ranked results with confidence scores and match explanations. Built with Spring Boot, it offers a robust RESTful API for integration.
- AI-Driven Embeddings: Leverages the
mixedbread-ai/mxbai-embed-large-v1transformer model for high-quality semantic vector representations. - Vector Search: Utilizes MongoDB Atlas Vector Search for efficient k-nearest neighbors (KNN) matching.
- Semantic Matching: Captures meaning beyond keywords, e.g., "Java developer" matches "Software engineer with Java skills."
- Confidence Scoring: Provides similarity scores (0-1) and human-readable match reasons.
- RESTful API: Exposes endpoints for generating embeddings and finding job matches.
- Language: Java
- Framework: Spring Boot
- Database: MongoDB Atlas with Vector Search
- AI Model: Hugging Face
mixedbread-ai/mxbai-embed-large-v1via LangChain4J - Dependencies:
- Spring Data MongoDB
- LangChain4J
- MongoDB Java Driver
- Trigger: An HTTP GET request to
/generate-embeddingsinitiates the process. - Fetch Posts: Retrieves all job postings from a repository (assumed to be pre-populated).
- Generate Embeddings:
- Job descriptions are extracted and sent to the
mixedbread-ai/mxbai-embed-large-v1model. - The model tokenizes the text, processes it through transformer layers, and outputs 1024-dimensional vectors.
- Vectors are converted to
BsonArrayformat for MongoDB compatibility.
- Job descriptions are extracted and sent to the
- Store: Inserts documents (job details + embeddings) into the
JobPostcollection in MongoDB Atlassample_db.
- Trigger: An HTTP POST request to
/jobs/matchwith a user profile string (e.g., "Experienced Java developer"). - Generate User Embedding:
- The user profile is processed by the same Hugging Face model to create a vector.
- Vector Search:
- MongoDB Atlas performs a KNN search using the user embedding against job embeddings.
- Returns the top X matches, in this configuration, (10) with cosine similarity scores.
- Process Results:
- Confidence scores are set from MongoDB’s
searchScore(0-1). - Match reasons are generated based on score thresholds (>0.8 = "very strong", >0.6 = "good") and tech keyword overlaps.
- Confidence scores are set from MongoDB’s
- Return: Delivers a list of
JobMatchobjects as JSON.
- Java 17+: Required to run the Spring Boot application.
- MongoDB Atlas: A cluster with Vector Search enabled.
- Create a vector index named
vector_indexon theJobPost.embeddingfield.
- Create a vector index named
- Hugging Face Account: For API access to the
mixedbread-ai/mxbai-embed-large-v1model. - Maven: To build and manage dependencies.
- Environment Variables:
ATLAS_CONNECTION_STRING: MongoDB Atlas connection string (e.g.,mongodb+srv://<user>:<pass>@<cluster>.mongodb.net/).HUGGING_FACE_ACCESS_TOKEN: Your Hugging Face API token.
- Clone the Repository:
git clone https://github.com/Jujuwryy/AIJobMatcher.git cd AIJobMatcher