Extract structured job information from any job posting URL with AI-powered intelligence
- π Universal Job URL Support - Extract from LinkedIn, Indeed, Glassdoor, and more
- π€ AI-Powered Extraction - Uses Google Gemini 2.0 Flash for intelligent data parsing
- π·οΈ Advanced Web Scraping - Playwright-powered headless browser automation for reliable data extraction
- πΎ Job Saving & Management - Save and organize extracted job information
- π€ User Authentication - Secure login/register system with email-based accounts
- π± Responsive Design - Beautiful glassmorphism UI that works on all devices
- Glassmorphism Design - Modern, elegant UI with blur effects and gradients
- Real-time Feedback - Loading states, error handling, and success notifications
- Intuitive Interface - Clean, user-friendly design with smooth animations
- Mobile-First - Fully responsive design optimized for mobile devices
- Headless Browser Automation - Playwright for reliable web scraping across all job platforms
- RESTful API - Clean, well-documented API endpoints
- Database Integration - PostgreSQL/SQLite support with Django ORM
- Docker Support - Containerized deployment ready
- Production Ready - Configured for deployment on Render, Heroku, and more
JobScrape/
βββ Frontend/ # Modern JavaScript frontend
β βββ index.html # Main application interface
β βββ script.js # Core application logic
β βββ style.css # Glassmorphism styling
βββ Backend/ # Django REST API
βββ jobscrape/
βββ jobscaper_api/ # Django project settings
βββ scraper/ # Job scraping functionality
β βββ views.py # Playwright scraping logic
β βββ models.py # Job data models
β βββ serializers.py # API serializers
βββ userauth/ # User authentication system
βββ Dockerfile # Container configuration
βββ requirements.txt # Python dependencies
The application uses a sophisticated two-stage extraction process:
-
Playwright Scraping Stage
- Headless Chromium browser automation
- Dynamic content rendering and JavaScript execution
- Robust error handling and timeout management
- Cross-platform compatibility for all major job sites
-
AI Processing Stage
- Google Gemini 2.0 Flash for intelligent data extraction
- Structured JSON output formatting
- Context-aware information parsing
- Error recovery for incomplete data
- Python 3.11+
- Node.js (for frontend development)
- Google Gemini API key
- PostgreSQL (optional, SQLite for development)
-
Clone the repository
git clone <repository-url> cd JobScrape/Backend/jobscrape
-
Set up environment variables
# Create .env file SECRET_KEY=your-secret-key-here DEBUG=True GEMINI_API_KEY=your-gemini-api-key ALLOWED_HOSTS=localhost,127.0.0.1 DATABASE_URL=sqlite:///db.sqlite3 # or your PostgreSQL URL
-
Install dependencies
pip install -r requirements.txt
-
Run migrations
python manage.py migrate
-
Start the development server
python manage.py runserver
-
Navigate to frontend directory
cd ../../Frontend -
Open in browser
- Simply open
index.htmlin your browser - Or serve with a local server:
python -m http.server 8000
- Simply open
-
Configure API endpoint
- Update
API_BASE_URLinscript.jsto point to your backend
- Update
- Enter Job URL - Paste any job posting URL into the input field
- Extract Information - Click "Extract Job Info" to process the URL
- Review Results - View extracted job title, company, location, pay, etc.
- Save Job - Login/register to save jobs to your personal collection
- Manage Jobs - View, organize, and delete saved jobs
- β LinkedIn
- β Indeed
- β Glassdoor
- β Monster
- β ZipRecruiter
- β And many more!
POST /api/register/- User registrationPOST /api/login/- User login
POST /api/fetch/- Extract job information from URLGET /api/jobs/?email={email}- Get user's saved jobsPOST /api/jobs/- Save a jobDELETE /api/jobs/{id}/?email={email}- Delete a job
GET /- Health check endpoint
- Glassmorphism - Modern glass-like effects with backdrop blur
- Gradient Backgrounds - Beautiful animated gradients
- Smooth Animations - CSS transitions and keyframe animations
- Responsive Layout - Mobile-first design approach
- Floating Cards - Glassmorphism cards with hover effects
- Animated Buttons - Gradient buttons with hover states
- Loading States - Spinning animations and progress indicators
- Error Handling - User-friendly error messages and notifications
- Intelligent Extraction - AI-powered job information parsing
- Structured Output - Consistent JSON format for extracted data
- Multi-platform Support - Works across different job posting formats
- Error Recovery - Handles malformed or incomplete job listings
- Headless Browser Automation - Reliable scraping using Chromium browser
- Dynamic Content Support - Handles JavaScript-rendered job listings
- Cross-Platform Compatibility - Works with LinkedIn, Indeed, Glassdoor, and more
- Robust Error Handling - Graceful fallbacks for failed scraping attempts
- Timeout Management - Configurable timeouts for different job platforms
- Content Extraction - Extracts both HTML content and visible text for AI processing
- Job Title
- Company Name
- Location
- Job Type (Remote, On-site, Hybrid)
- Salary/Pay Information
- Platform Source
- Job Description (when available)
# Required
SECRET_KEY=your-production-secret-key
GEMINI_API_KEY=your-gemini-api-key
ALLOWED_HOSTS=your-domain.com,www.your-domain.com
# Optional
DATABASE_URL=postgresql://user:pass@host:5432/dbname
DEBUG=False- Render - Easy deployment with automatic builds
- Heroku - Cloud platform with PostgreSQL support
- DigitalOcean - VPS deployment with Docker
- AWS - Scalable cloud infrastructure
Update script.js to point to your backend:
const API_BASE_URL = 'https://your-backend-domain.com/api';Key settings in settings.py:
ALLOWED_HOSTS- Configure for your domainCORS_ALLOWED_ORIGINS- Frontend domain for CORSDATABASES- Database configurationSTATIC_ROOT- Static files location
The application uses Playwright for robust web scraping:
# Example scraping implementation from scraper/views.py
async def extract_job_data(self, url):
async with async_playwright() as p:
browser = await p.chromium.launch(headless=True)
page = await browser.new_page()
await page.goto(url, timeout=60000)
content = await page.content()
visible_text = await page.inner_text('body')
await browser.close()
# Send extracted content to Gemini AI for processing
return self.call_gemini(visible_text)Key Features:
- Headless Mode - Runs without GUI for server deployment
- Timeout Handling - 60-second timeout for slow-loading pages
- Content Extraction - Both HTML and visible text for AI processing
- Browser Management - Automatic cleanup and resource management
# Run Django tests
python manage.py test
# Run specific app tests
python manage.py test scraper
python manage.py test userauth- Manual testing with different job URLs
- Browser compatibility testing
- Mobile responsiveness testing
- Test scraping with various job platforms (LinkedIn, Indeed, Glassdoor)
- Verify timeout handling for slow-loading pages
- Test error recovery for failed scraping attempts
- Validate AI processing of scraped content
- Email-based user authentication
- Secure password hashing
- CORS configuration for frontend
- Input validation and sanitization
- SQL injection prevention with Django ORM
- Caching - Database query optimization
- Static Files - WhiteNoise for static file serving
- Database Indexing - Optimized database queries
- CDN Ready - Static assets ready for CDN deployment
- Scraping Optimization - Efficient Playwright browser management
- Health check endpoints
- Error logging and monitoring
- Performance metrics tracking
- Scraping success rate monitoring
- Fork the repository
- Create a feature branch
- Make your changes
- Add tests if applicable
- Submit a pull request
- Follow PEP 8 for Python code
- Use ESLint for JavaScript
- Maintain consistent formatting
This project is licensed under the MIT License - see the LICENSE file for details.
- Google Gemini AI - For intelligent job information extraction
- Playwright - For reliable web scraping and browser automation
- Django - For the robust backend framework
- Glassmorphism Design - For the beautiful UI inspiration
- Issues - Report bugs and feature requests on GitHub
- Documentation - Check the code comments for detailed explanations
- Community - Join our community discussions
Made with β€οΈ by Tobi
Transform your job search with AI-powered intelligence!