Version 3.1 | Docker Hub | User Guide
A system for downloading, processing, and visualizing Sentinel-2 satellite embeddings (2018-2025) with an interactive web interface.
TEE integrates geospatial data processing with deep learning embeddings to create an interactive exploration platform. The system:
- Downloads Tessera embeddings from GeoTessera for multiple years
- Processes embeddings into RGB visualizations and pyramid tile structures
- Extracts vectors for efficient similarity search
- Visualizes embeddings through an interactive web-based viewer
- Enables temporal analysis by switching between years
- Download embeddings for years 2018-2025 (depending on data availability)
- Select which years to process during viewport creation
- Switch between years instantly in the viewer
- Temporal coherence in similarity search through year-specific vector data
- Zoomable, pannable map interface using Leaflet.js
- Real-time embedding visualization with year selector
- Pixel-level extraction of embeddings
- Similarity search to find matching locations across the viewport
- Create custom geographic viewports interactively
- Landmark/geocode search — type a place name (e.g. "London") to jump the map and auto-fill the viewport name
- Direct coordinate input — enter lat/long coordinates (e.g. "51.5074, -0.1278")
- Click-to-lock preview box — 5km box follows the mouse, locks on click, repositionable
- Multi-year processing with progress tracking
- Automatic navigation to viewer after processing
- Full cleanup on cancel/delete — removes mosaics, pyramids, vectors, and cached embeddings tiles; shared tiles used by other viewports are preserved
- Click pixels on the embedding map to extract embeddings
- All similarity search runs locally in the browser — no queries sent to server
- Vector data (embeddings + coordinates) downloaded once and cached in IndexedDB
- Brute-force L2 search over ~250K vectors completes in ~100-200ms
- Real-time threshold slider for instant local filtering
- Labels and search are fully private — only tile images are fetched from the server
- Auto-cluster the viewport into k groups using K-means on the embedding space — runs entirely in a Web Worker
- Segmentation results appear as a temporary preview overlay with a floating panel
- Promote individual clusters (or all at once) to permanent saved labels with full metadata (embedding, source pixel, threshold)
- Promoted labels support timeline analysis, cross-viewport re-matching, and all other label features
- Upload a ground-truth shapefile (.zip) with expert habitat labels
- Select a class field and choose classifiers: k-NN, Random Forest, XGBoost, MLP
- Tunable hyperparameters per classifier (expand with
...button):- k-NN: k (1–50), weights (uniform/distance)
- Random Forest: number of trees (10–500), max depth
- XGBoost: boosting rounds (10–500), max depth (1–15), learning rate (0.01–1.0)
- MLP: hidden layer architecture (64,32 / 128,64 / 256,128,64), max iterations (50–1000)
- Configurable max training pixels (default 10,000) — increase up to 100,000 for denser ground truth
- Runs stratified learning-curve evaluation server-side with log-spaced training sizes (10 up to max) and 5 random repeats each
- Results rendered as a Chart.js line chart with macro-average F1 vs training pixels (log scale) and shaded ±1 std bands
- Ground-truth polygons are overlaid on the satellite panel in red with hover tooltips showing class labels
- Useful for benchmarking how well Tessera embeddings separate habitat classes at different sample sizes
- Track how label coverage changes over time — click "Timeline" on any saved label to see pixel counts across all available years (2018–2025)
- Uses the label's stored embedding and threshold for consistent comparison
- Results displayed in a modal with a proportional bar chart (colored with the label's color) and a percentage change summary (e.g. "33% decrease from 2019 to 2023")
- Loads each year's vector data from IndexedDB cache (or downloads in background) without disrupting the current session
- All computation stays client-side — label privacy is preserved
The viewer includes a 6-panel layout toggle for advanced analysis:
- OSM — OpenStreetMap geographic reference
- RGB — Satellite imagery with label painting tools
- Embeddings Y1 — First year embeddings with similarity search
- PCA / UMAP — Dimensionality reduction of embedding space (PCA computed in-browser, UMAP server-side)
- Heatmap — Temporal distance heatmap (Y1 vs Y2 pixel-by-pixel differences)
- Embeddings Y2 — Second year embeddings for temporal comparison
A Validation mode replaces the bottom row with a controls panel and a learning-curve chart for evaluating classifier performance on uploaded ground-truth shapefiles.
Key capabilities: one-click similarity search, real-time threshold control, persistent colored label overlays, cross-panel synchronized markers, UMAP visualization with satellite RGB coloring, temporal distance heatmap, year-based label updates, cross-year label timeline analysis, and ground-truth validation with learning curves.
Labels are stored in browser localStorage (private, survive reloads). Labels can be exported/imported as compact JSON files for sharing — they are portable across viewports since matching uses embedding distance, not coordinates.
A consolidated Export dropdown provides three formats:
- Labels (JSON) — compact metadata for sharing and re-importing into TEE
- Labels (GeoJSON) — FeatureCollection with 10m polygons per pixel, aligned to zoom-18 Mercator projection for pixel-perfect overlay in QGIS/GIS tools. Properties include
label_name,label_color,distance, andthreshold. - Map (JPG) — high-resolution satellite image with label overlays and legend, rendered at zoom level 18
- Python 3.8+ (or Docker)
- ~5GB storage per viewport (varies by number of years)
-
Install Docker Desktop:
- Mac:
brew install --cask dockeror download from docker.com - Windows/Linux: Download from docker.com
- Mac:
-
Pull and run from Docker Hub (easiest):
docker pull sk818/tee:stable docker run -d --name tee --restart unless-stopped \ -p 8001:8001 -v /data:/data -v /data/viewports:/app/viewports \ sk818/tee:stableManagement (users, quotas, updates):
docker cp tee:/app/scripts/manage.sh ~/manage.sh && chmod +x ~/manage.sh sudo ./manage.sh
Or build from source:
git clone https://github.com/sk818/TEE.git tee cd tee docker build -t tee . docker run -p 8001:8001 -v ~/tee_data:/data tee
Or with docker-compose:
docker-compose up -d
-
Open browser: Navigate to http://localhost:8001
-
Clone the repository:
git clone https://github.com/sk818/TEE.git tee cd tee -
Create and activate virtual environment:
python3 -m venv venv source venv/bin/activate # On Windows: venv\Scripts\activate
-
Install dependencies:
pip install -r requirements.txt
-
Start the server:
bash restart.sh
Web server on http://localhost:8001 (serves both API and tiles).
-
Create a viewport: Open http://localhost:8001, click "+ Create New Viewport", search for a location or click the map, select years, and click Create.
| Local (single machine) | Server (VM behind Apache) | |
|---|---|---|
| Setup | bash restart.sh |
sudo bash deploy.sh then sudo bash restart.sh |
| User | Your user | tee system user |
| Data | ~/data/ |
/home/tee/data/ |
| Logs | ./logs/ |
/var/log/tee/ |
| Binding | 0.0.0.0 (direct access) |
127.0.0.1 (Apache proxies) |
| Tiles | Served on :8001 (same process) |
Apache proxies everything to :8001 |
| HTTPS | N/A | Apache handles TLS; set TEE_HTTPS=1 |
restart.sh auto-detects the environment: if a tee system user exists, services run as tee with server settings; otherwise they run as the current user in local mode. No code changes needed between server and laptop.
bash restart.sh
# Web server on http://localhost:8001 (waitress — serves API, tiles, and static files)Data is stored in ~/data/ by default (override with TEE_DATA_DIR). Logs go to ./logs/.
First-time setup:
cd /opt
sudo git clone https://github.com/sk818/TEE.git tee
cd /opt/tee
sudo bash deploy.sh # Creates tee user, venv, data dirs
sudo -u tee /opt/tee/venv/bin/python3 scripts/manage_users.py add admin
sudo bash restart.sh # Start services
curl http://localhost:8001/health # VerifyDay-to-day operations:
cd /opt/tee
sudo git pull && sudo bash restart.sh # Update and restart
sudo bash shutdown.sh # Stop services
bash status.sh # Check status
tail -f /var/log/tee/web_server.log # View logsThe viewer uses relative URLs, so it works identically behind a local or remote server. Configure your reverse proxy to forward all traffic to Django/waitress on port 8001 — API, tiles, and static files are all served from a single process.
TEE supports optional per-user authentication. When enabled, unauthenticated users can browse in read-only demo mode with a Login button in the header. Logged-in users see their username, a Change Password button, and a Logout button.
Authentication is controlled by the presence of a passwd file in the data directory (/data/passwd). If no passwd file exists, auth is disabled and all users have open access with no quota limits.
Copy the management script out of the container once, then use it to manage everything:
docker cp tee:/app/scripts/manage.sh ~/manage.sh && chmod +x ~/manage.sh
sudo ./manage.shThis gives an interactive menu:
TEE Management
1) List users
2) Add user
3) Remove user
4) Set quota
5) Update container
6) Exit
The script manages the /data/passwd file directly on the host and uses the Docker image to generate bcrypt password hashes — no extra dependencies needed.
Remove all users via the management script (option 3), or delete the passwd file:
rm /data/passwdWhen the last user is removed, the script deletes the passwd file automatically, returning to open access. No server restart is needed — the passwd file is re-read on every request.
The admin user has special privileges:
- No disk quota — can create viewports without size limits
- All other users default to a 2 GB disk quota (configurable per user)
Each non-admin user has a disk quota for viewport data (default 2 GB). Set per-user quotas via the management script (option 4) — accepts values like 4G, 512M, or bare MB.
The quota is stored as an optional third field in the passwd file:
admin:$2b$05$hash
user2:$2b$05$hash:4096
Logged-in users can change their password via the Password button in the header. Passwords must be at least 6 characters.
When deploying behind HTTPS, set TEE_HTTPS=1 to mark session cookies as secure:
export TEE_HTTPS=1| Variable | Default | Description |
|---|---|---|
TEE_DATA_DIR |
~/data |
Data directory (mosaics, pyramids, vectors, passwd) |
TEE_APP_DIR |
Project root | Application directory (auto-detected from lib/config.py) |
TEE_MODE |
desktop |
desktop (DEBUG=True) or production (DEBUG=False, security headers) |
TEE_HTTPS |
unset | Set to 1 to mark session cookies as Secure (for HTTPS) |
GEOTESSERA_API_KEY |
— | GeoTessera API credentials (if required) |
Modify viewports/{name}.txt to customize preset viewports:
name: My Viewport
description: Optional description
bounds: 77.55,13.0,77.57,13.02
The system processes satellite embeddings through five main stages with parallel multi-year processing. All pipeline execution flows through lib/pipeline.py::PipelineRunner, providing consistent behavior for both web-based and CLI entry points.
./venv/bin/python3 setup_viewport.py --years 2023,2024,2025 --umap-year 2024This runs the full pipeline: download → RGB → pyramids → vectors → UMAP. PCA is computed client-side in the browser (no pipeline stage needed).
Or use the web interface: bash restart.sh, open http://localhost:8001, click "+ Create New Viewport", select years and click Create. Processing runs in the background with status tracking.
Each stage processes all selected years in parallel:
python3 download_embeddings.py --years 2019,2021,2025- Downloads Sentinel-2 embeddings from GeoTessera (all years concurrently)
- Saves as GeoTIFF files in
~/data/mosaics/
python3 create_rgb_embeddings.py- Converts 128D embeddings to RGB using the first 3 bands
- Outputs to
~/data/mosaics/rgb/
python3 create_pyramids.py- Creates multi-level zoom pyramids (0-5) with 3x nearest-neighbor upscaling
- Viewer becomes available once ANY year has pyramids
- Output:
~/data/pyramids/{viewport}/{year}/
python3 extract_vectors.py- Extracts vectors from embeddings for all years
- Labeling controls become available once ANY year has vectors
- Output:
~/data/vectors/{viewport}/{year}/
python3 compute_umap.py {viewport_name} {year}- Computes 2D UMAP projection (~1-2 min for 264K embeddings)
- Used by the 6-panel layout (Panel 4)
- UMAP visualization becomes available once computed
- Output:
~/data/vectors/{viewport}/{year}/umap_coords.npy
| Stage | Feature | Available When |
|---|---|---|
| After Stage 3 (Pyramids) | Basic viewer with maps | ANY year has pyramids |
| After Stage 4 (Vectors) | Labeling/similarity search, PCA (Panel 4) | ANY year has vectors |
| After Stage 5 (UMAP) | UMAP visualization (Panel 4) | UMAP computed for any year |
Check pipeline status via:
curl http://localhost:8001/api/operations/pipeline-status/{viewport_name}List all viewports:
GET /api/viewports/list
Get current viewport:
GET /api/viewports/current
Switch viewport:
POST /api/viewports/switch
Content-Type: application/json
{"name": "viewport_name"}
Create new viewport:
POST /api/viewports/create
Content-Type: application/json
{
"bounds": "min_lon,min_lat,max_lon,max_lat",
"name": "My Viewport",
"years": ["2021", "2024"] // Optional: default is [2024]
}
Check viewport readiness:
GET /api/viewports/{viewport_name}/is-ready
Returns: {ready: bool, message: string, has_embeddings: bool, has_pyramids: bool, has_vectors: bool, has_umap: bool, years_available: [string]}
Get available years:
GET /api/viewports/{viewport_name}/available-years
Returns: {success: bool, years: [2024, 2023, ...]}
Check auth status:
GET /api/auth/status
Returns: {auth_enabled: bool, logged_in: bool, user: string|null}
Log in:
POST /api/auth/login
Content-Type: application/json
{"username": "admin", "password": "secret"}
Log out:
POST /api/auth/logout
Change password (requires active session):
POST /api/auth/change-password
Content-Type: application/json
{"current_password": "old", "new_password": "new"}
Upload ground-truth shapefile:
POST /api/evaluation/upload-shapefile
Content-Type: multipart/form-data
file: <.zip containing .shp/.dbf/.shx/.prj>
Returns: {fields: [{name, unique_count, samples}], geojson: <GeoJSON>}
Run learning-curve evaluation:
POST /api/evaluation/run
Content-Type: application/json
{
"viewport": "cumbria",
"year": "2024",
"field": "Group_Name",
"classifiers": ["nn", "rf", "xgboost", "mlp"],
"max_train": 10000,
"params": {
"nn": {"n_neighbors": 5, "weights": "uniform"},
"rf": {"n_estimators": 100, "max_depth": null},
"xgboost": {"n_estimators": 100, "max_depth": 6, "learning_rate": 0.3},
"mlp": {"hidden_layers": "64,32", "max_iter": 200}
}
}
Returns: {training_sizes, classifiers: {<name>: {mean_f1, std_f1}}, classes, total_labelled_pixels, elapsed_seconds, models_available}
Download trained model:
GET /api/evaluation/download-model/<classifier>
Returns: .joblib file containing {model, class_names} trained on all labelled data
TEE/
├── README.md # This file
├── requirements.txt # Python dependencies
├── Dockerfile # Docker container definition
├── docker-compose.yml # Docker Compose configuration
│
├── deploy.sh # First-time VM setup (creates tee user, venv, dirs)
├── restart.sh # Start/restart web + tile servers
├── shutdown.sh # Stop all servers
├── status.sh # Show project status (git, data, services)
│
├── manage.py # Django management script
├── tee_project/ # Django project settings
│ ├── settings/ # Split settings (base, desktop, production)
│ ├── urls.py # Root URL configuration
│ └── wsgi.py # WSGI entry point (used by waitress)
│
├── api/ # Django app — API endpoints
│ ├── middleware.py # Auth middleware (passwd file + sessions)
│ ├── auth_views.py # Login/logout/status/change-password
│ ├── tasks.py # Background task tracking
│ ├── helpers.py # Shared utilities
│ └── views/ # Endpoint modules
│ ├── viewports.py # Viewport CRUD and status
│ ├── pipeline.py # Downloads and processing
│ ├── compute.py # UMAP, distance heatmap
│ ├── tiles.py # Tile serving with LRU cache and ETag support
│ ├── vector_data.py # Vector data serving
│ ├── evaluation.py # Validation: shapefile upload, learning curves
│ └── config.py # Health, static files, client config
│
├── public/ # Web interface
│ ├── viewer.html # Embedding viewer (3-panel and 6-panel layouts)
│ ├── viewport_selector.html # Viewport creation and management
│ ├── login.html # Login page
│ └── README.md # Frontend documentation
│
├── scripts/ # Management scripts
│ └── manage_users.py # Add/remove/list users for authentication
│
├── lib/ # Python utilities
│ ├── config.py # Centralized configuration (paths, env vars)
│ ├── pipeline.py # Unified pipeline orchestration
│ ├── viewport_utils.py # Viewport file operations
│ ├── viewport_writer.py # Viewport configuration writer
│ └── progress_tracker.py # Progress tracking utilities
│
├── viewports/ # Viewport configurations (user-created, gitignored)
│ └── README.md # Viewport directory documentation
│
├── download_embeddings.py # GeoTessera embedding downloader
├── create_rgb_embeddings.py # Convert embeddings to RGB
├── create_pyramids.py # Build zoom-level pyramid structure
├── extract_vectors.py # Extract vectors for similarity search
├── compute_umap.py # Compute UMAP projection
└── setup_viewport.py # Orchestrate full workflow
Download specific years only:
python3 download_embeddings.py --years 2023,2024Process single viewport: Set the active viewport first, then run pipeline scripts.
- Check if port 8001 is in use:
lsof -i:8001 - Check logs:
tail logs/web_server.log(local) ortail /var/log/tee/web_server.log(server)
- If map tiles fail to load, restart the server:
bash restart.sh
- Cancelling or deleting a viewport now automatically cleans up cached embeddings tiles in
~/data/embeddings/ - Tiles shared with other viewports are preserved
- To manually clear all embeddings caches (when no viewports need them):
rm -rf ~/data/embeddings/global_0.1_degree_representation/
- Verify pyramids exist:
ls ~/data/pyramids/{viewport}/{year}/ - Check vectors:
ls ~/data/vectors/{viewport}/{year}/ - Re-run
create_pyramids.pyorextract_vectors.pyas needed
- Check vectors were extracted for the selected year
- Reduce similarity threshold for faster results
- Process fewer years per viewport
- Verify embeddings were downloaded:
ls ~/data/mosaics/*_{year}.tif - Confirm pyramids exist for that year
- Check that vectors were extracted
Memory & storage:
- ~550MB steady state, ~850MB peak during pipeline processing
- ~150-300MB per year per viewport for embeddings; ~500MB-1GB per year for pyramid tiles
Typical processing times:
| Stage | Time (per year) | Notes |
|---|---|---|
| Download embeddings | 5-15 min | All years download in parallel |
| Create RGB | 2-5 min | All years process in parallel |
| Build pyramids | 5-10 min | All years process in parallel |
| Extract vectors | 5-15 min | All years process in parallel |
| Total | 17-45 min | Same time for 1 year or 8 years |
Multiple years are downloaded and processed concurrently — total time is approximately the same whether you request 1 year or 8 years. Features become available incrementally as each stage completes (see Incremental Feature Availability).
MIT License - See LICENSE file for details
- S. Keshav - Primary development and design
- Claude Opus 4.6 - AI-assisted development and feature implementation
For issues or questions:
- Check the troubleshooting section
- Review server logs:
/var/log/tee/web_server.log(server) orlogs/web_server.log(local) - Verify data files exist in
~/data/ - Check browser console for JavaScript errors
Thanks to Julia Jones (Bangor), David Coomes (Cambridge), Anil Madhavapeddy (Cambridge), and Sadiq Jaffer (Cambridge) for their insightful feedback on half-baked versions of the code.
If you use this project in research, please cite:
@software{tee2025,
title={TEE: Tessera Embeddings Explorer},
author={Keshav, S. and Claude Opus 4.6},
year={2025},
url={https://github.com/sk818/TEE}
}

