Skip to content

Commit

Permalink
Bounding box restriction
Browse files Browse the repository at this point in the history
  • Loading branch information
ellenhp committed Feb 22, 2024
1 parent 9beb1ff commit e25dd6a
Show file tree
Hide file tree
Showing 866 changed files with 746 additions and 43,434 deletions.
51 changes: 29 additions & 22 deletions Cargo.lock

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

3 changes: 1 addition & 2 deletions Cargo.toml
Original file line number Diff line number Diff line change
@@ -1,9 +1,8 @@
[workspace]
resolver = "2"
members = [
"airmail", "airmail_common", "airmail_index", "airmail_service",
"airmail", "airmail_indexer", "airmail_service",
]


[profile.release]
debug = 1
3 changes: 1 addition & 2 deletions Dockerfile
Original file line number Diff line number Diff line change
Expand Up @@ -4,8 +4,7 @@ RUN apt update && apt install -y libssl-dev clang pkg-config

WORKDIR /usr/src/airmail
COPY ./airmail ./airmail
COPY ./airmail_common ./airmail_common
COPY ./airmail_index ./airmail_index
COPY ./airmail_indexer ./airmail_indexer
COPY ./airmail_service ./airmail_service
COPY ./Cargo.toml ./Cargo.toml
COPY ./Cargo.lock ./Cargo.lock
Expand Down
13 changes: 5 additions & 8 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,12 +1,12 @@
# 📫 Airmail 📫

Airmail is an extremely lightweight geocoder[^1] written in pure Rust. Built on top of [tantivy](https://github.com/quickwit-oss/tantivy), it offers a low memory footprint and fast indexing (the planet takes under 3 hours on my hardware). Airmail currently supports English queries based on place names and addresses in North American address formats. Other languages and address formats work, but have not been systematically tested.
Airmail is an extremely lightweight geocoder[^1] written in pure Rust. Built on top of [tantivy](https://github.com/quickwit-oss/tantivy), it offers a low memory footprint and fast indexing (index the planet in under 3 hours!). Airmail aims to support international queries in several languages, but in practice it's still very early days and there are definitely bugs preventing correct behavior.

[^1]: A geocoder is a search engine for places. When you type in "vegan donut shop" into your maps app of choice, a geocoder is what shows you nearby places that fit your query.

### Features

Airmail's killer feature is the ability to query remote indices, e.g. on S3. This lets you keep your index hosting costs fixed while you scale horizontally, and lowers the baseline costs associated with hosting a planet instance by around 2x-10x compared to other geocoders.
Airmail's killer feature is the ability to query remote indices, e.g. on S3. This lets you keep your index hosting costs fixed while you scale horizontally. The baseline cost of a global Airmail deployment is about $5 per month.

### Roadmap

Expand All @@ -21,13 +21,10 @@ Airmail's killer feature is the ability to query remote indices, e.g. on S3. Thi
- [x] Query remote indices.
- [x] Support and test planet-scale indices.
- [x] International address queries.
- [ ] Categorical search, e.g. "coffee shops near me".
- [ ] Bounding box biasing and restriction.
- [ ] Minutely updates?
- [x] Categorical search, e.g. "coffee shop seattle".
- [x] Typo tolerance (limited to >=8 character input tokens)
- [x] Bounding box biasing and restriction.
- [ ] Systematic/automatic quality testing in CI.
- [ ] Alternate results, e.g. returning Starbucks locations for "Dunkin Donuts" queries on the US west coast.[^2]

[^2]: This will likely need to be done with a vector database and some machine learning, and may have major hosting cost implications. TBD.

### License

Expand Down
8 changes: 6 additions & 2 deletions airmail/Cargo.toml
Original file line number Diff line number Diff line change
Expand Up @@ -8,11 +8,10 @@ edition = "2021"
[dependencies]
levenshtein_automata = "0.2.1"
s2 = "0.0.12"
tantivy = { version = "0.21.1", features = ["quickwit"] }
tantivy = "0.21.1"
tantivy-common = "0.6.0"
tantivy-fst = "0.4.0"
tempfile = "3.9.0"
airmail_common = { path = "../airmail_common" }
log = "0.4.20"
serde_json = "1"
serde = { version = "1", features = ["derive"] }
Expand All @@ -32,3 +31,8 @@ tantivy-jieba = "0.10.0"
cached = "0.48.1"
lazy_static = "1.4.0"
regex = "1.10.3"
geo = "0.27.0"

[features]
invasive_logging = []
remote_index = ["tantivy/quickwit"]
10 changes: 5 additions & 5 deletions airmail/src/directory/uffd.rs
Original file line number Diff line number Diff line change
Expand Up @@ -120,8 +120,8 @@ async fn fetch_and_resume(
"Critical: Failed to fetch chunk: {} after 5 attempts",
chunk_idx,
);
// Find something better to do here maybe?
panic!();
// They'll try again I guess?
uffd.wake(dst_ptr as *mut c_void, 4096).unwrap();
}

fn dont_need(page_start: usize) {
Expand All @@ -139,7 +139,7 @@ pub(crate) fn handle_uffd(uffd: Uffd, mmap_start: usize, _len: usize, artifact_u
let uffd = Arc::new(uffd);
let requested_pages = Arc::new(Mutex::new(HashSet::new()));
let chunk_cache: Arc<Mutex<LruCache<usize, Vec<u8>>>> =
Arc::new(Mutex::new(LruCache::new(NonZeroUsize::new(4).unwrap())));
Arc::new(Mutex::new(LruCache::new(NonZeroUsize::new(8).unwrap())));
let (sender, mut receiver): (Sender<usize>, Receiver<usize>) =
tokio::sync::broadcast::channel(100);
loop {
Expand All @@ -164,12 +164,12 @@ pub(crate) fn handle_uffd(uffd: Uffd, mmap_start: usize, _len: usize, artifact_u
addr,
thread_id,
} => {
debug!("Pagefault: {:?} {:?} {:?} {:?}", kind, rw, addr, thread_id);
trace!("Pagefault: {:?} {:?} {:?} {:?}", kind, rw, addr, thread_id);
let offset = addr as usize - mmap_start;
let chunk_idx = offset / CHUNK_SIZE;
trace!("Locking recent chunks to check for cached chunk");
if let Some(chunk) = chunk_cache.blocking_lock().get(&chunk_idx) {
debug!("Using cached chunk: {}", chunk_idx);
trace!("Using cached chunk: {}", chunk_idx);
let offset_into_chunk = offset % CHUNK_SIZE;
unsafe {
let _ = uffd.copy(
Expand Down
Loading

0 comments on commit e25dd6a

Please sign in to comment.