Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use of https://photon.komoot.io #614

Open
lonvia opened this issue Jan 2, 2025 · 25 comments
Open

Use of https://photon.komoot.io #614

lonvia opened this issue Jan 2, 2025 · 25 comments

Comments

@lonvia
Copy link

lonvia commented Jan 2, 2025

I'm the administrator for https://photon.komoot.io. While reviewing our server logs recently, I've noticed that dawarich is making rather intense use of the service. Looking at the logs from the past couple of days, I see about 60% of all requests coming from your application. This is slightly beyond what can be called 'fair use'.

I see that you already provide your own Photon instance for Patreon supporters. That's great. I've also seen that you rate limit access to photon.komoot.io. Awesome. Sadly, it doesn't look like it is enough. The server behind https://photon.komoot.io is really just one tiny server and won't be able to keep up with this kind of traffic forever. Any chance to actively discourage use of photon.komoot.io and instead provide people with the instructions to set up their own server?

One observation: It looks like you really only need city/country data from the reverse geocode. That means you could work with a geocoding database that is orders of magnitude smaller than the full Photon database. I've recently experimented with a sqlite-based Nominatim database. This application looks like it could be a good usecase for it. Simply run a tiny Nominatim server with a admin sqlite DB in your docker image and use that.

@solderdot72
Copy link

solderdot72 commented Jan 2, 2025

Hi lonvia,

I'm one of the users of DaWarIch and therefore I feel guilty for imposing part of that load :-(

I'd love to switch to a self-hosted server, but I'm just a hobbyist running a Raspberry Pi. All instructions for setting up such a server I found so far indicate that running a reverse geocoding server requires a big storage and a lot of comuting power - way more than a Raspi can handle. I also found instructions allowing to limit to a single country, thus tremendously reducing the amount of storage required.

Limiting to a single country is no option for me, but I can narrow down the list of countries to 10-20. Unfortunately I did not find a tutorial describing how to set-up a server for a hand-selected list of countries.

Since you state that the information provided by the reverse geocoding server is only partially required, so only a small part of the database is required, this should further reduce the demands.

If I knew how to set-up such a low-profile self-hosted server I will gladly give it a try. Can you show me that or indicate a place on the web where I can educate myself, empowering me to do so, preferably in a Docker/Portainer environment?

@lonvia
Copy link
Author

lonvia commented Jan 3, 2025

This needs a bit of a longer explanation. I'm working under the assumption here that dawarich indeed only uses city/state/country information from the reverse geocoding result. If it needs more fine-grained information then the picture changes slightly.

Creating your own database is not entirely trivial and you indeed need a machine with a bit more computing power. Running the geocoding on a ready-made database, however, only needs a bit of storage space. It is not very expensive in terms of CPU-power and will happily run on a Raspberry Pi, at least for the amount of requests that dawarich will produce. Luckily for us city/state/country information doesn't really change a l lot. You don't really need to update your geodatabase. So, to get your own private geocoder, you need to find a powerful machine once and produce the database. Then copy over to your Pi and use it forever.

To create a geocoding database from OpenStreetMap data, you need to use Nominatim. Photon doesn't do that itself. It only exports data from a Nominatim database. You can install Nominatim easily via Docker. Given that we are only interested in city-level data, you can configure Nominatim to use the admin style. It's so little data, it will be done in a couple of hours and you can probably run it on your laptop if you happen to have 500GB of space left. You can further reduce time and size by running the import with --reverse-only and --no-updates.

As a next step, you could create a Photon database from the Nominatim database. However, I wouldn't recommend that. Photon only imports point geometries, it doesn't know about the whole area of towns and cities. That is really bad for reverse geocoding because it has to make educated guesses about the closest city. Nominatim is much better here. It keeps the whole area geometries. That's why I was suggesting SQLite. Dump Nominatim's database into an sqlite file by following these instructions. Then:

  • copy that file over to your Pi
  • install nominatim with virtualenv nominatim-venv; ./nominatim-venv/bin/pip install nominatim-api falcon uvicorn
  • point Nominatim to your sqlite database: echo NOMINATIM_DATABASE_DSN=sqlite:dbname=mydb.sqlite > .env
  • fire up uvicorn to create a Nominatim server: /nominatim-venv/bin/uvicorn --host 127.0.0.1 --port 8080 --factory nominatim_api.server.falcon.server:run_wsgi
  • point dawarich to the internal service

(Disclaimer: I haven't tested these instructions, so please consult the Nominatim and uvicorn documentation for details.)

The SQLite file will be about 12GB. That should be managable for a Rhasberry Pi. If it is still to large, then Nominatim's style can be streamlined even more.

@Freika
Copy link
Owner

Freika commented Jan 3, 2025

@lonvia thank you for reaching out about the increased usage of photon.komoot.io. First of all, I would like to say huge thanks for it. It's been a really important service for Dawarich to rely on.

I'd be happy to somehow reduce the usage of the service in Dawarich, but I don't really know how to do that if I'm going to keep the reverse geocoding feature. I'm working on some features to utilize more than country and city information, so data on addresses and organizations is also useful for Dawarich.

What's already done:

What can be done:

  • I have some ideas on how to reduce the number of reverse geocoding requests from Dawarich
  • I'll make an announcement in release notes of one of the next releases to highlight the problem and encourage people to set up their own photon instances
  • I'll update the page on reverse geocoding on the website, to highlight the problem there too
  • I'll also research if there are any alternatives to Photon that could be integrated into Dawarich

If you have any other ideas, please let me know, I'd be happy to help reducing load to photon.komoot.io from Dawarich

@hopeseekr
Copy link

Is this why it took 35 days to import my Records.json? If so, i'd gladly host or donate or do whatever to have one running locally...

Also, this is probably a one-time bump because of the Google Timeline shutdown.

@lonvia
Copy link
Author

lonvia commented Jan 5, 2025

If you have a Linux machine with 16GB RAM(*) and 200GB SSD to spare, running your own Photon is super-easy:

  • download and unpack the Photon planet dump
  • download the Photon jar
  • install OpenJVM: sudo apt install openjdk-17-jdk-headless (replace '17' with whatever your OS has to offer)
  • run it: java -jar photon-0.6.1.jar
  • point Dawarich to http:\\localhost:2322 (or whereever your machine is located)

(*) The usual recommendation for a planet-size DB is 128GB RAM but that's in order to get reasonable throughput. When running a private instance that is only supposed to do a bit of reverse geocoding, you can get away with significantly less memory.

@raphpa
Copy link

raphpa commented Jan 5, 2025

There is also a dockerized version:
https://github.com/rtuszik/photon-docker

Works fine for me, takes about 1.4GB of RAM and pretty much no CPU load

@cosmindv
Copy link

cosmindv commented Jan 5, 2025

Hello, first of all an excellent work and documentation, my guess is the huge usage might be related to the fact more people had the free time this holiday to try this alternative to google Timeline. I would like to make use of a separate docker instance and implement https://github.com/rtuszik/photon-docker in order to not contribute to the high load.

I am using Synology to host Dawarich and don't have much experience with it.
I can't find any steps on how to load the modified docker-compose after i add my own PHOTON_API_HOST W/O loosing all the progress and data.

@joaoferreira-git
Copy link

joaoferreira-git commented Jan 6, 2025

There is also a dockerized version: https://github.com/rtuszik/photon-docker

Works fine for me, takes about 1.4GB of RAM and pretty much no CPU load

Just finished installing this and switching Dawarich to use the local photon and its zooming throught requests and its on spinning rust in my unraid server.

If someone is going to do this just make sure to add PHOTON_API_USE_HTTPS=false in the docker envs of the dawarich container otherwise it will try to use HTTPS in a HTTP Endpoint.

@ragnarkarlsson
Copy link

I've spun up a photon sidecar container to do my list and help reduce the influx, but I would like to be confident it's actually using local photon and not komoot.io - can this be easily determined? Should I see it in my sidekiq containers logs?

@Freika
Copy link
Owner

Freika commented Jan 6, 2025

@ragnarkarlsson In the console (https://dawarich.app/docs/FAQ#how-to-enter-dawarich-console), run Geocoder.config to see the configuration. It should look somewhat like this:

{:timeout=>5,
 :lookup=>:photon,
 :ip_lookup=>:ipinfo_io,
 :language=>:en,
 :http_headers=>{"X-Api-Key"=>"xxx"},
 :use_https=>true,
 :http_proxy=>nil,
 :https_proxy=>nil,
 :api_key=>nil,
 :basic_auth=>{},
 :logger=>:kernel,
 :kernel_logger_level=>2,
 :always_raise=>:all,
 :units=>:km,
 :distances=>:linear,
 :cache=>#<Redis client v5.3.0 for redis://dawarich_redis:6379>,
 :cache_prefix=>nil,
 :cache_options=>{:expiration=>1 day},
 :photon=>{:use_https=>true, :host=>"photon.dawarich.app"}}

Then, to test if it is working, you can do the following:

point = Point.last

Geocoder.search([point.latitude, point.longitude])

The response should contain some geocoding data and should not throw an error

@alternativesurfer
Copy link

alternativesurfer commented Jan 6, 2025

If you have a Linux machine with 16GB RAM(*) and 200GB SSD to spare, running your own Photon is super-easy:

  • download and unpack the Photon planet dump
  • download the Photon jar
  • install OpenJVM: sudo apt install openjdk-17-jdk-headless (replace '17' with whatever your OS has to offer)
  • run it: java -jar photon-0.6.1.jar
  • point Dawarich to http:\\localhost:2322 (or whereever your machine is located)

(*) The usual recommendation for a planet-size DB is 128GB RAM but that's in order to get reasonable throughput. When running a private instance that is only supposed to do a bit of reverse geocoding, you can get away with significantly less memory.

No need to even put it on an SSD
I have mine running in docker following these instructions: https://github.com/rtuszik/photon-docker with the data directory pointed at a spinning rust file server mount. The docker container is only using appx 7GB on my SSD datastores while this data can sit on my cheap storage.

The uncompressed planet-size DB is 169GB, for those interested.

Throughput seems decent enough (I don't know how to actually gauge that).

@ragnarkarlsson
Copy link

@ragnarkarlsson In the console (https://dawarich.app/docs/FAQ#how-to-enter-dawarich-console), run Geocoder.config to see the configuration. It should look somewhat like this:

{:timeout=>5,
 :lookup=>:photon,
 :ip_lookup=>:ipinfo_io,
 :language=>:en,
 :http_headers=>{"X-Api-Key"=>"xxx"},
 :use_https=>true,
 :http_proxy=>nil,
 :https_proxy=>nil,
 :api_key=>nil,
 :basic_auth=>{},
 :logger=>:kernel,
 :kernel_logger_level=>2,
 :always_raise=>:all,
 :units=>:km,
 :distances=>:linear,
 :cache=>#<Redis client v5.3.0 for redis://dawarich_redis:6379>,
 :cache_prefix=>nil,
 :cache_options=>{:expiration=>1 day},
 :photon=>{:use_https=>true, :host=>"photon.dawarich.app"}}

Then, to test if it is working, you can do the following:

point = Point.last

Geocoder.search([point.latitude, point.longitude])

The response should contain some geocoding data and should not throw an error

Wonderful, thank you I'm not confident I'm running local 100% 👍🏻

@Freika
Copy link
Owner

Freika commented Jan 7, 2025

@lonvia what's done so far:

  • Announcement posted in release notes, explaining the situation and encouraging using self-hosted photon
  • In-app notification created with the same text
  • Support implemented for Geoapify as an alternative reverse geocoding service
  • Instructions updated on the Reverse Geocoding page on Dawarich website
  • Photon-specific env vars removed from default docker-compose.yml to encourage users to make their own decision

Hopefully, this will help, and I'll work on support for more reverse geocoding providers in the future.

@makanimike
Copy link

I also think this is probably a one-time rush. Christmas break + dawarich users importing years' worth of google maps data.
I have been importing 13 years of google maps data the last couple of weeks.

Regardless, I will aim to fire up my own instance on the weekend to do my part.

Thank you to both parties for the solution-focused communication.

@tbelway
Copy link

tbelway commented Jan 17, 2025

I built my own container that I deployed for local use, I could provide instructions to assist reduce the workload on that photon public instance. Give me a few minutes to sanitize my work.

:edit:
Added it below, it's also pipeline ready if anyone has their own git (gitlab, gitea, wtv) instance with pipelines.
Should work with both public and private registries too, so theoretically someone can host this on docker or wtv and it would just work too.

@tbelway
Copy link

tbelway commented Jan 17, 2025

entrypoint.sh

#!/bin/bash
# Replace REGISTRY_NAME and REGISTRY_NAMESPACE with your instances
USER_AGENT="docker: $REGISTRY_NAME/$REGISTRY_NAMESPACE"

# Download elasticsearch index
if [ ! -d "/photon/photon_data/elasticsearch" ]; then
    echo "Downloading search index"

    # Let graphhopper know where the traffic is coming from
    which bzip2
    wget --user-agent="$USER_AGENT" -O - http://download1.graphhopper.com/public/photon-db-latest.tar.bz2 | bzip2 -cd | tar x
fi

# Start photon if elastic index exists
if [ -d "/photon/photon_data/elasticsearch" ]; then
    echo "Start photon"
    java -jar photon.jar $@
else
    echo "Could not start photon, the search index could not be found"
fi

sleep 90

get-latest.sh

#!/bin/bash

# Variables
GH_USERNAME="komoot"
GH_REPO="photon"


# Functions
get_latest_tag() {
curl https://api.github.com/repos/${GH_USERNAME}/${GH_REPO}/releases/latest | jq -r .tag_name
}

get_latest_tag

Containerfile

# replace $REGISTRY_NAME, $REGISTRY_NAMESPACE, and $IMAGE
# I built this using almalinux9-minimal
FROM $REGISTRY_NAME/$REGISTRY_NAMESPACE/$IMAGE:latest

# Install pbzip2 for parallel extraction
RUN microdnf update \
    && microdnf -y install \
        bzip2 \
        java-21-openjdk \
        pbzip2 \
        tar \
        wget \
    && rm -rf /var/cache/yum

WORKDIR /photon
ADD https://github.com/komoot/photon/releases/download/0.6.0/photon-0.6.0.jar /photon/photon.jar
#ADD https://github.com/komoot/photon/releases/download/0.4.2/photon-0.4.2.jar /photon/photon.jar
COPY entrypoint.sh ./entrypoint.sh

VOLUME /photon/photon_data
EXPOSE 2322

ENTRYPOINT /bin/bash /photon/entrypoint.sh

build.sh

#!/bin/bash

####################################################################
# Variables
####################################################################
# General Variables
# Replace $REGISTRY_NAME, and $REGISTRY_NAMESPACE with your values
REGISTRY=$REGISTRY_NAME
REGISTRY_NAMESPACE=$REGISTRY_NAMESPACE

# Variable Requirements
podman login $REGISTRY_NAME

###############################################################################################################################
# Functions
###############################################################################################################################

# Informational Functions
usage() {
cat << EOF

usage: ${SCRIPT_NAME} [options]

This script exports data from one database to be compared with data from another database

OPTIONS:
    -D      Enable DEBUG mode
    -f      Custom Containerfile with absolute or relative directory
    -F      Format of containerfile
    -h      Show this message
    -I      Initialize image
    -S      Scheduled build, should only be used by automation (gitea actions).
    -v      Version to build, defaults to 'latest'
EOF
exit 1
}

get_parameters() {
    while getopts "Df:F:hIRSv:" option_name; do
        case "$option_name" in
            "D")
                DEBUG="YES"
                ;;
            "f")
                OPT_CONTAINERFILE="$OPTARG"
                ;;
            "F")
                OPT_FORMAT_TYPE="$OPTARG"
                ;;
            "h")
                usage
                ;;
            "I")
                OPT_INITIALIZE="TRUE"
                ;;
            "R")
                OPT_REBUILD="TRUE"
                ;;
            "S")
                OPT_SCHEDULED="TRUE"
                ;;
            "v")
                OPT_VERSION="$OPTARG"
                ;;
            "?")
                echo "Error: Unknown option $OPTARG"
                usage
                ;;
            ":")
                echo "Error: No argument value for option $OPTARG"
                usage
                ;;
            *)
                echo "Error: Unknown error while processing options"
                usage
                ;;
        esac
    done

    if [[ -z $OPT_CONTAINERFILE ]]; then
        OPT_CONTAINERFILE="Containerfile"
    fi

    if [[ -z $OPT_FORMAT_TYPE ]]; then
        OPT_FORMAT="--format docker"
    elif [[ $OPT_FORMAT_TYPE == "podman" ]]; then
        OPT_FORMAT=""
    else
        OPT_FORMAT="--format ${OPT_FORMAT_TYPE}"
    fi

    if [[ $DEBUG == "YES" ]]; then
        SILENCE_OUTPUT=""
	    SILENCE_STOUT=" "
        echo "Debug mode enabled."
    fi

    if [[ -z $OPT_REBUILD ]]; then
        OPT_REBUILD="FALSE"
    fi

    if [[ -z $OPT_SCHEDULED ]]; then
        OPT_SCHEDULED="FALSE"
    fi

    if [[ -z $OPT_VERSION ]]; then
        JDK_IMAGE_ID_DIGEST=$(podman pull quay.io/almalinuxorg/9-minimal)
        JDK_IMAGE_VERSION=$(podman image inspect --format '{{json .}}' "${JDK_IMAGE_ID_DIGEST}" | jq -r '. | {Id: .Id, Digest: .Digest, RepoDigests: .RepoDigests, Labels: .Config.Labels}' | grep org.opencontainers.image.version | cut -d ":" -f2 | cut -d " " -f2 | tr -d '"')
        TAG=${CADDY_IMAGE_VERSION}
        CUSTOM_JDK_IMAGE_ID_DIGEST=$(podman pull $REGISTRY/$REGISTRY_NAMESPACE:latest)
        CUSTOM_JDK_IMAGE_VERSION=$(podman image inspect --format '{{json .}}' "${CUSTOM_JDK_IMAGE_ID_DIGEST}" | jq -r '. | {Id: .Id, Digest: .Digest, RepoDigests: .RepoDigests, Labels: .Config.Labels}' | grep org.opencontainers.image.version | cut -d ":" -f2 | cut -d " " -f2 | tr -d '"')
    fi

    if [[ "${OPT_SCHEDULED}" == "TRUE" ]]; then
        OPT_REBUILD="FALSE"
    fi
}

# Build Functions
container_initialize() {
    echo "Initalizing..."
    podman login ${REGISTRY}
    podman build -f ${OPT_CONTAINERFILE} -t ${REGISTRY}/${REGISTRY_NAMESPACE}:latest
    podman push ${REGISTRY}/${REGISTRY_NAMESPACE}:latest
#    podman logout ${REGISTRY}
    echo "Initialized!"

}

container_scheduled_build() {
    if [[ "${JDK_IMAGE_VERSION}" != "${CUSTOM_JDK_IMAGE_VERSION}" ]]; then
        podman login ${REGISTRY}
        podman build -f ${OPT_CONTAINERFILE} -t ${REGISTRY}/${REGISTRY_NAMESPACE}:${TAG} -t ${REGISTRY}/${REGISTRY_NAMESPACE}:latest
        podman push ${REGISTRY}/${REGISTRY_NAMESPACE}:${TAG}
        podman push ${REGISTRY}/${REGISTRY_NAMESPACE}:latest
        podman logout ${REGISTRY}
    else
        echo "No new version, exiting..."
        exit 1
    fi
}

container_rebuild() {
    podman login ${REGISTRY}
    podman build -f ${OPT_CONTAINERFILE} -t ${REGISTRY}/${REGISTRY_NAMESPACE}:${TAG} -t ${REGISTRY}/${REGISTRY_NAMESPACE}:latest
    podman push ${REGISTRY}/${REGISTRY_NAMESPACE}:${TAG}
    podman push ${REGISTRY}/${REGISTRY_NAMESPACE}:latest
    podman logout ${REGISTRY}
}

###############################################################################################################################
# Main Script
###############################################################################################################################
get_parameters "$@"

if [[ "$OPT_INITIALIZE" == "TRUE" ]]; then
    container_initialize
fi

if [[ "$OPT_SCHEDULED" == "TRUE" ]]; then
    container_scheduled_build
fi

if [[ "$OPT_REBUILD" == "TRUE" ]]; then
    container_rebuild
fi

@tbelway
Copy link

tbelway commented Jan 17, 2025

@lonvia what's done so far:

* Announcement posted in release notes, explaining the situation and encouraging using self-hosted photon

* In-app notification created with the same text

* Support implemented for Geoapify as an alternative reverse geocoding service

* Instructions updated on the Reverse Geocoding page on Dawarich website

* Photon-specific env vars removed from default `docker-compose.yml` to encourage users to make their own decision

Hopefully, this will help, and I'll work on support for more reverse geocoding providers in the future.

To save you the time, feel free to use my code above to build out a dawarich-photon docker image with a disclaimer as to the image size. The timeout may need to be modified for slower connections too... I actually populated the photon data manually then started up the container due to the large nature and the at-the-time slow internet I had...

@elyobelyob
Copy link

Just built my first Dawarich docker and also created a photon-docker as well. Was wondering if it'd be useful for the photon devs if we created a pool of our instances to help lower their usage. My build took a 77GB file which uncompressed is 193GB, so not sure how many average users will do this. But if we could create a round robin way of pooling our instances, then I'd be happy to share mine with other users. Say a 100 people did this, I'd only have 1% of the traffic. The more the merrier. Is this something worth investigating?

@Freika
Copy link
Owner

Freika commented Jan 19, 2025

Just built my first Dawarich docker and also created a photon-docker as well. Was wondering if it'd be useful for the photon devs if we created a pool of our instances to help lower their usage. My build took a 77GB file which uncompressed is 193GB, so not sure how many average users will do this. But if we could create a round robin way of pooling our instances, then I'd be happy to share mine with other users. Say a 100 people did this, I'd only have 1% of the traffic. The more the merrier. Is this something worth investigating?

I think it's a good idea and I'll try to work on a tool to group private photon instances soon

@Freika
Copy link
Owner

Freika commented Jan 20, 2025

To anyone willing to provide their private Photon instance for Dawarich users:

Please post your instances in the discussion and I'll add them to the main message.

@lindner
Copy link

lindner commented Jan 22, 2025

Idea:

  • Add a commented photon config in docker-compose.yml so folks can spin that up easily.
  • To use that photon instance you should add some documentation reminding people that PHOTON_API_HOST takes a port as well as a hostname. So for me I had to set it to dawarich-photon:2322

@nachbelichtet
Copy link

I've tried to enable Geoapify. I've created a Geoapify project with the reverse geocoding API and added a new environment variable
GEOAPIFY_API_KEY XXXXXXXX
restarted my container and added a new reverse geocoding job.
It appears that it doesn't do anything. No jobs in sidekiq and no API calls visible in the Geoapify API use stats.
I'm on Dawarich 0.23.6 because of problems with 0.24 (database not found etc.)

@Freika
Copy link
Owner

Freika commented Feb 15, 2025

I've tried to enable Geoapify. I've created a Geoapify project with the reverse geocoding API and added a new environment variable
GEOAPIFY_API_KEY XXXXXXXX
restarted my container and added a new reverse geocoding job.
It appears that it doesn't do anything. No jobs in sidekiq and no API calls visible in the Geoapify API use stats.
I'm on Dawarich 0.23.6 because of problems with 0.24 (database not found etc.)

Is your sidekiq instance running?

@nachbelichtet
Copy link

Is your sidekiq instance running?

Yes, it is. And Imports are working fine.

@nachbelichtet
Copy link

I managed to get version 0.24.1 running. Then I tested my Geoapify API with https://api.geoapify.com/v1/geocode/reverse?lat=52.47944744483806&amp;lon=13.213967739855434&amp;format=json&amp;apiKey=xxxxxxx and get valid data back. A job for reverse geocoding is created in Dawarich, which is finished after less than a minute.

Image

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests