Replies: 3 comments 12 replies
-
As already said in the other thread, this has already been fixed in the new Python frontend. Mismatching scripts now result in a penalty. You can try that on nominatim.openstreetmap.org which now runs the Python frontend. Penalties for lesser known names is a different story and needs a completely different search index. I have ideas but an implementation is still far away in the future. |
Beta Was this translation helpful? Give feedback.
-
Hi, as this is possible now, I upgraded to 4.3 my docker container, and it seems that "Ain" still return "Aisne" using the |
Beta Was this translation helpful? Give feedback.
-
I upgraded to 4.4 and I still got that :
returns:
It must have some cache no? |
Beta Was this translation helpful? Give feedback.
-
Following to #3132, I open this new discussion because I think that the specific case I previously described can occur for different queries and this need a specific thread.
As a reminder, this is the original issue :
Aisne is a french county. Aisne is translated in Greek : Αιν
So, once this name is tokenized in keywords, one of the token is : ain (because
ν
matches with the latin charn
)Problem: When I query "Ain", consequently it gives me Aisne... while I'm expecting Ain which is actually another french county.
If I set accept-language to en_GB, then "Ain" well returns Ain, and not Aisne.
So, I have several questions about this issue, that might occur for many places translated with different alphabets around the world and which tokens are wrongly mapped with the latin charset :
"Ain" is not equal to "Αιν", could the tokenizer use all the chars available in utf8 including non-latin chars ?
The keywords (+importance) seem to have more prominance in queries than exact match, is this OK ? (= the quick fix you suggested in the last thread is something realistic ?)
When there is no accept-language defined, should it not be the original-country translation (=default language name) that'd have the more prominance ? So for example, without accept-language, If I type Ain, the matching should first use all the original language/default name, find the most probable which is Ain. Only if this first search does not give any result, it would try another search with all the keywords. I guess this has not be done because we all write places with our own chars instead of local chars and so for example, looking for Moscou, a query as I described would first look for Moscou in all the default names and so Москва would not be found, then the search would find for Москва thanks to its full list of keywords. But anyway we could have several sets of keywords, I'm not sure of which would be the best approach to this.
Beta Was this translation helpful? Give feedback.
All reactions