Comment by eichin
5 hours ago
Trying not to turn this into "falsehoods developers believe about geographic names", but having done natural-language geocoding at scale (MetaCarta 2002-2010, acquired by Nokia) the most valuable thing was a growing set of tagged training data - because we were actually building the models out of that, but also because it would detect regressions; I suspect you needed something similar to "keep the LLMs in line", but you also need it for any more artisinal development approach too. (I'm a little surprised you even have a single-value-return search() function, issue#44 is just the tip of the iceberg - https://londonist.com/london/features/places-named-london-th... is a pretty good hint that a range of answers with probabilities attached is a minimum starting point...)
Thanks for this - its interesting how I have come to this conclusion as well.
My reworked approach is to return a list of results with a probability or certainty score.
In the situation of someone searching for London, I need to add some sort of priority for London, UK.
My dataset is sourced from an opensource JSON file which I am now pre-processing and identifying all collisions on it.
There are so many collisions!
Could I pick your brains and you could critique my approach? Thanks