Comment by DrStartup

7 months ago

entity resolution is the killer feature. context engineering is the problem with this benchmark attempt. The agent plan seemed to one shot, and the fact that the LLMs could write their own tools without validation or specific multi shot examples is worrisome. To me way to much left to the whims of the llms - with out proper context.

1 comment

DrStartup

mfrye0 7 months ago

Yes, none of the top LLMs can do entity resolution well yet. I constantly see them conflate entities with similar names - they'll confidently cite 3 sources about what appears to be one company, but the sources are actually about 3 different businesses with similar names.

The fundamental issue is that LLMs don't have a concept of canonical entity identity. They pattern match on text similarity rather than understanding that "Apple Inc" and "Apple Records" are completely different entities. It gets even worse when you realize companies can legally have identical names in the same country - text matching becomes completely unreliable.

Without proper entity grounding, any business logic built on top becomes unreliable.