Comment by dataflow
14 hours ago
Thanks, yeah. I think strong prefiltering is pretty much always doable because, if nothing else, I usually know the time range of the relevant emails and probably the sender/recipient or some keywords, plus I know how to filter out a big chunk of the irrelevant emails (like mailing lists, etc.), so I'm hoping it's not actually that much data for each search. What I don't know is which models would be most suitable even in the case where I can fit the data.
As an example of the kind of query I'm interested in, I want a model that can tell me all the flights I took within a given time range (so that means it'd have to filter out cancellations). Or, for a given flight, the arrival and departure times and time zones (or the city and country so I can look up the time zone). Stuff like that. (Travel is just an example obviously, I have other topics to ask about.) It's not a terribly large number of emails to search through in each query, but the email structures are too heterogeneous across senders to write custom tooling for each case.
No comments yet
Contribute on Hacker News ↗