← Back to context

Comment by cypherfox

25 days ago

I'm working on a tool to auto-label emails in Gmail (first) based on what you've labeled in the past.

It pulls down up to 400 emails for each custom label and creates a custom model just for you, that will label new incoming email.

For emails that are likely, but not certain to be a particular label, I use a 'Proposed/{label}' approach which lets you just archive them in Gmail, and it will detect that they've been archived with the proposed label and move them to the correct label. (Essentially using the archive action as an acceptance criteria.) Similarly I use re-labeling by the user as a negative signal, and include that data as a counter-example.

It's working well for my own accounts, and the back-end is pretty legendary, but Google requires a hefty cost to audit security in order to turn it into a real product.

It always frustrated me that Google won't use their ML systems to label emails for me based on what I've done before. So I scratched that itch.

I'm using very straightforward BERT models right now, but I'm exploring using something a little more intelligent. I'm also exploring a multi-stage process, because a lot of emails can be categorized using much simpler techniques.

It's a great Machine Learning project, with a back-end that really runs spectacularly on Temporal and Kubernetes, and it's useful to me, so...wins all around.

I do wish I could make it a product, though.