← Back to context

Comment by gwern

3 years ago

"There are only two hard problems in CS, cache invalidation and naming." Dan Luu covered cache the other day (https://danluu.com/cache-incidents/) so I guess OP can be said to cover naming: users really hate the constant friction and uncertainty of naming. 'Untitled.doc' is the "blinking twelve" of writing or notetaking systems: just enough effort and trivial inconveniences to stop people from fixing it.

But as always, the solution to every problem in automation is some more automation, and it seems like we could do better than forcing users to aversively write titles. A neural net like GPT-J or T5 is good at abstractive summarization, so there's no reason a note-taking program couldn't make titles strictly optional: use the internal ID as the placeholder, and update the title based on the contents until such time as the user wishes to manually edit or write it themself.

This is the sort of gradual automation which works best in practice: I write my note as I please, the NN does the best it can, and when I read it, I realize it missed the important thing (and I realize the important thing in the first place!) and I edit it. Or I don't because it's fine, and I move on to the next thing.

(And making links/backlinks likewise. We don't need to make the user keep the whole note taking graph in their head and curate it themselves: just use more ML. I do something similar with text embeddings on notes on gwern.net: I write a note, and then it gets embedded and a list of suggested links is inserted for me to edit. I find it relieving to write while knowing that it'll suggest links I've forgotten. A poor man's remembrance agent https://www.aaai.org/Papers/Symposia/Spring/1996/SS-96-02/SS... Is it perfect? Far from it. But when it's not, I just delete the suggestions and move on.)

An easy low-tech hack for this would be to take the same approach that the Friends showrunners used for naming episodes. Unless otherwise specified, the notesystem operator should be encouraged to use lazy placeholder titles like "my note about using ML to name notes"

I am currently developing exactly this - a notetaking system similar to Roam/Obsidian/etc. which, instead of explicit links automatically searches for related notes. My approach was to utilize some sort of automatic semantic-aware keyword extraction and try to link other notes not only to the current note as a whole, but to its specific terms/sections. For example, if note is containing a recipe, ingredients would have other recepies which use them linked.

I was not aware that such a system already exists. Can you point me to some references? Is it a private system you developed? Are you willing to share more details? Thanks!

  • Instead of reinventing the wheel, did you consider writing an extension for Obsidian instead? That would allow you to focus on the extraction and aggregation, instead of having to invest a lot of time into also getting an usable text editor (which is a non-trivial task)?

  • I think I would skip trying to do any entity or term extraction initially. That's going well beyond basic linking. Stick with something like TF-IDF similarity or LDA topic modeling to keep things fast and straightforward until you have the UI/UX worked out, which is always more work than it looks like. (The reason I used neural net embeddings was that the OA API had just added them, and it was a fun excuse to try that out.)