← Back to context

Comment by skinfaxi

3 hours ago

Would you ever consider writing up or sharing your setup?

The ingredients are:

1. Bun.Cron API to run a script every minute

2. Bun.$ (Bun Shell) to execute the macOS command to take a screenshot (I do this for all connected screens at that moment)

3. Bun.Image to downscale everything to 1x in case some of the screenshots are 2x

4. Bun Shell again to run a JXA AppleScript thing to use the Vision Framework or whatever it is called to OCR the image into a file

5. Bun Shell to run the Swift compiler in the one-off eval mode with inline Swift helper that runs the Foundation Models Framework built-in LLM with a system prompt that tells it what the OCR said and instructs it to glean what may be on the screen (can't do this with JXA because the models are not exposed with ObjC APIs)

6. For each screenshot, continuously, take the previous day summary file and the last OCR/context results and produce a new summary of the day

I plan on adding extra information from the OS like the currently opened windows, currently focused window, time of day etc. into the mix, but so far it hasn't been needed. It produces reports of a good enough quality for me.

I `grep` these daily summaries whenever I need to recall a link I saw or a find what channel a message I spotted was in or take another look at that one tab I already closed, maybe re-open it by its OCR'd URL etc.