Comment by deemkeen

3 months ago

great insights, i'll read up on it and see if it can be useful, thx

np. there's a diagram, "Linux bcc/BPF tracing tools" [-1] in the bcc readme [0] that's also in [1] which explains ebpf and bcc and bpftrace.

filetop, dirtop, and vfsstat use bpf to trace the VFS layer. [4]

[-1] "Linux bcc/BPF tracing tools" https://www.brendangregg.com/BPF/bcc_tracing_tools_early2019...

[0] iovisor/bcc: https://github.com/iovisor/bcc

[1] "Linux Extended BPF (eBPF) Tracing Tools", Dtrace book: https://www.brendangregg.com/ebpf.html

If running an AI agent in a container --- with devcontainers and e.g. vscode,

Good container policy prevents granting a container the CAP_SYS_ADMIN capability; the least-privileges thing to do is to grant limited capabilities to the container like CAP_BPF and (CAP_PERFMON, CAP_NET_RAW, CAP_SYS_PTRACE) [,3].

[3] https://medium.com/@techdevguides/using-bpftrace-with-limite...

[4] bpfcc-tools manpages: https://manpages.debian.org/unstable/bpfcc-tools/index.html

though ripgrep wins, vscode fails at monitoring large workspaces due to inotify limits too; so some way to parse fs events from bcc and libdtrace with python would be great

prompt 1: Create a python project named idk dbpftrace with a pyproject.toml and a README and sphinx /docs, with bcc and python-dtrace as dependencies to, then in dbpftrace/,

parse pid and descendents' fs syscall events from bcc (ebpf) or python-dtrace (dtrace), depending on which os we're running

Edit:

Prompt 1B: Create a Go package named dbpftrace with a README and docs,

parse pid and descendents' fs syscall events from bpftrace or dtrace stdout, depending on which os we're running

  • Prompt 1C: Create a Go package named dbpftrace with a README and docs, then create a cli utility named dbpftrace to:

    parse pid and descendents' fs syscall events (like bpftrace) using libbpfgo and godtrace

    Use either (cilium/ebpf or libbpfgo or gobpf) or (godtrace or (CGO or FFI) bindings to libdtrace) depending on which OS, by default

    cilium/ebpf: https://github.com/cilium/ebpf

    aquasecurity/libbpfgo https://github.com/aquasecurity/libbpfgo

    iovisor/gobpf w/ bcc: https://github.com/iovisor/gobpf

    chzyer/godtrace: https://github.com/chzyer/godtrace

    oracle/dtrace-utils/tree/devel/libdtrace: https://github.com/oracle/dtrace-utils/tree/devel/libdtrace

    From > awesome-ebpf > Kernel docs, examples, Go libraries:

  • Thanks for the thoughtful pointers — super helpful.

    Where diffwatch is today: it’s a portable directory watcher (fsnotify → inotify/FSEvents/ReadDirectoryChangesW) that coalesces events and renders live unified diffs in a tiny TUI.

    What I’m planning based on your suggestions (and others here):

    1. Two-tier design

    Default (no admin): keep the current directory-watch mode for quick, portable use.

    Power mode (attach): diffwatch attach --pid <PID> | --cmd "<…>" to trace a process and its children and feed any touched paths into the same diff UI.

    2. Per-OS backends for “attach”

    Linux: eBPF/bpftrace when available; fallback to strace -ff -e trace=file for zero extra deps.

    macOS: opensnoop / fs_usage (DTrace-based).

    Windows: ETW (Kernel File provider) via a tiny helper (e.g., KrabsETW) that streams JSON events.

    3. Admin rights caveat

    macOS (DTrace) and Windows (ETW kernel) typically require admin. I’ll keep the default dir-watch mode as the “no-admin” path, and document the elevated-rights requirement clearly for “attach”.

    4. Normalized event stream

    All backends emit a common JSON line: {"ts": "...", "pid": 1234, "op": "create|write|rename|unlink|close", "path": "..."} Then a short stability window (debounce + retry on transient ENOENT) before reading to diff.

    5. Scalability & ergonomics

    Handle editor/atomic-save tempfiles gracefully.

    Respect .gitignore and add --exclude/--include globs.

    Guardrails for watch count limits; skip non-regular files; optional --record (NDJSON) and --save-patch.

    6. Containers / agents

    Nice follow-on: diffwatch attach --cmd ... inside a container (or attach by PID in the container namespace) to confine the blast radius for agent runs.

    Ask: I’d love help and pointers to minimal tracer scripts:

    A small bpftrace/DTrace snippet that reliably captures opens/writes/renames for a PID(+children).

    A tiny Windows ETW consumer example focused on File I/O, filtered by PID, emitting JSON.

    Repo: https://github.com/deemkeen/diffwatch I’ll open issues for:

    “Attach mode” backends (Linux/macOS/Windows)

    .gitignore/globs

    Event coalescing + transient ENOENT handling

    JSON recording / patch export

    If you or anyone wants to collaborate, I’ll tag them good first issue / help wanted and am happy to review PRs quickly. Thanks again for the nudge to go beyond plain FS events — the PID/container “attach” mode should make agent debugging much more robust.

    1 reply →