Comment by Helmut10001
2 days ago
My experiences somewhat confirm these observations, but I also had one that was different. Two weeks of debugging IPSEC issues with Gemini. Initially, I imported all the IPSEC documentation from OPNsense and pfSense into Gemini and informed it of the general context in which I was operating (in reference to 'keeping your context clean'). Then I added my initial settings for both sides (sensitive information redacted!). Afterwards, I entered a long feedback loop, posting logs and asking and answering questions.
At the end of the two weeks, I observed that: The LLM was much less likely to become distracted. Sometimes, I would dump whole forum threads or SO posts into it, when it said "this is not what we are seeing here, because of [earlier context or finding]. I eliminated all dead ends logically and informed it of this (yes, it can help with the reflection, but I had to make the decisions). In the end, I found the cause of my issues.
This somewhat confirms what some user here on HN said a few days ago. LLMs are good at compressing complex information into simple one, but not at expanding simple ideas into complex ones. As long as my input was larger than the output (either complexity or length), I was happy with the results.
I could have done this without the LLM. However, it was helpful in that it stored facts from the outset that I had either forgotten or been unable to retrieve quickly in new contexts. It also made it easier to identify time patterns in large log files, which helped me debug my site-to-site connection. I also optimized many other settings along the way, resolving not only the most problematic issue. This meant, in addition to fixing my problem, I learned quite a bit. The 'state' was only occasionally incorrect about my current parameter settings, but this was always easy to correct. This confirms what others already saw: If you know where you are going and treat it as a tool, it is helpful. However, don't try to offload decisions or let it direct you in the wrong direction.
Overall, 350k Tokens used (about 300k words). Here's a related blog post [1] with my overall path, but not directly corresponding to this specific issue. (please don't recommend wireguard; I am aware of it)
[1]: https://du.nkel.dev/blog/2021-11-19_pfsense_opnsense_ipsec_cgnat/
Recently, Gemini helped me fix a bug in a PPP driver (Zephyr OS) without prior knowledge of PPP or even driver development really. I would copy-paste logs of raw PPP frames in HEX and it would just decode everything and explain the meaning of each bytes. In about an hour, I knew enough about PPP to fix the bug and submit a patch.
https://g.co/gemini/share/7edf8fa373fe
Or you could just read the PPP RFC [0].
I’m not saying that your approach is wrong. But most LLM workflows are either brute forcing the solution, or seeking a local minima to be stuck in. It’s like doing thousands of experiments of objects falling to figure out gravity while there’s a physics textbooks nearby.
[0]: https://datatracker.ietf.org/doc/html/rfc1661
Ironically, I could’ve read all 50 pages of that RFC and still missed the actual issue. What really helped was RFC 1331[0], specifically the "Async-Control-Character-Map" section.
That said, I’m building a product - not a PPP driver - so the quicker I can fix the problem and move on, the better.
[0] https://datatracker.ietf.org/doc/html/rfc1331
I could also walk everywhere, but sometimes technology can help.
There’s no way I could fully read that RFC in an hour. And that’s before you even know what reading to focus your attention on, so you’re just being a worse LLM at that point.
6 replies →
Interesting that it works for you. I tried several times something similar with frames from a 5G network and it mixed fields from 4G and 5G in its answers (or even from non-cellular network protocols because they had similar features as the 5G protocol I was looking at). Occasionally, the explanation was completely invented or based on discussions of planned features for future versions.
I have really learned to mistrust and double check every single line those systems produce. Same for writing code. Everything they produce looks nice and reasonable on the surface but when you dig deaper it falls apart unless it's something very very basic.
Similarly I found the results pretty mixed whenever a library or framework with a lot of releases/versions is involved. The LLM tends to mix and match features from across versions.
Yes, it fells like setting the `-h` flag for logs (human readable).
That's some impressive prompt engineering skills to keep it on track for that long, nice work! I'll have to try out some longer-form chats with Gemini and see what I get.
I totally agree that LLMs are great at compressing information; I've set up the docs feature in Cursor to index several entire large documentation websites for major libraries and it's able to distill relevant information very quickly.
In Gemini, it is really good to have large window with 1M tokens. However, around 100,000 it starts to make mistakes and refactor its own code.
Sometimes it is good to start new chat or switch to Claude.
And it really helps to be very precise with wording of specification what you want to achieve. Or repeat it sometimes with some added request lines.
GIGO in reality :)
Oh my, I hate it when it rewrites >1k LOC. I have to instruct it to "modify only ..., do not touch the rest" and so forth, but GPT does not listen to this often, Claude does. I dunno about Gemini.
6 replies →
LLM's are good at interpolation but bad at extrapolating
To be fair, all AI/ML and even statistical methods are bad at extrapolating.