← Back to context

Comment by someoneontenet

1 year ago

I talk to ai like a caveman mostly. Instead of over optimizing my prompt I just try to find the minimal amount of representation to get the llm to understand my problem and solve it for me and I have been very productive with this strategy. What would someone like me get out of prompt engineering? Or is it more for things like agents, bots, and assistants?

I'd guess the latter.

My SO wanted Google Assistant at home after gotten used to it in our Android Automotive-based car. So I've been dabbling with local LLMs, as a learning experience.

I got one prompt which classifies the question, ie asking for weather, math question or knowledge etc. There I ask it to only output the category, so I can easily do different things based on that.

For knowledge-based stuff I include our town and country, tell it to use metric units and be brief. I tell it to ask clarifying questions if needed. If I don't it'll use miles, or both miles and km, and be too long-winded and assumes too much.

For calculations, I've been asking it to output Octave code that computes the answer, giving the result in a specific variable name, and without explanation. If it can't then output a special sequence. Without it'll include explanations of what the code does and not be consistent with variable naming.

Been using Gemma 9B so far, which performs well on my aging 2080Ti, and haven't actually put all the pieces together yet (my SO asked last weekend). But seems very promising, and adding the extra instructions for each task radically changes the output and makes this approach viable.

Btw, I know there are probably tons of these assistants out there. I just enjoy figuring out how things work.

  • What hardware do you use for audio I/O. And what do you use for activation? I feel the rest is pretty simple to cobbler together... Should be a few days of work to surpass Siri...

    • This is the tricky bit yes. I bought some Pi HATs[1] earlier, and they seem to work fine when testing, but haven't tried to use them properly yet. Alternatively I have some condenser microphones and an audio card HAT for the Pi.

      So I hope to be able to use that with Pi Zero W. Played around with using PulseAudio so I could add compression plugins and such, which seems to work.

      Haven't actually attempted wake-work yet, but first port of call there will be to try the ones Rhasspy uses[2].

      Since I want to integrate with HA I might end up expanding on Rhasspy, or I might just use parts of Rhasspy, ie once I determine it's a HA command and not a question about dinosaurs say, inject the cleaned transcription into the Rhasspy pipeline.

      [1]: https://www.seeedstudio.com/ReSpeaker-2-Mics-Pi-HAT.html

      [2]: https://rhasspy.readthedocs.io/en/latest/wake-word/

      1 reply →

  • Do you mind sharing your code at some point? Is it using home automation? How are you asking it anything? Is it via your phone or have you setup a speaker device or what?

    • If I manage to make something useful I'll certainly do that.

      Planning on using small Pi Zero 2 based "boxes" with microphone, see other reply. Though have only done cursory tests.

  • Just for illustration, here's a concrete example. As noted I'm a total n00b and just playing around, but I think it's illustrative, including the limitations of these LLMs.

    First I used a Norwegian-tuned Whisper large model to convert to convert some audio. The audio was intentionally unfocused, to simulate a real session. It outputs English so does the translation directly as well, hence the somewhat weird sentences and use of "cake shape" rather than "cake pan". The output from Whisper was as follows:

    OK. Yes, think I have a cake shape. I have a recipe for a cake shape ... Wait a minute. So, I have a recipe meant for a cake shape of 30 cm. I will use a cake shape of 24 cm. How much less do I have to do the recipe to ... That it should fit.

    For the following I use Gemma 9B as mentioned.

    First I pass it through a cleanup step:

    Input> The following statement is an audio transcript of a question or statement. It may contain errors, corrections and unrelated utterances. Rewrite it so that it is suitable for an email or text message. Remove superflous or unrelated parts. Do not extrapolate beyond what is in the transcript. The transcript is: [Whisper output]

    Output> I have a cake recipe for a 30cm cake shape. I want to use a 24cm cake shape. How much less do I need to make the recipe so it fits?

    Next I pass it through a classification step:

    Input> Try to categorize the following question in the following categories: knowledge question, math calculation or equation, information about public transport, information about weather, unknown. Output only the category. The question is: [cleaned output]

    Output> math calculation or equation

    For math questions I pass it to the following prompt:

    Input> Convert the following math question into octave/matlab syntax which calculates the answer if possible. The output variable should be x. Only output the code. If the question is unclear or it's not possible to calculate, output three exclamation points followed by a brief explanation. If you need additional information, include that in the explanation. The question is: [cleaned output]

    Output> !!! The question requires calculating the volume ratio between two cake shapes, which is not possible with only the provided dimensions.

    So we hit a limitation. For people it's obvious two cake pans would have similar height, and thus the answer is calculable. However we didn't tell it that it could make assumptions, so lets try that:

    Input> Convert the following math question into octave/matlab syntax which calculates the answer if possible. The output variable should be x. Only output the code. You can make some conservative assumptions, but if the question is unclear or it's not possible to calculate, output three exclamation points followed by a brief explanation. If you need additional information, include that in the explanation. If you make some assumptions, include that as comments at the start of the code. The question is: [cleaned output]

    Output>

        % Assuming the cake recipe volume is proportional to the cake shape volume
        r1 = 30/2; % Radius of the 30cm cake
        r2 = 24/2; % Radius of the 24cm cake
        h = 10; % Height of the cakes
        vol1 = pi \* r1^2 \* h; % Volume of the 30cm cake
        vol2 = pi \* r2^2 \* h; % Volume of the 24cm cake
        x = (vol1 - vol2) / vol1; % Ratio of reduction needed
    

    So we got a satisfactory answer. However, here again you see limitations. It doesn't truly understand math. There's no need to assume an explicit height, as the heights cancel out in the final calculation.

    Ok, let's try that last prompt on a different and ambiguous question:

    What's the cutoff frequency for an LC circuit with R equals 500 ohm and C equals 10 nanofarrad?

    Output>

        ```octave
        % Assuming L is not given, we cannot calculate the cutoff frequency.
        % The cutoff frequency formula requires the inductance (L) value.
        !!! Need the inductance (L) value to calculate the cutoff frequency.
        ```
    

    First, it correctly determined the prompt was ambiguous to the point of not being answerable. However it also suddenly included the Markdown which I didn't tell it to do, and which it didn't in the previous case. So maybe the prompt needs to be even more explicit...

If you're trying to ship something to production that has consistent behavior within bounds and handles edge cases you'll need to do quite a bit of work. For ChatGPT use your strategy works fine

  • E.g. the way a librarian Googles stuff vs how an average user does. Googling "highest average income USA state" is fine for an average user. A research librarian would need to be more sophisticated because they're being asked to solve more sophisticated problems.

I'm starting to use LLMs more and more for technical/programming questions, but my success rate so far is only about 50/50. I haven't tried a lot of them, but so far Copilot is my least favorite. When I ask it a technical question, it seems to give me answers to a question /sort of like/ but not exactly like what I was asking. Essentially treating every prompt like an X/Y problem. Which is much more frustrating than just telling me my question is rare enough that it doesn't know the answer.

I actually found out that sometimes the less specific you are, the better the result. Sometimes even giving them "drunken" prompts with just a few keywords is enough, just something to let them understand where you're trying to get at and they fill out the rest. If you specify too much they tend to hyperfixate on things.

I imagine you might do well to write your prompt with similar language (diction, sentence construction, etc.) to what you'd find in the output/answer you are attempting to evoke from the LLM. These are prediction machines, after all.

I mostly use Gemini and I basically treat it like a Google search most of the time. I find all the pronouns and sentence structure is usually completely unnecessary.

Every now and then Gemini decides I'm doing something unsafe and I need to get creative.

These complex system prompts are mostly there to somehow restrict what users can get the bots to do, and then users try even more complex responses to try to get around that.

Yep.

With Claude - “Figure out what is going on here/do you see why I’m trying to do…”

With OpenAI - Sometimes being crazy detailed is the only way I can get it to compete

I do similar to you. I was commenting in another thread of similar:

From the visual codeViz thread ---

https://news.ycombinator.com/item?id=41393458

...

I've been wanting to have a GPT directly inside Blender to Talk Geometry Nodes - because I want to tie geometry nodes to external data to external data which runs as python inside blender that draws the object geometry that suitabley shows/diagrams out the nodes of my game I am slowly piecing together 'The Oligarchs' which is an updated Illuminati style game - but with updates using AI to creat nodes directly from Oligarch IRL files, such as their SEC Filings, Panama Papers, and all the tools on HN are suited to creating. I went to school for Softimage & Alias|WAVEFRONT (which became MAYA) Animation in 1995 :-)

So I like your DNA.

I want to unpack the relationships of the Oligarch, programmatically, with hexagonal nodes, similar to this[0]- but driven by Node-based-python-blocks-GraphQL-hierachy. And I am slowly learning how to get GPTBots to spit out the appropriate Elements for me to get there.

[0] - https://www.youtube.com/watch?v=vSr6yUBs8tY

(ive posted a bunch of disjointed information on this on HN - more specifically about how to compartmentalize GPT responses and code and how to drive them to write code using Style-Guide, and gather data using structures rules for how the outputs need to be presented..)

EDIT:

I throw commands at lit like this, where I tell it to "give me a ps1 that sets a fastAPI directory structure, creates the venv, touches the correct files give me a readme and follow the best practice for fastAPI from [this github repo from netflix]

And it gave me that script...

Then, here is the following when I want to document it. Then, Ill take that script and tell it to give me a webUI to run it and invoke it and add logging and dashboards.

I do this to practice making tooling logic doo-dads on the fly, and then iterate through them.

https://i.imgur.com/7YOjJf8.png

https://i.imgur.com/KecrvfZ.png

https://i.imgur.com/tKYsmb9.png

https://i.imgur.com/nCGOfSU.png

https://i.imgur.com/ayDrXZA.png

Etc -- I always make it diagram. Now I can throw a bunch of blocks in a directory and tell it to grab the components from the directory and build [THIS INTENT].app for my.