Comment by volkercraig

13 days ago

I don't think you understand what an "LLM" is. They're text generators. We've had autopilot since the 1930s that relies on measurable things... like PID loops, direct sensor input. You don't need the "language model" part to run an autopilot, that's just silly.

13 comments

volkercraig

pixl97 13 days ago

You see to be talking past him and ignoring what they are actually saying.

LLMs are a higher level construct than PID loops. With things like autopilot I can give the controller a command like 'Go from A to B', and chain constructs like this to accomplish a task.

With an LLM I can give the drone/LLM system complex command that I'd never be able to encode to a controller alone. "Fly a grid over my neighborhood, document the location of and take pictures of every flower garden".

And if an LLM is just a 'text generator' then it's a pretty damned spectacular one as it can take free formed input and turn it into a set of useful commands.

volkercraig 13 days ago
They are text generators, and yes they are pretty good, but that really is all they are, they don't actually learn, they don't actually think. Every "intelligence" feature by every major AI company relies on semantic trickery and managing context windows. It even says it right on the tin; Large LANGUAGE Model.
Let me put it this way: What OP built is an airplane in which a pilot doesn't have a control stick, but they have a keyboard, and they type commands into the airplane to run it. It's a silly unnecessary step to involve language.
Now what you're describing is a language problem, which is orchestration, and that is more suited to an LLM.
- lukan 13 days ago
  
  "they don't actually learn"
  Give the LLM agent write acces to a text file to take notes and it can actually learn. Not really realiable, but some seem to get useful results. They ain't just text generators anymore.
  (but I agree that it does not seem the smartest way to control a plane with a keyboard)
  
  2 replies →
- pixl97 13 days ago
  
  [dead]

infecto 13 days ago

My confusion maybe? Is this simulator just flying point a to b? Seems like it’s handling collisions while trying to locate the targets and identify them. That seems quite a bit more complex than what you are describing has been solved since the 1930s.

notepad0x90 13 days ago

LLMs can do chat-completion, they don't do only chat completion. There are LLMs for image generation, voice generation, video generation and possibly more. The camera of a drone inputs images for the LLM, then it determines what action take based on that. Similar to if you asked ChatGPT "there is a tree in this picture, if you were operating a drone, what action would you take to avoid collision", except the "there is a tree" part is done by the LLMs image recognition, and the sys prompt is "recognize objects and avoid collision", of course I'm simplifying it a lot but it is essentially generating navigational directions under a visual context using image recognition.

nrrbtrbbrb 13 days ago
> There are LLMs for image generation,
That part isn’t handled by an LLM
> voice generation,
That part isn’t handled by an LLM
> video generation
That part isn’t handled by an LLM
- famouswaffles 13 days ago
  
  Yes it can be, and often is. Advanced voice mode in chatGPT and the voice mode in Gemini are LLMs. So is the image gen in both chatGPT and Gemini (Nano Banana).
- notepad0x90 13 days ago
  
  What is it handled by? I'm honestly curious, there are models specifically labeled as for those tasks.

cheema33 13 days ago

"You don't need the "language model" part to run an autopilot, that's just silly."

I think most of us understood that reproducing what existing autopilot can do was not the goal. My inexpensive DJI quadcopter has an impressive abilities in this area as well. But, I cannot give it a mission in natural language and expect it to execute it. Not even close.

njhnjhnjhnjh 13 days ago

[dead]