Sam Altman: OpenAI is not training GPT-5 and "won't for some time"

3 years ago (twitter.com)

276 comments

ftxbro

Left unsaid in this piece is that OpenAI likely would have to increase parameters and compute by an order of magnitude (~10x) to train a new model that offers noticeable improvements over GPT-4, due to the diminishing returns seen in "transformer scaling laws."

Also, it's possible that OpenAI is still training GPT-4, perhaps with additional modalities, and will make future snapshots available as public releases.

ftxbro 3 years ago
> Left unsaid in this piece is that OpenAI likely would have to increase parameters
Maybe true, but he also said "We are not here to jerk ourselves off about parameter count"
https://techcrunch.com/2023/04/14/sam-altman-size-of-llms-wo...
Also, who says that the "transformer scaling laws" are the ultimate arbiter of LLM scaling? They overturned previous scaling laws and other scaling laws might overturn them. Furthermore, it's even possible that the transformer model won't even be used in later models. I remember Ilya making the point that just because the transformer model was the first one that looks like it can scale intelligence just by lighting up billions of dollars of GPUs, it doesn't mean it's the last one. Maybe it will even be like, the vacuum tube of AI models, and other ones are being made in secret. A hacker news rumor was that they are paying $5M-$20M per year to the top neural net experts probably to make some exotic architectures to surpass transformer.
- guyomes 3 years ago
  
  > A hacker news rumor was that they are paying $5M-$20M per year to the top neural net experts probably to make some exotic architectures to surpass transformer
  This reminds me a TV interview of the author Patrick Modiano, just after he won the literature Nobel price. The presenter asked him if the money would help. The author answered essentially that the next time he would be in front of a white page, the money surely wouldn't help.
  In the case of surpassing transformers, money could help to give access to more compute power. It could also help to prevent the research from being public.
  
  4 replies →
- tootie 3 years ago
  
  I'm not an expert but isn't size the distinguishing feature of an LLM? It's the first L.
  
  1 reply →
- Dylan16807 3 years ago
  
  > They overturned previous scaling laws
  Can you link to a comparison or graph of obsolete and new scaling laws?
  
  2 replies →
- jimsimmons 3 years ago
  
  Curious if anyone can confirm $5-20M figure. Seems absurdly high but what do I know
  
  16 replies →
- imtringued 3 years ago
  
  That money won't help unless they get permission to start their own research department.
ilaksh 3 years ago
Actually what he has said is that the biggest performance gains were from the human feedback reinforcement learning.
There are also all of the quantization and other tricks out there.
Also they have demonstrated that the model already understands images but just haven't completed the API for this.
So they use quantization to increase the speed by a factor of 3 while slightly increasing the parameter count. Maybe find a way to make the network more sparse and efficient so in the end with the quantization the model actually uses significantly less memory. and continue with the RHLF focusing on even more difficult tasks and those that incorporate visual data.
Then instead of calling it GPT-5 they just call it GPT-4.5. Twice as fast as GPT-4, IQ goes from 130 to 155. And the API now allows images to be passed in and analyzed.
- brianjking 3 years ago
  
  There is an API for multimodal computer vision and visual reasoning/VQA, and it's available, just not for normies. It's exclusively for their test group and then the Be My Eyes project at https://www.bemyeyes.com/.
  
  5 replies →
iliane5 3 years ago
I bet they’re not saying how big of a model GPT-4 is because it’s actually much smaller we would expect.
ChatGPT is IMO a heavily fine-tuned Curie sized model (same price via API + less cognitive capacity than even text davinci-003) so it would make sense that a heavily fine-tuned Davinci sized model would yield similar results to GPT-4.
- kyle_grove 3 years ago
  
  Yannic Kilcher makes a similar supposition based on results from the tech report https://www.youtube.com/watch?v=2zW33LfffPc&pp=ygUOeWFubmljI... . It’s about 3/4 of the way through the video if memory serves.
- KeplerBoy 3 years ago
  
  I wouldn't bet on their pricing being indicative of their costs. If MSFT wants the ChatGPT-API to be a success and is willing to subsidize it, that's just how it is.
  
  1 reply →
- qumpis 3 years ago
  
  I wonder why it's slower at inference time then (for members using their web UI), or rather, if it's similar in size to gpt3, how gpt3 is optimized in a way that gpt4 isn't or can't be?
  I'd expect that by now we would enjoy similar speeds but this hasn't yet happened.
  
  4 replies →
lumost 3 years ago
We are also starting to run out of high quality corpus to train on at such model scales. While Video offers another large set of data, we'll have to look at further RL approaches in the next few years to continue scaling datasets.
- furyofantares 3 years ago
  
  Is there any source for this, aside from it being oft repeated by internet speculators? Ilya has said the textual data situation is still quite good
  
  3 replies →
- paddim8 3 years ago
  
  They literally said this is not the case
fileyfood500 3 years ago

I often see mistakes when chatgot is faced with more spatial reasoning, and I wonder if changes as simple as deep convolutional subnetworks in intermediaries layers would help the language model fit better in these situations. In short, I’m excited to see where things go, and can definitely see room for great improvement through improvements to the architecture!
rafaelero 3 years ago

How noticeable changes will be have little connection with loss reduction during training. Holding very complex thought processes may actually not diminish the loss function all that much. But they are very noticeable when we are interacting with these systems.
version_five 3 years ago
They could clean up the training data I bet. That would be where I'd focus next.
- mathematicaster 3 years ago
  
  Is there any indication from OpenAI people that there are low hanging fruits to be picked in this direction?
  
  1 reply →
Terretta 3 years ago

> Also, it's possible that OpenAI is still training GPT-4, perhaps with additional modalities, and will make future snapshots available as public releases.
Read OpenAI API docs on GPT model versions carefully, and look at them again from time to time.
https://platform.openai.com/docs/models
Buttons840 3 years ago

In my machine learning experience, if it only takes 10x the parameters brings a significant improvement I feel lucky.
qaq 3 years ago

Vicuna offers considerable improvement over LLaMA and it's just 13B delta to 65B model.
realitysballs 3 years ago

I would suspect they probably conditioning data for gpt 5. Im guessing ‘training’ presupposes they have the training data primed & getting data into shape seems to be one of main cruxes
fortuna86 3 years ago

GPT-4 +

huijzer 3 years ago

It could be that they are not training GPT-5 for a simple reason: Microsoft ran out of GPU compute [1] and they focus on meeting inference demand for now.

Also, the GPT-4 message cap at chat.openai.com was shown as something along the lines of "we expect lower caps next week", then changed to "expect lower caps as we adjust for demand" to "GPT-4 currently has a cap of …". This sounds to me like they changed from having lots of compute to being limited by it. Also note how everything at OpenAI is now behind a sign up and their marketing has slowed down. Similarly, Midjourney has stopped offering their free plan due to lack of compute.

Seems like we didn’t need a 6 months pause letter. Hardware constraints limit the progress for now.

[1]: https://www.deeplearning.ai/the-batch/issue-192/

gremlinsinc 3 years ago
That or they're working on something like a 10-30B input model, dubbed GPT-NextGen, that essentially has the same results as gpt4, but with a lot more performance gains, and speed, and improvements. GPT-5 will suck, if it's a similar ratio slower to gpt-4, than gpt-4 is to gpt-3.5.
So, I think there's a lot of improvements where maybe gpt-4, is as far you go in terms of inputting data, and maybe better use cases are more customization of data trained on, or finding ways of going smaller, or even some model that just trains itself on the data requirements, similar to how we jump on google when we're stuck, it'd do the same and build up its knowledge that way.
I also think we need improvements in vector stores that maybe add weights to "memories" based on time/frequency/recency/popularity.
- qumpis 3 years ago
  
  That sounds like having a mixture of experts model (at high scale popularly developed by Google): train multiple specialised models (say embedders from text to a representation) that could be fed into a single model at the end. Each expert would be an adapter of sorts, activating depending on the type of input
furyofantares 3 years ago
> the GPT-4 message cap at chat.openai.com was shown as something along the lines of "we expect lower caps next week"
At the time I noticed that the wording they gave technically implied they expected the cap to get more limiting and then that's exactly what happened, and I haven't been able to work out if that is indeed what was the intended message or not.
- qumpis 3 years ago
  
  (Why) Is that technically correct? I'm really curious since I too thought they meant that capping effect would increase (fewer messages allowed), and not decrease (more messages allowed), as was my intuitive understanding.
WhackyIdeas 3 years ago
You also notice the trickery going on?…
I asked it to help me code something. Then it stopped midway through, so I asked it to continue from the last line.
…It started from the beginning.
Now at the same point, I asked it not to stop. To keep going.
It started again from the beginning.
It went like this for about another 10 or so prompts. Hell, I even asked it to help me write a better prompt to ask it to continue from the line it cut off and I then used that. It didn’t work at all.
Then I ran out of prompts.
Three hours later, it did the same crap to me and I lost around 14 prompts to it being ‘stuck’ in an eternal loop.
Basically, OpenAI are sneaky devils. ‘Stuck’ my ass - that was intentional to free up resources.
- FrenchDevRemote 3 years ago
  
  or maybe you need to stop thinking everything is a conspiracy and realize bugs happen I've been using GPT everyday for the last 3 years, it never happened to me
  
  1 reply →
- nulld3v 3 years ago
  
  You can just send a single space character to get the AI to continue its previous output.
  
  3 replies →
huijzer 3 years ago

Oh. And also they are probably making ChatGPT Plugins ready for public release. Maybe the competition can catch up on the language model, but they will not likely catch up soon to the best language model with the most plugin integrations.

mrshadowgoose 3 years ago

At this point, I wouldn't give much credibility to anything OpenAI claims about their research plans.

The game theory behind AGI research is identical to that of nuclear weapons development. There exists a development gap (the size of which is unknowable ahead of time) where an actor that achieves AGI first, and plays their cards right, can permanently suppress all other AGI research.

Even if one's intentions are completely good, failure to be first could result in never being able to reach the finish line. It's absolutely in OpenAI's interest to conceal critical information, and mislead competing actors into thinking they don't have to move as quickly as they can.

patmcc 3 years ago
>>>The game theory behind AGI research is identical to that of nuclear weapons development...
Nuclear powers have not been able to reliably suppress others from creating nuclear weapons. Why would we think the first AGI will suppress all others perfectly?
- mrshadowgoose 3 years ago
  
  The first nuclear power (the United States) chose to not. Had they decided to be completely evil, they certainly could have used the threat of nuclear annihilation (and the act of it for non-compliers) to achieve that goal.
  
  9 replies →
- caeril 3 years ago
  
  The first true AGI will likely foom immediately.
ChatGTP 3 years ago
I thought the same thing, they’ve not disclosed anything else, so why would the my be even slightly honest about this ?
- nicky0 3 years ago
  
  When I see comments like this, I wonder about the personal morality of the poster and how they arrived at their worldview. It may be hard to beleive, but there are some advantages to truthfulness in this world.
  
  6 replies →
citizen_friend 3 years ago

The only "game theory" here is trying to convince people your software is good and important, so you can raise money and sell products.
imtringued 3 years ago
Why would that be the case? If anything you would expect the first iteration of AGI either kept completely secret or end up leaked indirectly or directly negating any benefits. Also AGI without weapons is not a military threat.
- pas 3 years ago
  
  AGI that can engage in cyberwarfare, propaganda campaigns, and social engineering can achieve some military goals nevertheless.
  
  2 replies →

kk58 3 years ago

Perhaps it's time to call this synthetic intelligence instead of AI which has an implicit understanding of an alternative method to construct a human like AI.

What is clear is that on this earth itself we have cetacean, corvid, cephalopod intelligence which is wired very differently. Perhaps we need to respect the diversity of intelligences that exist and study this growth in LLM and adjoint areas as just synthetic intelligence.

Rebranding maybe could help drive a level of objectivity this conversation on ethics etc that seems to be missing

JohnFen 3 years ago
I don't understand. "Synthetic intelligence" is just a synonym for "artificial intelligence". The term has all the same issues, does it not?
- dougmwne 3 years ago
  
  Actually I agree with them a new name would be helpful. I would propose inorganic intelligence to try to pick a term with less value judgments.
  AI is really an overloaded term that includes 70 years of snake oil, Skynet, the Singularity and killer robots. I think we need a new name to start fresh.
  And personally, I think we are extremely biased by our sci-fi to think of this tech as malevolent. As far as we can see, it can only know what we teach it since it relies on all of our perceptions to learn. LLMs seem both extremely promising as a useful tool and very pliant to the operator’s wishes. I’m way beyond “this is a fancy next word predictor” as I think it’s emergent behavior has many of the hallmarks of reasoning and novel inference, but at best I think it is only part of a mind and an unconscious one at that.
  
  2 replies →
- ftxbro 3 years ago
  
  > The term has all the same issues, does it not?
  It could be useful for a similar reason as the euphemism treadmill. We could leave behind all of the misguided assumptions about AI with the old 'artificial intelligence' nomenclature and move forward with 'synthetic intelligence' which has our new understanding of what systems like GPT-4 can do.
  
  14 replies →
- xmprt 3 years ago
  
  I think Artificial Intelligence has taken on the meaning that the intelligence is real but just that it's coming from machines. Synthetic intelligence (at least to me) sounds more like we're acknowledging that the machines aren't really intelligent and just simulating intelligence.
  
  2 replies →
- DJBunnies 3 years ago
  
  If we can’t (and we haven’t) define intelligence, how could we possibly define artificial intelligence or synthetic.
laminatedsmore 3 years ago
I had a chat with GPT about this and it came up with the term 'data grounded cognition' to describe an 'intelligence' that is derived purely from (and expressed through) statistical patterns in data.
I quite like the term, and it seems quite unique (perhaps cribbing from 'grounded cognition' though that's an entirely different idea AFAIK)
- JohnFen 3 years ago
  
  "Cognition" means understanding and knowing. As problematic as "intelligence" is when describing these systems, I think "cognition" is even worse. "Intelligence" is vague and "cognition" is specific, but "cognition" is also incorrect.
  
  4 replies →
Waterluvian 3 years ago

AI has always meant so many things to so many different audiences. I think attempting to argue that X is AI but Y isn't is generally going to be a subjective endeavour of pedantry.
surprisetalk 3 years ago

Here’s an essay on why we should start saying “synthetic intelligence” in certain contexts:
https://taylor.town/synthetic-intelligence
justanotherjoe 3 years ago

that is assuming. Why don't we simply refer to our own intelligence as 'human intelligence' instead. We don't really know what intelligence is. So adding modifier in front of it will just lead to more confusion. AI helps us understand what intelligence actually is, to learn more of it's very essence. It's not that we already know what it is.
hanniabu 3 years ago

"Collective intelligence" since all it's really doing is regurgitating what people have collectively posted online
anyonecancode 3 years ago

What about "simulated intelligence"?

superkuh 3 years ago

It isn't surprising to me that the world's leading AI company is signalling it's okay with slowing down all large scale LLM training that would allow other companies to be competitive. This is familiar territory for Microsoft (edit: guess I'm wrong, they don't get the 49% stock till later).

fasterik 3 years ago
Why do people conflate OpenAI with Microsoft? Microsoft has an investment in OpenAI and provides infrastructure for them, but they are separate organizations.
- ftxbro 3 years ago
  
  Some of these replies are quibbling about percents of investment, but the elephant in the room is that the government and military and intelligence agencies have almost surely become involved by this point, and they must be providing some amounts of dark investment somehow at minimum. At maximum it's a new Manhattan-scale project.
  You can go down the rabbit hole if you want, but if you want only the most superficial glimpse of it then consider that OpenAI board member Will Hurd was a CIA undercover agent and also a representative in the House Permanent Select Committee on Intelligence and also he is a trustee of In-Q-Tel which is the private investment arm of the CIA.
  
  6 replies →
- WestCoastJustin 3 years ago
  
  They do own 49% [1]. So, sure they are separate organizations. But, when someone owns 49% of your house they have some sway in the decision making that happens. When you look at this from a integrations standpoint, where MS is going to have this baked into all their products, you can expand this logic way more. They are for sure influencing roadmap in areas they are interested in.
  [1] https://www.theverge.com/2023/1/23/23567448/microsoft-openai...
  
  5 replies →
- superkuh 3 years ago
  
  Look at it this way, if I repeatedly deposit $9999 into my bank account to avoid regulatory oversight for depositing $10000 then I'm still breaking the law by trying to avoid the regulatory trigger. This is called "structuring" and it is a criminal act.
  But if I do this in a stock context and buy 49% control of multiple companies over and over, with all the same obviousness of my intentional avoidance of the regulatory trigger, it's considered a smart move and pretty much the status quo.
  Yes, the practice of law says Microsoft does not own openai. But it's also obvious what's going on when companies do this.
- JohnFen 3 years ago
  
  Microsoft is the majority shareholder. That they're legally distinct organizations isn't as meaningful as it would be if Microsoft didn't effectively own OpenAI.
  
  3 replies →

a_bonobo 3 years ago

My bet is that (previously discussed by others and here) that they have cascades/steps of models. There's probably a 'simple' model that looks at your query first, which detects whether your query could result in a problematic (racist, sexist etc.) GPT answer, returning some boiler-plate text instead of sending the query to GPT. That saves a lot of compute power and time. If I were them I'd focus more on those auxiliary models which hold the hands of the main-GPT model; there are probably more lower-hanging fruits there. This would also explain why they didn't announce GPT-4 details; my bet is that the model itself isn't very impressive, you're just getting the illusion that it got better by these additional 'simpler' models.

sockaddr 3 years ago
Now I can’t help but imagine the raw GPT-4 is just some huge raging asshole and it just has a bunch of “handlers”.
- ftxbro 3 years ago
  
  > the raw GPT-4 is just some huge raging asshole
  That's pretty much exactly how one of the OpenAI Red Teamers Nathan Labenz describes the raw GPT-4, starting around 45 minutes into the video:
  https://news.ycombinator.com/item?id=35377741
  
  4 replies →
- a_bonobo 3 years ago
  
  I have been writing prompts for a GPT-based document 'digester' for business-internal people who can't code but do have the right background knowledge. Every day I have to expand the prompt because I found a new spot where I have to hold the thing's hands so it does the right thing :)

bob1029 3 years ago

I feel like the GPT # has already suffered the same fate as nanometers in semiconductor manufacturing.

When manifest as ChatGPT, it is obvious that what presents as 1 magical solution is in fact an elaborate combination of varying degrees of innovation.

In my view, the reasoning for not releasing GPT4 information (hyperparameters, etc) had nothing to do with AI safety. It was a deliberate marketing decision to obscure how the sausage is actually made.

ftxbro 3 years ago
> In my view, the reasoning for not releasing GPT4 information (hyperparameters, etc) had nothing to do with AI safety. It was a deliberate marketing decision
In their technical report they give both reasons:
"Given both the competitive landscape and the safety implications of large-scale models like GPT-4, this report contains no further details about the architecture (including model size), hardware, training compute, dataset construction, training method, or similar."
- Dylan16807 3 years ago
  
  What a joke. Talking about the size has no safety implications.
  
  2 replies →

m3kw9 3 years ago

They have scaling issue even with 3 and much more with 4, they need time to squeeze more $$ out of these models. 5 will come when they sense competition, they will have all the data and training methods on a turn key to meet that

kingstoned 3 years ago

I just read: https://graymirror.substack.com/p/gpt-4-invalidates-the-turi.... It makes the point that LLMs are not AIs, so we will need a different approach if we want a true AGI, not just incrementaly improving on the current LLM approach.

nyrikki 3 years ago
There is a bit of a political history between the symbolists and connectivist that complicates that, basically the Symbolic camp is looking for universal quantifers while the connectivist were researching existential or statistical quantifers.
The connectivists left the 'AI' folks and established the ML field in the 90s.
Sometimes those political rifts arise in discussions about what is possible.
Thinking of ML under the PAC learning lens will show you why AGI isn't possible through just ML
But the Symbolists direction is also blocked by fundamental limits of math and CS with Gödel's work being one example.
LLMs are AI id your definition is closer to the general understanding of the word, but you have to agree on a definition to reach agreement between two parties.
The belief that AGI is close is speculative and there are many problems, some which are firmly thought to be unsolvable with current computers.
AGI is pseudo-science today without massive advances. But unfortunately as there isn't a consensus on what intelligence is those discussions are difficult also.
Overloaded terms make it very difficult to have discussions on what is possible.
- nyrikki 3 years ago
  
  Your links claim that:
  'GPT-4 is not “AI” because AI means “AGI,”'
  Is a more strict term that hasn't typically applied to AI as an example of my above claim.
  As we lack general definitions, it isn't invalid but no AI is thought to be possible with their claims.
  AI being computer systems that perform work that typically requires humans within a restricted domain is closer to what most researchers would use in my experience.
  
  1 reply →
- hgsgm 3 years ago
  
  > Thinking of ML under the PAC learning lens will show you why AGI isn't possible through just ML
  Why? PAC looks a lot like how humans think
  > But the Symbolists direction is also blocked by fundamental limits of math and CS with Gödel's work being one example.
  Why? Gödel's incompleteness appoes equally well to humans as machines. It's an extremely technical statement about self-reference within an axiom systems, pointing out that it's possible to construct paradoxical sentences. That has nothing to do with general theorem proving about the world.
  
  1 reply →
- com2kid 3 years ago
  
  Semantics are nice, but it doesn't matter what name you give to technology that shatters economies and transforms the nature of human creative endeavours.
  An AI's ability to contemplate life while sitting under a tree is secondary to the impact it has on society.
  
  3 replies →
- YeGoblynQueenne 3 years ago
  
  >> The connectivists left the 'AI' folks and established the ML field in the 90s.
  The way I know the story is that modern machine learning started as an effort to overcome the "knowledge acquisition bottleneck" in expert systems, in the '80s. The "knowledge acquisition bottleneck" was simply the fact that it is very difficult to encode the knowledge of experts in a set of production rules for an expert system's knowledge-base.
  So people started looking for ways to acquire knowledge automatically. Since the use case was to automatically create a rule-base for an expert system, the models they built were symbolic models, at least at first. For example, if you read the machine learning literature from that era (again, we're at the late '80s and early '90s) you'll find it dominated by the work of Ryszard Michalski [1], which was all entirely symbolic as far as I can tell. Staple representations used in machine learning models of the era included decision lists, and decision trees, and that's where decision tree learners, like ID4, C45, Random Forests, Gradient Boosted Trees, and so on, come; which btw are all symbolic models (they are and-or trees, propositional logic formulae).
  A standard textbook from that era of machine learning is Tom Mitchell's "Machine Learning" [2] where you can find entire chapters about rule learning, decision tree learning, and other symbolic machine learning subjects, as well as one on neural network learning.
  I don't think connectionists ever left, as you say, the "AI" folks. I don't know the history of connectionism as well as that of symbolic machine learning (which I've studied) but from what I understand, connectionist approaches found early application in the field of Pattern Recognition, where the subject of study was primarily machine vision.
  In any case, the idea that the connectionists and the symbolists are diametrically opposed camps within AI reserach is a bit of a myth. Many of the luminaries of AI would have found it odd, for example Claude Shannon [3] invented both logic gates and information theory, whereas the original artificial neuron, the Pitts and McCulloch neuron, was a propositional logic circuit that learned its own boolean function. And you wouldn't believe it but Jurgen Schmidhuber's doctoral thesis was a genetic algorithm implemented in ... Prolog [4].
  It seems that in recent years people have found it easier to argue that symbolic and connectionist approaches are antithetical and somehow inimical to each other, but I think that's more of an excuse to not have to learn at least a bit about both; which is hard work, no doubt.
  ______________
  [1] https://en.wikipedia.org/wiki/Ryszard_S._Michalski
  [2] It's available as a free download from Tom Mitchell's wesbite:
  http://www.cs.cmu.edu/afs/cs.cmu.edu/user/mitchell/ftp/mlboo...
  [3] Shannon was one of the organisers of the Dartmouth Convention where the term "Artificial Intelligence" was coined, alongside John McCarthy and Marvin Minsky.
  [4] https://people.idsia.ch/~juergen/genetic-programming-1987.ht...
EduardLev 3 years ago
One comment on the article from what I’ve read so far. The article states that GPT bombed an economics test, but after trying out the first two questions on the test, I think that the test itself is poorly constructed.
The second question in particular drops the potential implicit assumption that only 100 people stand in line each day.
I face this issue in my CS masters program constantly, and would probably have failed this test much the same as GPT did.
- chomp 3 years ago
  
  That substack article poorly understand's Turing's paper anyway. Cars aren't even mentioned. Chess is briefly mentioned at the end. I wouldn't base any opinions off of it.
  Turing's test was not "this computer fooled me over text, therefore it's an AI". It's a philosophical, "we want to consider a machine that thinks, well we can't really define what thinking is, so instead it's more important to observe if a machine is indistinguishable from a thinker." He then goes on to consider counterpoints to the question, "Can a machine think?" Which is funny because some of these counterpoints are similar to the ones in the author's article.
  Author offers no definition of "think" or "invent" or other words. It's paragraph after paragraph of claiming cognitive superiority. Turing's test isn't broken, it's just a baseline for discussion. And comparing it to SHA-1 is foolish. Author would have done better with a writeup of the Chinese room argument.
- sharemywin 3 years ago
  
  At what point did the human test maker fail or the AI?
- hgsgm 3 years ago
  
  The absurdity in all these debates is how quickly people move the goalposts around between "artificial human intelligence" and "artificial omniscience (Singularity)" when trying to downplay the potential of AI.
rldjbpin 3 years ago

deep learning, underpinning it all, is machine learning at the end of the day. the same way we're calling it "AI", this is another branding bs.
in before comments on auto gpt.
braindead_in 3 years ago
Wow, that blog led me down a rabbit hole. Wonder why Yarvin didn't comment on the societal and political impact of LLMs. Sam Altman semed to be supportive of Democratic Socialism and central planning on Lex Fridman's podcast.
- jasonhansel 3 years ago
  
  Let's just say he's extremely controversial: https://www.vox.com/platform/amp/policy-and-politics/2337379...
  
  1 reply →

labrador 3 years ago

The AI maximalists think we're on a exponential curve to the singularity and potential AI disaster or even eternal dominance by a AI dictator [Musk].

Realistically though, the road to AGI and beyond is like the expansion of the human race to The Moon, Mars and beyond, slow, laborious, capital and resource intensive with vast amounts of discoveries that still need to be made.

mrshadowgoose 3 years ago
Without having an understanding of the architecture required for general intelligence, it is impossible to make claims like this. Nobody has this understanding. Literally nobody.
The human brain uses on the order of 10 watts of power and there are almost 8 billion examples of this. So we have hard proof that from a thermodynamic perspective general intelligence is utterly and completely mundane.
We almost certainly already have the computational power required for AGI, but have no idea what a complete working architecture looks like. Figuring that out might take decades, or we might get there significantly quicker. The timespan is simply not knowable ahead of time.
I'm not concerned in the slightest about "the singularity" and non-aligned superintelligences. AGI in the hands of malicious human actors is already a nightmare scenario.
- labrador 3 years ago
  
  I found out today I don't exactly have Covid brain fog. Covid has triggered my bipolar disorder, so I have flu-like symptoms and hypomania, a combo I've never experienced before so I'm not used to it. It's a bit wild.
  https://www.google.com/search?q=bipolar+covid
  
  2 replies →
- jasfi 3 years ago
  
  Take a look at Auto-GPT, it doesn't seem like AGI is far off. I say the AGI in a weak form is already here, it just needs to strengthen.
  Tracking problematic actions back to the person that own the AGI will likely not be a difficult task. The owner of an AGI would be held responsible for its actions. The worry is that these actions would happen very quickly. This too can be managed by safety systems, although they may need to be developed more fully in the near future.
- pharmakom 3 years ago
  
  Human brains are not built from digital circuits - perhaps they have far more compute than we think.
nh23423fefe 3 years ago
You asserted an analogy but didn't actualize the connection between the two concepts.
- labrador 3 years ago
  
  Sorry I have Covid with brain fog right now so maybe you could help me out
  Edit: Off the top of my foggy head, LLMs as I understand them are text completion predictors based on statistical probabilities trained on vast amounts of examples of what humans have previously written, whose output is styled with neuro linguistic programming also based on vast numbers of styles of human writing. This is my causual amatuer understanding. There is no logical, reasoning programing such as the Lisp programmers attempted in the 1980's, but clearly the logical abilities of the current LLMs fall short and they are not AGI for that reason. So how do we add logic abilities to make LLMs AGI? Should we revisit the approaches of the Lisp machines of the 1980's? This requires much research and discovery. Then there's the question of just what is general intelligence. I've always thought that emotional intelligence played a huge role in high intelligence, a balance between logic and emotion or Wise Mind is wisdom. Obviously we won't be building emotions into silicon machines or will we? Is anyone proposing this? This could take hundreds of years to accomplish if it is even possible. We could simulate emotion but that's not the same, that's logic. Logical intelligence and emotional capability I think are a prerequisite for consciousness and spirituality. If the Universe is conscious and it arises in a focused manner in brains that are capable of it then how do we build a machine capable of having consciousness arise in it? That's all I'm saying.
  https://en.wikipedia.org/wiki/Dialectical_behavior_therapy

fzliu 3 years ago

OpenAI may not be training GPT-5, but Sam didn't say anything about GPT-4.5.

ilaksh 3 years ago
In fact Greg Brockman explicitly said they are considering changing the release schedule in a way that could be interpreted as opening the door for a different versioning scheme.
And actually there is no law or anything that says that any particular change or improvement to the model or even new training run that necessities them calling it version 5. It's not like there is a Version Release Police that evaluates all of the version numbers and puts people in jail if they don't adhere to some specific consistent scheme.
- swyx 3 years ago
  
  > In fact Greg Brockman explicitly said
  source?
  
  1 reply →
thenipper 3 years ago
Maybe they’ll pull an MS and go straight to GPT X
- smnrchrds 3 years ago
  
  The famous Microsoft numbering system! I think we should all skip GPT-Vista, but I can't wait for GPT-7.
- gpderetta 3 years ago
  
  As long as it is not GPT-ME
- charlieflowers 3 years ago
  
  GPT-360
  
  1 reply →

holografix 3 years ago

That’s almost as funny as Elon asking everyone to please slow down so X Corp can catch up.

Of *course* they’re still training chat-gpt5

cpeterso 3 years ago

Translation: training GPT-5 will cost time and money, so we’re going to cash in on the commercialization of GPT-4 now while it’s hot. A bird in hand is worth two in the bush.

HeartStrings 3 years ago

"Don’t worry, guys. It’s just GPT4.998, it’s not GPT5, it’s not dangerous."

beezlewax 3 years ago

Why not GPT-6 instead? There is far too much hype surrounding this.

thund 3 years ago

Right, “they” are leaving it to GPT4 to train 5. Smart move :-)

SanderNL 3 years ago

I’m waiting for GPT4’s image API. From what I understand it’s not just a “image2text” descriptor that then “reasons” on this description, right?

It’s just grokking an image directly. Were the pixels tokenized somehow? I’m very curious what that does to a model like this.

Can somebody that actually knows anything clue me in?

famouswaffles 3 years ago
It's possible to take a text only model and ground it with images. examples are
blip-2(https://github.com/salesforce/LAVIS/tree/main/projects/blip2)
fromage(https://github.com/kohjingyu/fromage)
prismer(https://github.com/NVlabs/prismer)
palm-e(https://ai.googleblog.com/2023/03/palm-e-embodied-multimodal...)
now assuming gpt-4 vision isn't just some variant of mm-react(ie what you're describing), that's what's happening here. https://github.com/microsoft/MM-REACT
images can be tokenized. so what happens usually is that extra parameters are added to a frozen model and those parameters are trained on an image embedding to text embedding task. the details vary of course but that's a fairly general overview of what happens.
the image to text task the models get trained to do has its issues. it's lossy and not very robust. gpt-4 on the other hand looked incredibly robust. they may not be doing that. idk
- SanderNL 3 years ago
  
  Very interesting, thanks.
  
  1 reply →
MacsHeadroom 3 years ago
GPT-4's architecture is a trade secret, but vision transformers tokenize patches of images. Something like 8x8 or 32x32 pixel patches, rather than individual pixels.
Multi-model text-image transformers add these tokens right beside the text tokens. So there is both transfer-learning and similarity graphed between text and image tokens. As far as the model knows they're all just tokens. It can't tell the difference between the two.
For the model, the tokens for the words blue/azure/teal and all the tokens for image patches with blue are just tokens with a lot of similarity. It doesn't know if the token its being fed is text, image, or even audio or other sensory data. All tokens are just a number with associated weights to a transformer, regardless of what they represent to us.
The GPT-4 vision API is actually in production and in at least two public products already. https://www.bemyeyes.com/ and https://www.microsoft.com/en-us/ai/seeing-ai
- SanderNL 3 years ago
  
  I’d be surprised if that doesn’t do something qualitatively with the model. Very cool, curious to see what’s possible. Thanks.

swalsh 3 years ago

I've had this thought that the next generation of AI isn't a long "training" period, but rather it probably makes sense to to train a barebones version, and to give it a "sleep cycle". During this time it could use the context (think of it as short term memory) and then fine tune the parent model with it turning the important stuff into long term "memories", with probably a pruning type mechanism for rarely used stuff to keep the important stuff a priority. Would turn AI into individuals with specialized knowledge, but maybe that's more useful even? Like I don't need an AI with an expertise in law, I just want to use it to automate this specialized business process I have which isn't easily automated.

jasfi 3 years ago

I hope they have a plan to release DALL-E 3 sometime this year, MidJourney seems to be pulling ahead.

jutrewag 3 years ago

Midjourney is already far ahead. I can’t get nearly the same quality with DALLE2
dougmwne 3 years ago

I think Dall-e 3 basically exists in Bing. It is significantly better than Dall-e 2 and is close but not quite at the quality of Midjourny v5. I just generated a series of about 40 portraits and even the hands are significantly closer.
Der_Einzige 3 years ago
Open source is still so far ahead of midjourney it's not even funny. Like, a racist RimWorld mod author (Automatic1111) built a UI for stable diffusion which unlocks far more capabilities out of it than midjourney will ever have.
- ftxbro 3 years ago
  
  I'm not sure if you are trolling or not, but if you aren't then you haven't seen Midjourney v5. But I wouldn't blame you because your information is only like one month out of date which is short in normal timespans but so long in AI timespans.
  
  4 replies →
Imnimo 3 years ago

I feel like Bing Image Creator is at least DALL-E 2.5, it feels like it has higher quality outputs for the same prompt. Could also just be some form of post-processing, though.

mikerg87 3 years ago

I suspect it may be problem of input parameter exhaustion. Do they have enough source material that is safe/vetted for the next jump in training material. I can imagine that model poisoning is a real thing now...

jacooper 3 years ago

"Some time" tomorrow? Next week? Next month? This doesn't mean anything, its like saying "we don't have any plans to change anything" when a company acquires another. Its all BS

oars 3 years ago

Diminishing returns are currently seen in "transformer scaling laws."

"OpenAI’s CEO says the age of giant AI models is already over": https://news.ycombinator.com/item?id=35603756 (shared 3 dags after this post)

Let's see what the future holds.

d--b 3 years ago

To me, GPT-4 is like a superhuman “system 1” thought model (as defined per Daniel Kahneman).

So maybe they’re working on a “system 2” now, which is perhaps more related to what deepmind is doing?

SergeAx 3 years ago

With all the ChatGPT boom and startups built on top o their API hassle, OpenAI receives troves of data to train and fine tune their models on. GPT 3 was trained on only 17 gigabytes of data, and GPT 4 is not far away with 45 gigabytes. On the other side, Alpaca or Vicuna was fine tuned from LLaMA using only megabytes, if not hundreds of kilobytes of training data. I believe it is much more feasible path to significantly improve current generation LLMs.

atleastoptimal 3 years ago

There's got to be predictable ways of improving LLMs besides training data scale and parameter count. Arent LLMs robust enough to learn on their own via interacting with the world? Like put them in a turn based simulated environment.

I wonder if there's an assumption for how big an LLM should be before it could even conceivably be an LLM. Is there a minimum size necessary before that capability is plausible?

MallocVoidstar 3 years ago

>Arent LLMs robust enough to learn on their own via interacting with the world?
As far as I know current LLMs are entirely static once trained, they don't learn at all in runtime.
mise_en_place 3 years ago

Without RLHF, even a high parameter model performs very poorly. LLama-65B often hallucinated when I gave it the most basic of prompts.

mark_l_watson 3 years ago

It is OK to slow down development, take some more profit, maybe keep doing the human in the loop RL refinement, etc.

From an engineering standpoint, even the less powerful GPT-3.5turbo model handles NLP tasks, really nice tools like LangChain and LlamaIndex that I covered in my last book make it easy to use your own data sources.

I think the possibilities of using what we currently have in useful projects are vast.

rkachowski 3 years ago

is there some kind of convention that defines what specifically constitutes GPT-n or does this just mean "we're not working on the successor to GPT-4 yet"?

ilaksh 3 years ago

There may be conventions but in no way can anyone force them to follow them. It's just a name for a release. They absolutely are working on the successor models and have stated they plan to release a model by June. Whether they are working on a new architecture or training running, they certainly have experiments, but who knows how serious they are.
Regardless they can and will call future models anything they want. They could easily just decide that the minor improvements that come out in a few months are called GPT-4.2 and the major new training run is called GPT-4.5 instead of GPT-5.
dougmwne 3 years ago

No, it is just an arbitrary version number for this series of models from OpenAI. They will flip to 5 when they make an architecture change that will force them to begin training from scratch. Until then they will continue to produce more refined versions of 4, potentially more general training or fine-tuned task-oriented training.
PeterisP 3 years ago

The way it currently works, there is a quite clear boundary, as all the smaller iterations are based on something of a fixed size that was expensively pretrained, and then have either finetuned weights or some extra layers on top, but the core model structure and size can't be changed without starting from scratch.
So if some particular GPT-4 improved successor is based on the GPT-4 core transformer size and pretrained parameters then we'd call it GPT-4.x, but if some other GPT-4 successor is a larger core model (which inevitably also means it's re-trained from scratch) then we'd call it GPT-5, no matter if its observable performance is better or worse or comparable to the tweaked GPT-4.x options.

booleandilemma 3 years ago

Is it known how much of an improvement GPT-5 would bring? Can we predict how much better it would be compared to GPT-4?

dougmwne 3 years ago

Based on published research from Google and Meta it is somewhat known how much more capability is possible with the current approach, but it would require an extreme increase in compute and training set to achieve it. There are diminishing returns, but the returns appear to continue for a good while, even without any new model architecture discoveries. Right now the expense will likely mean that progress will be limited to the pace of Moore’s law.
In terms of what this improvement would actually look like in terms of real world, emergent capabilities, no one knows.
soulofmischief 3 years ago

No, but llama's training architecture was designed around studying the curve of output quality vs training data size so we do have companies looking into this.

CraftingLinks 3 years ago

It's the sensible economical thing to do when you basically have monopoly position.

stan_kirdey 3 years ago

Maybe it is just semantics and they do train GPT-5.1? Or UniversalPT-1 that is multi-modal?

insane_dreamer 3 years ago

Whew, looks like we've pushed back the birth of Skynet by at least 6 months!

codezero 3 years ago

They don’t need to do GPT-5 yet because GPT-4 is monetizable.

ChatGTP 3 years ago

It’s funny how we’re all watching the GPT releases and improvements hoping the next one doesn’t trash our jobs any further than the last ha

“Will I still be able to feed my family, Sam? Ilya?”

The old capitalist treadmill is now on full speed and we’re all just trying to keep up.

Where it goes…nobody knows. What an entertaining drama.

yrgulation 3 years ago

[dead]

ftxbro 3 years ago

ooh someone rewrote the title i put, thanks bro

Laaas 3 years ago

This seems like an extremely dangerous thing for Altman to do.

What about Roko's basilisk?

DesiLurker 3 years ago

Roko's basilisk is just Pascal's wager for the geeks.
PeterisP 3 years ago

Just as in Pascal's wager, the conclusion relies on the unwarranted assumption which privileges a particular outcome over its exact opposite - e.g. a diety with exactly inverted criteria for heaven and hell, punishing those who believe in Christian God, and "Roko's antibasilisk", which spares those people who'd get punished by Roko's basilisk and punishes everyone else.
turmeric_root 3 years ago
Hell, what about Skynet?
- wayeq 3 years ago
  
  and what about my shredder?
  
  1 reply →

mabbo 3 years ago

They need a few algorithmic improvements first, imho. GPT4 is noticeably slower than GPT3.5 and apparently costs a lot more to use, implying some serious compute costs.

They could train it with more data in the hopes of getting another big leap there, but what data is left? They've fed it everything it seems.

So what's left is getting the runtime reduced in terms of the model size. Hire some brilliant minds to turn an N-squared into an N-log-N (or something to that effect).

Maybe GPT4 has some ideas.

rvz 3 years ago

He has just admitted that O̶p̶e̶n̶AI.com has partially trained GPT-5 and is already planning to test the 'so-called' useless guardrails around it.

There is no 'revolution' around this. Just 'evolution' with more data and more excessive waste of compute to create another so-called AI black-box sophist with Sam Altman selling both the poison (GPT-4) and the antidote (Worldcoin).

At some point, with their tremendous lock-in strategy, O̶p̶e̶n̶AI.com and Microsoft will eventually use the lock-in to upsell and compete against their partners.

JohnFen 3 years ago
> Sam Altman selling both the poison (GPT-4) and the antidote (Worldcoin).
I actually find it pretty amazing that more people aren't given pause by Sam Altman's involvement here. After the WorldCoin stuff, I'd think that he'd be viewed with a much more skeptical eye in terms of his ethics.
- drusepth 3 years ago
  
  The late-night March 31 release of Worldcoin had the (unintended?) side-effect of making me think "a token to prove my personhood" was an April Fools Joke when I saw it the next morning and never thought about it again.
  
  1 reply →
yeck 3 years ago

> has partially trained GPT-5
From my understanding they are training their new GPT models off of a checkpoint from the previous generation, so they technically have partially trained multiple future models in their GPT lineage.
elcomet 3 years ago
Could you detail how Worldcoin is an antidote to GPT-4 ?
- turmeric_root 3 years ago
  
  Worldcoin price skyrockets => cryptocurrency speculation returns => GPU shortage prevents further training of LLMs
  
  3 replies →
sharemywin 3 years ago

Microsoft would never do that...