Left unsaid in this piece is that OpenAI likely would have to increase parameters and compute by an order of magnitude (~10x) to train a new model that offers noticeable improvements over GPT-4, due to the diminishing returns seen in "transformer scaling laws."
Also, it's possible that OpenAI is still training GPT-4, perhaps with additional modalities, and will make future snapshots available as public releases.
Also, who says that the "transformer scaling laws" are the ultimate arbiter of LLM scaling? They overturned previous scaling laws and other scaling laws might overturn them. Furthermore, it's even possible that the transformer model won't even be used in later models. I remember Ilya making the point that just because the transformer model was the first one that looks like it can scale intelligence just by lighting up billions of dollars of GPUs, it doesn't mean it's the last one. Maybe it will even be like, the vacuum tube of AI models, and other ones are being made in secret. A hacker news rumor was that they are paying $5M-$20M per year to the top neural net experts probably to make some exotic architectures to surpass transformer.
> A hacker news rumor was that they are paying $5M-$20M per year to the top neural net experts probably to make some exotic architectures to surpass transformer
This reminds me a TV interview of the author Patrick Modiano, just after he won the literature Nobel price. The presenter asked him if the money would help. The author answered essentially that the next time he would be in front of a white page, the money surely wouldn't help.
In the case of surpassing transformers, money could help to give access to more compute power. It could also help to prevent the research from being public.
Actually what he has said is that the biggest performance gains were from the human feedback reinforcement learning.
There are also all of the quantization and other tricks out there.
Also they have demonstrated that the model already understands images but just haven't completed the API for this.
So they use quantization to increase the speed by a factor of 3 while slightly increasing the parameter count. Maybe find a way to make the network more sparse and efficient so in the end with the quantization the model actually uses significantly less memory. and continue with the RHLF focusing on even more difficult tasks and those that incorporate visual data.
Then instead of calling it GPT-5 they just call it GPT-4.5. Twice as fast as GPT-4, IQ goes from 130 to 155. And the API now allows images to be passed in and analyzed.
There is an API for multimodal computer vision and visual reasoning/VQA, and it's available, just not for normies. It's exclusively for their test group and then the Be My Eyes project at https://www.bemyeyes.com/.
I bet they’re not saying how big of a model GPT-4 is because it’s actually much smaller we would expect.
ChatGPT is IMO a heavily fine-tuned Curie sized model (same price via API + less cognitive capacity than even text davinci-003) so it would make sense that a heavily fine-tuned Davinci sized model would yield similar results to GPT-4.
I wouldn't bet on their pricing being indicative of their costs. If MSFT wants the ChatGPT-API to be a success and is willing to subsidize it, that's just how it is.
I wonder why it's slower at inference time then (for members using their web UI), or rather, if it's similar in size to gpt3, how gpt3 is optimized in a way that gpt4 isn't or can't be?
I'd expect that by now we would enjoy similar speeds but this hasn't yet happened.
We are also starting to run out of high quality corpus to train on at such model scales. While Video offers another large set of data, we'll have to look at further RL approaches in the next few years to continue scaling datasets.
I often see mistakes when chatgot is faced with more spatial reasoning, and I wonder if changes as simple as deep convolutional subnetworks in intermediaries layers would help the language model fit better in these situations. In short, I’m excited to see where things go, and can definitely see room for great improvement through improvements to the architecture!
How noticeable changes will be have little connection with loss reduction during training. Holding very complex thought processes may actually not diminish the loss function all that much. But they are very noticeable when we are interacting with these systems.
> Also, it's possible that OpenAI is still training GPT-4, perhaps with additional modalities, and will make future snapshots available as public releases.
Read OpenAI API docs on GPT model versions carefully, and look at them again from time to time.
I would suspect they probably conditioning data for gpt 5. Im guessing ‘training’ presupposes they have the training data primed & getting data into shape seems to be one of main cruxes
It could be that they are not training GPT-5 for a simple reason: Microsoft ran out of GPU compute [1] and they focus on meeting inference demand for now.
Also, the GPT-4 message cap at chat.openai.com was shown as something along the lines of "we expect lower caps next week", then changed to "expect lower caps as we adjust for demand" to "GPT-4 currently has a cap of …". This sounds to me like they changed from having lots of compute to being limited by it. Also note how everything at OpenAI is now behind a sign up and their marketing has slowed down. Similarly, Midjourney has stopped offering their free plan due to lack of compute.
Seems like we didn’t need a 6 months pause letter. Hardware constraints limit the progress for now.
That or they're working on something like a 10-30B input model, dubbed GPT-NextGen, that essentially has the same results as gpt4, but with a lot more performance gains, and speed, and improvements. GPT-5 will suck, if it's a similar ratio slower to gpt-4, than gpt-4 is to gpt-3.5.
So, I think there's a lot of improvements where maybe gpt-4, is as far you go in terms of inputting data, and maybe better use cases are more customization of data trained on, or finding ways of going smaller, or even some model that just trains itself on the data requirements, similar to how we jump on google when we're stuck, it'd do the same and build up its knowledge that way.
I also think we need improvements in vector stores that maybe add weights to "memories" based on time/frequency/recency/popularity.
That sounds like having a mixture of experts model (at high scale popularly developed by Google): train multiple specialised models (say embedders from text to a representation) that could be fed into a single model at the end. Each expert would be an adapter of sorts, activating depending on the type of input
> the GPT-4 message cap at chat.openai.com was shown as something along the lines of "we expect lower caps next week"
At the time I noticed that the wording they gave technically implied they expected the cap to get more limiting and then that's exactly what happened, and I haven't been able to work out if that is indeed what was the intended message or not.
(Why) Is that technically correct? I'm really curious since I too thought they meant that capping effect would increase (fewer messages allowed), and not decrease (more messages allowed), as was my intuitive understanding.
I asked it to help me code something. Then it stopped midway through, so I asked it to continue from the last line.
…It started from the beginning.
Now at the same point, I asked it not to stop. To keep going.
It started again from the beginning.
It went like this for about another 10 or so prompts. Hell, I even asked it to help me write a better prompt to ask it to continue from the line it cut off and I then used that. It didn’t work at all.
Then I ran out of prompts.
Three hours later, it did the same crap to me and I lost around 14 prompts to it being ‘stuck’ in an eternal loop.
Basically, OpenAI are sneaky devils. ‘Stuck’ my ass - that was intentional to free up resources.
or maybe you need to stop thinking everything is a conspiracy and realize bugs happen
I've been using GPT everyday for the last 3 years, it never happened to me
Oh. And also they are probably making ChatGPT Plugins ready for public release. Maybe the competition can catch up on the language model, but they will not likely catch up soon to the best language model with the most plugin integrations.
At this point, I wouldn't give much credibility to anything OpenAI claims about their research plans.
The game theory behind AGI research is identical to that of nuclear weapons development. There exists a development gap (the size of which is unknowable ahead of time) where an actor that achieves AGI first, and plays their cards right, can permanently suppress all other AGI research.
Even if one's intentions are completely good, failure to be first could result in never being able to reach the finish line. It's absolutely in OpenAI's interest to conceal critical information, and mislead competing actors into thinking they don't have to move as quickly as they can.
>>>The game theory behind AGI research is identical to that of nuclear weapons development...
Nuclear powers have not been able to reliably suppress others from creating nuclear weapons. Why would we think the first AGI will suppress all others perfectly?
The first nuclear power (the United States) chose to not. Had they decided to be completely evil, they certainly could have used the threat of nuclear annihilation (and the act of it for non-compliers) to achieve that goal.
When I see comments like this, I wonder about the personal morality of the poster and how they arrived at their worldview. It may be hard to beleive, but there are some advantages to truthfulness in this world.
Why would that be the case? If anything you would expect the first iteration of AGI either kept completely secret or end up leaked indirectly or directly negating any benefits. Also AGI without weapons is not a military threat.
Perhaps it's time to call this synthetic intelligence instead of AI which has an implicit understanding of an alternative method to construct a human like AI.
What is clear is that on this earth itself we have cetacean, corvid, cephalopod intelligence which is wired very differently. Perhaps we need to respect the diversity of intelligences that exist and study this growth in LLM and adjoint areas as just synthetic intelligence.
Rebranding maybe could help drive a level of objectivity this conversation on ethics etc that seems to be missing
Actually I agree with them a new name would be helpful. I would propose inorganic intelligence to try to pick a term with less value judgments.
AI is really an overloaded term that includes 70 years of snake oil, Skynet, the Singularity and killer robots. I think we need a new name to start fresh.
And personally, I think we are extremely biased by our sci-fi to think of this tech as malevolent. As far as we can see, it can only know what we teach it since it relies on all of our perceptions to learn. LLMs seem both extremely promising as a useful tool and very pliant to the operator’s wishes. I’m way beyond “this is a fancy next word predictor” as I think it’s emergent behavior has many of the hallmarks of reasoning and novel inference, but at best I think it is only part of a mind and an unconscious one at that.
It could be useful for a similar reason as the euphemism treadmill. We could leave behind all of the misguided assumptions about AI with the old 'artificial intelligence' nomenclature and move forward with 'synthetic intelligence' which has our new understanding of what systems like GPT-4 can do.
I think Artificial Intelligence has taken on the meaning that the intelligence is real but just that it's coming from machines. Synthetic intelligence (at least to me) sounds more like we're acknowledging that the machines aren't really intelligent and just simulating intelligence.
I had a chat with GPT about this and it came up with the term 'data grounded cognition' to describe an 'intelligence' that is derived purely from (and expressed through) statistical patterns in data.
I quite like the term, and it seems quite unique (perhaps cribbing from 'grounded cognition' though that's an entirely different idea AFAIK)
"Cognition" means understanding and knowing. As problematic as "intelligence" is when describing these systems, I think "cognition" is even worse. "Intelligence" is vague and "cognition" is specific, but "cognition" is also incorrect.
AI has always meant so many things to so many different audiences. I think attempting to argue that X is AI but Y isn't is generally going to be a subjective endeavour of pedantry.
that is assuming. Why don't we simply refer to our own intelligence as 'human intelligence' instead. We don't really know what intelligence is. So adding modifier in front of it will just lead to more confusion. AI helps us understand what intelligence actually is, to learn more of it's very essence. It's not that we already know what it is.
It isn't surprising to me that the world's leading AI company is signalling it's okay with slowing down all large scale LLM training that would allow other companies to be competitive. This is familiar territory for Microsoft (edit: guess I'm wrong, they don't get the 49% stock till later).
Why do people conflate OpenAI with Microsoft? Microsoft has an investment in OpenAI and provides infrastructure for them, but they are separate organizations.
Some of these replies are quibbling about percents of investment, but the elephant in the room is that the government and military and intelligence agencies have almost surely become involved by this point, and they must be providing some amounts of dark investment somehow at minimum. At maximum it's a new Manhattan-scale project.
You can go down the rabbit hole if you want, but if you want only the most superficial glimpse of it then consider that OpenAI board member Will Hurd was a CIA undercover agent and also a representative in the House Permanent Select Committee on Intelligence and also he is a trustee of In-Q-Tel which is the private investment arm of the CIA.
They do own 49% [1]. So, sure they are separate organizations. But, when someone owns 49% of your house they have some sway in the decision making that happens. When you look at this from a integrations standpoint, where MS is going to have this baked into all their products, you can expand this logic way more. They are for sure influencing roadmap in areas they are interested in.
Look at it this way, if I repeatedly deposit $9999 into my bank account to avoid regulatory oversight for depositing $10000 then I'm still breaking the law by trying to avoid the regulatory trigger. This is called "structuring" and it is a criminal act.
But if I do this in a stock context and buy 49% control of multiple companies over and over, with all the same obviousness of my intentional avoidance of the regulatory trigger, it's considered a smart move and pretty much the status quo.
Yes, the practice of law says Microsoft does not own openai. But it's also obvious what's going on when companies do this.
Microsoft is the majority shareholder. That they're legally distinct organizations isn't as meaningful as it would be if Microsoft didn't effectively own OpenAI.
My bet is that (previously discussed by others and here) that they have cascades/steps of models. There's probably a 'simple' model that looks at your query first, which detects whether your query could result in a problematic (racist, sexist etc.) GPT answer, returning some boiler-plate text instead of sending the query to GPT. That saves a lot of compute power and time. If I were them I'd focus more on those auxiliary models which hold the hands of the main-GPT model; there are probably more lower-hanging fruits there. This would also explain why they didn't announce GPT-4 details; my bet is that the model itself isn't very impressive, you're just getting the illusion that it got better by these additional 'simpler' models.
I have been writing prompts for a GPT-based document 'digester' for business-internal people who can't code but do have the right background knowledge. Every day I have to expand the prompt because I found a new spot where I have to hold the thing's hands so it does the right thing :)
I feel like the GPT # has already suffered the same fate as nanometers in semiconductor manufacturing.
When manifest as ChatGPT, it is obvious that what presents as 1 magical solution is in fact an elaborate combination of varying degrees of innovation.
In my view, the reasoning for not releasing GPT4 information (hyperparameters, etc) had nothing to do with AI safety. It was a deliberate marketing decision to obscure how the sausage is actually made.
> In my view, the reasoning for not releasing GPT4 information (hyperparameters, etc) had nothing to do with AI safety. It was a deliberate marketing decision
In their technical report they give both reasons:
"Given both the competitive landscape and the safety implications of large-scale models like GPT-4, this report contains no further details about the architecture (including model size), hardware, training compute, dataset construction, training method, or similar."
They have scaling issue even with 3 and much more with 4, they need time to squeeze more $$ out of these models. 5 will come when they sense competition, they will have all the data and training methods on a turn key to meet that
There is a bit of a political history between the symbolists and connectivist that complicates that, basically the Symbolic camp is looking for universal quantifers while the connectivist were researching existential or statistical quantifers.
The connectivists left the 'AI' folks and established the ML field in the 90s.
Sometimes those political rifts arise in discussions about what is possible.
Thinking of ML under the PAC learning lens will show you why AGI isn't possible through just ML
But the Symbolists direction is also blocked by fundamental limits of math and CS with Gödel's work being one example.
LLMs are AI id your definition is closer to the general understanding of the word, but you have to agree on a definition to reach agreement between two parties.
The belief that AGI is close is speculative and there are many problems, some which are firmly thought to be unsolvable with current computers.
AGI is pseudo-science today without massive advances. But unfortunately as there isn't a consensus on what intelligence is those discussions are difficult also.
Overloaded terms make it very difficult to have discussions on what is possible.
Is a more strict term that hasn't typically applied to AI as an example of my above claim.
As we lack general definitions, it isn't invalid but no AI is thought to be possible with their claims.
AI being computer systems that perform work that typically requires humans within a restricted domain is closer to what most researchers would use in my experience.
> Thinking of ML under the PAC learning lens will show you why AGI isn't possible through just ML
Why? PAC looks a lot like how humans think
> But the Symbolists direction is also blocked by fundamental limits of math and CS with Gödel's work being one example.
Why? Gödel's incompleteness appoes equally well to humans as machines. It's an extremely technical statement about self-reference within an axiom systems, pointing out that it's possible to construct paradoxical sentences. That has nothing to do with general theorem proving about the world.
Semantics are nice, but it doesn't matter what name you give to technology that shatters economies and transforms the nature of human creative endeavours.
An AI's ability to contemplate life while sitting under a tree is secondary to the impact it has on society.
>> The connectivists left the 'AI' folks and established the ML field in the 90s.
The way I know the story is that modern machine learning started as an effort to overcome the "knowledge acquisition bottleneck" in expert systems, in the '80s. The "knowledge acquisition bottleneck" was simply the fact that it is very difficult to encode the knowledge of experts in a set of production rules for an expert system's knowledge-base.
So people started looking for ways to acquire knowledge automatically. Since the use case was to automatically create a rule-base for an expert system, the models they built were symbolic models, at least at first. For example, if you read the machine learning literature from that era (again, we're at the late '80s and early '90s) you'll find it dominated by the work of Ryszard Michalski [1], which was all entirely symbolic as far as I can tell. Staple representations used in machine learning models of the era included decision lists, and decision trees, and that's where decision tree learners, like ID4, C45, Random Forests, Gradient Boosted Trees, and so on, come; which btw are all symbolic models (they are and-or trees, propositional logic formulae).
A standard textbook from that era of machine learning is Tom Mitchell's "Machine Learning" [2] where you can find entire chapters about rule learning, decision tree learning, and other symbolic machine learning subjects, as well as one on neural network learning.
I don't think connectionists ever left, as you say, the "AI" folks. I don't know the history of connectionism as well as that of symbolic machine learning (which I've studied) but from what I understand, connectionist approaches found early application in the field of Pattern Recognition, where the subject of study was primarily machine vision.
In any case, the idea that the connectionists and the symbolists are diametrically opposed camps within AI reserach is a bit of a myth. Many of the luminaries of AI would have found it odd, for example Claude Shannon [3] invented both logic gates and information theory, whereas the original artificial neuron, the Pitts and McCulloch neuron, was a propositional logic circuit that learned its own boolean function. And you wouldn't believe it but Jurgen Schmidhuber's doctoral thesis was a genetic algorithm implemented in ... Prolog [4].
It seems that in recent years people have found it easier to argue that symbolic and connectionist approaches are antithetical and somehow inimical to each other, but I think that's more of an excuse to not have to learn at least a bit about both; which is hard work, no doubt.
[3] Shannon was one of the organisers of the Dartmouth Convention where the term "Artificial Intelligence" was coined, alongside John McCarthy and Marvin Minsky.
One comment on the article from what I’ve read so far. The article states that GPT bombed an economics test, but after trying out the first two questions on the test, I think that the test itself is poorly constructed.
The second question in particular drops the potential implicit assumption that only 100 people stand in line each day.
I face this issue in my CS masters program constantly, and would probably have failed this test much the same as GPT did.
That substack article poorly understand's Turing's paper anyway. Cars aren't even mentioned. Chess is briefly mentioned at the end. I wouldn't base any opinions off of it.
Turing's test was not "this computer fooled me over text, therefore it's an AI". It's a philosophical, "we want to consider a machine that thinks, well we can't really define what thinking is, so instead it's more important to observe if a machine is indistinguishable from a thinker." He then goes on to consider counterpoints to the question, "Can a machine think?" Which is funny because some of these counterpoints are similar to the ones in the author's article.
Author offers no definition of "think" or "invent" or other words. It's paragraph after paragraph of claiming cognitive superiority. Turing's test isn't broken, it's just a baseline for discussion. And comparing it to SHA-1 is foolish. Author would have done better with a writeup of the Chinese room argument.
The absurdity in all these debates is how quickly people move the goalposts around between "artificial human intelligence" and "artificial omniscience (Singularity)" when trying to downplay the potential of AI.
Wow, that blog led me down a rabbit hole. Wonder why Yarvin didn't comment on the societal and political impact of LLMs. Sam Altman semed to be supportive of Democratic Socialism and central planning on Lex Fridman's podcast.
The AI maximalists think we're on a exponential curve to the singularity and potential AI disaster or even eternal dominance by a AI dictator [Musk].
Realistically though, the road to AGI and beyond is like the expansion of the human race to The Moon, Mars and beyond, slow, laborious, capital and resource intensive with vast amounts of discoveries that still need to be made.
Without having an understanding of the architecture required for general intelligence, it is impossible to make claims like this. Nobody has this understanding. Literally nobody.
The human brain uses on the order of 10 watts of power and there are almost 8 billion examples of this. So we have hard proof that from a thermodynamic perspective general intelligence is utterly and completely mundane.
We almost certainly already have the computational power required for AGI, but have no idea what a complete working architecture looks like. Figuring that out might take decades, or we might get there significantly quicker. The timespan is simply not knowable ahead of time.
I'm not concerned in the slightest about "the singularity" and non-aligned superintelligences. AGI in the hands of malicious human actors is already a nightmare scenario.
I found out today I don't exactly have Covid brain fog. Covid has triggered my bipolar disorder, so I have flu-like symptoms and hypomania, a combo I've never experienced before so I'm not used to it. It's a bit wild.
Take a look at Auto-GPT, it doesn't seem like AGI is far off. I say the AGI in a weak form is already here, it just needs to strengthen.
Tracking problematic actions back to the person that own the AGI will likely not be a difficult task. The owner of an AGI would be held responsible for its actions. The worry is that these actions would happen very quickly. This too can be managed by safety systems, although they may need to be developed more fully in the near future.
Sorry I have Covid with brain fog right now so maybe you could help me out
Edit: Off the top of my foggy head, LLMs as I understand them are text completion predictors based on statistical probabilities trained on vast amounts of examples of what humans have previously written, whose output is styled with neuro linguistic programming also based on vast numbers of styles of human writing. This is my causual amatuer understanding. There is no logical, reasoning programing such as the Lisp programmers attempted in the 1980's, but clearly the logical abilities of the current LLMs fall short and they are not AGI for that reason. So how do we add logic abilities to make LLMs AGI? Should we revisit the approaches of the Lisp machines of the 1980's? This requires much research and discovery. Then there's the question of just what is general intelligence. I've always thought that emotional intelligence played a huge role in high intelligence, a balance between logic and emotion or Wise Mind is wisdom. Obviously we won't be building emotions into silicon machines or will we? Is anyone proposing this? This could take hundreds of years to accomplish if it is even possible. We could simulate emotion but that's not the same, that's logic. Logical intelligence and emotional capability I think are a prerequisite for consciousness and spirituality. If the Universe is conscious and it arises in a focused manner in brains that are capable of it then how do we build a machine capable of having consciousness arise in it? That's all I'm saying.
In fact Greg Brockman explicitly said they are considering changing the release schedule in a way that could be interpreted as opening the door for a different versioning scheme.
And actually there is no law or anything that says that any particular change or improvement to the model or even new training run that necessities them calling it version 5. It's not like there is a Version Release Police that evaluates all of the version numbers and puts people in jail if they don't adhere to some specific consistent scheme.
Translation: training GPT-5 will cost time and money, so we’re going to cash in on the commercialization of GPT-4 now while it’s hot. A bird in hand is worth two in the bush.
now assuming gpt-4 vision isn't just some variant of mm-react(ie what you're describing), that's what's happening here. https://github.com/microsoft/MM-REACT
images can be tokenized. so what happens usually is that extra parameters are added to a frozen model and those parameters are trained on an image embedding to text embedding task. the details vary of course but that's a fairly general overview of what happens.
the image to text task the models get trained to do has its issues. it's lossy and not very robust. gpt-4 on the other hand looked incredibly robust. they may not be doing that. idk
GPT-4's architecture is a trade secret, but vision transformers tokenize patches of images. Something like 8x8 or 32x32 pixel patches, rather than individual pixels.
Multi-model text-image transformers add these tokens right beside the text tokens. So there is both transfer-learning and similarity graphed between text and image tokens. As far as the model knows they're all just tokens. It can't tell the difference between the two.
For the model, the tokens for the words blue/azure/teal and all the tokens for image patches with blue are just tokens with a lot of similarity. It doesn't know if the token its being fed is text, image, or even audio or other sensory data. All tokens are just a number with associated weights to a transformer, regardless of what they represent to us.
I've had this thought that the next generation of AI isn't a long "training" period, but rather it probably makes sense to to train a barebones version, and to give it a "sleep cycle". During this time it could use the context (think of it as short term memory) and then fine tune the parent model with it turning the important stuff into long term "memories", with probably a pruning type mechanism for rarely used stuff to keep the important stuff a priority. Would turn AI into individuals with specialized knowledge, but maybe that's more useful even? Like I don't need an AI with an expertise in law, I just want to use it to automate this specialized business process I have which isn't easily automated.
I think Dall-e 3 basically exists in Bing. It is significantly better than Dall-e 2 and is close but not quite at the quality of Midjourny v5. I just generated a series of about 40 portraits and even the hands are significantly closer.
Open source is still so far ahead of midjourney it's not even funny. Like, a racist RimWorld mod author (Automatic1111) built a UI for stable diffusion which unlocks far more capabilities out of it than midjourney will ever have.
I'm not sure if you are trolling or not, but if you aren't then you haven't seen Midjourney v5. But I wouldn't blame you because your information is only like one month out of date which is short in normal timespans but so long in AI timespans.
I feel like Bing Image Creator is at least DALL-E 2.5, it feels like it has higher quality outputs for the same prompt. Could also just be some form of post-processing, though.
I suspect it may be problem of input parameter exhaustion. Do they have enough source material that is safe/vetted for the next jump in training material. I can imagine that model poisoning is a real thing now...
"Some time" tomorrow? Next week? Next month?
This doesn't mean anything, its like saying "we don't have any plans to change anything" when a company acquires another.
Its all BS
With all the ChatGPT boom and startups built on top o their API hassle, OpenAI receives troves of data to train and fine tune their models on. GPT 3 was trained on only 17 gigabytes of data, and GPT 4 is not far away with 45 gigabytes. On the other side, Alpaca or Vicuna was fine tuned from LLaMA using only megabytes, if not hundreds of kilobytes of training data. I believe it is much more feasible path to significantly improve current generation LLMs.
There's got to be predictable ways of improving LLMs besides training data scale and parameter count. Arent LLMs robust enough to learn on their own via interacting with the world? Like put them in a turn based simulated environment.
I wonder if there's an assumption for how big an LLM should be before it could even conceivably be an LLM. Is there a minimum size necessary before that capability is plausible?
It is OK to slow down development, take some more profit, maybe keep doing the human in the loop RL refinement, etc.
From an engineering standpoint, even the less powerful GPT-3.5turbo model handles NLP tasks, really nice tools like LangChain and LlamaIndex that I covered in my last book make it easy to use your own data sources.
I think the possibilities of using what we currently have in useful projects are vast.
is there some kind of convention that defines what specifically constitutes GPT-n or does this just mean "we're not working on the successor to GPT-4 yet"?
There may be conventions but in no way can anyone force them to follow them. It's just a name for a release. They absolutely are working on the successor models and have stated they plan to release a model by June. Whether they are working on a new architecture or training running, they certainly have experiments, but who knows how serious they are.
Regardless they can and will call future models anything they want. They could easily just decide that the minor improvements that come out in a few months are called GPT-4.2 and the major new training run is called GPT-4.5 instead of GPT-5.
No, it is just an arbitrary version number for this series of models from OpenAI. They will flip to 5 when they make an architecture change that will force them to begin training from scratch. Until then they will continue to produce more refined versions of 4, potentially more general training or fine-tuned task-oriented training.
The way it currently works, there is a quite clear boundary, as all the smaller iterations are based on something of a fixed size that was expensively pretrained, and then have either finetuned weights or some extra layers on top, but the core model structure and size can't be changed without starting from scratch.
So if some particular GPT-4 improved successor is based on the GPT-4 core transformer size and pretrained parameters then we'd call it GPT-4.x, but if some other GPT-4 successor is a larger core model (which inevitably also means it's re-trained from scratch) then we'd call it GPT-5, no matter if its observable performance is better or worse or comparable to the tweaked GPT-4.x options.
Based on published research from Google and Meta it is somewhat known how much more capability is possible with the current approach, but it would require an extreme increase in compute and training set to achieve it. There are diminishing returns, but the returns appear to continue for a good while, even without any new model architecture discoveries. Right now the expense will likely mean that progress will be limited to the pace of Moore’s law.
In terms of what this improvement would actually look like in terms of real world, emergent capabilities, no one knows.
No, but llama's training architecture was designed around studying the curve of output quality vs training data size so we do have companies looking into this.
Just as in Pascal's wager, the conclusion relies on the unwarranted assumption which privileges a particular outcome over its exact opposite - e.g. a diety with exactly inverted criteria for heaven and hell, punishing those who believe in Christian God, and "Roko's antibasilisk", which spares those people who'd get punished by Roko's basilisk and punishes everyone else.
They need a few algorithmic improvements first, imho. GPT4 is noticeably slower than GPT3.5 and apparently costs a lot more to use, implying some serious compute costs.
They could train it with more data in the hopes of getting another big leap there, but what data is left? They've fed it everything it seems.
So what's left is getting the runtime reduced in terms of the model size. Hire some brilliant minds to turn an N-squared into an N-log-N (or something to that effect).
He has just admitted that O̶p̶e̶n̶AI.com has partially trained GPT-5 and is already planning to test the 'so-called' useless guardrails around it.
There is no 'revolution' around this. Just 'evolution' with more data and more excessive waste of compute to create another so-called AI black-box sophist with Sam Altman selling both the poison (GPT-4) and the antidote (Worldcoin).
At some point, with their tremendous lock-in strategy, O̶p̶e̶n̶AI.com and Microsoft will eventually use the lock-in to upsell and compete against their partners.
> Sam Altman selling both the poison (GPT-4) and the antidote (Worldcoin).
I actually find it pretty amazing that more people aren't given pause by Sam Altman's involvement here. After the WorldCoin stuff, I'd think that he'd be viewed with a much more skeptical eye in terms of his ethics.
The late-night March 31 release of Worldcoin had the (unintended?) side-effect of making me think "a token to prove my personhood" was an April Fools Joke when I saw it the next morning and never thought about it again.
From my understanding they are training their new GPT models off of a checkpoint from the previous generation, so they technically have partially trained multiple future models in their GPT lineage.
Left unsaid in this piece is that OpenAI likely would have to increase parameters and compute by an order of magnitude (~10x) to train a new model that offers noticeable improvements over GPT-4, due to the diminishing returns seen in "transformer scaling laws."
Also, it's possible that OpenAI is still training GPT-4, perhaps with additional modalities, and will make future snapshots available as public releases.
> Left unsaid in this piece is that OpenAI likely would have to increase parameters
Maybe true, but he also said "We are not here to jerk ourselves off about parameter count"
https://techcrunch.com/2023/04/14/sam-altman-size-of-llms-wo...
Also, who says that the "transformer scaling laws" are the ultimate arbiter of LLM scaling? They overturned previous scaling laws and other scaling laws might overturn them. Furthermore, it's even possible that the transformer model won't even be used in later models. I remember Ilya making the point that just because the transformer model was the first one that looks like it can scale intelligence just by lighting up billions of dollars of GPUs, it doesn't mean it's the last one. Maybe it will even be like, the vacuum tube of AI models, and other ones are being made in secret. A hacker news rumor was that they are paying $5M-$20M per year to the top neural net experts probably to make some exotic architectures to surpass transformer.
> A hacker news rumor was that they are paying $5M-$20M per year to the top neural net experts probably to make some exotic architectures to surpass transformer
This reminds me a TV interview of the author Patrick Modiano, just after he won the literature Nobel price. The presenter asked him if the money would help. The author answered essentially that the next time he would be in front of a white page, the money surely wouldn't help.
In the case of surpassing transformers, money could help to give access to more compute power. It could also help to prevent the research from being public.
4 replies →
I'm not an expert but isn't size the distinguishing feature of an LLM? It's the first L.
1 reply →
> They overturned previous scaling laws
Can you link to a comparison or graph of obsolete and new scaling laws?
2 replies →
Curious if anyone can confirm $5-20M figure. Seems absurdly high but what do I know
16 replies →
That money won't help unless they get permission to start their own research department.
Actually what he has said is that the biggest performance gains were from the human feedback reinforcement learning.
There are also all of the quantization and other tricks out there.
Also they have demonstrated that the model already understands images but just haven't completed the API for this.
So they use quantization to increase the speed by a factor of 3 while slightly increasing the parameter count. Maybe find a way to make the network more sparse and efficient so in the end with the quantization the model actually uses significantly less memory. and continue with the RHLF focusing on even more difficult tasks and those that incorporate visual data.
Then instead of calling it GPT-5 they just call it GPT-4.5. Twice as fast as GPT-4, IQ goes from 130 to 155. And the API now allows images to be passed in and analyzed.
There is an API for multimodal computer vision and visual reasoning/VQA, and it's available, just not for normies. It's exclusively for their test group and then the Be My Eyes project at https://www.bemyeyes.com/.
5 replies →
I bet they’re not saying how big of a model GPT-4 is because it’s actually much smaller we would expect.
ChatGPT is IMO a heavily fine-tuned Curie sized model (same price via API + less cognitive capacity than even text davinci-003) so it would make sense that a heavily fine-tuned Davinci sized model would yield similar results to GPT-4.
Yannic Kilcher makes a similar supposition based on results from the tech report https://www.youtube.com/watch?v=2zW33LfffPc&pp=ygUOeWFubmljI... . It’s about 3/4 of the way through the video if memory serves.
I wouldn't bet on their pricing being indicative of their costs. If MSFT wants the ChatGPT-API to be a success and is willing to subsidize it, that's just how it is.
1 reply →
I wonder why it's slower at inference time then (for members using their web UI), or rather, if it's similar in size to gpt3, how gpt3 is optimized in a way that gpt4 isn't or can't be?
I'd expect that by now we would enjoy similar speeds but this hasn't yet happened.
4 replies →
We are also starting to run out of high quality corpus to train on at such model scales. While Video offers another large set of data, we'll have to look at further RL approaches in the next few years to continue scaling datasets.
Is there any source for this, aside from it being oft repeated by internet speculators? Ilya has said the textual data situation is still quite good
3 replies →
They literally said this is not the case
I often see mistakes when chatgot is faced with more spatial reasoning, and I wonder if changes as simple as deep convolutional subnetworks in intermediaries layers would help the language model fit better in these situations. In short, I’m excited to see where things go, and can definitely see room for great improvement through improvements to the architecture!
How noticeable changes will be have little connection with loss reduction during training. Holding very complex thought processes may actually not diminish the loss function all that much. But they are very noticeable when we are interacting with these systems.
They could clean up the training data I bet. That would be where I'd focus next.
Is there any indication from OpenAI people that there are low hanging fruits to be picked in this direction?
1 reply →
> Also, it's possible that OpenAI is still training GPT-4, perhaps with additional modalities, and will make future snapshots available as public releases.
Read OpenAI API docs on GPT model versions carefully, and look at them again from time to time.
https://platform.openai.com/docs/models
In my machine learning experience, if it only takes 10x the parameters brings a significant improvement I feel lucky.
Vicuna offers considerable improvement over LLaMA and it's just 13B delta to 65B model.
I would suspect they probably conditioning data for gpt 5. Im guessing ‘training’ presupposes they have the training data primed & getting data into shape seems to be one of main cruxes
GPT-4 +
It could be that they are not training GPT-5 for a simple reason: Microsoft ran out of GPU compute [1] and they focus on meeting inference demand for now.
Also, the GPT-4 message cap at chat.openai.com was shown as something along the lines of "we expect lower caps next week", then changed to "expect lower caps as we adjust for demand" to "GPT-4 currently has a cap of …". This sounds to me like they changed from having lots of compute to being limited by it. Also note how everything at OpenAI is now behind a sign up and their marketing has slowed down. Similarly, Midjourney has stopped offering their free plan due to lack of compute.
Seems like we didn’t need a 6 months pause letter. Hardware constraints limit the progress for now.
[1]: https://www.deeplearning.ai/the-batch/issue-192/
That or they're working on something like a 10-30B input model, dubbed GPT-NextGen, that essentially has the same results as gpt4, but with a lot more performance gains, and speed, and improvements. GPT-5 will suck, if it's a similar ratio slower to gpt-4, than gpt-4 is to gpt-3.5.
So, I think there's a lot of improvements where maybe gpt-4, is as far you go in terms of inputting data, and maybe better use cases are more customization of data trained on, or finding ways of going smaller, or even some model that just trains itself on the data requirements, similar to how we jump on google when we're stuck, it'd do the same and build up its knowledge that way.
I also think we need improvements in vector stores that maybe add weights to "memories" based on time/frequency/recency/popularity.
That sounds like having a mixture of experts model (at high scale popularly developed by Google): train multiple specialised models (say embedders from text to a representation) that could be fed into a single model at the end. Each expert would be an adapter of sorts, activating depending on the type of input
> the GPT-4 message cap at chat.openai.com was shown as something along the lines of "we expect lower caps next week"
At the time I noticed that the wording they gave technically implied they expected the cap to get more limiting and then that's exactly what happened, and I haven't been able to work out if that is indeed what was the intended message or not.
(Why) Is that technically correct? I'm really curious since I too thought they meant that capping effect would increase (fewer messages allowed), and not decrease (more messages allowed), as was my intuitive understanding.
You also notice the trickery going on?…
I asked it to help me code something. Then it stopped midway through, so I asked it to continue from the last line.
…It started from the beginning.
Now at the same point, I asked it not to stop. To keep going.
It started again from the beginning.
It went like this for about another 10 or so prompts. Hell, I even asked it to help me write a better prompt to ask it to continue from the line it cut off and I then used that. It didn’t work at all.
Then I ran out of prompts.
Three hours later, it did the same crap to me and I lost around 14 prompts to it being ‘stuck’ in an eternal loop.
Basically, OpenAI are sneaky devils. ‘Stuck’ my ass - that was intentional to free up resources.
or maybe you need to stop thinking everything is a conspiracy and realize bugs happen I've been using GPT everyday for the last 3 years, it never happened to me
1 reply →
You can just send a single space character to get the AI to continue its previous output.
3 replies →
Oh. And also they are probably making ChatGPT Plugins ready for public release. Maybe the competition can catch up on the language model, but they will not likely catch up soon to the best language model with the most plugin integrations.
At this point, I wouldn't give much credibility to anything OpenAI claims about their research plans.
The game theory behind AGI research is identical to that of nuclear weapons development. There exists a development gap (the size of which is unknowable ahead of time) where an actor that achieves AGI first, and plays their cards right, can permanently suppress all other AGI research.
Even if one's intentions are completely good, failure to be first could result in never being able to reach the finish line. It's absolutely in OpenAI's interest to conceal critical information, and mislead competing actors into thinking they don't have to move as quickly as they can.
>>>The game theory behind AGI research is identical to that of nuclear weapons development...
Nuclear powers have not been able to reliably suppress others from creating nuclear weapons. Why would we think the first AGI will suppress all others perfectly?
The first nuclear power (the United States) chose to not. Had they decided to be completely evil, they certainly could have used the threat of nuclear annihilation (and the act of it for non-compliers) to achieve that goal.
9 replies →
The first true AGI will likely foom immediately.
I thought the same thing, they’ve not disclosed anything else, so why would the my be even slightly honest about this ?
When I see comments like this, I wonder about the personal morality of the poster and how they arrived at their worldview. It may be hard to beleive, but there are some advantages to truthfulness in this world.
6 replies →
The only "game theory" here is trying to convince people your software is good and important, so you can raise money and sell products.
Why would that be the case? If anything you would expect the first iteration of AGI either kept completely secret or end up leaked indirectly or directly negating any benefits. Also AGI without weapons is not a military threat.
AGI that can engage in cyberwarfare, propaganda campaigns, and social engineering can achieve some military goals nevertheless.
2 replies →
Perhaps it's time to call this synthetic intelligence instead of AI which has an implicit understanding of an alternative method to construct a human like AI.
What is clear is that on this earth itself we have cetacean, corvid, cephalopod intelligence which is wired very differently. Perhaps we need to respect the diversity of intelligences that exist and study this growth in LLM and adjoint areas as just synthetic intelligence.
Rebranding maybe could help drive a level of objectivity this conversation on ethics etc that seems to be missing
I don't understand. "Synthetic intelligence" is just a synonym for "artificial intelligence". The term has all the same issues, does it not?
Actually I agree with them a new name would be helpful. I would propose inorganic intelligence to try to pick a term with less value judgments.
AI is really an overloaded term that includes 70 years of snake oil, Skynet, the Singularity and killer robots. I think we need a new name to start fresh.
And personally, I think we are extremely biased by our sci-fi to think of this tech as malevolent. As far as we can see, it can only know what we teach it since it relies on all of our perceptions to learn. LLMs seem both extremely promising as a useful tool and very pliant to the operator’s wishes. I’m way beyond “this is a fancy next word predictor” as I think it’s emergent behavior has many of the hallmarks of reasoning and novel inference, but at best I think it is only part of a mind and an unconscious one at that.
2 replies →
> The term has all the same issues, does it not?
It could be useful for a similar reason as the euphemism treadmill. We could leave behind all of the misguided assumptions about AI with the old 'artificial intelligence' nomenclature and move forward with 'synthetic intelligence' which has our new understanding of what systems like GPT-4 can do.
14 replies →
I think Artificial Intelligence has taken on the meaning that the intelligence is real but just that it's coming from machines. Synthetic intelligence (at least to me) sounds more like we're acknowledging that the machines aren't really intelligent and just simulating intelligence.
2 replies →
If we can’t (and we haven’t) define intelligence, how could we possibly define artificial intelligence or synthetic.
I had a chat with GPT about this and it came up with the term 'data grounded cognition' to describe an 'intelligence' that is derived purely from (and expressed through) statistical patterns in data.
I quite like the term, and it seems quite unique (perhaps cribbing from 'grounded cognition' though that's an entirely different idea AFAIK)
"Cognition" means understanding and knowing. As problematic as "intelligence" is when describing these systems, I think "cognition" is even worse. "Intelligence" is vague and "cognition" is specific, but "cognition" is also incorrect.
4 replies →
AI has always meant so many things to so many different audiences. I think attempting to argue that X is AI but Y isn't is generally going to be a subjective endeavour of pedantry.
Here’s an essay on why we should start saying “synthetic intelligence” in certain contexts:
https://taylor.town/synthetic-intelligence
that is assuming. Why don't we simply refer to our own intelligence as 'human intelligence' instead. We don't really know what intelligence is. So adding modifier in front of it will just lead to more confusion. AI helps us understand what intelligence actually is, to learn more of it's very essence. It's not that we already know what it is.
"Collective intelligence" since all it's really doing is regurgitating what people have collectively posted online
What about "simulated intelligence"?
It isn't surprising to me that the world's leading AI company is signalling it's okay with slowing down all large scale LLM training that would allow other companies to be competitive. This is familiar territory for Microsoft (edit: guess I'm wrong, they don't get the 49% stock till later).
Why do people conflate OpenAI with Microsoft? Microsoft has an investment in OpenAI and provides infrastructure for them, but they are separate organizations.
Some of these replies are quibbling about percents of investment, but the elephant in the room is that the government and military and intelligence agencies have almost surely become involved by this point, and they must be providing some amounts of dark investment somehow at minimum. At maximum it's a new Manhattan-scale project.
You can go down the rabbit hole if you want, but if you want only the most superficial glimpse of it then consider that OpenAI board member Will Hurd was a CIA undercover agent and also a representative in the House Permanent Select Committee on Intelligence and also he is a trustee of In-Q-Tel which is the private investment arm of the CIA.
6 replies →
They do own 49% [1]. So, sure they are separate organizations. But, when someone owns 49% of your house they have some sway in the decision making that happens. When you look at this from a integrations standpoint, where MS is going to have this baked into all their products, you can expand this logic way more. They are for sure influencing roadmap in areas they are interested in.
[1] https://www.theverge.com/2023/1/23/23567448/microsoft-openai...
5 replies →
Look at it this way, if I repeatedly deposit $9999 into my bank account to avoid regulatory oversight for depositing $10000 then I'm still breaking the law by trying to avoid the regulatory trigger. This is called "structuring" and it is a criminal act.
But if I do this in a stock context and buy 49% control of multiple companies over and over, with all the same obviousness of my intentional avoidance of the regulatory trigger, it's considered a smart move and pretty much the status quo.
Yes, the practice of law says Microsoft does not own openai. But it's also obvious what's going on when companies do this.
Microsoft is the majority shareholder. That they're legally distinct organizations isn't as meaningful as it would be if Microsoft didn't effectively own OpenAI.
3 replies →
My bet is that (previously discussed by others and here) that they have cascades/steps of models. There's probably a 'simple' model that looks at your query first, which detects whether your query could result in a problematic (racist, sexist etc.) GPT answer, returning some boiler-plate text instead of sending the query to GPT. That saves a lot of compute power and time. If I were them I'd focus more on those auxiliary models which hold the hands of the main-GPT model; there are probably more lower-hanging fruits there. This would also explain why they didn't announce GPT-4 details; my bet is that the model itself isn't very impressive, you're just getting the illusion that it got better by these additional 'simpler' models.
Now I can’t help but imagine the raw GPT-4 is just some huge raging asshole and it just has a bunch of “handlers”.
> the raw GPT-4 is just some huge raging asshole
That's pretty much exactly how one of the OpenAI Red Teamers Nathan Labenz describes the raw GPT-4, starting around 45 minutes into the video:
https://news.ycombinator.com/item?id=35377741
4 replies →
I have been writing prompts for a GPT-based document 'digester' for business-internal people who can't code but do have the right background knowledge. Every day I have to expand the prompt because I found a new spot where I have to hold the thing's hands so it does the right thing :)
I feel like the GPT # has already suffered the same fate as nanometers in semiconductor manufacturing.
When manifest as ChatGPT, it is obvious that what presents as 1 magical solution is in fact an elaborate combination of varying degrees of innovation.
In my view, the reasoning for not releasing GPT4 information (hyperparameters, etc) had nothing to do with AI safety. It was a deliberate marketing decision to obscure how the sausage is actually made.
> In my view, the reasoning for not releasing GPT4 information (hyperparameters, etc) had nothing to do with AI safety. It was a deliberate marketing decision
In their technical report they give both reasons:
"Given both the competitive landscape and the safety implications of large-scale models like GPT-4, this report contains no further details about the architecture (including model size), hardware, training compute, dataset construction, training method, or similar."
What a joke. Talking about the size has no safety implications.
2 replies →
They have scaling issue even with 3 and much more with 4, they need time to squeeze more $$ out of these models. 5 will come when they sense competition, they will have all the data and training methods on a turn key to meet that
I just read: https://graymirror.substack.com/p/gpt-4-invalidates-the-turi.... It makes the point that LLMs are not AIs, so we will need a different approach if we want a true AGI, not just incrementaly improving on the current LLM approach.
There is a bit of a political history between the symbolists and connectivist that complicates that, basically the Symbolic camp is looking for universal quantifers while the connectivist were researching existential or statistical quantifers.
The connectivists left the 'AI' folks and established the ML field in the 90s.
Sometimes those political rifts arise in discussions about what is possible.
Thinking of ML under the PAC learning lens will show you why AGI isn't possible through just ML
But the Symbolists direction is also blocked by fundamental limits of math and CS with Gödel's work being one example.
LLMs are AI id your definition is closer to the general understanding of the word, but you have to agree on a definition to reach agreement between two parties.
The belief that AGI is close is speculative and there are many problems, some which are firmly thought to be unsolvable with current computers.
AGI is pseudo-science today without massive advances. But unfortunately as there isn't a consensus on what intelligence is those discussions are difficult also.
Overloaded terms make it very difficult to have discussions on what is possible.
Your links claim that:
'GPT-4 is not “AI” because AI means “AGI,”'
Is a more strict term that hasn't typically applied to AI as an example of my above claim.
As we lack general definitions, it isn't invalid but no AI is thought to be possible with their claims.
AI being computer systems that perform work that typically requires humans within a restricted domain is closer to what most researchers would use in my experience.
1 reply →
> Thinking of ML under the PAC learning lens will show you why AGI isn't possible through just ML
Why? PAC looks a lot like how humans think
> But the Symbolists direction is also blocked by fundamental limits of math and CS with Gödel's work being one example.
Why? Gödel's incompleteness appoes equally well to humans as machines. It's an extremely technical statement about self-reference within an axiom systems, pointing out that it's possible to construct paradoxical sentences. That has nothing to do with general theorem proving about the world.
1 reply →
Semantics are nice, but it doesn't matter what name you give to technology that shatters economies and transforms the nature of human creative endeavours.
An AI's ability to contemplate life while sitting under a tree is secondary to the impact it has on society.
3 replies →
>> The connectivists left the 'AI' folks and established the ML field in the 90s.
The way I know the story is that modern machine learning started as an effort to overcome the "knowledge acquisition bottleneck" in expert systems, in the '80s. The "knowledge acquisition bottleneck" was simply the fact that it is very difficult to encode the knowledge of experts in a set of production rules for an expert system's knowledge-base.
So people started looking for ways to acquire knowledge automatically. Since the use case was to automatically create a rule-base for an expert system, the models they built were symbolic models, at least at first. For example, if you read the machine learning literature from that era (again, we're at the late '80s and early '90s) you'll find it dominated by the work of Ryszard Michalski [1], which was all entirely symbolic as far as I can tell. Staple representations used in machine learning models of the era included decision lists, and decision trees, and that's where decision tree learners, like ID4, C45, Random Forests, Gradient Boosted Trees, and so on, come; which btw are all symbolic models (they are and-or trees, propositional logic formulae).
A standard textbook from that era of machine learning is Tom Mitchell's "Machine Learning" [2] where you can find entire chapters about rule learning, decision tree learning, and other symbolic machine learning subjects, as well as one on neural network learning.
I don't think connectionists ever left, as you say, the "AI" folks. I don't know the history of connectionism as well as that of symbolic machine learning (which I've studied) but from what I understand, connectionist approaches found early application in the field of Pattern Recognition, where the subject of study was primarily machine vision.
In any case, the idea that the connectionists and the symbolists are diametrically opposed camps within AI reserach is a bit of a myth. Many of the luminaries of AI would have found it odd, for example Claude Shannon [3] invented both logic gates and information theory, whereas the original artificial neuron, the Pitts and McCulloch neuron, was a propositional logic circuit that learned its own boolean function. And you wouldn't believe it but Jurgen Schmidhuber's doctoral thesis was a genetic algorithm implemented in ... Prolog [4].
It seems that in recent years people have found it easier to argue that symbolic and connectionist approaches are antithetical and somehow inimical to each other, but I think that's more of an excuse to not have to learn at least a bit about both; which is hard work, no doubt.
______________
[1] https://en.wikipedia.org/wiki/Ryszard_S._Michalski
[2] It's available as a free download from Tom Mitchell's wesbite:
http://www.cs.cmu.edu/afs/cs.cmu.edu/user/mitchell/ftp/mlboo...
[3] Shannon was one of the organisers of the Dartmouth Convention where the term "Artificial Intelligence" was coined, alongside John McCarthy and Marvin Minsky.
[4] https://people.idsia.ch/~juergen/genetic-programming-1987.ht...
One comment on the article from what I’ve read so far. The article states that GPT bombed an economics test, but after trying out the first two questions on the test, I think that the test itself is poorly constructed.
The second question in particular drops the potential implicit assumption that only 100 people stand in line each day.
I face this issue in my CS masters program constantly, and would probably have failed this test much the same as GPT did.
That substack article poorly understand's Turing's paper anyway. Cars aren't even mentioned. Chess is briefly mentioned at the end. I wouldn't base any opinions off of it.
Turing's test was not "this computer fooled me over text, therefore it's an AI". It's a philosophical, "we want to consider a machine that thinks, well we can't really define what thinking is, so instead it's more important to observe if a machine is indistinguishable from a thinker." He then goes on to consider counterpoints to the question, "Can a machine think?" Which is funny because some of these counterpoints are similar to the ones in the author's article.
Author offers no definition of "think" or "invent" or other words. It's paragraph after paragraph of claiming cognitive superiority. Turing's test isn't broken, it's just a baseline for discussion. And comparing it to SHA-1 is foolish. Author would have done better with a writeup of the Chinese room argument.
At what point did the human test maker fail or the AI?
The absurdity in all these debates is how quickly people move the goalposts around between "artificial human intelligence" and "artificial omniscience (Singularity)" when trying to downplay the potential of AI.
deep learning, underpinning it all, is machine learning at the end of the day. the same way we're calling it "AI", this is another branding bs.
in before comments on auto gpt.
Wow, that blog led me down a rabbit hole. Wonder why Yarvin didn't comment on the societal and political impact of LLMs. Sam Altman semed to be supportive of Democratic Socialism and central planning on Lex Fridman's podcast.
Let's just say he's extremely controversial: https://www.vox.com/platform/amp/policy-and-politics/2337379...
1 reply →
The AI maximalists think we're on a exponential curve to the singularity and potential AI disaster or even eternal dominance by a AI dictator [Musk].
Realistically though, the road to AGI and beyond is like the expansion of the human race to The Moon, Mars and beyond, slow, laborious, capital and resource intensive with vast amounts of discoveries that still need to be made.
Without having an understanding of the architecture required for general intelligence, it is impossible to make claims like this. Nobody has this understanding. Literally nobody.
The human brain uses on the order of 10 watts of power and there are almost 8 billion examples of this. So we have hard proof that from a thermodynamic perspective general intelligence is utterly and completely mundane.
We almost certainly already have the computational power required for AGI, but have no idea what a complete working architecture looks like. Figuring that out might take decades, or we might get there significantly quicker. The timespan is simply not knowable ahead of time.
I'm not concerned in the slightest about "the singularity" and non-aligned superintelligences. AGI in the hands of malicious human actors is already a nightmare scenario.
I found out today I don't exactly have Covid brain fog. Covid has triggered my bipolar disorder, so I have flu-like symptoms and hypomania, a combo I've never experienced before so I'm not used to it. It's a bit wild.
https://www.google.com/search?q=bipolar+covid
2 replies →
Take a look at Auto-GPT, it doesn't seem like AGI is far off. I say the AGI in a weak form is already here, it just needs to strengthen.
Tracking problematic actions back to the person that own the AGI will likely not be a difficult task. The owner of an AGI would be held responsible for its actions. The worry is that these actions would happen very quickly. This too can be managed by safety systems, although they may need to be developed more fully in the near future.
Human brains are not built from digital circuits - perhaps they have far more compute than we think.
You asserted an analogy but didn't actualize the connection between the two concepts.
Sorry I have Covid with brain fog right now so maybe you could help me out
Edit: Off the top of my foggy head, LLMs as I understand them are text completion predictors based on statistical probabilities trained on vast amounts of examples of what humans have previously written, whose output is styled with neuro linguistic programming also based on vast numbers of styles of human writing. This is my causual amatuer understanding. There is no logical, reasoning programing such as the Lisp programmers attempted in the 1980's, but clearly the logical abilities of the current LLMs fall short and they are not AGI for that reason. So how do we add logic abilities to make LLMs AGI? Should we revisit the approaches of the Lisp machines of the 1980's? This requires much research and discovery. Then there's the question of just what is general intelligence. I've always thought that emotional intelligence played a huge role in high intelligence, a balance between logic and emotion or Wise Mind is wisdom. Obviously we won't be building emotions into silicon machines or will we? Is anyone proposing this? This could take hundreds of years to accomplish if it is even possible. We could simulate emotion but that's not the same, that's logic. Logical intelligence and emotional capability I think are a prerequisite for consciousness and spirituality. If the Universe is conscious and it arises in a focused manner in brains that are capable of it then how do we build a machine capable of having consciousness arise in it? That's all I'm saying.
https://en.wikipedia.org/wiki/Dialectical_behavior_therapy
OpenAI may not be training GPT-5, but Sam didn't say anything about GPT-4.5.
In fact Greg Brockman explicitly said they are considering changing the release schedule in a way that could be interpreted as opening the door for a different versioning scheme.
And actually there is no law or anything that says that any particular change or improvement to the model or even new training run that necessities them calling it version 5. It's not like there is a Version Release Police that evaluates all of the version numbers and puts people in jail if they don't adhere to some specific consistent scheme.
> In fact Greg Brockman explicitly said
source?
1 reply →
Maybe they’ll pull an MS and go straight to GPT X
The famous Microsoft numbering system! I think we should all skip GPT-Vista, but I can't wait for GPT-7.
As long as it is not GPT-ME
GPT-360
1 reply →
That’s almost as funny as Elon asking everyone to please slow down so X Corp can catch up.
Of *course* they’re still training chat-gpt5
Translation: training GPT-5 will cost time and money, so we’re going to cash in on the commercialization of GPT-4 now while it’s hot. A bird in hand is worth two in the bush.
"Don’t worry, guys. It’s just GPT4.998, it’s not GPT5, it’s not dangerous."
Why not GPT-6 instead? There is far too much hype surrounding this.
Right, “they” are leaving it to GPT4 to train 5. Smart move :-)
I’m waiting for GPT4’s image API. From what I understand it’s not just a “image2text” descriptor that then “reasons” on this description, right?
It’s just grokking an image directly. Were the pixels tokenized somehow? I’m very curious what that does to a model like this.
Can somebody that actually knows anything clue me in?
It's possible to take a text only model and ground it with images. examples are
blip-2(https://github.com/salesforce/LAVIS/tree/main/projects/blip2)
fromage(https://github.com/kohjingyu/fromage)
prismer(https://github.com/NVlabs/prismer)
palm-e(https://ai.googleblog.com/2023/03/palm-e-embodied-multimodal...)
now assuming gpt-4 vision isn't just some variant of mm-react(ie what you're describing), that's what's happening here. https://github.com/microsoft/MM-REACT
images can be tokenized. so what happens usually is that extra parameters are added to a frozen model and those parameters are trained on an image embedding to text embedding task. the details vary of course but that's a fairly general overview of what happens.
the image to text task the models get trained to do has its issues. it's lossy and not very robust. gpt-4 on the other hand looked incredibly robust. they may not be doing that. idk
Very interesting, thanks.
1 reply →
GPT-4's architecture is a trade secret, but vision transformers tokenize patches of images. Something like 8x8 or 32x32 pixel patches, rather than individual pixels.
Multi-model text-image transformers add these tokens right beside the text tokens. So there is both transfer-learning and similarity graphed between text and image tokens. As far as the model knows they're all just tokens. It can't tell the difference between the two.
For the model, the tokens for the words blue/azure/teal and all the tokens for image patches with blue are just tokens with a lot of similarity. It doesn't know if the token its being fed is text, image, or even audio or other sensory data. All tokens are just a number with associated weights to a transformer, regardless of what they represent to us.
The GPT-4 vision API is actually in production and in at least two public products already. https://www.bemyeyes.com/ and https://www.microsoft.com/en-us/ai/seeing-ai
I’d be surprised if that doesn’t do something qualitatively with the model. Very cool, curious to see what’s possible. Thanks.
I've had this thought that the next generation of AI isn't a long "training" period, but rather it probably makes sense to to train a barebones version, and to give it a "sleep cycle". During this time it could use the context (think of it as short term memory) and then fine tune the parent model with it turning the important stuff into long term "memories", with probably a pruning type mechanism for rarely used stuff to keep the important stuff a priority. Would turn AI into individuals with specialized knowledge, but maybe that's more useful even? Like I don't need an AI with an expertise in law, I just want to use it to automate this specialized business process I have which isn't easily automated.
I hope they have a plan to release DALL-E 3 sometime this year, MidJourney seems to be pulling ahead.
Midjourney is already far ahead. I can’t get nearly the same quality with DALLE2
I think Dall-e 3 basically exists in Bing. It is significantly better than Dall-e 2 and is close but not quite at the quality of Midjourny v5. I just generated a series of about 40 portraits and even the hands are significantly closer.
Open source is still so far ahead of midjourney it's not even funny. Like, a racist RimWorld mod author (Automatic1111) built a UI for stable diffusion which unlocks far more capabilities out of it than midjourney will ever have.
I'm not sure if you are trolling or not, but if you aren't then you haven't seen Midjourney v5. But I wouldn't blame you because your information is only like one month out of date which is short in normal timespans but so long in AI timespans.
4 replies →
I feel like Bing Image Creator is at least DALL-E 2.5, it feels like it has higher quality outputs for the same prompt. Could also just be some form of post-processing, though.
I suspect it may be problem of input parameter exhaustion. Do they have enough source material that is safe/vetted for the next jump in training material. I can imagine that model poisoning is a real thing now...
"Some time" tomorrow? Next week? Next month? This doesn't mean anything, its like saying "we don't have any plans to change anything" when a company acquires another. Its all BS
Diminishing returns are currently seen in "transformer scaling laws."
"OpenAI’s CEO says the age of giant AI models is already over": https://news.ycombinator.com/item?id=35603756 (shared 3 dags after this post)
Let's see what the future holds.
To me, GPT-4 is like a superhuman “system 1” thought model (as defined per Daniel Kahneman).
So maybe they’re working on a “system 2” now, which is perhaps more related to what deepmind is doing?
With all the ChatGPT boom and startups built on top o their API hassle, OpenAI receives troves of data to train and fine tune their models on. GPT 3 was trained on only 17 gigabytes of data, and GPT 4 is not far away with 45 gigabytes. On the other side, Alpaca or Vicuna was fine tuned from LLaMA using only megabytes, if not hundreds of kilobytes of training data. I believe it is much more feasible path to significantly improve current generation LLMs.
There's got to be predictable ways of improving LLMs besides training data scale and parameter count. Arent LLMs robust enough to learn on their own via interacting with the world? Like put them in a turn based simulated environment.
I wonder if there's an assumption for how big an LLM should be before it could even conceivably be an LLM. Is there a minimum size necessary before that capability is plausible?
>Arent LLMs robust enough to learn on their own via interacting with the world?
As far as I know current LLMs are entirely static once trained, they don't learn at all in runtime.
Without RLHF, even a high parameter model performs very poorly. LLama-65B often hallucinated when I gave it the most basic of prompts.
It is OK to slow down development, take some more profit, maybe keep doing the human in the loop RL refinement, etc.
From an engineering standpoint, even the less powerful GPT-3.5turbo model handles NLP tasks, really nice tools like LangChain and LlamaIndex that I covered in my last book make it easy to use your own data sources.
I think the possibilities of using what we currently have in useful projects are vast.
is there some kind of convention that defines what specifically constitutes GPT-n or does this just mean "we're not working on the successor to GPT-4 yet"?
There may be conventions but in no way can anyone force them to follow them. It's just a name for a release. They absolutely are working on the successor models and have stated they plan to release a model by June. Whether they are working on a new architecture or training running, they certainly have experiments, but who knows how serious they are.
Regardless they can and will call future models anything they want. They could easily just decide that the minor improvements that come out in a few months are called GPT-4.2 and the major new training run is called GPT-4.5 instead of GPT-5.
No, it is just an arbitrary version number for this series of models from OpenAI. They will flip to 5 when they make an architecture change that will force them to begin training from scratch. Until then they will continue to produce more refined versions of 4, potentially more general training or fine-tuned task-oriented training.
The way it currently works, there is a quite clear boundary, as all the smaller iterations are based on something of a fixed size that was expensively pretrained, and then have either finetuned weights or some extra layers on top, but the core model structure and size can't be changed without starting from scratch.
So if some particular GPT-4 improved successor is based on the GPT-4 core transformer size and pretrained parameters then we'd call it GPT-4.x, but if some other GPT-4 successor is a larger core model (which inevitably also means it's re-trained from scratch) then we'd call it GPT-5, no matter if its observable performance is better or worse or comparable to the tweaked GPT-4.x options.
Is it known how much of an improvement GPT-5 would bring? Can we predict how much better it would be compared to GPT-4?
Based on published research from Google and Meta it is somewhat known how much more capability is possible with the current approach, but it would require an extreme increase in compute and training set to achieve it. There are diminishing returns, but the returns appear to continue for a good while, even without any new model architecture discoveries. Right now the expense will likely mean that progress will be limited to the pace of Moore’s law.
In terms of what this improvement would actually look like in terms of real world, emergent capabilities, no one knows.
No, but llama's training architecture was designed around studying the curve of output quality vs training data size so we do have companies looking into this.
It's the sensible economical thing to do when you basically have monopoly position.
Maybe it is just semantics and they do train GPT-5.1? Or UniversalPT-1 that is multi-modal?
Whew, looks like we've pushed back the birth of Skynet by at least 6 months!
They don’t need to do GPT-5 yet because GPT-4 is monetizable.
It’s funny how we’re all watching the GPT releases and improvements hoping the next one doesn’t trash our jobs any further than the last ha
“Will I still be able to feed my family, Sam? Ilya?”
The old capitalist treadmill is now on full speed and we’re all just trying to keep up.
Where it goes…nobody knows. What an entertaining drama.
[dead]
ooh someone rewrote the title i put, thanks bro
This seems like an extremely dangerous thing for Altman to do.
What about Roko's basilisk?
Roko's basilisk is just Pascal's wager for the geeks.
Just as in Pascal's wager, the conclusion relies on the unwarranted assumption which privileges a particular outcome over its exact opposite - e.g. a diety with exactly inverted criteria for heaven and hell, punishing those who believe in Christian God, and "Roko's antibasilisk", which spares those people who'd get punished by Roko's basilisk and punishes everyone else.
Hell, what about Skynet?
and what about my shredder?
1 reply →
They need a few algorithmic improvements first, imho. GPT4 is noticeably slower than GPT3.5 and apparently costs a lot more to use, implying some serious compute costs.
They could train it with more data in the hopes of getting another big leap there, but what data is left? They've fed it everything it seems.
So what's left is getting the runtime reduced in terms of the model size. Hire some brilliant minds to turn an N-squared into an N-log-N (or something to that effect).
Maybe GPT4 has some ideas.
He has just admitted that O̶p̶e̶n̶AI.com has partially trained GPT-5 and is already planning to test the 'so-called' useless guardrails around it.
There is no 'revolution' around this. Just 'evolution' with more data and more excessive waste of compute to create another so-called AI black-box sophist with Sam Altman selling both the poison (GPT-4) and the antidote (Worldcoin).
At some point, with their tremendous lock-in strategy, O̶p̶e̶n̶AI.com and Microsoft will eventually use the lock-in to upsell and compete against their partners.
> Sam Altman selling both the poison (GPT-4) and the antidote (Worldcoin).
I actually find it pretty amazing that more people aren't given pause by Sam Altman's involvement here. After the WorldCoin stuff, I'd think that he'd be viewed with a much more skeptical eye in terms of his ethics.
The late-night March 31 release of Worldcoin had the (unintended?) side-effect of making me think "a token to prove my personhood" was an April Fools Joke when I saw it the next morning and never thought about it again.
1 reply →
> has partially trained GPT-5
From my understanding they are training their new GPT models off of a checkpoint from the previous generation, so they technically have partially trained multiple future models in their GPT lineage.
Could you detail how Worldcoin is an antidote to GPT-4 ?
Worldcoin price skyrockets => cryptocurrency speculation returns => GPU shortage prevents further training of LLMs
3 replies →
Microsoft would never do that...