Comment by janalsncm
5 days ago
The term “model” is one of those super overloaded terms. Depending on the conversation it can mean:
- a product (most accurate here imo)
- a specific set of weights in a neural net
- a general architecture or family of architectures (BERT models)
So while you could argue this is a “model” in the broadest sense of the term, it’s probably more descriptive to call it a product. Similarly we call LLMs “language” models even if they can do a lot more than that, for example draw images.
I'm pretty sure only the second is properly called a model, and "BERT models" are simply models with the BERT architecture.
If someone says something is a BERT “model” I’m not going to assume they are serving the original BERT weights (definition 2).
I probably won’t even assume it’s the OG BERT. It could be ModernBERT or RoBERTa or one of any number of other variants, and simply saying it’s a BERT model is usually the right level of detail for the conversation.
It depends on time. 5 years ago it was quite well defined that it’s the last one, maybe the second one in some context. Especially when distinction was important, it was always the last one. In our case it was. We trained models to have weights. We even stored models and weights separately, because models change slower than weights. You could choose a model and a set of weights, and run them. You could change weights any time.
Then marketing, and huge amount of capital came.
It seems unlikely "model" was ever equivalent in meaning to "architecture". Otherwise there would be just one "CNN model" or just one "transformer model" insofar there is a single architecture involved.
1 reply →