Comment by cubefox

7 days ago

I'm pretty sure only the second is properly called a model, and "BERT models" are simply models with the BERT architecture.

If someone says something is a BERT “model” I’m not going to assume they are serving the original BERT weights (definition 2).

I probably won’t even assume it’s the OG BERT. It could be ModernBERT or RoBERTa or one of any number of other variants, and simply saying it’s a BERT model is usually the right level of detail for the conversation.

It depends on time. 5 years ago it was quite well defined that it’s the last one, maybe the second one in some context. Especially when distinction was important, it was always the last one. In our case it was. We trained models to have weights. We even stored models and weights separately, because models change slower than weights. You could choose a model and a set of weights, and run them. You could change weights any time.

Then marketing, and huge amount of capital came.

  • It seems unlikely "model" was ever equivalent in meaning to "architecture". Otherwise there would be just one "CNN model" or just one "transformer model" insofar there is a single architecture involved.

    • First of all, hyperparameters. Second, organization, or connections. 3rd, cost function. 4th, activation function. 5th type of learning. Etc.

      These are not weights. These were parts of models.