Comment by simonw

1 year ago

Key quote (from Character.AI):

“Over the past two years, however, the landscape has shifted; many more pre-trained models are now available. Given these changes, we see an advantage in making greater use of third-party LLMs alongside our own. This allows us to devote even more resources to post-training and creating new product experiences for our growing user base.”

My interpretation is that Character.AI realized they don't actually need to train their own foundation models from scratch to support their product - they can build cheaper, faster and probably better if they use LLMs trained by other companies (could be GPT-4o/Claude/Gemini via APIs, could be Llama 3.1 self-hosted).

If they're not training foundation models any more, the talents of people like Noam Shazeer aren't so important to them. They need to focus on product development instead.

I think this highlights the winner-take-all stakes of intelligence. It also suggests that there is little to be gained by specialization. Building a brand might actually be more short-term profitable since you can swap in the latest AI models as they become available. In other words, if advancing the SOTA AI is your dream, a product company may not be the right place. And if building a product company is your dream, then building foundational AI might not be the best strategy.

  • > if advancing the SOTA AI is your dream, a product company may not be the right place.

    Does Meta get in the way of this?

    It's hard to compete with a company that is dead set on spending billions and seemingly wants to drive your SOTA AI product revenue to 0.

    If you are OpenAI or Anthropic right now, it seems like trying to run a great restaurant at a reasonable price right next to a good (great?) restaurant that is serving everyone for free.

    • My take is that this has more to do with the coming years than the current climate.

      I think it is just a consequence of the cost of getting to the next level of AI. The estimates for training a GPT-5 level foundational model are on the order of 1 billion. It isn't going to get cheaper from there. So even if your model is a bit better than the free models available today, unless you are spending that 1 billion+ today then you are going to look weak in 6 months to 1 year. And by then the GPT-6+ model training costs will be even higher, so you can't just wait and play catch up. You are probably right as well, in that there is a fear that a competitor based on an open source model gets close enough in capability to generate bad publicity.

      I imagine character.ai (like inflection) did calculations and realized that there was no clear path to recoup that magnitude of investment based on their current product lines. And when they brainstormed ways to increase return they found that none of the paths strictly required a proprietary foundational model. Just my speculation, of course.

      5 replies →

  • Specializing will happen in product implementation, not model implementation.

    LLMs are become akin to tools, like programming languages. They’re blank slates, but require implementation to become special.

I'd argue that their own foundational models are getting outperformed by the Llama finetunes on HF and at this point they're shifting cost structures (getting rid of training clusters in favor of hosted inference).

Strange take since Noam was CEO. He didn't get kicked out. He left. Character AI and remaining employees is going to have a tough time ahead for survival as Google gouged it empty.

> If they're not training foundation models any more, the talents of people like Noam Shazeer aren't so important to them.

Why is the CEO important to model development regardless of talents? They've raised $150m+, have $15m+ ARR and ~200 employees, etc. Shouldn't the CEO be CEOing?

Edit: reading the comments below, it seems like maybe he thought the expected value of attempting to clear the hurdle of their valuation/liquidation preferences at a $250k/year salary as CEO was lower than a $5m+/year salary/RSUs from Google?