Comment by canyon289

4 days ago

Its a crucial question. I wrote up a long answer here. Let me know it helps

https://news.ycombinator.com/item?id=44913558

Thanks for the reply!

It does help to figure out where in the space this model fits. I'm still a bit confused about this part:

>since it needs to be shaped to match specific tasks, we did our best to design it to be a flexible starting point for LLM-style tasks and worked with partners to put it into the right frameworks and places for you all to be able to shape it to what you need it to be.

What does shaping mean in this case? What tools are used, what requirements are there, both in terms of hardware and knowledge?

I would like to go beyond being spoonfed by large companies' high usability products, both to improve my knowledge and not be a victim of potential future rug pulls. In the classic software world, I guess the equivalent would be someone who runs open source software navigating the extra complexity, and ocassionally collaborates with the projects.

But I don't know what that looks like in the AI world. I've gone through some courses on machine learning but learning the basics about hessian matrices and gradient descent seems as detached from the practical point I'm searching as taking a compilers class is from learning React, so I think I've been looking in the wrong places (?).

  • > What does shaping mean in this case? What tools are used, what requirements are there, both in terms of hardware and knowledge?

    I'll try making an analogy to another task I like which is cooking. In cooking the chef has to make decisions like what is the overall meal going to look like, but then also detailed decisions like what the main course versus side, and even more detailed what's the proportion of side dish serving to main dish, what ingredients, how long to cook something etc.

    It's kind of the same with ML models, whether AI or not. When I build smaller bayesian models I make specific choices about the model architecture, which data I use, the array shape of the output etc.

    The tools used here are largely jax or pytorch, often in a framework like flax, or a NN higher level package. You often then pair it with libraries that which have NN optimizers, data loaders etc. Pytorch is more batteries included than the JAX ecosystem which separates these out.

    One of the best ways to get a grasp of all of this is implement some small models yourself. These pieces will start to be come more apparent and concrete, especially because as an end users you're not exposed to them, the same way most end users are not exposed to compilers.