Comment by CountGeek
6 hours ago
So could I in practice train it on all my psychology books, materials, reports, case study and research papers and then run it on demand on a 1xH100 node - https://getdeploying.com/reference/cloud-gpu/nvidia-h100 whenever I have a specialised question?
You could do that indeed, but the performance would be abysmal. For this kind of use-case, it would be a LOT better to use a small pre-trained model and either fine-tune it on your materials, or use some kind of RAG workflow (possibly both).
> it would be a LOT better to use a small pre-trained model and either fine-tune it on your materials, or use some kind of RAG workflow (possibly both).
I noticed NewRelic has a chat feature that does this sort of thing, it's scoped very narrowly down to their website and analytics DSL language, and generates charts/data from their db. I've always wondered how they did that (specifically in terms of set up the training/RAG + guardrails). It's super useful.
You might be able to figure that out just by asking it - see if you can get it to spit out a copy of the system prompt or tell you what tools it has access to.
The most likely way of building that would be to equip it with a "search_docs" tool that lets it look up relevant information for your query. No need to train an extra model at all if you do that.
Yes, though it's possible a more-general core model, further enhanced with some other ways to bring those texts-of-interest into the working context, might perform better.
Those other ways to integrate the texts might be some form of RAG or other ideas like Apple's recent 'hierarchical memories' (https://arxiv.org/abs/2510.02375).
You could but it would be significantly worse than fine-tuning or RAG with a pre-trained model, or using a smaller model since your dataset would be so small.
No.