You’re subtly pushing the same product in basically every one of your comments. If these are good faith comments please edit out the product name, it’s unnecessary and doing so as a green account just makes people consider you a spammer. Establish yourself first.
They've submitted "I'm working at io.net" quite openly, but I admit, they should at least announce their employment in the bio, otherwise it's a very poorly executed astroturf post (phrased like they're an experimenting user and not a dev).
> On the infra side, training a 1.5B model in ~4 hours on 8×H100 is impressive.
It's hard to compare without more details about the training process and the dataset, but, is it? Genuine question, because I had the opposite impression. Like, for example, recently I did a full finetuning run on a 3B model chewing through a 146k entry dataset (with 116k entries having reasoning traces, so they're not short) in 7 hours on a single RTX 6000.
Honestly I think we can improve our training throughput drastically via a few more optimizations but we've been spending most of our time on model quality improvements instead.
You’re subtly pushing the same product in basically every one of your comments. If these are good faith comments please edit out the product name, it’s unnecessary and doing so as a green account just makes people consider you a spammer. Establish yourself first.
They've submitted "I'm working at io.net" quite openly, but I admit, they should at least announce their employment in the bio, otherwise it's a very poorly executed astroturf post (phrased like they're an experimenting user and not a dev).
Or he could disclose it.l, which he did in a different comment on a different story.
I agree that green accounts could be regarded as suspicious and, if it were me, I'd disclose each time I mention it.
> On the infra side, training a 1.5B model in ~4 hours on 8×H100 is impressive.
It's hard to compare without more details about the training process and the dataset, but, is it? Genuine question, because I had the opposite impression. Like, for example, recently I did a full finetuning run on a 3B model chewing through a 146k entry dataset (with 116k entries having reasoning traces, so they're not short) in 7 hours on a single RTX 6000.
Honestly I think we can improve our training throughput drastically via a few more optimizations but we've been spending most of our time on model quality improvements instead.