Comment by spacebacon

10 hours ago

Now do it without the fine tuning.

The HF zool4nd3r demo may be useful

14 comments

spacebacon

Your method appears to be similar to LoRA but simply less expressive. Some kind of manipulation to layers 7, 14, and 21. Did you compare with other layers? This is obviously extremely specific to a particular backbone.

Also your documents use a ton of nonstandard jargon which only serve to confuse laypeople and annoy anyone who is familiar with ML. Saying your change adds “semiotic awareness” is meaningless when your experiments claim only marginal improvements. Clearly the model had most of the capability before.

More generally, who is it for? People who have expertise in ML are not going to take it seriously. People who don’t?

spacebacon 7 hours ago

It is not LoRA. LoRA fine tunes capabilities into the model. SRT Adapter is a small overlay on a frozen model whose purpose is to make internal reasoning observable. It surfaces what the model is activating at moments of high divergence.
The layers 7, 14, and 21 were chosen after probing. They showed the strongest regime signals. We did compare other layers. The term semiotic awareness is just shorthand for detecting and modulating higher order meaning patterns. If the term is unhelpful I will drop it.
The capability gains are often marginal on standard benchmarks. The intended value is observability and steerability without retraining the backbone.

anentropic 9 hours ago

Tip: neither the "30 second TL;DR" nor the intro paragraph above it really explain to anyone unfamiliar with your (possibly novel?) jargon what it does

janalsncm 8 hours ago

“Semiotic awareness” is not standard ML terminology. The dictionary definition of semiotic simply means “relating to symbols” so it’s a bit grandiose to say you have Qwen “awareness of symbols” when in reality it’s a marginal improvement if even true.
Also to say that a philosopher that died 100 years ago inspired a new attention head is another instance of GPT off his rocker again. You don’t need MAH to contextualize “freedom” in a sentence. Attention already does that.
spacebacon 9 hours ago
Thank you, I would appreciate additional feedback on how I can improve that?
Edit: its not GPT nor off rocker. This repo empirically proved computational semiotics with the reference to C.S. Peirce, Paul Kockelman, and many other respected contemporary semioticians.
- anentropic 8 hours ago
  
  Just try to explain why I should use it and why it's different or better than alternatives - in terms of some qualities of the results rather than how it's implemented
  The technical implementation details are also useful to have, but they're a bit hard to parse into "what is this?"
  
  4 replies →
- janalsncm 8 hours ago
  
  You should write your readmes by hand. You’ll learn a lot more that way, and it’ll help to ground the project.
  
  1 reply →

nextaccountic 9 hours ago

How does this helps with making a LLM write in a particular style present in a large corpus? Is there a training step? Or does SRT can use the raw data as is? (seems unfeasible)

Also is SRT really suitable for style transfer?

I mean this seems to be another network overlaid on top of the LLM steering it, but it needs some target to determine whether the underlying LLM drifted away from it

spacebacon 7 hours ago

SRT does involve a training step, but only on the small adapter and not on the base model. It learns to shift internal representations toward a target discourse regime or style.
It is an overlay, but it works by modulating meaning level patterns called regimes rather than fixed steering vectors. Because it can read its own effect on the hidden states it gives a way to observe whether output is staying in the target regime or drifting.
It is not raw data in and raw style out. The adapter needs examples that define the desired regime.