Comment by quanto
1 year ago
Fascinating paper. Thanks for the ref.
Operating transistors outside the linear region (the saturated "on") on a billion+ scale is something that we as engineers and physicists haven't quite figured out, and I am hoping that this changes in future, especially with the advent of analog neuromorphic computing. The quadratic region (before the "on") is far more energy efficient and the non-linearity could actually help with computing, not unlike the activation function in an NN.
Of course, the modeling the nonlinear behavior is difficult. My prof would say for every coefficient in SPICE's transistor models, someone dedicated his entire PhD (and there are a lot of these coefficients!).
I haven't been in touch with the field since I moved up the stack (numerical analysis/ML) I would love to learn more if there has been recent progress in this field.
The machine learning model didn’t discover something that humans didn’t know about. It abused some functions specific to the chip that could not be repeated in production or even on other chips or other configurations of the same chip.
That is a common problem with fully free form machine learning solutions: They can stumble upon something that technically works in their training set, but any human who understood the full system would never actually use due to the other problems associated with it.
> The quadratic region (before the "on") is far more energy efficient
Take a look at the structure of something like CMOS and you’ll see why running transistors in anything other than “on” or “off” is definitely not energy efficient. In fact, the transitions are where the energy usage largely goes. We try to get through that transition period as rapidly as possible because minimal current flows when the transistors reach the on or off state.
There are other logic arrangements, but I don’t understand what you’re getting at by suggesting circuits would be more efficient. Are you referring to the reduced gate charge?
> Take a look at the structure of something like CMOS and you’ll see why running transistors in anything other than “on” or “off” is definitely not energy efficient. In fact, the transitions are where the energy usage largely goes. We try to get through that transition period as rapidly as possible because minimal current flows when the transistors reach the on or off state.
Sounds like you might be thinking of power electronic circuits rather than CMOS. In a CMOS logic circuit, current does not flow from Vdd to ground as long as either the p-type or the n-type transistor is fully switched off. The circuit under discussion was operated in subthreshold mode, in which one transistor in a complementary pair is partially switched on and the other is fully switched off. So it still only uses power during transitions, and the energy consumed in each transition is lower than in the normal mode because less voltage is switched at the transistor gate.
> In a CMOS logic circuit, current does not flow from Vdd to ground as long as either the p-type or the n-type transistor is fully switched off.
Right, but how do you get the transistor fully switched off? Think about what happens during the time when it’s transitioning between on and off.
You can run the transistors from the previous stage in a different part of the curve, but that’s not an isolated effect. Everything that impacts switching speed and reduces the current flowing to turn the next gate on or off will also impact power consumption.
There might be some theoretical optimization where the transistors are driven differently, but at what cost of extra silicon and how delicate is the balance between squeezing a little more efficiency and operating too close to the point where minor manufacturing changes can become outsized problems?
Seems like this overfitting problem could have been trivially fixed by running it on more than one chip, no?
Unfortunately not. This is analogous to writing a C program that relied on undefined behavior on the specific architecture and CPU of your developer machine. It’s not portable.
The behavior could change from one manufacturing run to another. The behavior could disappear altogether in a future revision of the chip.
The behavior could even disappear if you change some other part of the design that then relocated the logic to a different set of cells on the chip. This was noted in the experiment where certain behavior depended on logic being placed in a specific location, generating certain timings.
If you rely on anything other than the behavior defined by the specifications, you’re at risk of it breaking. This is a problem with arriving at empirical solutions via guess and check, too.
Ideally you’d do everything in simulation rather than on-chip where possible. The simulator would only function in ways supported by the specifications of the chip without allowing undefined behavior.
3 replies →
[dead]
The previous poster was probably thinking about very low power analog circuits or extremely slow digital circuits (like those used in wrist watches), where the on-state of the MOS transistors is in the subthreshold conduction region (while the off state is the same off state as in any other CMOS circuits, ensuring a static power consumption determined only by leakage).
Such circuits are useful for something powered by a battery that must have a lifetime measured in years, but they cannot operate at high speeds.
In other words, optimization algorithms in general are prone to overfitting. Fortunately there are techniques to deal with that. Thing is, once you find a solution that generalize better to different chips, it probably won't be as small as the solution found.
I'm having trouble understanding. Chips with very high transistor counts tend to use saturation/turn-off almost exclusively. Very little is done in the linear region because it burns a lot of power and it's less predictable.
> Operating transistors outside the linear region (the saturated "on")
Do fuzz pedals count?
To be fair, we know they work and basically how they work, but the sonic nuances can be very hard to predict from a schematic.
>Operating transistors outside the linear region (the saturated "on") on a billion+ scale
The whole point of switching transistors is that we _only_ operate them in the fully saturated on or totally off IV-curve region?
Subthreshold circuits are commercially available, just unpopular since all the tools are designed for regular circuits. And the overlap between people who understand semiconductors and people who can make computational tools is very limited, or it's just cheaper to throw people+process shrinks at the problem.
I believe neuromorphic spiking hardware will be the step to truly revolutionize the field of anthropod contagion issues.
Can’t tell if this is a joke or not
I came in already knowing what neuromorphic hardware is and I'm also unsure
6 replies →
Bug zapper
at last, something possibly more buggy than vibe coding!
My thoughts, exactly.