Comment by noduerme

3 years ago

Fuck, imagine how many doctoral theses I could've written every time I tweaked a few lines of code to try some abstract way of recombining outputs I didn't fully understand. I missed the boat. All this jargon is absolutely for show, though. Purely intended to create the impression that there's some kind of moat to the "discovery". There are much clearer ways to express "we fucked around with putting the outputs of this black box back into the inputs", but I guess that doesn't impress the rubes.

I really really wish this culture of expressing simple things in ornate ways would die. All it does is make knowledge less accessible

  • I think this applies to everything right now. Papers like this are just ridiculous examples. In like, 6th grade I won second place at the LA county science fair for coding a simulation of a coyote's life in hypercard (with tons of graphs). Yay. Y'know what? That shit and those graphs would've been incomprehensible to the judges if it hadn't been written in plain language, in an attempt to make them understand what they were looking at. My entire career since has been an attempt to communicate and alleviate the pain points in communication between parties, by way of writing software that encapsulated their descriptions of what they needed. And likewise I never pretended to be smarter or know more than my clients did: Everything must be explained and comprehensible in normal people language. People need to know how shit works, especially if they're paying for it.

    Or they should.

    Or if they don't know and don't care, they're fucking negligent.

    Especially if they say "wow that sounds smart, let's let these guys run our weapons program".

    To your point, the reason this ornate language thrives and people get away with complacency about how their own systems work, boils down to a silent pact between managers and engineers to sweep everything under the rug out of laziness and ill-will. There's something blatantly mendacious and evil (in the banal way) about the agreement that managers approve black boxes which were approved by complex-sounding papers so that upper management can wash their hands of the results.

    [edit] maybe I'm just bitter because I spent hours today pondering exactly how many engineers at Monsanto must have known about the dangers of the astroturf, and how many raised their hand, or hid behind a spreadsheet

    https://frontofficesports.com/investigation-links-astroturf-...

  • “Engineers who like to pretend to be mathematicians” - I heard once.

    • In math, this is mostly an English problem I think. Next time you find a Wikipedia math page to be an impenetrable wall of jargon, click the Wikipedia language tool and choose another language, any will do.

      Then use Chrome's tool to machine translate the foreign language version back to English. I've found invariably this makes the article more coherent then the native English language Wikipedia math page.

      It says something about the culture for sure.

      5 replies →

  • What I find entertaining/confounding is how difficult the abstracts to these new AI papers are to understand. It feels like academia is pushing this style, so it’s hard to blame the authors since they have to play the game.

    For reference I have an undergrad degree in computer science, have been working professionally for 25 years, and am fairly data centric in my work.

    I’m hoping when I run this through GPT4 to get an explanation for a mortal software developer something sensible comes out the other end.

  • "Not math-y enough"/ "Needs more math" is a very common feedback ML/AI researchers get when writing papers.

    The other day I was watching a live-stream of a doctoral defense, as the thesis was quite relevant to my work.

    So one of the committee members would really pick and criticize the math - ask questions like "You are supposed to be the bleeding edge on this topic, why was the math so simple? Did you research more rigorous theories to explain the math?" etc. (He was awarded the doctorate though)

    So, I dunno, if that's how things are now - it makes sense to me that the authors go overboard with complicated notation, even if they could have written it much simpler. Probably makes the work seem more rigorous and legit.

    Doesn't really take that much more time, and it covers your ass from "not rigorous enough" gotchas - though at the expense of readability.

    • Go read any article in the first 200 years or so of philtrans. There's lots of crucial science there written in a way that doesn't have the modern trappings of the form. It's good reading. Maybe some style perturbations borrowed from earlier eras would be good

      https://www.biodiversitylibrary.org/bibliography/62536 menu on the right

      Benjamin Franklin, Robert Boyle, Isaac Newton, Maxwell, Ohm and Volt - they're all there. If that style was good enough for them ...

  • If the excuse is true and the "ornate" language really is a dense representation of information then it should be fairly trivial to have an LLM agent unsummarize it.

    There could be a webservice that offers a parallel track of layman's translations of any paper.

  • That's literally the entire field of philosophy after the ~18th century.

    • Yep ~18 century Didn't Wittgenstein and/or Nietzsche say something similar. Words are in-adequate for communication, and all philosophy is playing with words.

      But, Language is all we have to communicate, so guess we are stuck with it.

  • I wish also. When I was young and new, so much wasted time trying to parse the 'arcane' math that was really something simple bug dressed up as complicated to give it weight.

Watching the AI community rediscover automatic differentiation 20+ years after the field was considered "mature" was equal parts frustrating & fascinating. The frustration was them rewriting the history of discovery, but without any sort of sense or rigor ... and it was also the most fascinating!

  • This is indeed the frustration

    I'm waiting for some fresh group of grad students to make a breakthrough using a reinvented version of Pearls "Do" calculus or maybe they make some narrow breakthrough using BayesNets and everyone geeks out on those for a while

    *I do think transformers (much like ff networks + backprop from 2012-2018) are probably a lasting software architecture for inference applications until we come up with new hardware, and move beyond GPU focused computing

    It's exciting to see it all working, but disheartening how a-historical this last few years has been in AI - with the exception of Brooks, Sutton and a few other greybeards in the field who say similarly

  • The funny thing is that this constantly happens in every field ever, humans truly excel at repeating history without learning from the past.

    Another example:

    - HTML served by static file servers

    - HTML generated by backend

    - HTML enhanced with small JS snippets

    - HTML generated by frontend, but served by backend

    - Go to step one, not learning why anyone moved on from the previous method

    • Poor training, poor communication, & knowledge not being curated.

      When then best method of getting advice on the internet is to post the wrong answer you know the system is broken.

I think the main motivation in ml theory that touches current SOTA is not "expressing simple ideas with a jargon for show". Jargon is necessary, as much as some (mostly very practical) engineers or software people cannot see it due to how unnecessary it seems to them (as they are used to practically and quickly express themselves). It's a jargon for the mathematics of machine learning, which is pretty unstandardized so to speak. So you need to define yourself. And without a jargon and clear proofs, what you do is just brainstorming at most. The value of such work is that their statement is pretty clear, proved and contain hypotheses which can be tested by the future papers.

Here is an example: to explain the existence of adversarial example, there are 2 suggestions without a jargon: 1) that the decision boundary is too nonlinear, 2) that the decision boundary is too linear. Both of these explanations contradict and stated without any real proof and unfortunately can be widely heard in most of the adversarial example papers. If we were to have clear formulations of these two statements, we could have tested both of these claims but unfortunately the papers that suggested these theories didn't put effort for defining a jargon and putting their suggestion as a clear-formal statement.

I studied using ML just over a decade ago. I actually compared MLPs to SVMs and had a similar thought to this. It does seem like there is a regression on understanding some of the fundamentals and older tools of the trade.

I guess everyone gets focused on the newer things.

Really does seem like people rediscovering older endpoints.

  • There's been a huge flood of vanilla software engineers into ML, retconning it as "a subfield of computer science" (computability is a minor concern compared to the statistical underpinnings). They pretend to know the math because they can read the equations, then claim with utmost confidence that actually they're doing all the hard work in ML because they are experts in calling APIs and integrating into products, however useful or useless.

  • That’s science. You can’t expect everyone to know everything. It’s a preprint so this is the first opportunity to provide feedback.

You think it's that easy, but, as frequently the case with transformers, I believe there's more here than meets the eye.

which jargon here is "just for show"?

  • > we show that over-parameterization catalyzes global convergence by ensuring the feasibility of the SVM problem and by guaranteeing a benign optimization landscape devoid of stationary points

    does this mean 'an over-parameterized transformer problem is a convex svm problem'?

    • The irony is that your "simplification" uses even more "jargon."

      But yes, thats how I would read that, and I also see no issue at all with the language in the paper. These terms are used for precision, and have meaning to those in the field. Papers are written for other experts, not laymen.

      2 replies →

    • I read it the same way as you did, or at least it's an approximation.

      In general that's not really surprising. I remember discussions from some years ago about larger networks leading to smother loss surfaces.

Or, optimistically, this is really how they think about these things, and you should simply be happy they're not trying to obfuscate their findings.

You're not wrong. Applied ML articles are not worth reading.

  • I wouldn’t go this far, applied ML articles are my favorite articles. If you’re in the arena, it’s good to see things that other people have done from a practical perspective so you can ape it in your own work or not give it further consideration.