Comment by alankay

10 years ago

What if "data" is a really bad idea?

72 comments

alankay

Data like that sentence? Or all of the other sentences in this chat? I find 'data' hard to consider a bad idea in and of itself, i.e. if data == information, records of things known/uttered at a point in time. Could you talk more about data being a bad idea?

alankay 10 years ago
What is "data" without an interpreter (and when we send "data" somewhere, how can we send it so its meaning is preserved?)
- richhickey 10 years ago
  
  Data without an interpreter is certainly subject to (multiple) interpretation :) For instance, the implications of your sentence weren't clear to me, in spite of it being in English (evidently, not indicated otherwise). Some metadata indicated to me that you said it (should I trust that?), and when. But these seem to be questions of quality of representation/conveyance/provenance (agreed, important) rather than critiques of data as an idea. Yes, there is a notion of sufficiency ('42' isn't data).
  Data is an old and fundamental idea. Machine interpretation of un- or under-structured data is fueling a ton of utility for society. None of the inputs to our sensory systems are accompanied by explanations of their meaning. Data - something given, seems the raw material of pretty much everything else interesting, and interpreters are secondary, and perhaps essentially, varied.
  
  49 replies →
- sandal 10 years ago
  
  The more meaning you pack into a message, the harder the message is to unpack.
  So there's this inherent tradeoff between "easy to process" and "expressive" -- and I imagine deciding which side you want to lean toward depends on the context.
  Check this out for a practical example: https://www.practicingruby.com/articles/information-anatomy
  (not a Ruby article, but instead about essential structure of messages, loosely inspired by ideas in Gödel, Escher, Bach)
- olantonan 10 years ago
  
  So the idea is to always send the interpreter, along with the data? They should always travel together?
  Interesting. But, practically, the interpreter would need to be written in such a way that it works on all target systems. The world isn't set up for that, although it should be.
  Hm, I now realize your point about HTML being idiotic. It should be a description, along with instructions for parsing and displaying it (?)
  
  4 replies →
mmiller 10 years ago
Take a look here:
https://tekkie.wordpress.com/2010/07/05/sicp-what-is-meant-b...
- richhickey 10 years ago
  
  Data, and the entirety of human understanding and knowledge derived from recording, measurement and analysis of data, predates computing, so I don't see the relevance of these recent, programming-centric notions in a discussion of its value.
  
  4 replies →
- NotUsingLinux 10 years ago
  
  Your blog looks very interesting. You should share some links of it here on hackernews!
  
  1 reply →

deterministic 10 years ago

Data is semantically defined by the processes using/interpreting it. Not by the data itself. So Rich Hickey is right and Alan Kay is wrong.

ayoshi 10 years ago

Yes, I think if we could get rid of this notion we can probably move in interesting directions. Another way to look at it: if we take any object with sufficient complexity in the universe, how could it interact with other object of sufficient complexity? If we look at humans, as first order augmentation devices for other humans, it's notable that the difference between levels of complexity of their internal state is much higher than the level of complexity of input at any sufficiently small time frame (whatever measurement you decide to take). Basically, the whole state is encoded internally, by means of successive undifirentiated input. In that sense, for example - neural networks don't work with data as such, the data presupposes an internal structure that is absent in an input from the standpoint of the network itself. It is it's job to covert that to something we can reasonably call "data". Moreover, this knowledge is encoded in it's internal state, essentially being the "interpreter" bundled in. Another angle that I like to think from is this: TRIZ has a concept of an ideal device, something performing it's function with minumum overhead required, best that the function be performed by itself, in absence of any device. If we imagine the computer (in a very generic sence) to be such a device, it stands to reason that ideally it will require minimum, or even no input. Obviously it means that we don't need to encode meaning or interpretation into it through directed formal input. The only way for it to happen is for a computer to have a sufficiently complex internal state, capable of converting directed, or even self acquired input to whatever we can eventually call "data". This logic could possibly be applied to some minimimal object - we could look for a unit capable of performing a specific function on a defined range of inputs, building the meaning from it's internal state. The second task then, would be to find a way to compose those object, provided they have no common internal state, and to build systems in which combination of those states would render a larger possible field of operation. Third interesting question would be: how can we build up the internal state of another object, provided we would want to feed it the input requiring interpretation further down the line, building up from whatever minimum we already have.

alankay 10 years ago
Welcome to Claude Shannon! It's not about the message but about the receiver ...
- jsa-aerial 10 years ago
  
  Actually it is as much about the sender and the message as the receiver.
  
  1 reply →
jsa-aerial 10 years ago

data isn't the carrier, it isn't the signal (information), and it certainly isn't the meaning (interpretation). A reasonable first approximation is that data is _message_.