Comment by ssivark

11 hours ago

Couldn't help riffing off on a tangent from the title (since the article is about diagramming tools)...

Dylan Beattie has a thought-provoking presentation for anyone who believes that "plain text" is a simple / solid substrate for computing: "There's no such thing as plain text" https://www.slideshare.net/slideshow/theres-no-such-thing-as... (you'll find many videos from different conferences)

17 comments

ssivark

rmunn 7 hours ago

Haven't watched the videos yet, but from the slides, it looks like part of the issue he was talking about was encodings (there's a slide illustrating UTF-16LE ve UTF-16BE, for example). Thankfully, with UTF-8 becoming the default everywhere (so that you need a really good reason not to use it for any given document), we're back at "yes, there is such a thing as plain text" again. It has a much larger set of valid characters, but if you receive a text file without knowing its encoding, you can just assume it's UTF-8 and have a 99.7% chance of being right.

FINALLY.

bmitc 1 hour ago

The point is, a lot of work went into making that happen. I.e., plain text as it is today is not some inherent property of computing. It is a binary protocol and displaying text through fonts is also not a trivial matter.
So my question is: what are we leaving on the table by over focusing on text? What about graphs and visual elements?
ButlerianJihad 2 hours ago

vaxocentrism, or “All the World’s a VAX”
http://www.catb.org/esr/jargon/html/V/vaxocentrism.html
thaumasiotes 4 hours ago
> Thankfully, with UTF-8 becoming the default everywhere (so that you need a really good reason not to use it for any given document), we're back at "yes, there is such a thing as plain text" again.
Whenever I hear this, I hear "all text files should be 50% larger for no reason".
UTF-8 is pretty similar to the old code page system.
- mort96 4 hours ago
  
  Hm? UTF-8 encodes all of ASCII with one byte per character, and is pretty efficient for everything else. I think the only advantage UTF-16 has over UTF-8 is that some ranges (such as Han characters I believe?) are often 3 bytes of UTF-8 while they're 2 bytes of UTF-16. Is that your use case? Seems weird to describe that as "all text files" though?
  
  7 replies →

lelanthran 5 hours ago

I can't tell what the argument is just from the slideshow. The main point appears to be that code pages, UTF-16, etc are all "plain text" but not really.

If that really was the argument, then it is, in 2026, obsolete; utf-8 is everywhere.

benj111 5 hours ago

He has a YouTube channel, there's a talk on there.
He also discusses code pages etc.
I don't think the thesis is wrong. Eg when I think plain text I think ASCII, so we're already disagreeing about what 'plain text' is. His point isn't that we don't have a standard, it's that we've had multiple standards over what we think is the most basic of formats, with lots of hidden complications.

zahlman 2 hours ago

Nice. I've used the phrase before, with the vague notion that a proper talk must already exist.

carra 3 hours ago

I read that article long time ago, and for me it's a hard disagree. A system as complex and quirky as Unicode can never be considered "plain", and even today it is common for many apps that something Unicode-related breaks. ASCII is still the only text system that will really work well everywhere, which I consider a must for calling something plain text.

And yes, ASCII means mostly limiting things to English but for many environments that's almost expected. I would even defend this not being a native English speaker myself.

d-us-vb 1 hour ago

I feel like that isn’t exactly a very useful definition of plaintext. If you mean “ASCII” say ASCII.
Plain text is text intended to be interpreted as bytes that map simply to characters. Complexity is irrelevant.