Comment by quibono

18 hours ago

CLRF vs LF strikes again. Partly at least.

I wonder why even have a max line length limit in the first place? I.e. is this for a technical reason or just display related?

Wait, now we have to deal with Carriage Line Return Feeds too?

I wonder if the person who had the idea of virtualizing the typewriter carriage knew how much trouble they would cause over time.

  • Yeah, and using two bytes for a single line termination (or separation or whatever)? Why make things more complicated and take more space at the same time?

    • Remember that back in the mists of time, computers used typewriter-esque machines for user interaction and text output. You had to send a CR followed by an LF to go to the next line on the physical device. Storing both characters in the file meant the OS didn't need to insert any additional characters when printing. Having two separate characters let you do tricks like overstriking (just send CR, no LF)

      7 replies →

I haven't seen them other than in the submission - but if the length matches up it may be that they were processed from raw email, the RFC defines a length to wrap at.

Edit: yes I think that's most likely what it is (and it's SHOULD 78ch; MUST 998ch) - I was forgetting that it also specifies the CRLF usage, it's not (necessarily) related to Windows at all here as described in TFA.

Here it is in my 'notmuch-more' email lib: https://github.com/OJFord/amail/blob/8904c91de6dfb5cba2b279f...

  • > it's not (necessarily) related to Windows at all here as described in TFA.

    The article doesn't claim that it's Windows related. The article is very clear in explaining that the spec requires =CRLF (3 characters), then mentions (in passing) that CRLF is the typical line ending on Windows, then speculates that someone replaced the two characters CRLF with a one character new line, as on Unix or other OSs.

    • Ok yeah I may have misinterpreted that bit in the article. It would be a totally reasonable assumption if you didn't happen to know that about email though, it wasn't a judgement regardless.

I am just wondering how it is good idea for a sever to insert some characters into user's input. If a collegue were to propose this, i d laugh in his face

It's just sp hacky i cant belive it's a real life's solution

  • “Insert characters”?

    Consider converting the original text (maintaining the author’s original line wrapping and indentation) to base64. Has anything been “inserted” into the text? I would suggest not. It has been encoded.

    Now consider an encoding that leaves most of the text readable, translates some things based on a line length limit, and some other things based on transport limitations (e.g. passing through 7-bit systems.) As long as one follows the correct decoding rules, the original will remain intact - nothing “inserted.” The problem is someone just knowledgeable enough to be aware that email is human readable but not aware of the proper decoding has attempted to “clean up” the email for sharing.

    • Okey it does sound better from this POV. Still wierd as its a Client/UI concern, not something a server is supposed to do; whats next,adding "bold" tags on the title? Lol

      1 reply →

  • When you post a comment on HN, the server inserts HTML tags into your input. Isn't that essentially the same thing?

    • No, because there is a clear separation between the content and the envelop. You wouldnt expect the post office to open your physical letters and write routing instructions to the postmen for delivery

      But I agree with sibling comment: it makes more sense when its called "encoding" instead of "inserting chars into original stream"

      1 reply →

  • It's called escaping, and almost every protocol has it. HN must convert the & symbol to & for displaying in HTML. Many wire protocols like SATA or Ethernet must insert a 1 after a certain number of consecutive 0s to maintain electrical balance. Don't remember which ones — don't quote me that it's SATA and Ethernet.