Comment by quibono

18 hours ago

CLRF vs LF strikes again. Partly at least.

I wonder why even have a max line length limit in the first place? I.e. is this for a technical reason or just display related?

23 comments

quibono

brk 15 hours ago

Wait, now we have to deal with Carriage Line Return Feeds too?

I wonder if the person who had the idea of virtualizing the typewriter carriage knew how much trouble they would cause over time.

keybored 14 hours ago
Yeah, and using two bytes for a single line termination (or separation or whatever)? Why make things more complicated and take more space at the same time?
- floren 14 hours ago
  
  Remember that back in the mists of time, computers used typewriter-esque machines for user interaction and text output. You had to send a CR followed by an LF to go to the next line on the physical device. Storing both characters in the file meant the OS didn't need to insert any additional characters when printing. Having two separate characters let you do tricks like overstriking (just send CR, no LF)
  
  7 replies →

OJFord 17 hours ago

I haven't seen them other than in the submission - but if the length matches up it may be that they were processed from raw email, the RFC defines a length to wrap at.

Edit: yes I think that's most likely what it is (and it's SHOULD 78ch; MUST 998ch) - I was forgetting that it also specifies the CRLF usage, it's not (necessarily) related to Windows at all here as described in TFA.

Here it is in my 'notmuch-more' email lib: https://github.com/OJFord/amail/blob/8904c91de6dfb5cba2b279f...

FabHK 17 hours ago
> it's not (necessarily) related to Windows at all here as described in TFA.
The article doesn't claim that it's Windows related. The article is very clear in explaining that the spec requires =CRLF (3 characters), then mentions (in passing) that CRLF is the typical line ending on Windows, then speculates that someone replaced the two characters CRLF with a one character new line, as on Unix or other OSs.
- OJFord 17 hours ago
  
  Ok yeah I may have misinterpreted that bit in the article. It would be a totally reasonable assumption if you didn't happen to know that about email though, it wasn't a judgement regardless.

dgan 17 hours ago

I am just wondering how it is good idea for a sever to insert some characters into user's input. If a collegue were to propose this, i d laugh in his face

It's just sp hacky i cant belive it's a real life's solution

jagged-chisel 17 hours ago
“Insert characters”?
Consider converting the original text (maintaining the author’s original line wrapping and indentation) to base64. Has anything been “inserted” into the text? I would suggest not. It has been encoded.
Now consider an encoding that leaves most of the text readable, translates some things based on a line length limit, and some other things based on transport limitations (e.g. passing through 7-bit systems.) As long as one follows the correct decoding rules, the original will remain intact - nothing “inserted.” The problem is someone just knowledgeable enough to be aware that email is human readable but not aware of the proper decoding has attempted to “clean up” the email for sharing.
- dgan 17 hours ago
  
  Okey it does sound better from this POV. Still wierd as its a Client/UI concern, not something a server is supposed to do; whats next,adding "bold" tags on the title? Lol
  
  1 reply →
flexagoon 17 hours ago
When you post a comment on HN, the server inserts HTML tags into your input. Isn't that essentially the same thing?
- dgan 17 hours ago
  
  No, because there is a clear separation between the content and the envelop. You wouldnt expect the post office to open your physical letters and write routing instructions to the postmen for delivery
  But I agree with sibling comment: it makes more sense when its called "encoding" instead of "inserting chars into original stream"
  
  1 reply →
direwolf20 16 hours ago
It's called escaping, and almost every protocol has it. HN must convert the & symbol to & for displaying in HTML. Many wire protocols like SATA or Ethernet must insert a 1 after a certain number of consecutive 0s to maintain electrical balance. Don't remember which ones — don't quote me that it's SATA and Ethernet.
- zoho_seni 11 hours ago
  
  Protocols that literally insert a bit are HDLC / PPP / CAN and they insert a 0 after a few 1s
layer8 16 hours ago

Just wait until you learn what mess UTF-8 will turn your characters into. ;)