Comment by esafak

2 months ago

> "Modern" languages try to avoid exceptions by using sum types and pattern matching plus lots of sugar to make this bearable. I personally dislike both exceptions and its emulation via sum types. ... I personally prefer to make the error state part of the objects: Streams can be in an error state, floats can be NaN and integers should be low(int) if they are invalid.

Special values like NaN are half-assed sum types. The latter give you compiler guarantees.

27 comments

esafak

SJMG 2 months ago

Not a defense of the poison value approach, but in this thread Araq (Nim's principal author) lays out his defense for exceptions.

https://forum.nim-lang.org/t/9596#63118

kace91 2 months ago

I’d like to see their argument for it. I see no help in pushing NaN as a number through a code path corrupting all operations it is part of, and the same is true for the others.

snek_case 2 months ago
The reason NaN exists is for performance AFAIK. i.e. on a GPU you can't really have exceptions. You don't want to be constantly checking "did this individual floating-point op produce an error?" It's easier and faster for the individual floating point unit to flag the output as a NaN. Obviously NaNs long predate GPUs, but floating-point support was also hardware accelerated in a variety of ways for a long time.
That being said, I agree that the way NaNs propagate is messy. You can end up only finding out that there was an error much later during the program's execution and then it can be tricky to find out where it came from.
- beagle3 2 months ago
  
  The alternative is checking the result of every operation; or use “signaling NaNs” that raise an exception on a (properly configured) scalar operation on a CPU. As soon as non scalar code is involved - SIMD or GPU, quiet NaNs with strategically placed explicit tests along the computation becomes the only reasonable/efficient option.
cb321 2 months ago
There is no direct argument/guidence that I saw for "when to use them", but masked arrays { https://numpy.org/doc/stable/reference/maskedarray.html } (an alternative to sentinels in array processing sub-languages) have been in NumPy (following its antecedents) from its start. I'm guessing you could do a code-search for its imports and find arguments pro & con in various places surrounding that.
From memory, I have heard "infecting all downstream" as both "a feature" and "a problem". Experience with numpy programs did lead to sentinels in the https://github.com/c-blake/nio Nim package, though.
Another way to try to investigate popularity here is to see how much code uses signaling NaN vs. quiet NaN and/or arguments pro/con those things / floating point exceptions in general.
I imagine all of it comes down to questions of how locally can/should code be forced to confront problems, much like arguments about try/except/catch kinds of exception handling systems vs. other alternatives. In the age of SIMD there can be performance angles to these questions and essentially "batching factors" for error handling that relate to all the other batching factors going on.
Today's version of this wiki page also includes a discussion of Integer Nan: https://en.wikipedia.org/wiki/NaN . It notes that the R language uses the minimal signed value (i.e. 0x80000000) of integers for NA.
There is also the whole database NULL question: https://en.wikipedia.org/wiki/Null_(SQL)
To be clear, I am not taking some specific position, but I think all these topics inform answers to your question. I think it's something with trade-offs that people have a tendency to over-simplify based on a limited view.
- kace91 2 months ago
  
  >To be clear, I am not taking some specific position, but I think all these topics inform answers to your question. I think it's something with trade-offs that people have a tendency to over-simplify based on a limited view.
  That's fair, I wasn't dimsissing the practice but rather just commenting that it's a shame the author didn't clarify their preference.
  I don't think the popularity angle is a good proxy for usefulness/correction of the practice. Many factors can influence popularity.
  Performance is a very fair point, I don't know enough to understand the details but I could see it being a strong argument. It is counter intuitive to move forward with calculations known to be useless, but maybe the cost of checking all calculations for validity is larger than the savings of skipping early the invalid ones.
  There is a catch though. Numpy and R are very oriented to calculation pipelines, which is a very different usecase to general programming, where the side effects of undetected 'corrupt' values can be more serious.
  
  1 reply →
otabdeveloper4 2 months ago

There is no argument. It's literally just a "programming is hard, let's go shopping" sentiment.

elcritch 2 months ago

The compiler can still enforce checks, such as with nil checks for pointers.

In my opinion it’s overall cleaner if the compiler handles enforcing it when it can. Something like “ensure variable is initialized” can just be another compiler check.

Combined with an effects system that lets you control which errors to enforce checking on or not. Nim has a nice `forbids: IOException` that lets users do that.

umanwizard 2 months ago
> The compiler can still enforce checks, such as with nil checks for pointers.
Only sometimes, when the compiler happens to be able to understand the code fully enough. With sum types it can be enforced all the time, and bypassed when the programmer explicitly wants it to be.
- wavemode 2 months ago
  
  There's nothing preventing this for floats and ints in principle. e.g. the machine representation could be float, but the type in the eyes of the compiler could be `float | nan` until you check it for nan (at which point it becomes `float`). Then any operation which can return nan would return `float | nan` instead.
  tbh this system (assuming it works that way) would be more strict at compile-time than the vast majority of languages.
  
  5 replies →
ux266478 2 months ago
Both of these things respectively are just pattern matches and monads, just not user-definable ones.
- xigoi 2 months ago
  
  On the other hand, it’s more ergonomic and readable because you don’t need to declare a new name.
  if name != nil: echo name
  versus
  case name of Some(unwrappedName): echo unwrappedName

saghm 2 months ago

Yeah, I'm not sure I've ever seen NaN called or as an example to be emulated before, rather than something people complain about.

echelon 2 months ago
Holy shit, I'd love to see NaN as a proper sum type. That's the way to do it. That would fix everything.
- ameliaquining 2 months ago
  
  I suspect that this would result in a lot of .unwrap() calls or equivalent, and people would treat them as line noise and find them annoying.
  An approach that I think would have most of the same correctness benefits as a proper sum type while being more ergonomic: Have two float types, one that can represent any float and one that can represent only finite floats. Floating-point operations return a finite float if all operands are of finite-float type, or an arbitrary float if any operand is of arbitrary-float type. If all operands are of finite-float type but the return value is infinity or NaN, the program panics or equivalent.
  (A slightly more out-there extension of this idea: The finite-float type also can't represent negative zero. Any operation on finite-float-typed operands that would return negative zero returns positive zero instead. This means that finite floats obey the substitution property, and (as a minor added bonus) can be compared for equality by a simple bitwise comparison. It's possible that this idea is too weird, though, and there might be footguns in the case where you convert a finite float to an arbitrary one.)
  
  4 replies →

lairv 2 months ago

That's why I always disliked calling null the "billion dollar mistake", null and Options<T> are basically the same, the mistake is not checking it at compile time

the_gipsy 2 months ago

...and if everything was wrapped in Option<>.
If my grandmother had wheels, she'd be a bike.