Comment by beart

7 hours ago

I agree with your first point. I've seen this same issue crop up in several other ORMs.

As to your second point. VARCHAR uses N + 2 bytes where as NVARCHAR uses N*2 + 2 bytes for storage (at least on SQL Server). The vast majority of character fields in databases I've worked with do not need to store unicode values.

21 comments

beart

wvenable 7 hours ago

> The vast majority of character fields in databases I've worked with do not need to store unicode values.

This has not been my experience at all. Exactly the opposite, in fact. ASCII is dead.

SigmundA 7 hours ago
Vast majority of text fields I see are coded values that are perfectly fine using ascii, but I deal mostly with English language systems.
Text fields that users can type into directly especially multiline tend to need unicode but they are far fewer.
- psidebot 5 hours ago
  
  Some examples of coded fields that may be known to be ascii: order name, department code, business title, cost center, location id, preferred language, account type…
- simonask 6 hours ago
  
  English has plenty of Unicode — claiming otherwise is such a cliché…
  Unicode is a requirement everywhere human language is used, from Earth to the Boöotes Void.
  
  8 replies →

_3u10 7 hours ago

Generally if it stores user input it needs to support Unicode. That said UTF-8 is probably a way better choice than UTF-16/UCS-2

Dwedit 2 hours ago

The one place UTF-16 massively wins is text that would be two bytes as UTF-16, but three bytes as UTF-8. That's mainly Chinese, Japanese, Korean, etc...
SigmundA 7 hours ago
UTF-8 is a relatively new thing in MSSQL and had lots of issues initially, I agree it's better and should have been implemented in the product long ago.
I have avoided it and have not followed if the issues are fully resolved, I would hope they are.
- kstrauser 7 hours ago
  
  > UTF-8 is a relatively new thing in MSSQL and had lots of issues initially, I agree it's better and should have been implemented in the product long ago.
  Their insistence on making the rest of the world go along with their obsolete pet scheme would be annoying if I ever had to use their stuff for anything ever. UTF-8 was conceived in 1992, and here we are in 2026 with a reasonably popularly database still considering it the new thing.
  
  4 replies →

SigmundA 7 hours ago

To complicate matters SQL Server can do Nvarchar compression, but they should have just done UTF-8 long ago:

https://learn.microsoft.com/en-us/sql/relational-databases/d...

Also UTF-8 is actually just a varchar collation so you don't use nvarchar with that, lol?