Comment by CamouflagedKiwi
9 hours ago
> of course a dictionary program will include code to talk to dictionary-providing web sites.
I wouldn't say that is just a given, if I've apt-get installed a dictionary I might expect that is the whole thing on my machine. It's not like we haven't had dictionaries in physical books for centuries... It seems like stardict is very much an online thing, which I suppose could be legit, but the whole thing does seem like a trap.
>> of course a dictionary program will include code to talk to dictionary-providing web sites.
Maybe to download a dictionary, but not to provide the same services that the dictionary program provides locally.
I's a generational thing. I would guess that someone who expects applications to phone home, on the off chance that they are actually otherwise local, is likely someone pretty young who hasn't lived in a world of locally installed software that doesn't talk to anything.
If we search for the author's bio, that seems to check out. They are a well-credentialed CS person; obviously they know that dictionary programs such as translation pop ups can have offline dictionaries, and mentions that. But they are a person of their time with an according set of "of courses".
Today, an application being locally installed and works with offline data is like a a statement of quaint chivalry, promulgated by a few remaining Don Quixotes of computing. (It saddens me to say. So much that this analogy brings me insufficient amusement.)
Wouldn't someone's expectation instead depend on the nature of the application, and what data it needs? My expectation is that an application does not access the network unless it requires a resource only available from the network. I would totally expect a "Yelp" application to make network requests as part of its core functionality. Yelp is an online service, and in order to use it, you have to talk to the network, and you're generally requesting data that might often change, so you need fresh copies. Same for an Internet browser, or ftp or git (for remotes) or things like that. I would not expect a spell checker to need to access a network because it can all be done locally and the spelling of words doesn't change often enough to need a fresh dictionary from the network over and over. And I certainly would not expect the software to send data to the network. I would also not expect a calculator application to request math function from the network or send my equations to a network service so that the network service could provide a result.
For many languages, there simply isn't a comprehensive dictionary file that could be redistributed legally as part of a free-software offline dictionary application. You either settle for a few thousand words put together by a handful of volunteers, or you redistribute a commercial dictionary illegally, or you have to connect to an online service to provide sufficient coverage legally.
Even if it's "legit", it shouldn't be using unencrypted HTTP.
Why? Should it use the dict protocol, then?
How about HTTPS?
Because without HTTPS it's trivial to MITM that clipboard content if they're always sending it via http.
People in your coffee shop on the same WiFi could read it.
I get some people don't realize that's how TCP/IP works and the firesheep stuff all happened 15 years ago. But a bit worrying to see a frequent HN contributor challenging that.
That's why we now push for Https everywhere.
6 replies →
The venerable ding does well with a local dictionary - and it's packaged in Debian too
https://www-user.tu-chemnitz.de/~fri/ding/
But only english-german, sadly
That stood out to me as well. It's a sad world when people expect even simple functionality to be a live service.
At some point I started running gui apps without network access, first with firejail and then bubblewrap. This was before flatpak became a thing. I still use collection of bash scripts that built up over time to run applications in sandbox.
One might even expect a program to use a common Unix preinstalled dictionary.
"words" is nothing but a list of words. It does not contain definitions for those words, which is what one expects from a dictionary.
Hmm, you are correct.
Dumb question... Could you do a per-word bloom filter to do online spell checking without actually disclosing the words you're checking?
There are two scenarios I believe, first accidentally sending a (decent) password, and second the server not learning what you actually look up.
For the first case, sending a hash would prevent the server from learning a password that is not in the dictionary, something like password5 would hash to gibberish.
For the second, the server needs to know what to actually send back. I believe Google's malicious website check works (or used to) by truncating a hash an then just sending the answer for some 128 or so websites and have the browser figure out which of them the user wanted to visit. That creates some deniability over witch website you actually visited and should be also usable to prevent the server from learnering what you actually looked up.
So yes, I think you could design a more secure Protokoll. Though general security disclaimer the people trying to read your letters probably spend more time attacking than I spend writing this post.
a bloom filter look up is by hash, and given the relatively small set of words in english, it would be pretty easy for the server to reverse the hash sent to it. Thus a bloom filter wouldn't be very private.
Additionally, a typical spell checker feature is to provide alternative, correct, spellings, rather than just telling you whether a word is correctly spelled.
I bet there's some cool way to do this with zero-knowledge or homomorphic cryptography though!
There’s also a way simpler way: send a hash prefix to server, get a list of matches. Google Safe Browsing does this with URLs, for example.
1 reply →
You should be able to do a K-means type thing. Where your query is an entire group, and you grab the field from the chunk locally.
But you might still be able to use some frequency sampling to predict the words used, unless those chunks are very very carefully constructed.
> a bloom filter look up is by hash, and given the relatively small set of words in english, it would be pretty easy for the server to reverse the hash sent to it. Thus a bloom filter wouldn't be very private.
The typical use of a Bloom filter is to have it locally as a prefilter, not to send hashes to the server.
> I bet there's some cool way to do this with zero-knowledge or homomorphic cryptography though!
The code for which would almost certainly be larger than a fully local dictionary for any human language.
> a typical spell checker feature is to provide alternative, correct, spellings, rather than just telling you whether a word is correctly spelled.
I personally don't use that one, for me the red underline is enough.
Just want to mention that the feature in question here is for translation, not spell checking.
This sort of crap makes me sure I’ll be employable forever.
I may not be on top of the latest trends, but at least I understand how computers work and what they can actually do.