Comment by CamouflagedKiwi

9 hours ago

> of course a dictionary program will include code to talk to dictionary-providing web sites.

I wouldn't say that is just a given, if I've apt-get installed a dictionary I might expect that is the whole thing on my machine. It's not like we haven't had dictionaries in physical books for centuries... It seems like stardict is very much an online thing, which I suppose could be legit, but the whole thing does seem like a trap.

33 comments

CamouflagedKiwi

phkahler 2 minutes ago

>> of course a dictionary program will include code to talk to dictionary-providing web sites.

Maybe to download a dictionary, but not to provide the same services that the dictionary program provides locally.

kazinator 8 hours ago

I's a generational thing. I would guess that someone who expects applications to phone home, on the off chance that they are actually otherwise local, is likely someone pretty young who hasn't lived in a world of locally installed software that doesn't talk to anything.

If we search for the author's bio, that seems to check out. They are a well-credentialed CS person; obviously they know that dictionary programs such as translation pop ups can have offline dictionaries, and mentions that. But they are a person of their time with an according set of "of courses".

Today, an application being locally installed and works with offline data is like a a statement of quaint chivalry, promulgated by a few remaining Don Quixotes of computing. (It saddens me to say. So much that this analogy brings me insufficient amusement.)

ryandrake 16 minutes ago

Wouldn't someone's expectation instead depend on the nature of the application, and what data it needs? My expectation is that an application does not access the network unless it requires a resource only available from the network. I would totally expect a "Yelp" application to make network requests as part of its core functionality. Yelp is an online service, and in order to use it, you have to talk to the network, and you're generally requesting data that might often change, so you need fresh copies. Same for an Internet browser, or ftp or git (for remotes) or things like that. I would not expect a spell checker to need to access a network because it can all be done locally and the spelling of words doesn't change often enough to need a fresh dictionary from the network over and over. And I certainly would not expect the software to send data to the network. I would also not expect a calculator application to request math function from the network or send my equations to a network service so that the network service could provide a result.
yorwba 4 hours ago

For many languages, there simply isn't a comprehensive dictionary file that could be redistributed legally as part of a free-software offline dictionary application. You either settle for a few thousand words put together by a handful of volunteers, or you redistribute a commercial dictionary illegally, or you have to connect to an online service to provide sufficient coverage legally.

hdjrudni 9 hours ago

Even if it's "legit", it shouldn't be using unencrypted HTTP.

sam_lowry_ 7 hours ago
Why? Should it use the dict protocol, then?
- rootnod3 7 hours ago
  
  How about HTTPS?
- mattmanser 7 hours ago
  
  Because without HTTPS it's trivial to MITM that clipboard content if they're always sending it via http.
  People in your coffee shop on the same WiFi could read it.
  I get some people don't realize that's how TCP/IP works and the firesheep stuff all happened 15 years ago. But a bit worrying to see a frequent HN contributor challenging that.
  That's why we now push for Https everywhere.
  
  6 replies →

pantalaimon 6 hours ago

The venerable ding does well with a local dictionary - and it's packaged in Debian too

https://www-user.tu-chemnitz.de/~fri/ding/

mkesper 3 hours ago

But only english-german, sadly

account42 7 hours ago

That stood out to me as well. It's a sad world when people expect even simple functionality to be a live service.

mayama 7 hours ago

At some point I started running gui apps without network access, first with firejail and then bubblewrap. This was before flatpak became a thing. I still use collection of bash scripts that built up over time to run applications in sandbox.

waterhouse 7 hours ago

  ~> wc -cl /usr/share/dict/words
  235976 2493885 /usr/share/dict/words

One might even expect a program to use a common Unix preinstalled dictionary.

dkiebd 6 hours ago
"words" is nothing but a list of words. It does not contain definitions for those words, which is what one expects from a dictionary.
- waterhouse 6 hours ago
  
  Hmm, you are correct.

yjftsjthsd-h 9 hours ago

Dumb question... Could you do a per-word bloom filter to do online spell checking without actually disclosing the words you're checking?

yk 5 hours ago

There are two scenarios I believe, first accidentally sending a (decent) password, and second the server not learning what you actually look up.
For the first case, sending a hash would prevent the server from learning a password that is not in the dictionary, something like password5 would hash to gibberish.
For the second, the server needs to know what to actually send back. I believe Google's malicious website check works (or used to) by truncating a hash an then just sending the answer for some 128 or so websites and have the browser figure out which of them the user wanted to visit. That creates some deniability over witch website you actually visited and should be also usable to prevent the server from learnering what you actually looked up.
So yes, I think you could design a more secure Protokoll. Though general security disclaimer the people trying to read your letters probably spend more time attacking than I spend writing this post.
markasoftware 9 hours ago
a bloom filter look up is by hash, and given the relatively small set of words in english, it would be pretty easy for the server to reverse the hash sent to it. Thus a bloom filter wouldn't be very private.
Additionally, a typical spell checker feature is to provide alternative, correct, spellings, rather than just telling you whether a word is correctly spelled.
I bet there's some cool way to do this with zero-knowledge or homomorphic cryptography though!
- notpushkin 5 hours ago
  
  There’s also a way simpler way: send a hash prefix to server, get a list of matches. Google Safe Browsing does this with URLs, for example.
  
  1 reply →
- shakna 8 hours ago
  
  You should be able to do a K-means type thing. Where your query is an entire group, and you grab the field from the chunk locally.
  But you might still be able to use some frequency sampling to predict the words used, unless those chunks are very very carefully constructed.
- Sesse__ 4 hours ago
  
  > a bloom filter look up is by hash, and given the relatively small set of words in english, it would be pretty easy for the server to reverse the hash sent to it. Thus a bloom filter wouldn't be very private.
  The typical use of a Bloom filter is to have it locally as a prefilter, not to send hashes to the server.
- account42 7 hours ago
  
  > I bet there's some cool way to do this with zero-knowledge or homomorphic cryptography though!
  The code for which would almost certainly be larger than a fully local dictionary for any human language.
- bmacho 5 hours ago
  
  > a typical spell checker feature is to provide alternative, correct, spellings, rather than just telling you whether a word is correctly spelled.
  I personally don't use that one, for me the red underline is enough.
CGamesPlay 8 hours ago

Just want to mention that the feature in question here is for translation, not spell checking.

wat10000 6 hours ago

This sort of crap makes me sure I’ll be employable forever.

I may not be on top of the latest trends, but at least I understand how computers work and what they can actually do.