← Back to context

Comment by knuckleheads

3 days ago

Great minds think alike ;) Some rambly thoughts off the top of my head about picking games:

I am still nailing down how many words to put in and therefore how long the game is. Right now somewhere between 30 and 75 feels good, I had it at 100 and friends complained that it was too long. To get a sense of your data, make a histogram of total number of words per letter set. This is different per language and does give you sense of what's up, as well as how good your word universe is. Conjugations/inflections help pad this out a lot as well.

Get the word frequencies for each word, wordfreq is helpful. Then, do a greedy algorithm, start accumulating the list of letter sets, one for each day. For first day, take the letter set that maximizes the sum of the squares of the zipf from wordfreq and has the number of words that you want for that day. For the next day, remove all the previous words from the words for your possible letter sets and repeat the sum of squares of zipfs. Then just keep running that, and it will maximize the most common words that haven't been seen yet.

Additionally, I tried to filter out a lot of words out front before running the game covering algorithm. No vulgar, nothing obscure or too obsolete. It's relatively cheap to run a very large word list through GPT-5 and ask a couple questions about each word. Do that once and you have filtered out a fair portion of the list. Build in a system for ad hoc blocking and it gets you most of the way there.

That is very helpful about the buttons and formatting. Thank you so much! I have that fixed up soon. Big lol on the shuffle algorithm, I hardly ever use it so I hadn't noticed. Thanks! I am in the middle of studying German, yes. What I've found is that it is very helpful for introducing me to new words for sure, that I am not sure I could have seen otherwise. It also helps me see the patterns within how the words are formed, just trying to puzzle out things. I have to take a lot of guesses and I feel like I am getting better at guessing which letters will ended up going where, even if I have never heard the word before.