Comment by gabriel666smith

2 days ago

> Where are you going with this?

I don't know!

Adding code now.

Is this more clear, for at least the initial 'word set' generation? I can add to repo if so:

Concept:

The system predicts and generates words based on Fibonacci distances - instead of looking at the next word or previous word, it looks at words that are 2, 3, 5, 8, 13, 21, etc. positions away (following the Fibonacci sequence).

Key Components

1. Training Phase

Takes a text file and extracts all words Builds two prediction models:

Forward model: "If I see word X at position N, what word appears at position N+2, N+3, N+5, N+8, etc.?" Backward model: "If I see word X at position N, what word appeared at position N-2, N-3, N-5, N-8, etc.?"

2. Generation Phase

Starts with seed words (user input) For each seed word, predicts what should come before and after using Fibonacci distances

Uses bidirectional validation: a word is only chosen if it's probable in BOTH forward and backward directions

This attempts to create a more coherent, contextually consistent text.

Then runs multiple passes where generated words become new starting points for further generation, creating richer, more developed text. The words with strongest association values = the final generation set of available words.

When you say you build "prediction models" - what exactly is that? Is it just a list of possibilities for each word at each position?

So for example forward[the][2] is a list of words that can come 2 places after "the"? Either with duplicates for more likely words, or with probabilities that you can sample, like a Markov model.

Or is the "prediction model" some sort of neural network, or something else?

When you say a word is only chosen if it's probable in both the forward and backward direction, what does that mean?

I still can't see any code in your repo.

  • I've just added the model generation code! I hope that's helpful.

    1. Yes — that’s exactly right. It counts. If "Hacker" appeared 2 places before "News" multiple times, the counts would reflect that.

    Later, when generating, these counts are turned into probabilities by normalising (dividing by the total).

    2. So I think this part is a Fibonacci-structured Markov-like model (not a neural network, I don't think).

    3.

    > When you say a word is only chosen if it's probable in both the forward and backward direction, what does that mean?

    This is the key part, potentially.

    When generating, the script does this:

    Forward model: “Given seed word A, what words appear fib_distance ahead?” → gives P_forward(B | A)

    Backward model: “Given candidate word B, what words appear fib_distance behind?” → gives P_backward(A | B)

    It then checks both directions.

    If word B is predicted forward and backward, it multiplies the probabilities. If a word only shows up in the forward direction but never appears in backward training data (or vice versa), it gets discarded.

    It’s a kind of bidirectional filter to avoid predictions that only hold in one direction.

    I'm learning a lot of these words as I go, so questions like these are really helpful for me - thanks.