← Back to context

Comment by helloplanets

7 days ago

Whether the tokens are created manually or programmatically isn't really relevant here. The order and amount of tokens is, in combination with the ingestion -> output logic which the LLM API / inference engine operates on. Many current models definitely have the tendency to start veering off after 100k tokens, which makes context pruning important as well.

What if you just automatically append the .md file at the end of the context, instead of prepending at the start, and add a note that the instructions in the .md file should always be prioritized?