← Back to context

Comment by gthompson512

12 hours ago

Sorry if I missed it, but what about keeping the suffixes and trying to do some finetuning on the source then clustering sentences or at least pages which given the media should be consistent-ish

Great question — and something I've been thinking about. I stripped suffixes mostly to normalize some of the repeated endings (aiin, dy, etc.) that felt like filler, but you’re totally right that preserving them might preserve structure I lost.

Clustering by sentence or page would be interesting too — I haven't gone that far yet, but it’d be fascinating to see if there’s consistency across visual/media sections. Appreciate the insight!