← Back to context

Comment by nerdponx

4 months ago

How did you construct the embedding? Sum of individual token vectors, or something more sophisticated?

Model embedding models (particulaly those with context windows of 2048+ tokens) allow you to YOLO and just plop the entire text blob into it and you can still get meaningful vectors.

Formatting the input text to have a consistent schema is optional but recommended to get better comparisons between vectors.