Comment by Scipio_Afri

8 days ago

Great. Do you have any details on how you produced this? The "reproducible code" isn't really reproducible. The "hierarchical topic model" that you mentioned - which model was used?

The code provided is to reproduce the analytical results from the annotated data; my impression is that you're more interested in the details of the annotation process than running into an issue with that code?

My company's core technology extends topic models to enable arbitrary hierarchical graphs, with additional branches beyond the topic and word branch. We expose those annotations in a SQL interface. It's an alternative/complementary approach to embeddings/LLMs for working with text data. In this case, the hierarchy broke submissions down into paragraphs added a layer to pool them into submissions, and added one more layer to pool them by year (on the topic branch).

Our word branch is a bit more complicated, but we have some extended documentation on our website if you are interested in digging a bit deeper. Always happy to chat more about the technical details of our topic models if you have any questions!

Overview of Our Technology: https://blog.sturdystatistics.com/posts/technology/

Technical Docs: https://docs.sturdystatistics.com