Comment by fareesh

5 days ago

What solutions are folks using to solve queries like "How many of these 1000 podcast transcripts have a positive view of Hillary Clinton"? Seems like you would need a way to map reduce and count? And some kind of agent assigner/router on top of it?

We do a lot of things with podcast and other audio media at https://listenalert.com

But in general we found the best course of action is simply label everything. Because our customers will want those answers and rag won’t really work at the scale of “all podcasts the last 6 months. What is the trend of sentiment Hillary Clinton and what about the top topics and entities mentioned nearby”. So we take a more “brute force” approach :-)

At the moment this repo is designed to handle more RAG-oriented use cases, i.e. that require to recall the "top pieces of information" relevant to a given question/context. In your specific example, right now, FastGraphRAG would select the nodes that represent podcasts that are connected to Hilary Clinton, feed them to an LLM which would then select the ones that are positively associated with her. As a next step, we plan to weight the connections between nodes given the query. This way, PageRank will explore only edges which carry the concept "positively associated with", and only the right podcasts would be selected and returned, without having to ask an LLM to classify them. Note that this is basically a fuzzy join and so it will produce only a "best-effort" answer rather than an exact one.

I don't have a dev answer, but in case its relevant, I've seen commercial services that I imagine are doing something similar on the back end-- ground news is one of them. I wish they had monthly subs for their top tier plan rather than only annual, but it seems like a cool product. I haven't actually used it though.

  • What feature(s) of the top tier plan do you wish you had? I have no idea how their subs work but have seen a few ads for the product so have a vague idea that it rates news for bias but don’t see how that would involve many different tiers of subs.

    • It’s been a while since I looked, but unless they changed it, you needed the top tier plan to get a report analyzing the biases of your reading choices and recommending things to balance it out.

      2 replies →

Anticipate what kind of questions user might ask, pre-compute the answers and store them as natural sentences in a vector database.