Comment by langitbiru
15 hours ago
I did consider building a tool like this before I pivot to something else. I'm learning materials in Chinese Mandarin language from a YouTube playlist. NotebookLLM doesn't support Chinese language yet so you must make sure your app supports Chinese Mandarin so I can use it. :)
A way to find specific materials would be nice. Think of converting the whole playlist into something like RAG then you can search anything from this playlist.
Wow, thanks for this validation! Hearing from someone who almost built the solution themselves confirms we’re on the right track.
You hit the nail on the head regarding language support.
Mandarin/Multilingual Support: Absolutely, supporting a wide range of languages—especially Mandarin—is a top priority. Since we focus on extracting the official subtitles provided by YouTube, the language support is inherently tied to what the YouTube platform offers. We just need to ensure our system correctly parses and handles those specific Unicode character sets on the backend. We'll make sure CJK (Chinese, Japanese, Korean) languages are handled cleanly from Day 1.
The RAG/Semantic Search Idea: That is an excellent feature suggestion and exactly where I see the tool evolving! Instead of just giving the user a zip file of raw data, the true value is transforming that data into a searchable corpus. The idea of using RAG to search across an entire playlist/channel transcript is something we're actively exploring as a roadmap feature, turning the tool from a downloader into a Research Assistant.
Thanks for the use case and the specific requirements! It helps us prioritize the architecture.
> Since we focus on extracting the official subtitles provided by YouTube, the language support is inherently tied to what the YouTube platform offers.
You can use video understanding from Gemini LLM models to extract subtitles even the video doesn't have official subtitles. That's expensive for sure. But you should provide this option to willing users. I think.
That is a fantastic point, and you've perfectly articulated the core trade-off we're facing: Accuracy vs. Cost.
You are 100% right. For the serious user (researcher, data analyst, etc.) the lack of an official subtitle is a non-starter. Relying solely on official captions severely limits the available corpus.
The suggestion to use powerful models like Gemini for high-accuracy, custom transcription is excellent, but as you noted, the costs can spiral quickly, especially with bulk processing of long videos.
Here is where we are leaning for the business model:
We are committed to keeping the Bulk Download of all YouTube-provided subtitles free, but we must implement a fair-use limit on the number of requests per user to manage the substantial bandwidth and processing costs.
We plan to introduce a "Pro Transcription" tier for those high-value, high-volume use cases. This premium tier would cover:
Unlimited/High-Volume Bulk Requests.
LLM-Powered Transcription: Access to the high-accuracy models (like the ones you mentioned) with custom context injection, bypassing the "no official subs" problem entirely—and covering the heavy processing costs.
We are currently doing market research on fair pricing for the Pro tier. Your input helps us frame the value proposition immesnely. Thank you for pushing us on this critical commercial decision!