Comment by simonw
14 days ago
This thread so far (at 310 comments) summarized by Llama 4 Maverick:
hn-summary.sh 43595585 -m openrouter/meta-llama/llama-4-maverick -o max_tokens 20000
Output: https://gist.github.com/simonw/016ea0fd83fc499f046a94827f9b4...
And with Scout I got complete junk output for some reason:
hn-summary.sh 43595585 -m openrouter/meta-llama/llama-4-scout -o max_tokens 20000
Junk output here: https://gist.github.com/simonw/d01cc991d478939e87487d362a8f8...
I'm running it through openrouter, so maybe I got proxied to a broken instance?
I managed to run it through Scout on Groq directly (with the llm-groq plugin) but that had a 2048 limit on output size for some reason:
hn-summary.sh 43595585 -m groq/meta-llama/llama-4-scout-17b-16e-instruct -o max_tokens 2048
Result here: https://gist.github.com/simonw/a205c5fc131a1d4e9cd6c432a07fe...
I'm a little unimpressed by its instruction following here, the summaries I get from other models are a lot closer to my system prompt. Here's the same thing against Gemini 2.5 Pro for example (massively better): https://gist.github.com/simonw/f21ecc7fb2aa13ff682d4ffa11ddc...
I tried summarizing the thread so far (339 comments) with a custom system prompt [0] and a user-prompt that captures the structure (hierarchy and upvotes) of the thread [1].
This is the output that we got (based on the HN-Companion project) [2]:
LLama 4 Scout - https://gist.github.com/annjose/9303af60a38acd5454732e915e33...
Llama 4 Maverick - https://gist.github.com/annjose/4d8425ea3410adab2de4fe9a5785...
Claude 3.7 - https://gist.github.com/annjose/5f838f5c8d105fbbd815c5359f20...
The summary from Scout and Maverick both look good (comparable to Claude), and with this structure, Scout seems to follow the prompt slightly better.
In this case, we used the models 'meta-llama/llama-4-maverick' and 'meta-llama/llama-4-scout' from OpenRouter.
--
[0] - https://gist.github.com/annjose/5145ad3b7e2e400162f4fe784a14...
[1] - https://gist.github.com/annjose/d30386aa5ce81c628a88bd86111a...
[2] - https://github.com/levelup-apps/hn-enhancer
edited: To add OpenRouter model details.
This is the script that assembles the structured comments and generates the summary - https://github.com/levelup-apps/hn-enhancer/blob/main/script...
You can run it as: node summarize-comments.js <post_id> Example: node summarize-comments.js 43597782
And the summary will be put in the "output" folder.
You need to set the environment variable (in this case OPENROUTER_API_KEY because LLama4 is currently available at OpenRouter).
as another dateline, Maverick has taken #2 position on LMArena, just behind Gemini 2.5 Pro.
That Gemini 2.5 one is impressive. I found it interesting that the blog post didn't mention Gemini 2.5 at all. Okay, it was released pretty recently, but 10 days seems like enough time to run the benchmarks, so maybe the results make Llama 4 look worse?
I'm sure it does, as Gemini 2.5 Pro has been making every other model look pretty bad.
Meta will most likely compare against it when they release the upcoming Llama 4 reasoning model.
LM Arena ranks it second, just below Gemini 2.5 Pro.
> I'm a little unimpressed by its instruction following
Been trying the 109b version on Groq and it seems less capable than Gemma 3 27b
I have found the Gemini 2.5 Pro summary genuinely interesting: it adequately describes what I've read.
Have you thought about automatizing hn-summaries for say what the 5 top posts are at 8 AM EST?
That would be a simple product to test the market. If successful, it could be easily extended to a weekly newsletter summary.
This is a great idea! Exactly what I was also thinking and started working on a side-project. Currently the project can create summaries like this [1].
Since HN Homepage stories change throughtout the day, I thought it is better to create the Newsletter based on https://news.ycombinator.com/item?id=43597782
https://hnup.date/ ;)
yes this is great but I'd like to pick a different voice. the current one feels too robotic
3 replies →
Here's the link for model on openrouter: https://openrouter.ai/meta-llama/llama-4-maverick
> had a 2048 limit on output size for some reason
It's a common issue with ollama, maybe it's running something similar under the hood?
It doesn’t seem that impressive to me either.