Comment by simonw

14 days ago

This thread so far (at 310 comments) summarized by Llama 4 Maverick:

    hn-summary.sh 43595585 -m openrouter/meta-llama/llama-4-maverick -o max_tokens 20000

Output: https://gist.github.com/simonw/016ea0fd83fc499f046a94827f9b4...

And with Scout I got complete junk output for some reason:

    hn-summary.sh 43595585 -m openrouter/meta-llama/llama-4-scout -o max_tokens 20000

Junk output here: https://gist.github.com/simonw/d01cc991d478939e87487d362a8f8...

I'm running it through openrouter, so maybe I got proxied to a broken instance?

I managed to run it through Scout on Groq directly (with the llm-groq plugin) but that had a 2048 limit on output size for some reason:

    hn-summary.sh 43595585 -m groq/meta-llama/llama-4-scout-17b-16e-instruct -o max_tokens 2048

Result here: https://gist.github.com/simonw/a205c5fc131a1d4e9cd6c432a07fe...

I'm a little unimpressed by its instruction following here, the summaries I get from other models are a lot closer to my system prompt. Here's the same thing against Gemini 2.5 Pro for example (massively better): https://gist.github.com/simonw/f21ecc7fb2aa13ff682d4ffa11ddc...

I tried summarizing the thread so far (339 comments) with a custom system prompt [0] and a user-prompt that captures the structure (hierarchy and upvotes) of the thread [1].

This is the output that we got (based on the HN-Companion project) [2]:

LLama 4 Scout - https://gist.github.com/annjose/9303af60a38acd5454732e915e33...

Llama 4 Maverick - https://gist.github.com/annjose/4d8425ea3410adab2de4fe9a5785...

Claude 3.7 - https://gist.github.com/annjose/5f838f5c8d105fbbd815c5359f20...

The summary from Scout and Maverick both look good (comparable to Claude), and with this structure, Scout seems to follow the prompt slightly better.

In this case, we used the models 'meta-llama/llama-4-maverick' and 'meta-llama/llama-4-scout' from OpenRouter.

--

[0] - https://gist.github.com/annjose/5145ad3b7e2e400162f4fe784a14...

[1] - https://gist.github.com/annjose/d30386aa5ce81c628a88bd86111a...

[2] - https://github.com/levelup-apps/hn-enhancer

edited: To add OpenRouter model details.

  • This is the script that assembles the structured comments and generates the summary - https://github.com/levelup-apps/hn-enhancer/blob/main/script...

    You can run it as: node summarize-comments.js <post_id> Example: node summarize-comments.js 43597782

    And the summary will be put in the "output" folder.

    You need to set the environment variable (in this case OPENROUTER_API_KEY because LLama4 is currently available at OpenRouter).

  • as another dateline, Maverick has taken #2 position on LMArena, just behind Gemini 2.5 Pro.

That Gemini 2.5 one is impressive. I found it interesting that the blog post didn't mention Gemini 2.5 at all. Okay, it was released pretty recently, but 10 days seems like enough time to run the benchmarks, so maybe the results make Llama 4 look worse?

  • I'm sure it does, as Gemini 2.5 Pro has been making every other model look pretty bad.

  • Meta will most likely compare against it when they release the upcoming Llama 4 reasoning model.

> I'm a little unimpressed by its instruction following

Been trying the 109b version on Groq and it seems less capable than Gemma 3 27b

I have found the Gemini 2.5 Pro summary genuinely interesting: it adequately describes what I've read.

Have you thought about automatizing hn-summaries for say what the 5 top posts are at 8 AM EST?

That would be a simple product to test the market. If successful, it could be easily extended to a weekly newsletter summary.

> had a 2048 limit on output size for some reason

It's a common issue with ollama, maybe it's running something similar under the hood?