Comment by simonw

6 months ago

This thread so far (at 310 comments) summarized by Llama 4 Maverick:

    hn-summary.sh 43595585 -m openrouter/meta-llama/llama-4-maverick -o max_tokens 20000

Output: https://gist.github.com/simonw/016ea0fd83fc499f046a94827f9b4...

And with Scout I got complete junk output for some reason:

    hn-summary.sh 43595585 -m openrouter/meta-llama/llama-4-scout -o max_tokens 20000

Junk output here: https://gist.github.com/simonw/d01cc991d478939e87487d362a8f8...

I'm running it through openrouter, so maybe I got proxied to a broken instance?

I managed to run it through Scout on Groq directly (with the llm-groq plugin) but that had a 2048 limit on output size for some reason:

    hn-summary.sh 43595585 -m groq/meta-llama/llama-4-scout-17b-16e-instruct -o max_tokens 2048

Result here: https://gist.github.com/simonw/a205c5fc131a1d4e9cd6c432a07fe...

I'm a little unimpressed by its instruction following here, the summaries I get from other models are a lot closer to my system prompt. Here's the same thing against Gemini 2.5 Pro for example (massively better): https://gist.github.com/simonw/f21ecc7fb2aa13ff682d4ffa11ddc...

18 comments

simonw

georgeck 6 months ago

I tried summarizing the thread so far (339 comments) with a custom system prompt [0] and a user-prompt that captures the structure (hierarchy and upvotes) of the thread [1].

This is the output that we got (based on the HN-Companion project) [2]:

LLama 4 Scout - https://gist.github.com/annjose/9303af60a38acd5454732e915e33...

Llama 4 Maverick - https://gist.github.com/annjose/4d8425ea3410adab2de4fe9a5785...

Claude 3.7 - https://gist.github.com/annjose/5f838f5c8d105fbbd815c5359f20...

The summary from Scout and Maverick both look good (comparable to Claude), and with this structure, Scout seems to follow the prompt slightly better.

In this case, we used the models 'meta-llama/llama-4-maverick' and 'meta-llama/llama-4-scout' from OpenRouter.

[0] - https://gist.github.com/annjose/5145ad3b7e2e400162f4fe784a14...

[1] - https://gist.github.com/annjose/d30386aa5ce81c628a88bd86111a...

[2] - https://github.com/levelup-apps/hn-enhancer

edited: To add OpenRouter model details.

annjose 6 months ago

This is the script that assembles the structured comments and generates the summary - https://github.com/levelup-apps/hn-enhancer/blob/main/script...
You can run it as: node summarize-comments.js <post_id> Example: node summarize-comments.js 43597782
And the summary will be put in the "output" folder.
You need to set the environment variable (in this case OPENROUTER_API_KEY because LLama4 is currently available at OpenRouter).
khimaros 6 months ago

as another dateline, Maverick has taken #2 position on LMArena, just behind Gemini 2.5 Pro.

mkl 6 months ago

That Gemini 2.5 one is impressive. I found it interesting that the blog post didn't mention Gemini 2.5 at all. Okay, it was released pretty recently, but 10 days seems like enough time to run the benchmarks, so maybe the results make Llama 4 look worse?

jjani 6 months ago

I'm sure it does, as Gemini 2.5 Pro has been making every other model look pretty bad.
az226 6 months ago

Meta will most likely compare against it when they release the upcoming Llama 4 reasoning model.
utopcell 6 months ago

LM Arena ranks it second, just below Gemini 2.5 Pro.

tarruda 6 months ago

> I'm a little unimpressed by its instruction following

Been trying the 109b version on Groq and it seems less capable than Gemma 3 27b

csdvrx 6 months ago

I have found the Gemini 2.5 Pro summary genuinely interesting: it adequately describes what I've read.

Have you thought about automatizing hn-summaries for say what the 5 top posts are at 8 AM EST?

That would be a simple product to test the market. If successful, it could be easily extended to a weekly newsletter summary.

georgeck 6 months ago

This is a great idea! Exactly what I was also thinking and started working on a side-project. Currently the project can create summaries like this [1].
Since HN Homepage stories change throughtout the day, I thought it is better to create the Newsletter based on https://news.ycombinator.com/item?id=43597782
yunusabd 6 months ago
https://hnup.date/ ;)
- toinewx 6 months ago
  
  yes this is great but I'd like to pick a different voice. the current one feels too robotic
  
  3 replies →

kristianp 6 months ago

Here's the link for model on openrouter: https://openrouter.ai/meta-llama/llama-4-maverick

eamag 6 months ago

> had a 2048 limit on output size for some reason

It's a common issue with ollama, maybe it's running something similar under the hood?

mberning 6 months ago

It doesn’t seem that impressive to me either.