Comment by danielhanchen
10 hours ago
Oh I wrote up a post on X on this exact question! https://x.com/danielhanchen/status/1979389893165060345?s=20
1. Cursor used online RL to get +28% approval rate: https://cursor.com/blog/tab-rl
2. Vercel used RFT for their AutoFix model for V0: https://vercel.com/blog/v0-composite-model-family
3. Perplexity's Sonar for Deep Research Reasoning I think was a finetuned model: https://docs.perplexity.ai/docs/getting-started/overview
4. Doordash uses LoRA, QLoRA for a "Generalized Attribute Extraction model" https://careersatdoordash.com/blog/unleashing-the-power-of-l...
5. NASA flood water detection https://earthdata.nasa.gov/news/nasa-ibm- openly-release-geospatial-ai-foundation-model-nasa-earth-observation-data6
6. Online RL for robotics - imagine you teaching a robot in the future via some mini finetuning
7. OpenAI's RFT page has more: https://developers.openai.com/api/docs/guides/rft-use-cases
8. For larger models - https://www.mercor.com/blog/expert-data-drives-model-perform...
No comments yet
Contribute on Hacker News ↗