Comment by warpech
3 years ago
GitHub is a great example of mostly server-side generated HTML. Their output HTML is very stable: the structure rarely changes, is semantic, logical, and self-documenting. This has helped hundreds of browser extensions (e.g. Refined GitHub, ZenHub) and userscripts, including dozens of my own.
Of course, treating HTML as API/data exchange format is fragile and might be seen as hindering progress by some of the GitHub staff. However, it has been like that for many years and was beneficial for the community. Perhaps not that beneficial for GitHub, who would prefer integrations over the official API.
If GitHub's HTML changes to a dynamic React-powered div-soup, that might be the end of browser extensions and userscripts. And another reason for power users to flock to other platforms.
Edit: React does not necessarily mean div-soup, but I have seen too many React-powered div-soups to expect GitHub's HTML to stay the same.
> If GitHub's HTML changes to a dynamic React-powered div-soup, that might be the end of browser extensions and userscripts.
But that feels more like a failure of user script developers (myself included). Why don't we have great tooling for modifying React apps (or do we but I don't know about it?)? The amount of them is already high and only going to increase.
For modifying the Tidal UI I had to use MutationObservers, tied to how the components will render. It's bad, I'd rather modify the React app more directly, but hey, it does work reliably (albeit only because the app doesn't change much): https://gist.github.com/solarkraft/edd9d49bcf0f548b1aa285da7...
Can we do better?
> Can we do better?
I hope yes! Some people are interested in this field. I think, we would need more mor flexible, adaptive scrapers based on content not the HTML structure. This week alone, I had interactions about "AI assisted web scraping" with two people:
- https://news.ycombinator.com/item?id=33555426