Comment by npunt

1 year ago

I just want to run a headless browser that logs into my socials periodically and scrapes the stuff I want from the account I follow and puts it into a less addictive format that provides an upper limit on my possible exposure and engagement. I’m happy to run it locally from my device. Ideally it redirects links to socials I come across too. Where is such a thing?

This was the dream of RSS, but it went against the long-term business interests of the corporations hosting content, so it has since largely faded away

  • I literally reached this discussion thread via the RSS feed for HN.

    It’s out there. Social networking sites with a vested interest in monopolizing your attention don’t use it. So I don’t use them.

    • Outside of the tech community those see little use. Listings for my local music and arts scenes are on Instagram and Facebook. Underground community events are planned using proprietary group chats.

      Free culture and open internet activists lost this battle

      1 reply →

  • A lot of news sites still support RSS. Could be that not very many regular people understand RSS or want to use it, aside from podcasts.

You may want social media to be pipes, not platforms.

Convince the government to forbid the business model in which most of users are not paying customers but a product offered to advertisers. Then, social platforms will not care if you use whatever client you desire.

Big Social shareholders don’t want it, though. Being a double-sided market is addictive, and no one can compete with them if they capture the market by not charging money.

  • > Convince the government to forbid the business model in which most of users are not paying customers but a product offered to advertisers

    I like broadcast TV and radio (especially radio while driving) and think that most Americans would object to their removal.

    Just look at the uproar over a few NFL games being unavailable on broadcast TV for a hint as to how well such a ban might go.

    • > I like broadcast TV and radio (especially radio while driving)

      If you suggest to apply the same model to social media (where they don’t get to know about a single thing about the user, strictly one way ads) then I’d be totally for it.

      However, I don’t think they’ll find this model profitable enough (advertisers like to target), and because charging users is easier with social media compared to radio broadcasting the barrier to start doing that is lower.

      That is, disallowing profiting from PII and only allowing one-way ads in social media, while difficult to enforce will also mean they start charging users anyway. So why not skip that model altogether.

      Local broadcast models don’t work on global scale anyway.

      1 reply →

  • New issue is AI. Even sites with no ads will want limits on scraping / apis / composability.

This can be done with little AWS Lambda scripts that periodically scrape (or API) whatever sites you want and e-mail you results. All the credentials to login to whatever sites can be personal/dedicated to your instance (so no real API limits), and the usage will almost certainly fall into the AWS free tier since it's only for you.

The ideal install workflow would be to have a repo of AWS CloudFormation templates to automate the installation of the lambdas for different sites in your account. Anyone can open an AWS account, and using CloudFormation is a few fields, and a button click.

Also, if the scripts are developed properly, they are runnable locally. A sane developer will run them locally during development, and then test deployed before releasing.

  • With an AWS IP and a bot usage pattern they’ll surely ban your account pretty quickly or put you in front of a CAPTCHA. I wish it was as easy as a small script. Without anti-bot techniques, sites would be overflown by scraping bots. Try to scrape a Cloudflare protected site, for example. They’re really good in figuring out if you’re human or a bot. IIRC they even fingerprint your TLS handshake or cypher suite, which ultimately made me give up with headless Chrome and Puppeteer even after proxying through my residential IP, spoofing user-agent and screen size and rate limiting. Unfortunately, there’s no way to distinguish good bots for personal usage from bad bots.

  • In theory, anything is possible with months of developer work. The trouble is, there are billions of people addicted to social media. There aren't many widespread solutions to scrape it. Whenever a scraper becomes even remotely popular, Facebook takes action against it, as accessing posts outside the walled garden is a violation of their terms of service. Currently, I am using a combination of Feedbro and Nitter to scrape all the accounts I want to follow. They currently work with Facebook and have not been blocked.

    • Yes.

      But there is no aggregation - each user runs their own instances. For any site the offers an API, the API would need to have breaking changes to disable this, or block access from AWS.

      It's easy to make work for a developer like crowd (very little time to write). It would work for most developers just fine, and could, with more considerable development time, be good for anyone.

      Distributed guerilla social media deconstruction.

im reminded of an ipad app 10 years ago that did roughly this, creating a magazine like interaction with your twitter, google reader, etc.

This is what I made for myself at https://www.bulletyn.co - regular email digests of content from Reddit, HN, RSS feeds, etc. It's helped me significantly cut down the amount of time I spend on Reddit particularly.

I had been planning to add Twitter before the API changes...alas.

I want the vision that Rabbit is selling, a headless browser that periodically scrolls through my socials and have AI assemble all entries into a readable digest. I would settle for manually scrolling through my timetime to a screen recorder that OCRs all the text and removes the fluff.

That sounds like a case of be the change you wish to see. Such a project sounds rather substantial, not only initially but also in upkeep. Might want to be more specific, such as "does this exist for my favorite social network called <insert network here>?"