← Back to context

Comment by philipkglass

9 days ago

I would like to be able to pull content out of the Wayback Machine with a proper API [1]. I'd even be willing to pay a combination of per-request and per-gigabyte fees to do it. But then I think about the Archive's special status as a non-profit library, and I'm not sure that offering paid API access (even just to cover costs) is compatible with the organization as it exists.

[1] It looks like this might exist at some level, e.g. https://github.com/hartator/wayback-machine-downloader, but I've been trying to use this for a couple of weeks and every day I try I get a HTTP 5xx error or "connection refused."

https://github.com/internetarchive/wayback/tree/master/wayba...

https://akamhy.github.io/waybackpy/

https://wiki.archiveteam.org/index.php/Restoring

  • Yes, there are documents and third party projects indicating that it has a free public API, but I haven't been able to get it to work. I presume that a paid API would have better availability and the possibility of support.

    I just tried waybackpy and I'm getting errors with it too when I try to reproduce their basic demo operation:

      >>> from waybackpy import WaybackMachineSaveAPI
      >>> url = "https://nuclearweaponarchive.org"
      >>> user_agent = "Mozilla/5.0 (Windows NT 5.1; rv:40.0) Gecko/20100101 Firefox/40.0"
      >>> save_api = WaybackMachineSaveAPI(url, user_agent)
      >>> save_api.save()
      Traceback (most recent call last):
        File "<python-input-4>", line 1, in <module>
          save_api.save()
          ~~~~~~~~~~~~~^^
        File "/Users/xxx/nuclearweapons-archive/venv/lib/python3.13/site-packages/waybackpy/save_api.py", line 210, in save
          self.get_save_request_headers()
          ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^^
        File "/Users/xxx/nuclearweapons-archive/venv/lib/python3.13/site-packages/waybackpy/save_api.py", line 99, in get_save_request_headers
          raise TooManyRequestsError(
          ...<4 lines>...
          )
      waybackpy.exceptions.TooManyRequestsError: Can not save 'https://nuclearweaponarchive.org'. Save request refused by the server. Save Page Now limits saving 15 URLs per minutes. Try waiting for 5 minutes and then try again.

    • Reach out to patron services, support @ archive dot org. Also, your API limits will be higher if you specify your API key from your IA user versus anonymous requests when making requests.

I wish there were some kind of file search for the Wayback Machine. Like "list all .S3M files on members.aol.com before 1998". It would've made looking for obscure nostalgia much easier.