← Back to context

Comment by INTPenis

8 hours ago

Lazy has nothing to do with it, codeberg simply doesn't work.

Most of my friends who use codeberg are staunch cloudflare-opponents, but cloudflare is what keeps Gitlab alive. Fact of life is that they're being attacked non-stop, and need some sort of DDoS filter.

Codeberg has that anubis thing now I guess? But they still have downtime, and the worst thing ever for me as a developer is having the urge to code and not being able to access my remote. That is what murders the impression of a product like codeberg.

Sorry, just being frank. I want all competitors to large monopolies to succeed, but I also want to be able to do my job/passion.

Maybe I'm too old school, but both GitHub and Codeberg for me are asyncronous "I want to send/share the code somehow", not "my active workspace I require to do work". But reading

> the worst thing ever for me as a developer is having the urge to code and not being able to access my remote.

Makes it seem like GitHub/Codeberg has to be online for you to be able to code, is that really the case? If so, how does that happen, you only edit code directly in the GitHub web UI or how does one end up in that situation?

  • For me it's a soft block rather than a hard block. I use multiple computers so when I switch to the other one I usually do a git pull, and after every commit I do a push. If that gets interrupted, then I have resort to things like rsyncing over from the other system, but more than once I've lost work that way. I'm strongly considering just standing up a VM and using "just git" and foregoing any UI, but I make use of other features like CI/CD and Releases for distribution, so the VM strategy is still just a bandaid. When the remote is unavailable, it can be very disruptive.

    • > If that gets interrupted, then I have resort to things like rsyncing over from the other system

      I'm guessing you have SSH access between the two? You could just add it as another remote, via SSH, so you can push/pull directly between the two. This is what I do on my home network to sync configs and other things between various machines and OSes, just do `git remote add other-host git+ssh://user@10.55/~/the-repo-path` or whatever, and you can use it as any remote :)

      Bonus tip: you can use local paths as git remote URLs too!

      > but more than once I've lost work that way.

      Huh, how? If you didn't push it earlier, you could just push it later? Some goes for pull? I don't understand how you could lose anything tracked in git, corruption or what happened?

      1 reply →

    • > just standing up a VM and using "just git"

      That's what I do. Control your entire world yourself.

    • If you can rsync from the other system, and likely have an SSH connection between them, why don't you just add it as an additional remote and git pull from it directly?

      5 replies →

  • For some projects, the issue tracker is a pretty integral part of the documentation. Sure, you can host your own issue tracker somewhere, but that's still shifting a center point somewhere, in a theoretically decentralized system. I've frequently wished the issue tracker was part of the repository. Also -- love them or hate them -- LLMs would probably love that too.

  • My main exposure to Codeberg is Zig and it has an issue tracker there and I pull in changes from it.

    For how infrequent I interface with Codeberg I have to say that my experience has been pretty terrible when it comes to availability.

    So I guess the answer is: the availability is bad enough that even infrequent interactions with it are a problem.

  • > Makes it seem like GitHub/Codeberg has to be online for you to be able to code, is that really the case?

    I can understand that work with other active contributors, but I agree with you that it is a daft state of affairs for a solo or mostly-solo project.

    Though if you have your repo online even away from the big places, it will get hit by the scrapers and you will end up with admin to do because of that, even if it doesn't block your normal workflow because your main remote is not public.

  • You’re right this is the proper way to use git. And I encourage developers to use their own cloud storage (or remote volume) for their primary remote.

    Even with the best habits, there will be the few times a month where you forgot to push everything up and you’re blocked from work.

    Codeberg needs to meet the highest ability levels for it to be viable.

  • I was shaking my head in disbelief when reading that part too. I mean, git's whole raison d'etre, back when it was introduced, was that you do not need online access to the repo server most of the time.

    • It's getting even worse if you read the thread about Claude going down the other day. People were having mini panic attacks.

    • > I mean, git's whole raison d'etre, back when it was introduced, was that you do not need online access to the repo server most of the time.

      So what ? That's not how most people prefer to use it.

    • > git's whole raison d'etre […] was that you do not need online access to the repo server most of the time

      Not really. The point of git was to make Linus' job of collating, reviewing, and merging, work from a disparate team of teams much less arduous. It just happens that many of the patterns needed for that also mean making remote temporarily disconnected remote repositories work well.

      2 replies →

I've had the same experience.

Philosophically I think it's terrible that Cloudflare has become a middleman in a huge and important swath of the internet. As a user, it largely makes my life much worse. It limits my browser, my ability to protect myself via VPNs, etc, and I am just browsing normally, not attacking anything. Pragmatically though, as a webmaster/admin/whatever you want to call it nowadays, Cloudflare is basically a necessity. I've started putting things behind it because if I don't, 99%+ of my traffic is bots, and often bots clearly scanning for vulnerabilities (I run mostly zero PHP sites, yet my traffic logs are often filled with requests like /admin.php and /wp-admin.php and all the wordpress things, and constant crawls from clearly not search engines that download everything and use robots.txt as a guide of what to crawl rather than what not to crawl. I haven't been DDoSed yet, but I've had images and PDFs and things downloaded so many times by these things that it costs me money. For some things where I or my family are the only legitimate users, I can just firewall-cmd all IPs except my own, but even then it's maintenance work I don't want to have to do.

I've tried many of the alternatives, and they often fail even on legitimate usecases. I've been blocked more by the alternatives than I have by Cloudflare, especially that one that does a proof of work. It works about 80% of the time, but that 20% is really, really annoying to the point that when I see that scren pop up I just browse away.

It's really a disheartening state we find ourselves in. I don't think my principles/values have been tested more in the real world than the last few years.

  • Either I am very lucky or what I am doing has zero value to bots, because I've been running servers online for at least 15 years, and never had any issue that couldn't be solved with basic security hygiene. I use cloudflare as my DNS for some servers, but I always disable any of their paid features. To me they could go out of business tomorrow and my servers would be chugging along just fine.

    • Sometime it is not security , it could be just bandwidth or CPU.

      I have website small enough not to attract too many bot, but sometime, something very innocent can bring my website down.

      For example, I put a php ical viewer.. and some crawler start loading the calendar page, taking up all the cpu cycle.

      1 reply →

  • > and use robots.txt as a guide of what to crawl rather than what not to crawl

    Mental note, make sure my robots.txt files contain a few references to slowly returning pages full of almost nonsense that link back to each other endlessly…

    Not complete nonsense, that would be reasonably easy to detect and ignore. Perhaps repeats of your other content with every 5th word swapped with a random one from elsewhere in the content, every 4th word randomly misspelt, every seventh word reversed, every seventh sentence reversed, add a random sprinkling of famous names (Sir John Major, Arc de Triomphe, Sarah Jane Smith, Viltvodle VI) that make little sense in context, etc. Not enough change that automatic crap detection sees it as an obvious trap, but more than enough that ingesting data from your site into any model has enough detrimental effect to token weightings to at least undo any beneficial effect it might have had otherwise.

    And when setting traps like this, make sure the response is slow enough that it won't use much bandwidth, and the serving process is very lightweight, and just in case that isn't enough make sure it aborts and errors out if any load metric goes above a given level.

    • Hot damn, this is a great idea! Reminds me fondly of an old project a friend and I built that looks like an SSH prompt or optionally an unauthed telnet listener, which looks and feels enough like a real shell that we would capture some pretty fascinating sessions of people trying to explore our system or load us with malware. Eventually somebody figured it out and then DDoSed the hell out of our stuff and would not stop hassling us. It was a good reminder that yanking people's chains sometimes really pisses them off and can attract attention and grudges that you really don't want. My friend ended up retiring his domain because he got tired of dealing with the special attention. It did allow us to capture some pretty fascinating data though that actually improved our security while it lasted.

    • This is one reason why most crawlers ignore robots.txt now. The other reason is that bandwidth/bots are cheap enough now that they don't need web admins to help them optimize their crawlers

  • While I sympathise, I disagree with your stance. Cloudflare handle a large % of the Internet now because of people putting sites that, as you admitted, don't need to be behind it there.

OP is about Github. Have you seen the Github uptime monitor? It’s at 90% [1] for the last 90 days. I use both Codeberg and Github a lot and Github has, by far, more problems than Codeberg. Sometimes I notice slowdowns on Codeberg, but that’s it.

[1] https://mrshu.github.io/github-statuses/

  • To be fair, Github has several magnitudes higher of users running on it than Codeberg. I'm also a Codeberg user, but I don't think anyone has seen a Forgejo/Gitea instance working at the scale of Github yet.

    • I don't think OP was making a value judgment or anything. It's just weird to say you won't consider Codeberg because you need reliability when Codeberg's uptime is at 100% and Github's is at 90%.

    • To be fair, GitHub has several magnitudes higher of revenue to support that. Including from companies like mine who are paying them good money and get absolutely sub-par service and reliability from them. I'd be happy for Codeberg to take my money for a better service on the core feature set (git hosting, PRs, issues). I can take my CI/CD elsewhere, we self-host runners anyway.

    • I think the idea is that a Forgejo/Gitea instance should never have to work at anywhere near the scale of GitHub. Codeberg provides its Forgejo host as a convenience/community thing but it's not being built to be a central service.

My own git server has been hit severely by scrapers. They're scraping everything. Commits, comparisons between commits, api calls for files, everything.

And pretty much all of them, ByteDance, OpenAI, AWS, Claude, various I couldn't recognize. I basically just had to block all of them to get reasonable performance for a server running on a mini-pc.

I was going to move to codeberg at some point, but they had downtime when I was considering it, I'd rather deal with that myself then.

  • Anyone actually scraping git repos would probably just do a 'git clone'. Crawling git hosts is extremely expensive, as git servers have always been inadvertent crawler traps.

    They generate a URL for every version of every file on every commit and every branch and tag, and if that wasn't enough, n(n+1)/2 git diffs for every file on every commit it has exited on. Even a relatively small git repo with a few hundred files and commit explodes into millions of URLs in the crawl frontier. Server side many of these are very expensive to generate as well so it's really not a fantastic interaction, crawler and git host.

    If you run a web crawler, you need to add git host detection to actively avoid walking into them.

The whole point of git is to be decentralized so there is no reason for you to not have your current version available even when a remote is offline.

  • The original intent of the authors is by now irrelevant. The current "point" of git is that it's the most used version control solution, with good tooling support from third parties. Nothing more. And most people prefer to use it in a centralised fashion.

  • How do people even on hacker news of all places conflate git with a code hosting platform all the time? Codeberg, GitHub or whatever are for tracking issues, running CI, hosting builds, and much more.

    The idea that you shouldn't need a code hosting platform because git is decentralized is so out of place that it is genuinely puzzling how often it pops up.

    • How do people on hacker news keep having reading issues?

      The parent post mentionned: "the worst thing ever for me as a developer is having the urge to code and not being able to access my remote."

      Emphasis one "code", not triaging issues, merging other people's branches, etc.

      Besides there are tools to sync forgejo repositories including PRs and issues.

    • OP didn't conflate them.

      They said they want to be able to rely on their git remote.

      The people responding are saying "nah, an unreliable remote is fine because you can use other remotes" which doesn't address their problem. If Codeberg is unreliable, then why use it at all? Especially for CI, issues, and collab?

      1 reply →

  • It's also trivial to have multiple remotes, I do in most of my repos. When one has issues I just push to the other instead of both.

Probably has happened at some point, but personally, I have not been hit with/experienced downtime of Codeberg yet. The other day however GitHub was down again. I have not used Gitlab for a while, and when I used it, it worked fine, and its CI seems saner than Github's to me, but Gitlab is not the most snappy user experience either.

Well, Codeberg doesn't have all the features I did use of Gitlab, but for my own projects I don't really need them either.

> for me as a developer is having the urge to code and not being able to access my remote

I think that's the moment when you choose to self host your whatever git wrapper. It really isn't that complicated to do and even allows for some fun (as in cheap and productive) setups where your forge is on your local network or really close to your region and you (maybe) only mirror or backup to a bigger system like Codeberg/GitHub.

In our case, we also use that as an opportunity to mirror OCI/package repositories for dependencies we use in our apps and during development so not only builds are faster but also we don't abuse free web endpoints with our CI/CD requests.

I agree. I switched to Codeberg but switched back after a few months. Funny enough, I found there to be more unreported downtime on Codeberg than GitHub.

> Lazy has nothing to do with it, codeberg simply doesn't work.

Been working on it for months now, it does work, lol.

I find irony in that Git was made to get rid of central repos, and then we re-introduce them.

  • That is what we have been doing for quite some time now, from what I gathered. Every time I see something becoming popular, I am like "Hmm, I've seen this before", and I really have. They just gave it a fancier name with a fancier logo and did some marketing and there you go, old is new.