Comment by simonw
5 days ago
S3: "Block Public Access is now enabled by default on new buckets."
On the one hand, this is obviously the right decision. The number of giant data breeches caused by incorrectly configured S3 buckets is enormous.
But... every year or so I find myself wanting to create an S3 bucket with public read access to I can serve files out of it. And every time I need to do that I find something has changed and my old recipe doesn't work any more and I have to figure it out again from scratch!
The thing to keep in mind with the "Block Public Access" setting is that is a redundancy built in to save people from making really big mistakes.
Even if you have a terrible and permissive bucket policy or ACLs (legacy but still around) configured for the S3 bucket, if you have Block Public Access turned on - it won't matter. It still won't allow public access to the objects within.
If you turn it off but you have a well scoped and ironclad bucket policy - you're still good! The bucket policy will dictate who, if anyone, has access. Of course, you have to make sure nobody inadvertantly modifies that bucket policy over time, or adds an IAM role with access, or modifies the trust policy for an existing IAM role that has access, and so on.
I think this is the key of why I find it confusing: I need a very clear diagram showing which rules override which other rules.
My understanding is that there isn't actually any "overriding" in the sense of two rules conflicting and one of them having to "win" and take effect. I think it's more that an enabled rule always is in effect, but it might overlap with another rule, in which case removing one of them still won't remove the restrictions on the area of overlap. It's possible I'm reading too much into your choice of words, but it does sound like there's a chance that the confusion is stemming from an incorrect assumption of how various permissions interact.
That being said, there's certain a lot more that could into making a system like that easier for developers. One thing that springs to mind is tooling that can describe what rules are currently in effect that limit (or grant, depending on the model) permissions for something. That would make it more clear when there are overlapping rules that affect the permissions of something, which in turn would make it much more clear why something is still not accessible from a given context despite one of the rules being removed.
2 replies →
They don't really override each other but they act like stacked barriers, like a garage door blocking access to an open or closed car. Access is granted if every relevant layer allows it.
This sort of thing drives me nuts in interviews, when people are like, are you familiar with such-and-such technology?
Yeah, what month?
If you're aware of changes, then explain that there were changes over time, that's it
You seem to be lacking the experience of what actually happens in interviews.
You say this, someone challenges you, now you're on the defensive during an interview and everyone has a bad taste in their mouth. Yeah, that's how it goes.
3 replies →
I just stick CloudFront in front of those buckets. You don't need to expose the bucket at all then and can point it at a canonical hostname in your DNS.
That’s definitely the “correct” way of doing things if you’re writing infra professionally. But I do also get that more casual users might prefer not to incur the additional costs nor complexity of having CloudFront in front. Though at that point, one could reasonably ask if S3 is the right choice for causal users.
S3 + cloudfront is also incredibly popular so you can just find recipes for automating that in any technology you want, Terraform, ansible, plain bash scripts, Cloudformation (god forbid)
22 replies →
I'd argue putting CloudFront on top of S3 is less complex than getting the permissions and static sharing setup right on S3 itself.
1 reply →
It's actually incredibly cheap. I think our software distribution costs, in the account I run, are around $2.00 a month. That's pushing out several thousand MSI packages a day.
2 replies →
>S3 is the right choice for causal users.
It's so simple for storing and serving a static website.
Are there good and cheap alternatives?
3 replies →
For the sake of understanding, can you explain why putting CloudFront in front of the buckets helps?
Cloudfront allows you to map your S3 with both
- signed url's in case you want a session base files download
- default public files, for e.g. a static site.
You can also map a domain (sub-domain) to Cloudfront with a CNAME record and serve the files via your own domain.
Cloudfront distributions are also CDN based. This way you serve files local to the users location, thus increasing the speed of your site.
For lower to mid range traffic, cloudfront with s3 is cheaper as the network cost of cloudfront is cheaper. But for large network traffic, cloudfront cost can balloon very fast. But in those scenarios S3 costs are prohibitive too!
Not always that simple - for example if you want to automatically load /foo/index.html when the browser requests /foo/ you'll need to either use the web serving feature of S3 (bucket can't be private) or set up some lambda at edge or similar fiddly shenanigans.
I’m getting deja vu, didn’t they already do this like 10 years ago because people kept leaving their buckets wide open?
This is exactly what I use LLMs for. To just read the docs for me and pull out the base level demo code that's buried in all the AWS documentation.
Once I have that I can also ask it for the custom tweaks I need.
Back when GPT4 was the new hotness, I dumped the markdown text from the Azure documentation GitHub repo into a vector index and wrapped a chatbot around it. That way, I got answers based on the latest documentation instead of a year-old LLM model's fuzzy memory.
I now have the daunting challenge of deploying an Azure Kubernetes cluster with... shudder... Windows Server containers on top. There's a mile-long list of deprecations and missing features that were fixed just "last week" (or whatever). That is just too much work to keep up with for mere humans.
I'm thinking of doing the same kind of customised chatbot but with a scheduled daily script that pulls the latest doco commits, and the Azure blogs, and the open GitHub issue tickets in the relevant projects and dumps all of that directly into the chat context.
I'm going to roll up my sleeves next week and actually do that.
Then, then, I'm going to ask the wizard in the machine how to make this madness work.
Pray for me.
I just want a service that does this. Pulls in the latest docs into a vector db with a chat or front-end. Not the windows containers bit.
This could not possibly go wrong...
You're braver than me if you're willing to trust the LLM here - fine if you're ready to properly review all the relevant docs once you have code in hand, but there are some very expensive risks otherwise.
This is LLM as semantic search- so it's way way easier to start from the basic example code and google to confirm that it's correct than it is to read the docs from scratch and piece together the basic example code. Especially for things like configurations and permissions.
1 reply →
There’s nothing brave in this. It generally works the way it should and even if it doesn’t - you just go back to see what went wrong.
I take code from stack overflow all the time and there’s like a 90% chance it can work. What’s the difference here?
7 replies →
They'll teach you how for $250 and a certification test...
I honestly don't mind that you have to jump through hurdles to make your bucket publically available and that it's annoying. That to me seems like a feature, not a bug
I think the OPs objection is not that hurdles exist but that they move them every time you try and run the track.
Sure... but last time I needed to jump through those hurdles I lost nearly an hour to them!
I'm still not sure I know how to do it if I need to again.