Cloudflare Flagship

17 hours ago (developers.cloudflare.com)

Never underestimate the power of a zero-network-hop abstraction over f(feature_name, context).

And context can be extremely tailored to your niche: specific inventory, from a specific supplier, for a specific user of a specific B2B client of a specific business model subtype, who should or shouldn’t see certain features on that specific inventory at certain times.

When you can write your own logic, and just run this in a tight loop as easily and performantly as you can use a constant, it makes your business incredibly agile. Think some text might change for some customers? Just write the code to make it configurable, and you get tests and flags for free.

Sadly, that zero-hop setup requires a sophisticated client execution engine, which it doesn’t appear Cloudflare has implemented here. Makes sense for their memory constrained workers, less sense for traditional infrastructure.

Statsig has an approach here that I quite like:

> To be able to do this, Server SDKs hold the entire ruleset of your project in memory - a representation of each gate or experiment in JSON. On client SDKs, we evaluate all of the gates/experiments when you call initialize - on our servers.

https://docs.statsig.com/sdks/how-evaluation-works

You can also roll your own - just sync your rulesets to a few data structures every few seconds in a background thread and atomically swap the reference to them. Then you just need a CRUD interface over the applicability ruleset dimensions.

Just be careful to have governance on who can play with which would-be constants. Great power and great responsibility and all that!

  • When reading your comment, it just reminds me on how feature flags can be misused as application configuration/customization. An antipattern i could observe at various organzations already.

    For me feature flags go along with trunk based development to enable features in QA settings, but not on PROD yet, for PO/PM testing. Trunk based development allows for fast/easy devops, without complicated branching strategies.

    Application configuration is, for me, part of the application and has the business context for customizing the application accordingly. Not sure if there are specific frameworks/tools out there. But one should clearly distinguish these two.

    • Yes, feature flags are conflated with dynamic configs (or "remote configs"). The difference is crucial, hence why people are talking past each other.

      https://docs.statsig.com/dynamic-config/overview

      https://engineering.atspotify.com/2020/10/spotifys-new-exper...

      https://blog.x.com/engineering/en_us/topics/infrastructure/2...

      Or in code:

        # Feature flag — variant gate for rollout
        if sdk.check_gate(user, "new_checkout_flow") == 'open':
            render_new_checkout()
        elif sdk.check_gate(user, "new_checkout_flow") == 'warning':
            render_warning_checkout()
        else:
            render_old_checkout()
      
        # Dynamic config — structured values for tuning behavior
        config = sdk.get_config(user, "checkout_settings")
        timeout_ms   = config.get("timeout_ms", 5000)
        max_items    = config.get("max_items", 50)
        allowed_tlds = config.get("allowed_tlds", [".com", ".org"])
      

      Feature flags ought to be temporary — you ship the feature, ramp it to 100%, then delete the flag. Dynamic configs are more ok as permanent knobs — rate limits, feature parameters, business rules you expect a PM to keep tuning. Think gating a new feature vs changing text copy on a website. Former has much higher chance of breakage because it's meant to change control flow, latter is relatively safe.

      In practice, the forcing function for the separation of the two concepts are experimentation platforms: when human-control of feature flags is shared with a new automated allocation system that manipulates dynamic config under the hood. That's how Statsig built their system and, in part, why they could sell for a billion. Whereas companies that ignored the difference, like LaunchDarkly, struggled outside of feature flags.

    • > it just reminds me on how feature flags can be misused as application configuration/customization. An antipattern i could observe at various organzations already.

      feature flags are perfect for configuration and customization, why using them for this purpose is 'misuse' is beyond me and I've heard this claim from multiple people. they're literally configuration. feature with a flag to turn it on, off or give the flag a value. where's the misuse? is it a problem I'm not running experiments when switching over redis to valkey or whatever?

      7 replies →

  • Which is not hard to do (it is a modulo over a mersenne twister or something similar), but in my recent gigs just Flipper with optional "state of the flags table as of now" endpoint was more than enough. That modulo+random combo required tools like LaunchDarkly to ship SDKs in several languages, and the ones I had to work with were just plain horrible fit for their language of choice. But because the evaluation was relegated to the edge, the whole system got way more complex than desirable. In actuality, I think a refetch of the current flags table "for this customer" every so often is just fine, and way less of a nuisance.

    So glad Flipper exists and I don't have to deal with this stuff anymore.

  • > Sadly, that zero-hop setup requires a sophisticated client execution engine, which it doesn’t appear Cloudflare has implemented here.

    It doesn't have to be sophisticated and they don't need to implement it themselves. They piggy-back on OpenFeature where the client libraries have a simple targeting rule evaluation engine integrated.

  • Statsig has worked great at my work, really polished and rich feature set. Their tooling to identify unused flags as candidates for removal is neat.

    The per-seat billing we have in our agreement is a bit rough but it's workable.

    • Statsig is a half-baked product bought out by OpenAI for data harvesting. We already reported 2 documentation issues and 1 critical technical issue, and we're barely using it.

      1 reply →

  • > Sadly, that zero-hop setup requires a sophisticated client execution engine, which it doesn’t appear Cloudflare has implemented here. Makes sense for their memory constrained workers, less sense for traditional infrastructure.

    wait what? what kind of logic do you need to do that CF Workers can't do?

Gold-plated booleans-as-a-service

  • I’ve seen whole teams at companies set up fail to provide these booleans-as-a-service well. There are whole companies like LaunchDarkly for them.

    If you boil it down to this, you may as well boil down every service that exists to bits-as-a-service.

    Turns out theres legitimate business value in these things, and complexity in delivering them.

  • I don't mind it. I don't want to keep track of thousands of feature flags in my DB, have to create an admin dash, etc.

    You could call any SaaS tool "excel-as-a-service" and it would hold the same power as your comment.

Looking at the docs for their JS SDK, they have this warning:

> The client provider requires an API token to fetch flag values. This token is not scoped to a single app, so anyone with the token can evaluate flags across all apps in your account. Use the client provider with caution in public-facing applications.

https://developers.cloudflare.com/flagship/sdk/client-provid...

Can anyone clarify... why the client SDK, designed to be deployed to browsers, requires caution? Does this mean that any client could send requests with a new targetingKey and observe other users' flags?

While flags probably shouldn't be critical information, this seems like an interesting design choice.

This is nice, but I’m still waiting for this to be delivered (which ironically is probably using Flagship):

https://blog.cloudflare.com/enterprise-grade-features-for-al...

—-

I don’t believe a single enterprise only feature has made its way to lower tier (paid) account yet.

I’m most interested in:

https://developers.cloudflare.com/speed/optimization/content...

I am a mere mortal when it comes to understanding the technicalities, but I know i find it relatively easy to use Cloudflare and all I want to say is keep up the good work.

Cloudflare are winning these days, they’re just lacking good fine grained permissions. You still have to make an entirely separate account for prod, which messes up SSO since one domain can only be bound to one account.

I've never understood feature flags. How are they fundamentally different to a Boolean in a database?

  • The flags (whether they be booleans, strings, numbers, or anything else) are the trivial part. It's the targeting and rollout rules (i.e. who gets to see which flags), and the requirements for extremely fast and consistent evaluation of these rules, that can get surprisingly complicated fast, and folks who have rolled their own usually find that product management or marketing or sales wants to target using more complex rules, and the problem balloons.

    I agree that problem is not particularly hard in the grand scheme of things, but it is actually quite big, meaning it requires a lot of features that aren't obvious at first glance.

    Edit: Thought of another analogy that may help explain the complexity. At their heart, feature flags are really a permissioning system: only certain users get access to certain pieces of functionality. Anyone who has ever dealt with permission systems know how complex they can be: group membership, including hierarchical groups, roles, ACLs, etc. All of those things are really analogous (actually, a subset really) to the various types of targeting rules that can be used in a feature flags system.

  • Percentage rollouts, RBAC, audit history, A/B testing, multivariate - it gets complex quick. That boolean turns into a whole system you have to maintain and operate.

  • They're not always booleans - for example, we often see feature flags being used for A/B rollouts.

    Cloudflare themselves even uses them internally as such, by shipping new features/builds to their free customers first, and then progressively larger customers after a settling period.

    Feature flags can also be randomly turned on, for a sort of fuzz testing. Don't think of them just as 'new things' - it could be 'changed behavior'.

    I guess you could think of them as a boolean on every client but they're generally not implemented that way.

    • Really any "constant". Failure thresholds, timeouts, API versions or endpoints, LLM model id

  • This is just an implementation detail, a feature flag can very well be implemented with a Boolean in a database.

    To me the main appeal of feature flags is that they allow to work on large features that often require months and many commits to finish in a main branch. This, at least to me, results in a more lightweight and more iterative development process. This contrasts with maintaining a separate branch, with perhaps separate deployment target for a large in-development features.

  • efficient delivery of the single bit (and especially the flip event) to the desired audience is the use case. the actual payload almost doesn't matter as long as it's reasonably small.

  • These are booleans with a bit more context. They may only apply to a particular geographic area, and may have dependencies: if we turn off flag X, we automatically turn off flag Y.

  • It's the tooling around them.

    How do you set a boolean to only return true for queries to 5% of the fleet? And which 5% of the fleet? And then ramp up on a predefined cadence? Or how about returning true only for customers in the preview group for the feature? Does the database return false automatically if the 5% of the fleet where it's true start crashing or throwing exceptions? Does it hook into your observability stack?

    Fundamentally, sure, you could just implement it as a boolean in the database. It's the integration and tooling that works with the rest of your stack that makes it worthy of the name "feature flag".

    • Thanks, good reply. I can see the argument for sure.

      I guess I like boring software too much to reach for a dependency but I do see how the tooling matters here.

  • That’s all it is. This only exists to lock you into cloudflare even more.

    • Then why did they deliberately make it compatible with Open Feature, explicitly making it easy to swap out a different Open Feature provider?

      Oh, that's right, you just spouted a "big company bad" mantra without bothering to read the article. Look, I know saying RTFA goes against the HN guidelines, but the amount of increasingly lazy spew i see from folks (or bots) who haven't bothered to read the article is so tiresome and annoying.

      1 reply →

OpenFeature was new to me, neat! Anyone have experience integrating this? https://openfeature.dev

  • I have had a lot of experience with OpenFeature, and have early commits in a few of the client libraries. It's definitely the future of feature flagging, and the ecosystem is really growing.

    Full disclosure, I am the CTO of Flagsmith, and we have seen a clear curve in adoption of OpenFeature over the last few years. It used to be that we were pushing customers to try it out, now they come to us with OpenFeature as a requirement.

    The vendor support is pretty mature now and there is coverage across almost all languages. If you're integrating feature flags into a new service, or looking to migrate from e.g. home-grown to a third party solution, OpenFeature is definitely the way I would recommend going.

  • It’s pretty useful. We used it at a previous company. We built a custom backend, but used the spec and SDKs.

    It took like 2 weeks to build a full custom backend. SDKs across languages worked flawlessly (okay, we did find one bug, reported it, and it was fixed within the day)

I really like the speed at which Cloudflare is executing toward becoming a critical infrastructure player with all of those new product offerings. That said, not everything needs to be serverless. Their Gen 13 hardware looks impressive, and it’s a pity you can’t rent it by the hour like AWS EC2 Metal instances.

Feature flags are often ridiculously over engineered.

Check a config, bdd value, env var to dynamically go one path or the other.

That’s all, you must either have a small feature or refactor the code to easily switch at a high level.

If you are not able to do so easily, then yes, complex feature flags implementations might help you, to coordinate feature activation between micro services.

Or if you have many features then a dashboard might be useful.

But I would argue that both are serious indicators that you should avoid feature flags, they are better for local and temporary changes, otherwise the complexity compounds and it become hard to manage and maintain.

  • There's an argument to be made for being able to turn on a feature for a certain segment (e.g low revenue users in Italy) so you can see what the business/performance impact is.

    Ofcourse you don't want users to lose the feature once they exceeded your revenue threshold or cross the border so you'll need to implement some kind of tracking. Your analytics and error tracking also needs to communicate with the feature flag service.

    Definitely not rocket science but more complex than a environment variable.

    • Enterprise software is full of this kind of stuff. Half our customers are on year old UI's because they don't want to re-up contracts yet.

      That is, features are contractual and when you've only got 50 customers but they're all paying high 6 figures does anyone really care about feature flag complexity?

      1 reply →

    • There's an argument to be made for being able to turn on a feature for a certain segment

      Not just an argument, it's the entire point of feature flags for ui experiments which is an essential practice. Dynamic adjustment of the cohorts (or even just an immediate kill switch if it's a disaster) is required.

  • The main thing about feature flags is discipline: create them purposefully, remove them as soon as they don't add value any more. KISS applies.

Funny that my app already uses custom feature flag solution built on... Cloudlfare Workers

I’m always excited when Cloudflare starts offering things that I had to use other providers for because I know it will be solid.

We used Statsig at Function. It started out as 2 of us using it on one product and within 12 months, large amounts of our product copy and rollouts were driven off of it.

Statsig has client side evals so you can write rules and rollouts based on internal concepts without Statsig’s servers processing a piece of user data. Hoping Cloudflare can build a sophisticated product here so I don’t have use another product in the future!

  • you use a 3rd party for feature flags? im not "roll my own" for everything but feature flags have not been an issue to roll

    • There's feature flags then there's staged rollouts gated by multiple variables with statistical analysis

i see @btown's comment below but also just for education about this space:

- anyone have comments/comparisons about launchdarkly vs posthog vs statsig (is it still alive after openai?) vs _____ vs cloudflare flagship?

like a "beginner/intermediate/advanced" progression of what to look out for/what you will want when it comes to feature flags would be highly helpful for me and many others here

A bit tangent but related: These things I'm never sure if I should be shipping on day one with mobile apps (Flutter in particular): Flagships, bug gathering, A/B testing ?

I feel strong inclination too but its also way too early before any real users can prove PMF. I've been using Google stuff but wonder if Flagship and perhaps other Cloudflare offerings can help.

The other side is that again it feels too early for this stuff and I just want to ship something quickly.

The work ivnvolved

I'm out of my league on this discussion, but it reminds me of the Configuration Database (CDB) used for most modern aircraft.

More of this please: essential tools for building modern software must be oss; Im fine with paying for a hosted version but just the benefit of learning one tool and being able to use it everywhere (linux, k8s, python etc) is amazing.

never understood this. why follow over-engineered standard and depend on 3rd party API spec, and 3rd party vendor. if you cannot call home from your service, you having bigger problems. and once you can call home, it is just.. single json file.

Has anyone struggled to run their own feature flagging service? After root causing slow app starts to be caused by the equivalent offering from Firebase, I've been cautious to use any off the shelf solutions

  • It's literally a field in your database. I could never fathom why this needs to be an outsourced service never mind an entire company.

    • It can get complicated quickly if you're actually using it in a production system. At my prev enterprise saas company we had feature flags that could be turned on per customer / per environment (dev, staging, prod) with permission + logging model such that our support team could also toggle flags with history of who turned on what. We also had "per user" feature flags for certain test users at companies and had DSL rules to evaluate the features

    • when started, yes. but then you want segment (how you segment your user), rollout strategy, etc.. it will get complicated fast

I love direction and features Claudflare is taking, they are really impressive. And since I use their service on several projects, I can say I am overall very happy with service.

Only thing, they are going strongly in AWS territory and not in any good way. Finding what I need and what I use has become harder as times go by. By contast, Azure (MS) even though it looks crazy complex, once you get used to it, you can find things.

Missing gradual rollout of feature flag changes themselves. Yes, you can do percentage based rollouts for individual features but still should have ability to canary all changes before they cause an insta-sev.

Anybody and everybody could use a mature LaunchDarkly alternative.

  • According to their page, they are an AI company, so I don’t see why would anyone choose them for feature flags.

this make perfect sense for cloudflare.

and im sure they can drive down the cost , compared to say launchdarkly

Am I the only one worried about Cloudflare becoming too powerful?

We went through this with E-mail: we slept through the period when Google, Microsoft and AWS were growing, and we ended up with them dictating the terms. Today I get 90% of my spam from Google, Microsoft and AWS and they don't care: they can safely ignore spam reports, because at this point they are Too Big to Block.

I have a feeling we are moving towards the same problem with Cloudflare and the web. Tomorrow Cloudflare will start dictating what we can or cannot do and we will not be able to do anything about it. This has already begun: their arbitrary "bot-filtering" for example.

  • Cloudflare is already too powerful, their anti DDOS solution is just too good. But their serverless products/features don't really build on that, they are just another hosting company.

  • > Am I the only one worried about Cloudflare becoming too powerful?

    No, it gets brought up in every single thread about cloud flare. And if this wasnt a feature release that people seem to like, the top comment would probably be talking about how cloudflare is terrible for the internet.

I don’t have experience with the tools Cloudflare has been shipping this year so I can’t speak about the quality, but they have really been pushing out a lot new products and services, no doubt due to agentic coding.

This is what "Building for the future" looks like post-layoffs, huh?

Can't even ship with app-scoped tokens...

Fix your stupid Turnstile that blocks humans first. You are gatekeeping the whole Internet. Even Anubis works better than the Clownflare garbage.

If anyone is interested, you can implement something like that with a few lines of code on the front end. We expose a function that generates a uniformly-distributed hash that you can use for A/B testing and other uses:

  Q.Data.variant()

https://github.com/Qbix/Q.js/blob/main/src/js/Q.minimal.js#L...

And on the back end, you'd use it like this:

https://github.com/Qbix/Platform/blob/main/platform/classes/...

Essentially, this can support a huge number of "variants" and within each variant you can have N equal segments. That will help you do A/B testing and flipping features on or off.

Feature flags are so ridiculously simple I have never needed to outsource this to someone else.

  • Do your running services receive streaming updates when Flags are toggled? Is your rule-engine evaluated locally?