Comment by btown

15 hours ago

Never underestimate the power of a zero-network-hop abstraction over f(feature_name, context).

And context can be extremely tailored to your niche: specific inventory, from a specific supplier, for a specific user of a specific B2B client of a specific business model subtype, who should or shouldn’t see certain features on that specific inventory at certain times.

When you can write your own logic, and just run this in a tight loop as easily and performantly as you can use a constant, it makes your business incredibly agile. Think some text might change for some customers? Just write the code to make it configurable, and you get tests and flags for free.

Sadly, that zero-hop setup requires a sophisticated client execution engine, which it doesn’t appear Cloudflare has implemented here. Makes sense for their memory constrained workers, less sense for traditional infrastructure.

Statsig has an approach here that I quite like:

> To be able to do this, Server SDKs hold the entire ruleset of your project in memory - a representation of each gate or experiment in JSON. On client SDKs, we evaluate all of the gates/experiments when you call initialize - on our servers.

https://docs.statsig.com/sdks/how-evaluation-works

You can also roll your own - just sync your rulesets to a few data structures every few seconds in a background thread and atomically swap the reference to them. Then you just need a CRUD interface over the applicability ruleset dimensions.

Just be careful to have governance on who can play with which would-be constants. Great power and great responsibility and all that!

When reading your comment, it just reminds me on how feature flags can be misused as application configuration/customization. An antipattern i could observe at various organzations already.

For me feature flags go along with trunk based development to enable features in QA settings, but not on PROD yet, for PO/PM testing. Trunk based development allows for fast/easy devops, without complicated branching strategies.

Application configuration is, for me, part of the application and has the business context for customizing the application accordingly. Not sure if there are specific frameworks/tools out there. But one should clearly distinguish these two.

  • > it just reminds me on how feature flags can be misused as application configuration/customization. An antipattern i could observe at various organzations already.

    feature flags are perfect for configuration and customization, why using them for this purpose is 'misuse' is beyond me and I've heard this claim from multiple people. they're literally configuration. feature with a flag to turn it on, off or give the flag a value. where's the misuse? is it a problem I'm not running experiments when switching over redis to valkey or whatever?

    • Feature flags need to be treated as short-lived and experimental otherwise they end up getting abused for everything and make it very difficult to reason about your application.

      If it's config/customization, it should be in code. If it's experimental it can be a flag until it solidifies, and then it needs to get moved to code.

      When I was at Shopify a couple of years ago they mandated that feature flags had to be short-lived (Like 2-4w lifetime tops, some had exceptions) because they would end up getting left in code and never cleaned up, or for extended periods of time like months. Hard to tell if it's genuinely a "feature flag" or actually just a normal part of the system at that point.

      Feature flags being flipped in prod was also a major source of incidents, in part because people didn't treat them as experimental and with the associated risk profile of something experimental.

      The only exception where having long-lived flags was useful and required was for operational killswitches (E.g. disable Apple Pay because it's having issues), but that is explicitly not application config.

      5 replies →

    • One well known issue is that when you have a lot of separate feature flags that can interact, you explode the number of test cases you have to cover. For example if you have three feature flags that can interact in a module that has 100 test cases, you actually have 900 test cases if you are going to test with each possible combination of flags. Many teams don't test them all because they "already know" that doesn't apply here, and find out in production which combination of feature flags is unworkable.

  • Yes, feature flags are conflated with dynamic configs (or "remote configs"). The difference is crucial, hence why people are talking past each other.

    Feature flags are gates for whether a piece of code runs; basically, an if-condition. Dynamic configs are a mechanism for changing runtime values without redeploying[1].

    For example:

      # Feature flag — variant gate for rollout
      flag = SOME_CONSTANT_DEFINED_IN_MY_REPO  # remote config not needed
      if flag == 'open':
          render_new_checkout()
      elif flag == 'warning':
          render_warning_checkout()
      else:
          render_old_checkout()
    
      # Raw dynamic/remote config pulled — structured values for tuning behavior
      config = sdk.get_config(user, "checkout_settings")
      timeout_ms   = config.get("timeout_ms", 5000)
      max_items    = config.get("max_items", 50)
      allowed_tlds = config.get("allowed_tlds", [".com", ".org"])
    

    In practice, feature flags are implemented on top of dynamic configs to remotely handle the temporary lifecycle of a feature — aka, ship a new block of code, ramp its execution up to 100%, then delete the flag. Whereas dynamic configs are a deeper primitive meant for semi-permanent/safer operations like rate limits or business rules you'd expect a PM to change like the text copy on a marketing website.

    As I've seen: the forcing function for the separation of the two concepts are experimentation platforms: when human-control of feature flags is shared with a new automated allocation system that manipulates dynamic config under the hood. That's how Statsig built their system and, in part, why they could sell for a billion. Whereas companies that ignored the difference, like LaunchDarkly, struggled outside of feature flags.

    [1] https://docs.statsig.com/dynamic-config/overview, https://engineering.atspotify.com/2020/10/spotifys-new-exper... https://blog.x.com/engineering/en_us/topics/infrastructure/2...

  • > it just reminds me on how feature flags can be misused as application configuration/customization

    They literally are configuration.

    • Oh yeah lets make a web request per service invocation to figure out what to serve for the invocation!

      Guys this is exactly the kind of banal crap that makes a simple app into a monsterous beast that won't work unless it's connected to the internet.

      1 reply →

Which is not hard to do (it is a modulo over a mersenne twister or something similar), but in my recent gigs just Flipper with optional "state of the flags table as of now" endpoint was more than enough. That modulo+random combo required tools like LaunchDarkly to ship SDKs in several languages, and the ones I had to work with were just plain horrible fit for their language of choice. But because the evaluation was relegated to the edge, the whole system got way more complex than desirable. In actuality, I think a refetch of the current flags table "for this customer" every so often is just fine, and way less of a nuisance.

So glad Flipper exists and I don't have to deal with this stuff anymore.

> Sadly, that zero-hop setup requires a sophisticated client execution engine, which it doesn’t appear Cloudflare has implemented here.

It doesn't have to be sophisticated and they don't need to implement it themselves. They piggy-back on OpenFeature where the client libraries have a simple targeting rule evaluation engine integrated.

Statsig has worked great at my work, really polished and rich feature set. Their tooling to identify unused flags as candidates for removal is neat.

The per-seat billing we have in our agreement is a bit rough but it's workable.

  • Statsig is a half-baked product bought out by OpenAI for data harvesting. We already reported 2 documentation issues and 1 critical technical issue, and we're barely using it.

    • Well, OpenAI already sold it (but kept the team), so it’s in someone else’s hands now.

> Sadly, that zero-hop setup requires a sophisticated client execution engine, which it doesn’t appear Cloudflare has implemented here. Makes sense for their memory constrained workers, less sense for traditional infrastructure.

wait what? what kind of logic do you need to do that CF Workers can't do?