← Back to context

Comment by la_fayette

7 hours ago

When reading your comment, it just reminds me on how feature flags can be misused as application configuration/customization. An antipattern i could observe at various organzations already.

For me feature flags go along with trunk based development to enable features in QA settings, but not on PROD yet, for PO/PM testing. Trunk based development allows for fast/easy devops, without complicated branching strategies.

Application configuration is, for me, part of the application and has the business context for customizing the application accordingly. Not sure if there are specific frameworks/tools out there. But one should clearly distinguish these two.

Yes, feature flags are conflated with dynamic configs (or "remote configs"). The difference is crucial, hence why people are talking past each other.

Feature flags are gates for whether a piece of code runs; at its most basic, it's an if-conditional. Dynamic configs are a mechanism for changing runtime values without redeploying[1].

For example:

  # Feature flag — variant gate for rollout
  flag = SOME_CONSTANT_DEFINED_IN_MY_REPO  # remote config not needed
  if flag == 'open':
      render_new_checkout()
  elif flag == 'warning':
      render_warning_checkout()
  else:
      render_old_checkout()

  # Raw dynamic config pulled — structured values for tuning behavior
  config = sdk.get_config(user, "checkout_settings")
  timeout_ms   = config.get("timeout_ms", 5000)
  max_items    = config.get("max_items", 50)
  allowed_tlds = config.get("allowed_tlds", [".com", ".org"])

In practice, Feature Flags use Dynamic Configs usually for releases — ship the feature, ramp it to 100%, then delete the flag. Dynamic configs are operations oriented — rate limits, feature parameters, business rules you expect a PM to keep tuning like changing marketing text copy. Former has much higher chance of breakage because it's meant to change control flow, latter is relatively safe as the underlying primitive.

As I've seen: the forcing function for the separation of the two concepts are experimentation platforms: when human-control of feature flags is shared with a new automated allocation system that manipulates dynamic config under the hood. That's how Statsig built their system and, in part, why they could sell for a billion. Whereas companies that ignored the difference, like LaunchDarkly, struggled outside of feature flags.

[1] https://docs.statsig.com/dynamic-config/overview, https://engineering.atspotify.com/2020/10/spotifys-new-exper... https://blog.x.com/engineering/en_us/topics/infrastructure/2...

> it just reminds me on how feature flags can be misused as application configuration/customization. An antipattern i could observe at various organzations already.

feature flags are perfect for configuration and customization, why using them for this purpose is 'misuse' is beyond me and I've heard this claim from multiple people. they're literally configuration. feature with a flag to turn it on, off or give the flag a value. where's the misuse? is it a problem I'm not running experiments when switching over redis to valkey or whatever?

  • Feature flags need to be treated as short-lived and experimental otherwise they end up getting abused for everything and make it very difficult to reason about your application.

    If it's config/customization, it should be in code. If it's experimental it can be a flag until it solidifies, and then it needs to get moved to code.

    When I was at Shopify a couple of years ago they mandated that feature flags had to be short-lived (Like 2-4w lifetime tops, some had exceptions) because they would end up getting left in code and never cleaned up, or for extended periods of time like months. Hard to tell if it's genuinely a "feature flag" or actually just a normal part of the system at that point.

    Feature flags being flipped in prod was also a major source of incidents, in part because people didn't treat them as experimental and with the associated risk profile of something experimental.

    The only exception where having long-lived flags was useful and required was for operational killswitches (E.g. disable Apple Pay because it's having issues), but that is explicitly not application config.

    • Agreed.

      This is the kind of design wisdom that’s both true and difficult to win an argument over.

      It reminds me of arguments related to over-engineering and complexity. The principles are super important to having a codebase that scales and continues to be efficient to work in as the team grows, but they are hard to objectively measure.

      Locally or in isolation something may sound like a great idea. Being able to step back and see the greater ripple effects require some experience and intuition that can’t always be used to convince people otherwise.

    • I disagree with just about everything you said being a problem except the process of cleaning up is absolutely required.

      Notably feature flags triggering incidents is expected and desired vs the alternative of shipping the code and having to roll a release back because there is no other way to remove the feature from prod.

      1 reply →

    • Runtime evaluated feature flags can always be used for control plane levers and emergency handbrakes.

      You just have to label them as such and prevent other teams from fiddling with them.

      This is not an antipattern, it's just semantic hand-wringing.

      My team managed critical systems in the online flow of billions of dollars of daily payment volume. We also wrote the feature flag system that the rest of the company used. Not only were we completely fine with feature flags as long-lived control plane levers, we heavily used the system that way ourselves.

      You just have to clearly distinguish between ephemeral rollout flags (and clean them up or expire them) and the permanent control plane levers.

      It's the exact same functionality for both sets of tools. Just different practices around the two usages.

      1 reply →

  • One well known issue is that when you have a lot of separate feature flags that can interact, you explode the number of test cases you have to cover. For example if you have three feature flags that can interact in a module that has 100 test cases, you actually have 900 test cases if you are going to test with each possible combination of flags. Many teams don't test them all because they "already know" that doesn't apply here, and find out in production which combination of feature flags is unworkable.

> it just reminds me on how feature flags can be misused as application configuration/customization

They literally are configuration.

  • Oh yeah lets make a web request per service invocation to figure out what to serve for the invocation!

    Guys this is exactly the kind of banal crap that makes a simple app into a monsterous beast that won't work unless it's connected to the internet.

    • There's no web request per service invocation.

      Feature flags are set once at startup (or specific events like hard refresh, or new login) and then simply included in the request headers.

      It's not rocket science, but I'm sure people are free to overcomplicate it.