Comment by jackfranklyn

2 months ago

The "split long functions" advice works well for most code but falls apart in transaction processing pipelines.

I work on financial data processing where you genuinely have 15 sequential steps that must run in exact order: parse statement, normalize dates, detect duplicates, match patterns, calculate VAT, validate totals, etc. Each step modifies state that the next step needs.

Splitting these into separate functions creates two problems: (1) you end up passing huge context objects between them, and (2) the "what happens next" logic gets scattered across files. Reading the code becomes an archaeology exercise.

What I've found works better: keep the orchestration in one longer function but extract genuinely reusable logic (date parsing, pattern matching algorithms) into helpers. The main function reads like a recipe - you can see the full flow without jumping around, but the complex bits are tucked away.

70 lines is probably fine for CRUD apps. But domains with inherently sequential multi-step processes sometimes just need longer functions.

I'll admit this may be naive, but I don't see the problem based on your description. Split each step into its own private function, pass the context by reference / as a struct, unit test each function to ensure its behavior is correct. Write one public orchestrator function which calls each step in the appropriate sequence and test that, too. Pull logic into helper functions whenever necessary, that's fine.

I do not work in finance, but I've written some exceptionally complex business logic this way. With a single public orchestrator function you can just leave the private functions in place next to it. Readability and testability are enhanced by chunking out each step and making logic obvious. Obviously this is a little reductive, but what am I missing?

  • You're not missing much - what you describe is roughly what I do. My original comment was pushing back against the "70 lines max" orthodoxy, not against splitting at all.

    The nuance: the context struct approach works well when steps are relatively independent. It gets messy when step 7 needs to conditionally branch based on something step 3 discovered, and step 12 needs to know about that branch. You end up with flags and state scattered through the struct, or you start passing step outputs explicitly, and the orchestrator becomes a 40-line chain of if/else deciding which steps to run.

    For genuinely linear pipelines (parse → transform → validate → output), private functions + orchestrator is clean. For pipelines with lots of conditional paths based on earlier results, I've found keeping more in the orchestrator makes the branching logic visible rather than hidden inside step functions that check context.flags.somethingWeird.

    Probably domain-specific. Financial data has a lot of "if we detected X in step 3, skip steps 6-8 and handle differently in step 11" type logic.

> Each step modifies state that the next step needs.

I've been bitten by this. It's not the length that's the problem, so much as the surface area which a long function has to stealthily mutate its variables. If you have a bunch of steps in one function all modifying the same state, there's a risk that the underlying logic which determines the final value of widely-used, widely-edited variables can get hard to decipher.

Writing a function like that now, I'd want to make very sure that everything involved is immutable & all the steps are as close to pure functions as I can get them. I feel like it'd get shorter as a consequence of that, just because pure functions are easier to factor out, but that's not really my objective. Maybe step 1 is a function that returns a `Step1Output` which gets stored in a big State object, and step 2 accesses those values as `state.step1Output.x`. If I absolutely must have mutable state, I'd keep it small, explicit, and as separate from the rest of the variables as possible.

  • Yeah the immutability angle is the right instinct. The Step1Output approach is essentially what we landed on - each phase returns a typed result that gets composed into the final state. The tricky bit is when phase 7 needs to check something from phase 3's output to decide whether to run at all. You end up with either a growing "context" object that accumulates results, or a lot of explicit parameters threading through.

    The discipline tax is real though. Pure functions are easier to test in isolation but harder to trace when you're debugging "why did this transaction get coded wrong" and the answer spans 6 different step outputs.

    • > harder to trace when you're debugging "why did this transaction get coded wrong" and the answer spans 6 different step outputs

      That's definitely a pain, but I'm not sure it's easier when this is one variable being mutated in six different places. I think you're just running into the essential complexity of the problem.

      1 reply →

The typestate pattern common in Rust applications allows the compiler to verify that the operations are executed in the right order and that previous states are not accidentally referenced. Here’s a good description: https://cliffle.com/blog/rust-typestate/

  • Thanks for the link - typestate is exactly the kind of compile-time guarantee I wish we had in JS/TS land. The pattern would be perfect for enforcing "you can't call postToAccounting() until validateTotals() has been called" at the type level.

    We're in Node.js so the best we can do is runtime checks and careful typing. I've experimented with builder patterns that sort of approximate this - each method returns a new type that only exposes the valid next operations - but it's clunky compared to proper typestate.

    The real benefit isn't just preventing out-of-order calls, it's making invalid states unrepresentable. Half our bugs come from "somehow this transaction reached step 9 without having the field that step 5 should have populated."

I frequently write software like this (in other domains). Structs exist (in most languages) specifically for the purpose of packaging up state to pass around, I don't really buy that passing around a huge context is the problem. I'm not opposed to long functions (they can frequently be easier to understand than deep callstacks), especially in languages that are strictly immutable like Clojure or where mutation is tightly controlled like Rust. Otherwise it really helps to break down problems into sub problems with discrete state and success/failure conditions, even though it results in more boilerplate to manage.

  • Fair point on structs - the context object itself isn't the problem, it's what ends up inside it. When the struct is a clean data container (transaction, amounts, dates, account codes) it works great.

    Where I've seen it go sideways is when it accumulates process state: wasTransferDetected, skipVATCalculation, needsManualReview, originalMatchConfidence. Now you have a data object that's also a control flow object, and understanding the code means understanding which flags get set where and what downstream checks them.

    Your point about discrete success/failure conditions is well taken though. We moved toward exactly that - each phase either returns its result or an explicit error, and the orchestrator handles the failures instead of stuffing error flags into the context for later. Bit more boilerplate but much easier to reason about.

This is an admirable goal but complicated by the inevitable conditional logic around what to do in certain situations. Push the conditions into the sub methods, or keep them in primary method? Thinking about what are testable chunks of logic can be a valuable guide. A lot of discipline is required so these primary methods dont become spaghetti themselves.

  • The testability angle is the deciding factor for me. Business logic that can be tested without the full pipeline context goes in sub methods. Orchestration decisions stay in the primary method.

    Concrete example: "is this a transfer between accounts?" is pure business logic - takes a transaction and a list of bank accounts, returns true/false. That gets its own function with its own tests.

    But "if it's a transfer, skip VAT calculation and use a different account mapping" is orchestration. Pushing that into the transfer detection function means it now needs to know about VAT and account mapping, which breaks isolation. Keeping it in the primary method means you can see all the skip/branch decisions in one place.

    The spaghetti risk is real. What helps: keeping the orchestrator as declarative as possible. "if transfer detected, result = handleAsTransfer()" rather than inline logic. The primary method becomes a readable list of conditions and delegations, not nested logic.