← Back to context

Comment by jackfranklyn

2 hours ago

The "split long functions" advice works well for most code but falls apart in transaction processing pipelines.

I work on financial data processing where you genuinely have 15 sequential steps that must run in exact order: parse statement, normalize dates, detect duplicates, match patterns, calculate VAT, validate totals, etc. Each step modifies state that the next step needs.

Splitting these into separate functions creates two problems: (1) you end up passing huge context objects between them, and (2) the "what happens next" logic gets scattered across files. Reading the code becomes an archaeology exercise.

What I've found works better: keep the orchestration in one longer function but extract genuinely reusable logic (date parsing, pattern matching algorithms) into helpers. The main function reads like a recipe - you can see the full flow without jumping around, but the complex bits are tucked away.

70 lines is probably fine for CRUD apps. But domains with inherently sequential multi-step processes sometimes just need longer functions.

> Each step modifies state that the next step needs.

I've been bitten by this. It's not the length that's the problem, so much as the surface area which a long function has to stealthily mutate its variables. If you have a bunch of steps in one function all modifying the same state, there's a risk that the underlying logic which determines the final value of widely-used, widely-edited variables can get hard to decipher.

Writing a function like that now, I'd want to make very sure that everything involved is immutable & all the steps are as close to pure functions as I can get them. I feel like it'd get shorter as a consequence of that, just because pure functions are easier to factor out, but that's not really my objective. Maybe step 1 is a function that returns a `Step1Output` which gets stored in a big State object, and step 2 accesses those values as `state.step1Output.x`. If I absolutely must have mutable state, I'd keep it small, explicit, and as separate from the rest of the variables as possible.

I'll admit this may be naive, but I don't see the problem based on your description. Split each step into its own private function, pass the context by reference / as a struct, unit test each function to ensure its behavior is correct. Write one public orchestrator function which calls each step in the appropriate sequence and test that, too. Pull logic into helper functions whenever necessary, that's fine.

I do not work in finance, but I've written some exceptionally complex business logic this way. With a single public orchestrator function you can just leave the private functions in place next to it. Readability and testability are enhanced by chunking out each step and making logic obvious. Obviously this is a little reductive, but what am I missing?

I frequently write software like this (in other domains). Structs exist (in most languages) specifically for the purpose of packaging up state to pass around, I don't really buy that passing around a huge context is the problem. I'm not opposed to long functions (they can frequently be easier to understand than deep callstacks), especially in languages that are strictly immutable like Clojure or where mutation is tightly controlled like Rust. Otherwise it really helps to break down problems into sub problems with discrete state and success/failure conditions, even though it results in more boilerplate to manage.

This is an admirable goal but complicated by the inevitable conditional logic around what to do in certain situations. Push the conditions into the sub methods, or keep them in primary method? Thinking about what are testable chunks of logic can be a valuable guide. A lot of discipline is required so these primary methods dont become spaghetti themselves.