← Back to context

Comment by unoti

13 days ago

> you can work on your problem, or you can customize the language to fit your problem better

There’s a thing I’m whispering to myself constantly as I work on software: “if I had something that would make this easy, what would it look like?”

I do this continuously, whether I’m working in C++ or Python. Although the author was talking about Lisp here, the approach should be applied to any language. Split the problem up into an abstraction that makes it look easy. Then dive in and make that abstraction, and ask yourself again what you’d need to make this level easy, and repeat.

Sometimes it takes a lot of work to make some of those parts look and be easy.

In the end, the whole thing looks easy, and your reward is someone auditing the code and saying that you work on a code base of moderate complexity and they’re not sure if you’re capable enough to do anything that isn’t simple. But that’s the way it is sometimes.

Yes! I call this sort of top-down programming "wishful thinking." It is these days much easier to explain to people, because machine learning tools.

“if you can just trust that chat GPT will later fill in whatever stub functions you write, how would you write this program?” — and you can quickly get going, “well, I guess I would have a queue, while the queue is not empty I pull an item from there, look up its responsible party in LDAP, I guess I need to memoize my LDAP queries so let's @cache that LDAP stub, if that party is authorized we just log the access to our S3-document, oh yeah I need an S3-document I am building up... otherwise we log AND we add the following new events to the queue...”

It is not the technique that has most enhanced what I write, which is probably a variant on functional core imperative shell. But it's pretty solid as a way to break that writers block that you face in any new app.

  • "Wishful thinking" is exactly what it's called in SICP. You write the code to solve your problem using the abstraction you want, then you implement the abstraction.

  • I want to hear more about this functional core/imperative shell....

    • Satvik gave you a fine link in a sibling comment, but I like to add something that I call "shell services" so let me give maybe the smallest example I've got of what it looks like, this is a little Python lambda. First you have a core module, which only holds data structures and deterministic transforms between them:

          core/app_config.py   (data structures to configure services)
          core/events.py       (defines the core Event data structure and such)
          core/grouping.py     (parses rule files for grouping Events to send)
          core/parse_events.py (registers a bunch of parsers for events from different sources)
          core/users.py        (defines the core user data structures)
      

      (there's also an __init__.py to mark it as a module, and so forth). There is some subtlety, for instance events.py contains the logic to turn an event into a slack message string or an email, an AppConfig contains the definition of what groups there are and whether they should send an email or a slack message or both. But everything here is a deterministic transform. So for instance `parse_event` doesn't yet know what User to associate an event with, so `user.py` defines a `UserRef` that might be looked up to figure out more about a user, and there is a distinction between an `EventWithRef` which contains a `list[UserRef]` list of user-refs to try, and an `Event` which contains a User.

      Then there's the services/ module, which is for interactions with external systems. These are intentionally as bare as possible:

          services/audit_db.py      (saves events to a DB to dedupe them)
          services/config.py        (reads live config params from an AWS environment)
          services/notification.py  (sends emails and slack messages)
          services/user_lookup.py   (user queries like LDAP to look up UserRefs)
      

      If they need to hold onto a connection, like `user_lookup` holds an LDAP connection and `audit_db` holds a database connecction, then these are classes where __init__ takes some subset of an AppConfig to configure itself. Otherwise like for the email/slack sends in the notification service, these are just functions which take part of the AppConfig as a parameter.

      These functions are as simple as possible. There are a couple audit_db functions which perform -gasp- TWO database queries, but it's for a good reason (e.g. lambdas can be running in parallel so I want to atomically UPDATE some rows as "mine" before I SELECT them for processing notifications to send). They take core data structures as inputs and generate core data structures as outputs and usually I've arranged for some core data structure to "perfectly match" what the service produces (Python TypedDict is handy for this in JSON-land).

      "Simple" can be defined approximately as "having if statements", you can say that basically all if/then logic should be moved to the functional core. This requires a bit of care because for instance a UserRef contains an enum (a UserRefType) and user_lookup will switch() on this to determine which lookup it should perform, should I ask LDAP about an email address, should I ask it about an Amazon user ID, should I ask this other non-LDAP system. I don't consider that sort of switch statement to be if/then complexity. So the rule of thumb is that the decision of what lookups to do, is made in Core code, and then actually doing one is performed from the UserLookupService.

      If you grok type theory, the idea more briefly is, "you shouldn't have if/then/else here, but you CAN have try/catch and you CAN accept a sum type as your argument and handle each case of the sum type slightly differently."

      Finally there's the parent structure,

          main.py        (main entrypoint for the lambda)
          migrator.py    (a quick DB migration script)
          ../sql/        (some migrations to run)
          ../test/       (some tests to run)
      

      Here's the deal, main.py is like 100 lines long gluing the core to the services. So if you printed it it's only three pages of reading and then you know "oh, this lambda gets an AppConfig from the config service, initializes some other services with that, does the database migrations, and then after all that setup is done, it proceeds in two phases. In the first ingestion phase it parses its event arguments to EventWithRefs, then looks up the list of user refs to a User and then makes an Event with it: then it labels those events with their groups, it checks those groups for an allowlist and drops some events based on the allowlist, otherwise it inserts those events into the database, skipping duplicates. Once all of that ingestion is done, phase two of reporting starts, it reserves any unreported records in the database, groups them by their groups, and for each group, tells the notification service, "here's a bunch of notifications to send" and for each successful send, we mark all of the events we were processing as reported. Last we purge any records older than our retention policy and we close the database connection." You get the story in broad overview in three pages of readable code. Migrator.py adds about two printed pages more to do database migrations, in its current form it makes its own DB connections from strings so it doesn't depend on core/ or services/, it's kind of an "init container" app except AWS Lambda isn't containerized in that way.

      The test folder is maybe the most important part, because based on this decoupling,

      - The little pieces of logic that haven't been moved out of main.py yet, can be tested by mocking. This can be reduced arbitrarily much -- in theory there is no reason that a Functional Core Imperative Shell program needs mocks. (Without mocks, the assurance that main.py works is that main.py looks like it works and worked previously and hasn't changed, it's pure glue and high-level architecture. If it does need to change, the assurance that it works is that it was deployed to dev and worked fine there, so the overall architecture should be OK.)

      - The DB migrations can be tested locally by spinning up a DB with some example data in it, and running migrations on it.

      - The core folder can be tested exhaustively by local unit tests. This is why it's all deterministic transforms between data structures -- that's actually, if you like, what mocking is, it's an attempt to take nondeterministic code and make it deterministic. The functional core, is where all the business logic is, and because it's all deterministic it can all be tested without mocking.

      - The services, can be tested pretty well by nonlocal unit "smoke"/"integration" tests, which just connect and verify that "if you send X and parse the response to data structure Y, no exception gets thrown and Y has some properties we expect etc." This doesn't fully test the situations where the external libraries called by services, throw exceptions that aren't caught. So like you can easily test "remote exists" and "remote doesn't exist" but "remote stops existing halfway through" is untested and "remote times out" is tricky.

      - The choice to test stuff in services, depends a lot on who has control over it. AuditDBService is always tested against another local DB in a docker container with test data preloaded, because we control schema, we control data, it's just a hotspot for devs to modify. config.py's `def config_from_secrets_manager()` is always run against the AWS Secrets Manager in dev. UserLookupService is always tested against live LDAP because that's on the VPN and we have easy access to it. But like NotificationService, while it probably should get some sort of API token and send to real Slack API, we haven't put in the infrastructure for that and created a test Slack channel or whatever... so it's basically untested (we mock the HTTP requests library, I think?). But it's also something that nobody has basically ever had to change.

      Once you see that you can just exhaustively test everything in core/ it becomes really addictive to structure everything this way. "What's the least amount of stuff I can put into this shell service, how can I move all of its decisions to the functional core, oh crap do I need a generic type to hold either a User or list[UserRef] possible lookups?" etc.

“The reasonable man adapts himself to the world: the unreasonable one persists in trying to adapt the world to himself. Therefore all progress depends on the unreasonable man.”

George Bernard Shaw, Man and Superman

C++ is for when complexity cost is worth trading for performance gain. What type of person successfully finds simplicity working in C++?

  • > C++ is for when complexity cost is worth trading for performance gain. What type of person successfully finds simplicity working in C++?

    > What type of person successfully finds simplicity working in C++?

    Be the change you want to see!

    Every language has the raw materials available to turn the codebase into an inscrutable complex mess, C++ more than others. But it’s still possible to make it make sense with a principled approach.

  • Those of us that grew up with the language, always used it instead of C when given the option, thus even if no one can claim expertise, we know it well enough to find it relatively simple to work with.

    In the some vein that Python looks simple on the surface, but in reality it is a quite deep language, when people move beyond using it as DSL for C/Fortran libraries, or introduction to programming scenarios.

I think Casey Muratori calls this "compression". And yeah (see other child comment), he does it in C++ :-)

Creating a Domain Specific Language (DSL) for a given task is a classic approach to solving problems.