Comment by agumonkey

3 days ago

When I realized that I was calling openapi-generator to create client side call stubs on non-small service oriented project, I started missing J2EE EJB. And it takes a lot to miss EJB.

I'd like to ask seasoned devs and engineers here. Is it the normal industry-wide blind spot where people still crave for and are happy creating 12 different description of the same things across remote, client, unit tests, e2e tests, orm, api schemas, all the while feeling much more productive than <insert monolith here> ?

I've seen some systems with a lot of pieces where teams have attempted to avoid repetition and arranged to use a single source of schema truth to generate various other parts automatically, and it was generally more brittle and harder to maintain due to different parts of the pipeline owned by different teams, and operated on different schedules. Furthermore it became hard to onboard to these environments and figure out how to make changes and deploy them safely. Sometimes the repetition is really the lesser evil.

  • I see, it's also reminiscent of the saying "microservices" are an organisational solution. It's just that I also see a lot of churn and friction due to incoherent versions and specs not being managed in sync now (some solutions exists are coming though)

  • > I've seen some systems with a lot of pieces where teams have attempted to avoid repetition and arranged to use a single source of schema truth to generate various other parts automatically, and it was generally more brittle and harder to maintain due to different parts of the pipeline owned by different teams, and operated on different schedules.

    I'm not sure what would lead to this setup. For years there are frameworks that support generating their own OpenAPI spec, and even API gateways that not only take that OpenAPI spec as input for their routing configuration but also support exporting it's own.

  • > it was generally more brittle and harder to maintain

    It depends on the system in question, sometimes it's really worth it. Such setups are brittle by design, otherwise you get teams that ship fast but produce bugs that surface randomly in the runtime.

    • Absolutely, it can work well when there is a team devoted to the schema registry and helping with adoption. But it needs to be worth it to be able to amortize the resources, so probably best for bigger organizations.

I keep pining for a stripped-down gRPC. I like the *.proto file format, and at least in principle I like the idea of using code-generation that follows a well-defined spec to build the client library. And I like making the API responsible for defining its own error codes instead of trying to reuse and overload the transport protocol's error codes and semantics. And I like eliminating the guesswork and analysis paralysis around whether parameters belong in the URL, in query parameters, or in some sort of blob payload. And I like having a well-defined spec for querying an API for its endpoints and message formats. And I like the well-defined forward and backward compatibility rules. And I like the explicit support for reusing common, standardized message formats across different specs.

But I don't like the micromanagement of field encoding formats, and I don't like the HTTP3 streaming stuff that makes it impossible to directly consume gRPC APIs from JavaScript running in the browser, and I don't like the code generators that produce unidiomatic client libraries that follow Google's awkward and idiosyncratic coding standards. It's not that I don't see their value, per se*. It's more that these kinds of features create major barriers to entry for both users and implementers. And they are there to solve problems that, as the continuing predominance of ad-hoc JSON slinging demonstrates, the vast majority of people just don't have.

  • I can write frickin' bash scripts that handle JSON APIs with curl, jq, here quotes and all that.

    A lot of people just do whatever comes to mind first and don't think about it so they don't get stuck with analysis paralysis.

       curl -fail
    

    Handling failure might be the real hardest programming problem ahead of naming and caches and such. It boggles my mind the hate people have for Exceptions which at least make you "try" quite literally if you don't want the system to barrel past failures, some seem nostalgic for errno and others will fight mightily with Either<A,B> or Optional<X> or other monads and wind up just barreling past failures in the end anyway. A 500 is a 500.

    • I worked at a place that had a really great coding standard for working with exceptions:

      1. Catch exceptions third-party code and talking to the outside world right away.

      2. Never catch exceptions that we throw ourselves.

      3. Only (and always) throw exceptions when you're in a state where you can't guarantee graceful recovery. Exceptions are for those exceptional circumstances where the best thing to do is fail fast and fail hard.

Brb, I'm off to invent another language independent IDL for API definitions that is only implemented by 2 of the 5 languages you need to work with.

I'm joking, but I did actually implement essentially that internally. We start with TypeScript files as its type system is good at describing JSON. We go from there to JSON Schema for validation, and from there to the other languages we need.

  • > Brb, I'm off to invent another language independent IDL for API definitions that is only implemented by 2 of the 5 languages you need to work with.

    Watch out, OpenAPI is now 3 versions deep and supports both JSON and YAML.

  • anything I could read to imitate that workflow ?

    • I haven't written anything up - maybe one day - but our stack is `ts-morph` to get some basic metadata out of our "service definition" typescript files, `ts-json-schema-generator` to go from there to JSON Schema, `quicktype-core` to go to other languages.

      Schema validation and type generation vary by language. When we need to validate schemas in JS/TS land, we're using `ajv`. Our generation step exports the JSON Schema to a valid JS file, and we load that up with AJV and grab schemas for specific types using `getSchema`.

      I evaluated (shallowly) for our use case (TS/JS services, PHP monolith, several deployment platforms):

      - typespec.io (didn't like having a new IDL, mixes transport concerns with service definition)

      - trpc (focused on TS-only codebases, not multi language)

      - OpenAPI (too verbose to write by hand, too focused on HTTP)

      - protobuf/thrift/etc (too heavy, we just want JSON)

      I feel like I came across some others, but I didn't see anyone just using TypeScript as the IDL. I think it's quite good for that purpose, but of course it is a bit too powerful. I have yet to put in guardrails that will error out when you get a bit too type happy, or use generics, etc.

      2 replies →

It's not that we like it, it's just that most other solutions are so complex and difficult to maintain that repetition is really not that bad a thing.

I was however impressed with FastAPI, a python framework which brought together API implementation, data types and generating swagger specs in a very nice package. I still had to take care of integration tests by myself, but with pytest that's easy.

So there are some solutions that help avoid schema duplication.

  • fastapi + sqlmodel does remove many layers that is true, but you still have other services requiring lots of boilerplate

My experience is that all of these layers have identical data models when a project begins, and it seems like you have a lot of boilerplate to repeat every time to describe "the same thing" in each layer.

But then, as the project evolves, you actually discover that these models have specific differences in different layers, even though they are mostly the same, and it becomes much harder to maintain them as {common model} + {differences}, than it is to just admit that they are just different related models.

For some examples of very common differences:

- different base types required for different languages (particularly SQL vs MDW vs JavaScript)

- different framework or language-specific annotations needed at different layers (public/UNIQUE/needs to start with a capital letter/@Property)

- extra attached data required at various layers (computed properties, display styles)

- object-relational mismatches

The reality is that your MDW data model is different from your Database schema and different from your UI data model (and there may be multiple layers as well in any of these). Any attempt to force them to conform to be kept automatically in sync will fail, unless you add to it all of the logic of those differences.

  • anybody ever worked with model-driven methodologies ? the central model is then derived to other definitions

Having 12 different independent copies means nobody on your 30 people multi-region team is blocked.