LLMs are powerful, but enterprises are deterministic by nature
2 days ago
Over the last year, we’ve been experimenting with LLMs inside enterprise systems.
What keeps surfacing is a fundamental mismatch: LLMs are probabilistic and non-deterministic, while enterprises are built on predictability, auditability, and accountability.
Most current approaches try to “tame” LLMs with prompts, retries, or heuristics. That works for demos, but starts breaking down when you need explainability, policy enforcement, or post-incident accountability.
We’ve found that treating LLMs as suggestion engines rather than decision makers changes the architecture completely. The actual execution needs to live in a deterministic control layer that can enforce rules, log decisions, and fail safely.
Curious how others here are handling this gap between probabilistic AI and deterministic enterprise systems. Are you seeing similar issues in production?
My experience working at a large F500 company:
A non-technical PM asked me (an early career SWE) to develop an agentic pipeline / tool that could ingest 1000+ COBOL programs related to a massive 30+ year old legacy system (many of which have multiple interrelated sub-routines) and spit out a technical design document that can help modernizing the system in the future.
- I have limited experience with architecture & design at this point in my career.
- I do not understand business context of a system that old and any of the decisions that occurred in that time.
- I have no business stakeholders or people capable of validating the output.
- I am the sole developer being tasked with this initiative.
- My current organization has next to no engineering standards or best practices.
No one in this situation is interested in these problems except me. My situation isn't unique with everyone high on AI looking to cram LLMs & agents into everything without any real explanation of what problem it solves or how to measure the outcome.
I admire you for thinking about this kind of issue, I wish I could work with more individuals who do :(
This resonates a lot, and I think your example actually captures the core failure mode really well.
What your PM asked for isn’t an “agentic pipeline” problem - it’s an organizational knowledge and accountability problem. LLMs are being used as a substitute for missing context, missing ownership, and missing validation paths.
In a system like that (30+ years, COBOL, interdependent routines), the hardest parts are not parsing code — they are understanding why things exist, which constraints were intentional, and which tradeoffs are still valid. None of that lives in the code, and no model can infer it reliably without human anchors.
This is where I have seen LLMs work better as assistive tools rather than autonomous agents: helping summarize, cluster, or surface patterns — but not being expected to produce “the” design document, especially when there is no stakeholder capable of validating it.
Without determinism around inputs, review, and ownership, the output might look impressive but it’s effectively unverifiable. That’s a risky place to be, especially for early-career engineers being asked to carry responsibility without authority.
I don’t think the problem is that LLMs are not powerful enough — it is that they are often being dropped into systems where the surrounding structure (governance, validation, incentives) simply isn’t there.
Your task is certainly doable though.
You can ask AI to focus on the functional aspects and create a design-only document. It can do that in chunks. You don't need to know about COBOL best practices now, that's an implementation detail. Is the plan to modernize the COBOL codebase or to rewrite in a different language?
See what this skill does in Claude Code, you want something similar: https://github.com/glittercowboy/get-shit-done/blob/main/get...
First off, what you shared is cool, thank you. Especially considering it captures problems I need to address (token limitations, context transfer, managing how agents interact & execute their respective tasks).
My challenge specifically is that there is no real plan. It feels like this constant push to use these tools without any real clarity or objective. I know a lot of the job is about solving business problems, but no one asking me to do this has any idea or defined acceptance criteria to say the outputs are correct.
I also understand this is an enterprise / company issue, not that the problem is impossible or the idea itself is bad. Its just a common theme I am seeing where this stuff fails in enterprises because few are actually thinking how to apply it... as evidenced by the fact that I got more from your comment than I otherwise get attempting to collaborate in my own organization
I resonate strongly with your framing. LLMs as suggestion engines, deterministic layer for execution.
I'm building something similar with security as the focus: deterministic policy that agents can't bypass (regardless of prompt injection). Same principle - deterministic enforcement guiding a probabalistic base.
Would love to hear more about your use case. What kinds of enterprise workflows are you targeting? Is security becoming a blocker?
That sounds very aligned. I like the way you phrased it - deterministic policy that agents can not bypass is exactly the right boundary, especially once you assume prompt injection and misalignment are not edge cases but normal operating conditions.
On the use case side, what we have been seeing (and discussing internally) isn’t one narrow workflow so much as a recurring pattern across domains: anywhere an LLM starts influencing actions that have irreversible or accountable consequences.
That shows up in security, but also in ops, infra, finance, and internal tooling - places where “suggesting” is fine, but executing without a gate is not. In those environments, the blocker usually isn’t model capability; it is the lack of a deterministic layer that can enforce constraints, log decisions, and give people confidence about why something was allowed or stopped.
Security tends to surface this problem first because the blast radius is obvious, but we are starting to see similar concerns come up once agents touch production systems, money, or compliance-sensitive workflows.
I am curious from your side — are you finding that security teams are more receptive to this model than other parts of the org, or are you still having to convince people that “agent autonomy” needs hard boundaries?
> LLMs are probabilistic and non-deterministic
This is a polite way of saying unreliable and untrustworthy.
The problem facing enterprise is best understood by viewing LLMs as any other unreliable program.
> We’ve found that treating LLMs as suggestion engines rather than decision makers changes the architecture completely.
Figures. Look at the disruption LLM "suggestions" are inflicting on scientific journals, court cases and open source projects wordwide.
I don’t disagree with the underlying concern. In practice, “probabilistic” often does translate to unreliable when you put these systems in environments that expect reproducibility and accountability.
Where I think the framing matters is in how we respond architecturally. Treating LLMs as “just another unreliable program” is reasonable — but enterprises already have patterns for dealing with unreliable components: isolation, validation, gates, and clear ownership of side effects.
The problem we’re seeing is that LLMs are often dropped past those boundaries — allowed to directly author decisions or actions — which is why the downstream damage you mention (journals, courts, OSS) feels so chaotic.
The “suggestion engine” framing isn’t meant to excuse that behavior; it’s meant to reassert a familiar control model. Suggestions are cheap. Execution and publication are not. Once you draw that line explicitly, you can start asking the same questions enterprises always ask: who approves, what’s logged, and what happens when this is wrong?
Without that separation, I agree — you’re effectively wiring an unreliable component straight into systems that assume trust, and the failure modes shouldn’t surprise anyone.
I've been thinking about how ISO-9000 will be reconciled with LLMs? Will businesses abandon their ISO-9000 certifications in favor of "We use AI" or will ISO-9000 adapt in some way to the "need" for LLMs?
I doubt ISO-9000 gets “replaced” so much as interpreted more strictly in the presence of LLMs. ISO-9000 isn’t about how work is done — it’s about whether processes are defined, repeatable, auditable, and improvable.
From that lens, LLMs actually create tension rather than an escape hatch. A system whose outputs can’t be reproduced, explained, or bounded makes it harder to demonstrate compliance, not easier. Saying “we use AI” doesn’t satisfy requirements around traceability, corrective action, or process control.
My guess is that ISO-style frameworks will push organizations toward explicitly classifying where LLMs are allowed to operate: as advisory inputs, as drafting aids, or as automation under defined controls — with clear ownership and validation steps around them.
In other words, the pressure probably won’t be to loosen standards, but to reassert them: define where probabilistic components sit, what checks exist before outputs become authoritative, and how failures are detected and corrected. Without that structure, it’s hard to see how certification survives unchanged.
If enterprises are deterministic, that’s what a coding LLMs are for. To create the deterministic part with the help of the LLM.
I mostly agree — using LLMs to help author deterministic code is a good fit. The distinction I’m trying to draw is that once that code exists, the determinism has to live outside the model.
LLMs can assist in creating the rules, but they shouldn’t be the place where those rules are enforced or bypassed at runtime.
reminds me of this article > https://unstract.com/blog/understanding-why-deterministic-ou...
Thanks for sharing — that article points at the same core tension. Determinism isn’t about rejecting probabilistic systems, it’s about deciding where uncertainty is allowed to live.
What keeps breaking in practice is when probabilistic reasoning leaks into places that expect reproducibility and accountability.