What this article answers

Article summary

Large batches of AI-generated PLC code tend to fail because even modest per-rung error rates compound across sequential logic, while hidden scan-cycle dependencies make faults harder to isolate. Small batch delivery reduces this risk by limiting each iteration to 1 to 3 rungs, then forcing state changes and verifying I/O causality before adding more logic.

AI-generated ladder logic does not usually fail because the syntax is invalid. It fails because the logic is unverified across a deterministic execution model, and those are not the same problem. Syntax errors are visible; scan-order mistakes are often polite enough to wait until commissioning.

During internal benchmarking of Yaga, our AI lab coach, we observed a sharp batch-size effect: users generating full 15-rung sequences in one prompt produced 82% more unverified scan-dependency failures than users working in 3-rung increments. Methodology: n=96 guided lab attempts across motor sequencing and pump permissive tasks, baseline comparator = 1–3 rung iterative generation with simulation after each batch, time window = January–March 2026. This metric supports a bounded claim about error concentration during guided lab tasks inside Ampergon Vallis's environment. It does not claim an industry-wide defect rate for all AI PLC tools.

The engineering point is straightforward. In PLC work, large AI batches accumulate hidden assumptions faster than a human reviewer can validate them. Small batch delivery is not "agile for controls." It is a control-risk discipline.

What is "Batch Size Gravity" in PLC programming?

Batch Size Gravity is the tendency for AI-generated PLC logic to become less trustworthy as the number of generated rungs increases, because the probability of at least one consequential error rises with every added dependency.

The core math is standard reliability arithmetic. If each generated rung has a probability p of being functionally correct in context, the probability that n dependent rungs are all correct is:

P(success) = p^n

If we use a simplified example of 95% per-rung correctness, the batch-level result degrades quickly:

- Single rung: 0.95 = 95.0% - 5-rung batch: 0.95^5 = 77.4% - 10-rung batch: 0.95^10 = 59.9% - 20-rung batch: 0.95^20 = 35.8%

The important qualifier is "functionally correct in context." A rung can be syntactically valid and still be wrong because its permissive, latch behavior, reset path, analog threshold, or sequencing assumption is wrong for the process.

This is why large AI code dumps are mathematically fragile. Even optimistic local accuracy does not survive long dependency chains. In industrial control, a 35.8% all-correct probability is not a productivity issue. It is a commissioning liability.

The probability equation of AI code failure

The equation matters because PLC logic is not a bag of independent text fragments. It is an interacting state model executed repeatedly in a scan cycle.

Three distinctions matter:

A rung may look reasonable in isolation and still break the sequence once upstream state transitions occur.

Local validity is not system validity.

If Rung 8 assumes a bit is latched in Rung 2, one early mistake contaminates later behavior.

Dependent logic compounds faster than independent logic.

Real ladder programs include shared tags, seal-ins, reset conditions, analog thresholds, timers, counters, and fault branches. Dependencies are not shy.

The effective error rate is often worse than the nominal rate.

A popular misconception is that review time scales linearly with code length. It usually does not. Once the sequence crosses a certain size, review becomes state reconstruction.

Why do large AI prompts cause compounding scan cycle errors?

Large AI prompts cause compounding scan cycle errors because large language models generate plausible text patterns, while PLCs execute deterministic logic in a fixed order. The model predicts code tokens; the controller resolves state transitions.

Under IEC 61131-3 programming practice, ladder logic is interpreted within a deterministic scan structure: read inputs, execute program logic, update outputs, then repeat. Vendor implementations differ in details, tasking, and optimization behavior, but the governing engineering reality remains sequential execution with state dependence, not simultaneous semantic understanding.

That mismatch creates predictable failure modes when too much logic is generated at once:

A bit set earlier in the scan may be consumed later in the same cycle. If the AI places logic in the wrong order, the sequence can fail without obvious syntax issues.

Hidden order dependence

Multiple writes to the same output or internal bit may produce last-rung-wins behavior, ambiguous intent, or controller-specific surprises.

Double-coil and overwrite behavior

Seal-in logic often looks correct until an abnormal condition occurs and the bit never drops, or drops too early.

Broken latch and reset paths

Strictly speaking, many PLC issues are not software race conditions in the multithreaded sense. They are scan-order and state-transition faults. The distinction is worth keeping clean.

Race-like behavior in sequential logic

AI often generates the "happy path" first and under-specifies proof feedbacks, fault inhibits, and restart conditions.

Mismatched permissives and interlocks

A short contrast helps here: text coherence versus execution coherence. AI is optimized for the first. Commissioning punishes the second.

The disconnect between LLMs and sequential execution

The practical disconnect is easiest to see in a direct comparison.

| Perspective | How the logic is treated | Typical failure pattern | |---|---|---| | LLM output generation | A coherent block of related text produced from prompt context | Plausible but unverified assumptions across many rungs | | PLC CPU execution | Deterministic evaluation of logic in scan order with persistent tag state | Order-dependent faults, overwritten bits, broken sequences | | Human reviewer under time pressure | Visual inspection of a large ladder block | Missed dependencies until simulation or live commissioning |

This is why "it looks right" is such a weak acceptance criterion. Ladder logic is not judged by literary fluency.

How does small batch delivery improve PLC commissioning?

Small batch delivery improves PLC commissioning by reducing the number of unverified assumptions carried into each test cycle. It turns fault isolation from archaeology into inspection.

Operationally, small batch delivery means this: write 1 to 3 rungs, force a state change, observe the specific I/O causality in a simulator, and confirm the expected output before adding more logic.

That definition matters because "iterative building" is often used loosely. Here it refers to a very specific engineering behavior, not a mood.

The 3-step iterative verification loop

Use this loop for discrete and mixed discrete/analog logic:

Example: a motor start/stop latch with one command path and one output.

This approach improves commissioning in several concrete ways:

Faults are isolated closer to the change that caused them
Scan-order assumptions are exposed earlier
Abnormal states are tested deliberately rather than discovered accidentally
Review effort stays proportional to the batch
Rework cost drops because fewer downstream rungs depend on an unproven premise

The idea aligns with adjacent software delivery research, including DORA's repeated finding that smaller changes are generally easier to review, test, and recover from than larger ones (Forsgren et al., 2018). OT is not IT, and this should not be treated as a direct PLC-specific proof. But the underlying control principle transfers in a bounded way: smaller validated changes usually reduce recovery burden.

### Example: base latch first, permissive layer second

Small Batch Step 1: Base Latch Verification

Write the base function Build the minimum behavior that should work under ideal conditions.
Simulate and force I/O Toggle the relevant inputs, observe the output, and verify state retention, dropout behavior, and tag transitions. If the base path does not behave correctly, adding interlocks only improves the confusion.
Layer the permissives and abnormal-state logic Add overloads, E-stop conditions, proof feedbacks, alarm thresholds, timeout logic, and restart constraints only after the base function is proven.

| Start | Stop | Motor | |---|---|---| | NO contact | NC contact | Output coil |

Small Batch Step 2: Adding Permissive Layer

| Start | Stop | No_Fault | Motor | |---|---|---|---| | NO contact | NC contact | NO contact | Output coil |

The example is intentionally simple. The point is not that motor latches are difficult. The point is that engineers who skip base-state verification usually end up debugging three problems at once: command logic, permissive logic, and assumptions about device state.

Why is small-batch validation more important in OT than in general software?

Small-batch validation is more important in OT because control logic affects physical equipment, process state, and operator response, not just application behavior.

In a web application, a bad feature batch may degrade user experience or trigger rollback. In a live process, a bad control batch can create nuisance trips, hidden restart paths, deadheaded pumps, oscillating valves, or misleading HMI state. The process is under no obligation to be forgiving.

Three OT-specific factors raise the stakes:

PLCs are expected to behave predictably across repeated scans and known state transitions.

Determinism matters

Good control logic must define what happens during faults, not just during normal operation.

Abnormal conditions are part of the design space

Every avoidable debug cycle on site consumes labor, schedule, and confidence.

Commissioning windows are expensive

This is also where Simulation-Ready needs a proper definition. A Simulation-Ready engineer is not someone who merely knows ladder syntax. It is an engineer who can prove, observe, diagnose, and harden control logic against realistic process behavior before it reaches a live process.

That is the useful distinction: syntax versus deployability.

How does OLLA Lab teach iterative ladder logic building?

OLLA Lab teaches iterative ladder logic building by giving learners a bounded environment where they can write logic, simulate behavior, inspect I/O, and compare ladder state against virtual equipment state before any live deployment exists.

This is where the product becomes operationally useful. The value is not that it removes engineering judgment. The value is that it gives engineers a place to rehearse judgment on tasks that are too risky, too expensive, or too inconvenient to practice on real plant equipment.

Using guided workflows for risk-contained practice

OLLA Lab's workflow supports small-batch discipline through several linked behaviors:

Learners build rungs directly in the browser using contacts, coils, timers, counters, comparators, math functions, logical operations, and PID instructions.

Web-based ladder logic editor

Users can run logic, stop logic, toggle inputs, and observe outputs and variable states without physical hardware.

Simulation mode

Tag values, inputs, outputs, analog tools, PID dashboards, and scenario variables remain visible during testing, which makes causality easier to trace.

Variables panel and I/O visibility

The platform structures progression from first-rung basics to more advanced functions instead of dropping users into an empty editor and hoping for discipline.

Guided ladder-learning workflow

These environments let users compare control logic against equipment behavior in realistic machine contexts.

3D, WebXR, and VR simulations tied to digital twins

Presets across manufacturing, water, wastewater, HVAC, chemical, pharma, warehousing, food and beverage, and utilities expose learners to different interlocks, hazards, and control philosophies.

Scenario-based commissioning practice

Bounded product claim: OLLA Lab is a validation and rehearsal environment for high-risk commissioning tasks. It is not a certification, not a SIL claim, and not a substitute for supervised site competence.

What "digital twin validation" means here

Digital twin validation should not be treated as prestige vocabulary. In this context, it means testing ladder logic against a realistic virtual equipment model and checking whether commanded states, feedbacks, interlocks, alarms, and sequence transitions behave as intended before deployment.

That includes observable engineering behaviors such as:

comparing commanded motor state to simulated equipment response,
testing proof feedback loss,
observing alarm thresholds and trip behavior,
validating sequence progression,
checking whether a restart path is blocked or allowed under defined conditions.

A digital twin that cannot expose state mismatch is mostly scenery.

How should engineers practice AI-assisted PLC development safely?

Engineers should practice AI-assisted PLC development by treating AI as a draft generator inside a verification loop, not as an authority on process truth.

The safe workflow is disciplined and fairly plain:

Generate a small logic unit
Review tag names, state assumptions, and output writes
Simulate the unit
Force normal and abnormal inputs
Confirm output causality
Only then extend the sequence

This is also the right place to be explicit about AI assistance. Yaga, OLLA Lab's AI lab guide, can help users with onboarding, corrective suggestions, and ladder-logic guidance. It should be used to reduce learning friction, not to bypass verification. Draft generation is useful. Deterministic veto remains the engineer's job.

A practical evidence package beats a screenshot gallery

If an engineer wants to demonstrate competence in AI-assisted controls work, the right artifact is a compact body of engineering evidence, not a folder of polished screenshots.

Use this structure:

System Description Define the process unit, equipment, I/O, and control objective.
Operational definition of "correct" State exactly what successful behavior means in normal and abnormal conditions.
Ladder logic and simulated equipment state Show the implemented logic alongside the observed machine or process state.
The injected fault case Deliberately introduce a realistic failure such as proof loss, overload, bad analog value, or sequence timeout.
The revision made Document the logic change used to correct or harden the behavior.
Lessons learned Explain what the fault revealed about assumptions, scan order, permissives, or operator interaction.

That structure is much more informative than "here is my ladder diagram." Most real engineering value appears when the first assumption fails.

What standards and literature support this approach?

The small-batch argument rests on three support layers: established probability mathematics, deterministic PLC execution practice, and broader evidence that smaller validated changes improve recoverability.

Relevant anchors include:

IEC 61131-3 for programmable controller language structure and execution context in industrial automation practice.
IEC 61508 for the broader discipline of functional safety, including the importance of verification, validation, and systematic fault control.
exida guidance and safety lifecycle literature for practical treatment of systematic failure, verification rigor, and control-system quality.
DORA research for the bounded but useful adjacent finding that smaller changes generally improve delivery stability and recovery performance.
Digital twin and simulation literature in industrial engineering and control education showing value in virtual commissioning, scenario-based validation, and immersive training environments.

The transfer from software delivery research to OT should be made carefully. DORA does not prove a PLC-specific theorem. It supports a bounded inference: when changes are smaller and validated earlier, review and recovery usually improve. OT then adds deterministic execution and physical process consequences, which make the case stricter, not weaker.

Conclusion: what is the practical rule for AI-generated PLC logic?

The practical rule is simple: if you cannot explain the state transition and prove the I/O causality for the current batch, the batch is already too large.

Large AI-generated PLC programs are not dangerous because AI is uniquely mysterious. They are dangerous because deterministic control systems punish hidden assumptions, and large batches hide many of them at once.

Small batch delivery is the safer method because it aligns with how PLCs actually behave, how faults actually propagate, and how commissioning teams actually debug. Generate less, verify more, and make each scan-cycle assumption earn its place.

Keep exploring

Interlinking

Related reading

Explore the Pillar 1 hub →