AI Industrial Automation

Article playbook

How to Detect Memory Leaks in Edge Automation Scripts with Python tracemalloc

Learn how to use Python's tracemalloc to identify memory growth in long-running edge automation scripts and validate fixes safely with persistent OLLA Lab simulations.

Direct answer

To detect memory leaks in long-running industrial automation scripts, engineers should use Python's `tracemalloc` module to compare memory allocation snapshots over time. Running those tests against persistent OLLA Lab simulations makes hidden leaks more observable before deployment to physical edge devices and live process environments.

What this article answers

Article summary

To detect memory leaks in long-running industrial automation scripts, engineers should use Python's `tracemalloc` module to compare memory allocation snapshots over time. Running those tests against persistent OLLA Lab simulations makes hidden leaks more observable before deployment to physical edge devices and live process environments.

Memory leaks in automation are usually not a Python problem in the abstract. They are a runtime-duration problem in a 24/7 system. A script that behaves perfectly for ten minutes can still fail in week two, which is not a philosophical issue when the edge box is feeding historians, APIs, alarms, or AI oversight.

A common misconception is that garbage collection makes long-running Python services "self-cleaning." It does not. Python reclaims objects that are no longer referenced; it does not rescue designs that keep references alive, leave sockets open, or accumulate threads and buffers indefinitely.

During a recent 48-hour stress test of a Python-based OPC UA data logger connected to OLLA Lab's Water Treatment preset, failing to explicitly close the connection loop produced a measured 2.4 MB/hour memory increase; the same script showed no visible fault in a 10-minute bench run. [Methodology: n=1 script variant under continuous simulated polling load, compared against the same script with explicit connection teardown, 48-hour window.] This supports one narrow point: short tests can miss long-duration memory instability. It does not establish an industry-wide failure rate.

Why do long-running automation scripts develop memory leaks?

Long-running automation scripts leak memory because they use dynamic allocation in environments that never really stop. PLC scan cycles are deterministic by design: memory is allocated for tags, function blocks, and execution structures in a bounded way. Python is not built on that model. It allocates objects as needed, tracks references, and relies on garbage collection to reclaim what is no longer reachable.

That distinction matters because edge automation is increasingly hybrid. The PLC still executes deterministic control, while Python on an IPC or gateway handles polling, protocol translation, API calls, local analytics, and sometimes supervisory AI logic. Useful architecture, yes. Forgiving architecture, no.

In practice, leaks appear when the script keeps objects alive longer than intended. The three most common OT sources are mundane and expensive:

The 3 most common sources of OT memory leaks

A fourth category is worth mentioning because it hides well: library-level buffering. The leak is not always in your code; sometimes your code simply activates it repeatedly.

  1. Unclosed sockets Ethernet/IP, Modbus TCP, OPC UA, MQTT, or HTTP sessions that are opened repeatedly but not closed cleanly will accumulate resources over time.
  2. Global list appends Historical tag values, alarm events, or API payloads stored in unbounded lists or dictionaries create steady memory growth unless a FIFO or retention limit is enforced.
  3. Thread or task accumulation New threads, async tasks, or retry workers launched on communication failures can persist indefinitely if they hang or are never joined and cleaned up.

How is memory behavior in Python different from a PLC scan cycle?

Python and PLC runtimes solve different problems, and they fail differently. A PLC scan cycle is built for repetitive, bounded execution against known I/O and memory structures. Python is built for general-purpose computing, where objects can be created, referenced, passed, cached, and retained in flexible ways.

The clean contrast is this: deterministic scan memory versus dynamic object lifetime.

That is why "it ran once" is almost meaningless for edge reliability. A commissioning engineer would not validate a pump alternation sequence with one start command and call it done. Software deserves the same skepticism.

This is also where Simulation-Ready needs a precise definition. In Ampergon Vallis usage, Simulation-Ready does not mean "familiar with syntax" or "comfortable in a code editor." It means an engineer can prove, observe, diagnose, and harden control-related logic against realistic process behavior before it reaches a live process. Syntax is necessary. Deployability is the test.

How does the `tracemalloc` module identify memory growth in Python?

`tracemalloc` identifies memory growth by tracing Python memory allocations and comparing snapshots over time. It hooks into Python's allocator and records where blocks were allocated, which allows engineers to inspect growth by file, line number, or traceback grouping.

That makes it useful for OT debugging because it answers the only question that matters once memory starts climbing: where is the growth originating?

A simple baseline pattern looks like this:

import tracemalloc import time

tracemalloc.start()

snapshot1 = tracemalloc.take_snapshot()

time.sleep(3600) # Run for 1 hour

snapshot2 = tracemalloc.take_snapshot() top_stats = snapshot2.compare_to(snapshot1, 'lineno')

print("[Top 5 Memory Growth Locations]") for stat in top_stats[:5]: print(stat)

This does not detect every possible memory issue in every dependency layer. It tracks Python-managed allocations, which is usually the right first move but not the final word. If a C extension or driver leaks outside Python's allocator, you may need OS-level tools as well.

A more useful industrial pattern is periodic snapshotting with controlled logging:

import tracemalloc import time from datetime import datetime

tracemalloc.start(25) # store deeper traceback history

baseline = tracemalloc.take_snapshot()

for cycle in range(1, 25): # example: 24 hourly checks time.sleep(3600)

current = tracemalloc.take_snapshot() stats = current.compare_to(baseline, 'lineno')

print(f"\n[Snapshot {cycle}] {datetime.now().isoformat()}") for stat in stats[:10]: print(stat)

The point is not to admire the output. The point is to establish whether memory growth is bounded, stable, and attributable.

What does `tracemalloc` actually prove, and what does it not prove?

`tracemalloc` proves that Python-managed allocations have increased between snapshots and shows where that increase is associated in code. It does not, by itself, prove that the increase is harmful, permanent, or operationally unacceptable.

That distinction matters because not all growth is a leak. Some memory growth is expected during startup, cache warm-up, model loading, or batch initialization. A leak is better defined operationally as memory growth that continues without a justified steady-state ceiling during the intended runtime profile.

For edge automation, the intended runtime profile is usually measured in days or weeks, not minutes.

A practical decision rule is:

- Expected growth: rises during startup, then stabilizes. - Suspicious growth: rises with workload spikes, then partially recovers. - Leak behavior: rises monotonically or stair-steps upward with no meaningful plateau.

This is why one snapshot pair is rarely enough. Trend matters. Industrial failures are often slow enough to pass a demo and fast enough to ruin a weekend.

How do you test edge scripts against long-duration PLC simulations?

You test edge scripts against long-duration PLC simulations by connecting the script to a persistent virtual process, running the intended polling or orchestration workload for hours or days, and comparing memory snapshots while process state continues to evolve.

Physical hardware is the wrong first place for this test. Tying up a PLC rack, remote I/O, and field devices for a 48- or 72-hour software stability trial is expensive, operationally awkward, and often impossible in a production-adjacent environment. The plant usually has other ideas.

This is where OLLA Lab becomes operationally useful. OLLA Lab is a web-based ladder logic and digital twin simulator that allows engineers to build logic, run persistent simulations, inspect I/O and variables, and validate behavior against realistic industrial scenarios. In this context, its value is bounded and practical: it provides a risk-contained, persistent environment for long-duration validation of edge-side software interacting with simulated control behavior.

Operationally, the workflow is straightforward:

  • Launch an OLLA Lab scenario with stable process behavior, such as a pump station, HVAC air handler, or water-treatment sequence.
  • Run the ladder logic in Simulation Mode.
  • Use the Variables Panel to confirm changing tags, analog values, outputs, and sequence states.
  • Connect the external Python script to the simulated tag environment or mirrored I/O workflow used for testing.
  • Start `tracemalloc`.
  • Let the script run under realistic polling, retry, logging, and fault-handling conditions for a sustained period.

The important point is persistence. A leak that appears after six hours is invisible in a five-minute test, and a digital twin does not get bored.

Image alt text: Screenshot of a split-screen workspace. On the left, the OLLA Lab Variables Panel shows a continuous simulated pumping sequence. On the right, a terminal window displays Python tracemalloc output, highlighting a memory leak at line 42.

What is the workflow for debugging a memory leak in OLLA Lab?

The workflow for debugging a memory leak in OLLA Lab is to establish a baseline, induce realistic load, compare memory snapshots, isolate the cause, refactor the script, and repeat the simulation until memory behavior stabilizes.

Step-by-step debugging workflow

  1. Establish the baseline Open an OLLA Lab industrial preset such as an HVAC Air Handler, lift station, or water-treatment process. Start the simulation and take an initial `tracemalloc` snapshot before sustained polling begins.
  2. Induce the load Run the Python script against the simulated process at the intended operating rate. For example, poll tags every 50 ms, write results to a local buffer, and trigger any normal API or historian calls. Maintain the run for at least four hours; longer is better when the production duty cycle is continuous.
  3. Compare snapshots Use `snapshot2.compare_to(snapshot1, 'lineno')` or periodic comparisons against the baseline to identify which lines or modules accumulate memory.
  4. Inspect the failure mode Determine whether the growth comes from connection handling, retained data structures, retries, async tasks, or library behavior. This is where engineering judgment matters more than syntax familiarity.
  5. Refactor and validate Close sockets explicitly, implement bounded queues, join or cancel threads, reduce object retention, or revise retry logic. Then rerun the same OLLA Lab simulation and confirm that memory growth reaches a stable ceiling or remains effectively flat.
  6. Document the evidence Keep the before-and-after snapshot deltas, runtime duration, simulated scenario, polling interval, and code revision notes. If the fix cannot be explained, it has not really been validated.

What does a real leak pattern look like in automation code?

A real leak pattern usually looks boring at first. That is part of the problem. The script keeps collecting data, the process keeps moving, and system load appears normal until memory pressure crosses a threshold and everything degrades at once.

Consider a simplified anti-pattern:

import time data_log = []

while True: tag_values = read_plc_tags() # returns dict of current values data_log.append(tag_values) # unbounded growth send_to_api(tag_values) time.sleep(0.05)

This code may be functionally correct and operationally reckless. If the process runs continuously, `data_log` becomes a memory sink.

A bounded version is safer:

import time from collections import deque

data_log = deque(maxlen=2000)

while True: tag_values = read_plc_tags() data_log.append(tag_values) send_to_api(tag_values) time.sleep(0.05)

The same principle applies to connections:

client = open_connection() while True: if need_refresh(): client = open_connection() # old connection may persist poll(client)

A safer pattern uses explicit lifecycle management:

while True: with open_connection() as client: poll(client) process_data(client) sleep_interval()

The exact implementation depends on the library, but the rule does not change: resource lifetime must be explicit in long-running OT code.

How long should a memory leak test run before you trust the result?

A memory leak test should run long enough to cover the intended duty cycle, communication rhythm, and fault-handling behavior of the deployed script. In practice, that usually means hours at minimum and often 24 to 72 hours for continuously running edge workloads.

There is no universal magic duration. A script polling every second with hourly batch uploads has a different risk profile than one polling every 50 ms with retry storms on intermittent comms. Test duration should be tied to the slowest relevant behavior in the system.

A reasonable engineering approach is:

- 1–2 hours: catches obvious runaway growth - 4–8 hours: catches many retention and buffering issues - 24+ hours: begins to represent continuous-duty behavior - 48–72 hours: more credible for edge services expected to run unattended

The test should also include abnormal states, not just nominal operation. A script that survives steady-state polling but leaks during reconnect storms is still a leaking script.

How should engineers build evidence that a script is actually hardened?

Engineers should build a compact body of validation evidence, not a screenshot gallery. The artifact should show what was tested, what "correct" meant, how the fault was induced, and what changed after revision.

Use this structure:

Define success in observable terms: stable memory over a 24-hour run, no unbounded object growth, successful reconnect behavior, and no loss of required tag updates.

State the fault introduced or observed: unclosed OPC UA session, unbounded event list, hung retry thread, or malformed reconnect loop.

Document the code or architecture change: explicit socket teardown, bounded queue, retry backoff, thread cleanup, or library substitution.

  1. System Description Describe the simulated process, ladder logic role, edge script function, polling interval, protocols, and runtime target.
  2. Operational definition of "correct"
  3. Ladder logic and simulated equipment state Record the OLLA Lab scenario, relevant sequence states, analog conditions, alarm conditions, and the ladder logic context the script interacted with.
  4. The injected fault case
  5. The revision made
  6. Lessons learned Summarize what the failure revealed about runtime assumptions, process interaction, and deployment risk.

That is the kind of evidence that supports engineering review. It also travels well across teams because it explains behavior, not just appearance.

How does this relate to standards, safety, and commissioning risk?

Memory leak testing is not the same thing as functional safety validation, but it is still a commissioning-risk issue. IEC 61508 and related safety practice focus on systematic integrity, lifecycle discipline, and the control of dangerous failures in electrical, electronic, and programmable systems. A leaking edge script may sit outside the core safety function, yet still create operational hazards through loss of visibility, delayed alarming, stale supervisory decisions, or failed integrations.

The safe distinction is simple: not every edge service is safety-related, but every unstable edge service is a reliability risk.

Digital twin validation is useful here because it allows repeated exposure to realistic process behavior without requiring live equipment. Literature across simulation-based engineering and industrial cyber-physical systems supports the value of high-fidelity virtual environments for validation, operator training, and fault analysis, provided claims are kept bounded to the task being simulated rather than treated as universal proof (Antonino et al., 2024; Tao et al., 2019; Villalonga et al., 2021).

In that frame, OLLA Lab should be understood as a validation and rehearsal environment for high-risk automation tasks. It is not a substitute for site acceptance testing, formal safety lifecycle work, or plant-specific hazard review.

When should you use OLLA Lab instead of physical hardware for memory testing?

You should use OLLA Lab instead of physical hardware when the engineering question is about long-duration behavior, repeatability, fault injection, and low-risk validation of software interacting with control logic.

That includes cases such as:

  • edge data loggers polling simulated PLC tags continuously
  • protocol bridges moving data between OT and IT systems
  • API-connected orchestration scripts
  • AI-assisted supervisory scripts that observe process state and issue bounded recommendations or commands
  • commissioning rehearsals where logic, I/O state, and abnormal scenarios need to be replayed repeatedly

Physical hardware still matters for final integration, timing validation, device-specific behavior, and environmental constraints. But hardware is a poor place to discover that a Python list has been growing quietly for 19 hours.

Conclusion

The practical answer is straightforward: if a Python automation script is expected to run continuously, `tracemalloc` should be part of the validation workflow. Short bench tests do not establish memory stability, and edge failures caused by slow leaks are exactly the kind of defect that survive superficial testing.

The stronger engineering pattern is to pair `tracemalloc` with a persistent simulation environment. That combination lets you observe memory behavior under realistic process conditions, isolate growth to specific code paths, revise the design, and rerun the same workload without tying up physical assets.

That is what Simulation-Ready work looks like in practice: not merely writing code that executes, but proving that it remains stable, observable, and correct when the process keeps moving.

Keep exploring

Related Reading

References

Editorial transparency

This blog post was written by a human, with all core structure, content, and original ideas created by the author. However, this post includes text refined with the assistance of ChatGPT and Gemini. AI support was used exclusively for correcting grammar and syntax, and for translating the original English text into Spanish, French, Estonian, Chinese, Russian, Portuguese, German, and Italian. The final content was critically reviewed, edited, and validated by the author, who retains full responsibility for its accuracy.

About the Author:PhD. Jose NERI, Lead Engineer at Ampergon Vallis

Fact-Check: Technical validity confirmed on 2026-03-23 by the Ampergon Vallis Lab QA Team.

Ready for implementation

Use simulation-backed workflows to turn these insights into measurable plant outcomes.

© 2026 Ampergon Vallis. All rights reserved.
|