How do I identify and remove users who have long-running transactions from my OpenEdge database? (Linux and Windows)
CAVEAT: This is paliative care! With this approach we are simply protecting the database, technically the BI file, from misuse, or rather from not being re-used as intended. The root cause for this sort of intervention is bad code. Such logic must be identified and refactored to prevent old transactions from wreaking havoc in your system in the first place.
Use ProTop Real-Time (RT) to help identify the procedure name and line number implicated in your long running transaction(s), and pass that information along to your development team for analysis and correction. See Finding Code: Old Transactions for more detail.
In the meantime...
ProTop includes a feature, referred to as "trxmon", which can monitor and remove Progress client sessions that hold transactions open longer than is healthy for your environment.
NOTE: A transaction that qualifies for disconnect by trxmon will be:
- ACTIVE (Stat = ACTV in the ProTop RT "x" panel)
- Have a duration greater than trxThreshold (see below)
- And be idle for more than trxZapAfter (see below)
AND: The ProTop agent (pt3agent) needs to be running with elevated privileges (root/administrator) to allow it to disconnect users or run proGetStack against sessions.
The components of this feature exist in subdirectories in your PROTOP installation:
- etc/trxmon.friendlyName.cfg - set the parameters that control trxmon's behavior; see Setup below
- bin/trxmon.sh[.bat] - this is "trxmon", schedule this to run in etc/schedule.*.cfg
- util/trxmon.p - called by trxmon.sh[.bat]
- bin/disconnect[.bat] - default disconnect script called by trxmon.p
- bin/disconn.local - called by bin/disconnect, if it exists (allows you to add custom functionality)
- bin/zapconnect[.bat] - called by trxmon if the session becomes stuck; bin/disconnect sent the disconnect message and the disconnect message was received by the session, but the session is not disconnecting; it will let you know if manual intervention is required
- bin/disconnx.local - called by bin/disconnectx, if it exists (allows you to add custom functionality)
- bin/killprosession.sh - not recommended for automation, but it can be run from bin/disconnx.local when you want to be more aggressive about removing the session; it uses progressively more aggressive attempts to kill the offending process; can also be run manually; read the script for more details and cautions
Setup
-
Copy etc/trxmon.cfg to etc/trxmon.friendlyName.cfg
-
Edit etc/trxmon.friendlyName.cfg and update the variables to your liking:
-
monInt 60 - how often do we check for old, idle transactions?
-
trxZapAfter 600 - minimum "db idle" while also in a transaction before being disconnected
-
-
Then, uncomment the correct set of disconnect script variables for your OS (this is for *nix):
# unix:4. Now add trxmon.sh[.bat] to your schedule.*.cfg, for example, for Unix, to run trxmon against the resource "friendlyName", every 15 minutes, add this line:
#
disconScript bin/disconnect # simple disconnect
killScript bin/zapconnect # take stern measures
# windows:
#
# disconScript bin\disconnect # simple disconnect
# killScript bin\zapconnect # take stern measures
for unix:
0,15,30,45 * * * * trxmon.sh friendlyName > ${PTTMP}/trxmon.err 2>&1
or for Windows
0,15,30,45 * * * * cnd /c trxmon.bat friendlyName > %PTTMP%\trxmon.err 2>&1
The trxmon loops internally every monInt seconds until it is asked to stop. When the above line attempts to start trxmon and finds it is already running, the attempt will exit. Adding [NOALERT] at the end of the line suppresses the alert normally sent to the portal when a job is run.
Shut Down
To stop the current run of the transaction monitor, remove tmp/trxmon.friendlyName.flg. The scheduler will restart it according to the configuration you provided, at the next quarter-hour in the example above.
To permanently disable trxmon, comment it out or remove it from your etc/schedule.*.cfg file.