trxmon - Disconnect Long Running Sessions (ProTop 330+)
How do I identify and remove users who have long-running transactions from my OpenEdge database? (Linux and Windows)
NOTICE: Do you have an older version of trxmon running in your envornment? If so, jump to the bottom of this article to read about the conversion program provided in version 330.
CAVEAT: This is paliative care! With this approach we are simply protecting the database, technically the BI file, from misuse, or rather from not being re-used as intended. The root cause for this sort of intervention is bad code. Such logic must be identified and refactored to prevent old transactions from wreaking havoc in your system in the first place.Use ProTop Real-Time (RT) to help identify the procedure name and line number implicated in your long running transaction(s), and pass that information along to your development team for analysis and correction. See Finding Code: Old Transactions for more detail.
In the meantime...
ProTop includes a feature referred to as "trxmon", which can monitor and remove Progress client sessions that hold transactions open longer than is healthy for your environment.
NOTE: A transaction that qualifies for disconnect by trxmon will be:
- ACTIVE (Stat = ACTV in the ProTop RT "x" panel)
- Have a duration greater than trxThreshold (see below)
- And be idle for more than trxZapAfter (see below)
CAUTION: The ProTop agent (pt3agent) needs to be running with elevated privileges (root/administrator) to allow it to disconnect users (other than your own) or run proGetStack against sessions.
The components of this feature exist in subdirectories in your PROTOP installation:
- etc/protop.<custId>.cfg - set the parameters that control trxmon's behavior; see Setup below
- bin/trxmon.sh[.bat] - this is "trxmon", schedule this to run in etc/schedule.*.cfg
- util/trxmon.p - called by trxmon.sh[.bat]
- bin/disconnect[.bat] - default disconnect script called by trxmon.p
- bin/disconn.local - called by bin/disconnect, if it exists (allows you to add custom functionality)
- bin/zapconnect[.bat] - called by trxmon if the session becomes stuck, bin/disconnect sent the disconnect message, and the disconnect message was received by the session, but the session is not disconnecting; it will let you know if manual intervention is required
- bin/disconnx.local - called by bin/disconnectx, if it exists (allows you to add custom functionality)
- bin/killprosession.sh - not recommended for automation, but it can be run from bin/disconnx.local when you want to be more aggressive about removing the session; it uses progressively more aggressive attempts to kill the offending process; can also be run manually; read the script for more details and cautions
Setup
-
If you do not already have one, copy etc/protop.cfg to e.g. etc/protop.<custId>.cfg (or any other config file hierarchy name as required)
- Edit etc/protop.<custId>.cfg and update the variables as needed:
# trxmon
#
trxMonInt 60 # how often do we check for old,
# idle transactions? minimum 10
# seconds
trxThreshold 600 # minimum transaction duration
# before being considered for
# zapping, minimum 60
trxZapAfter 600 # minimum "db idle" while also in
# a transaction before being
# disconnected, minimum 60
trxDisconScript disconnect # script to disconnect sessions
# ($protop/bin and .sh or .bat
# will be added)
trxKillScript zapconnect # script to take stern measures
# ($protop/bin and .sh or .bat
# will be added)
trxDisconNag 3600 # how often to nag (send an alert)
# about disconnecting sessions
trxKillNag 3600 # how often to nag (send an
# alert) about killing sessions
trxStuckNag 3600 # how often to nag (send an
# alert) about stuck sessions
trxDisconAlert alert # what sort of alert to send when
# disconnection sessions
trxKillAlert alarm # what sort of alert to send when
# killing sessions
trxStuckAlert page # what sort of alert to send when
# sessions are stuck
trxUserExcludeList "" # a comma-separated list of users
# to be exempted from disconnection - Now add trxmon.sh[.bat] to your schedule.*.cfg, for example, for Unix, to run trxmon against the resource "friendlyName", every 15 minutes, add this line:
for unix:
0,15,30,45 * * * * trxmon.sh friendlyName > ${PTTMP}/trxmon.err 2>&1
or for Windows
0,15,30,45 * * * * cnd /c trxmon.bat friendlyName > %PTTMP%\trxmon.err 2>&1
The trxmon loops internally every trxMonInt seconds until it is asked to stop. When the above line attempts to start trxmon and finds it is already running, the attempt will exit. Adding [NOALERT] at the end of the line suppresses the alert normally sent to the portal when a job is run.
Shut Down
To stop the current run of the transaction monitor, remove tmp/trxmon.friendlyName.flg. The scheduler will restart it according to the configuration you provided, at the next quarter-hour in the example above.
To permanently disable trxmon, comment it out or remove it from your etc/schedule.*.cfg file.
Upgrading trxmon from older versions of ProTop (pre 330)
A conversion program (run from $PROTOP) is provided to translate old-style etc/trxmon*.cfg files to the new syntax in etc/protop*.cfg.
LINUX
bpro -p util/convtrxcfg.p > convtrxcfg.log 2>&1
WINDOWS
bpro -p util\convtrxcfg.p -basekey INI -ininame etc\protop.ini > convtrxcfg.log 2>&1
The conversion program will rename properties as needed and, when done, the old configuration files will be renamed with a “.old” suffix. Only non-default values will be added, as appropriate, to the etc/protop.*.cfg files.
NOTE: etc/protop.cfg will NOT be modified. Plain trxmon.cfg maps to protop.<custId>.cfg as the top-level config file.