ProTop Knowledge Base

ProTop Real-Time (RT) Monitoring

Panel Details

Performance Dashboard (d)

The dashboard panel is a high-level view of some of the more important metrics of a running database. Here you'll find information on database activity (reads, writes, etc...), connections (brokers, servers, users, AppServers, etc...) and so much more.

Remember that the values displayed are typically rates, expressed as values per second. If you prefer to see raw numbers, see the page on switching between raw and rate sampling.

This information is also available on the web in the ProTop Trends dashboard (requires the commercial version of ProTop).

Activity

ProTop RT Label	Alertable Metric*	Description
Hit%	HR	Percentage of database reads read from the database buffer cache
Log Reads	logRd	Number of logical record reads
Log Writes	logWr	Number of logical record writes
OS reads	osRd	Number of reads from disk
OS writes	osWr	Number of writes to disk
LogRd / LogWr	N/A	Ratio of logical reads to logical writes
LogRd / RecRd	N/A	Ratio of logical reads to records reads
Rec Reads	recRd	Number of records reads
Idx Reads	idxRd	Number of index reads
Rec Creates	recCr	Number of records creates
Idx Creates	idxCrx	Number of index creates
Rec Updates	recUp	Number of records updates
Rec Deletes	recDl	Number of records deletes
Idx Deletes	idxDlx	Number of index deletes
Idx Blk Spl	idxBlkSplit	Number of index block splits
Rec Waits	recWt	Number of processes waiting for records
Resrc Waits	resrcWt	Number of processes waiting for an internal database resource
Latch Waits	latchTMO	Number of processes waiting for a latch to access information in shared memory
Latch Reqs	latchReq	Number of latch requests
Latch / logRd	N/A	The ratio of latches requests to logical reads
Commits	trx	Number of transactions commits
Undos	trxUndo	Number of transactions undos

Insights to Activity

Hit%: This is the most used and abused metric in the database world. To be blunt, anything under 99% is bad. Let me explain:

If you have a 99% hit ratio, then one in every 100 database reads is not found in the buffer cache, forcing a read from disk. If the buffer hit ratio is 98%, this means there were two disk reads for every 100 logical reads. This is 100% worse than a 99% buffer hit percentage (i.e. twice as bad). It is not 1% worse.

There are all kinds of benchmarks out in the GoogleSphere talking about small random IOs that are typical of database accesses. Any way you slice it, a disk request will be thousands of times slower than a cache hit, and I don't care what your SAN vendor told you about his super-incredible-all-flash storage subsystem.

99% or bust !!

LogRd / RecRd: Normally, this should be a little more than 2, as each record read usually involves a read from an index block that contains the ROWID of the record and then the actual read of the data from the data block. The little more comes from associated reads like the template record and the traversal of the index from the root, through the branches, and finally to the leaf node.

Confused? Ask the experts.

Locks

ProTop RT Label	Alertable Metric*	Description
Lock Tbl HWM	lkTblHWM	Lock table concurrent locks high water mark. This is the highest number of concurrent record locks since the database was started or statistics were zeroed
Curr # Locks	currLks	Current number of record locks
Lock Tbl %	lkTblpct	Current number of locks as a percentage of the highest allowable number of entries in the lock table (-L startup parameter)
Rec Lk/s	N/A	Record locks per second
Lk Dura (ms)	lkDurMS	The average duration of a lock in milliseconds

System Metrics

ProTop RT Label	Alertable Metric*	Description
BogoMIPS	bogoMips	BogoMIPS measures the single-threaded performance of a 4GL client
Random IO (ms)	ioResponse	The time it takes to perform a random rea from a database file that is not in memory
Sync IO	synchIO	Time required to grow a bi file to 96 MB on the BI file system
(MB/sec)	synchIOMb	Sync IO time presented in MB/sec
User Exp SHM	usrExpSHM	Shared memory 4GL client db reads/second
User Exp LH	usrExpLH	Client/server 4GL client reads/sec, connecting via the localhost connection (not shown - part of commercial ProTop)
User Exp CS	usrExpCS	Client/server 4GL client reads/sec, connecting via across the LAN from another server (not shown - part of commercial ProTop)

Insights to System Metrics

BogoMIPS is the measure of single-threaded performance when running 4GL code. Higher numbers are better. Values of 6 and up mean the server is running on modern Intel Xeon CPUs (as of 2020) or comparable, 4-6 are older CPUs. Anything under 4 is old and slow.

We also expect the value to be consistent: if it's 5.2, then it should be around 5.2 all the time. If you see wild swings in bogoMIPS then it could be an indicator that your CPUs are over-burdened. If top/topas/nmon/glance (UNIX) or Performance Monitor (Windows) insists that your server is not using a lot of CPU, then a wildly varying bogoMIPS could point to an overprovisioned hypervisor.

Sync IO: Numbers greater than 10 are concerning. Numbers much greater than 10 are an indicator of a major disk subsystem problem.

How long can you freeze writes to the database before people notice? Our rule-of-thumb is 2 seconds. When the BI file needs to grow by one cluster, writes to the database are frozen. So what's your BI cluster size? Values of 8MB - 64MB are common. In our screen shot at the top of this page, we see that we can grow the BI file at 7.83 MB per second. If your BI cluster size is 8 MB or 16 MB, then probably no one will notice. If it's 64 MB, then all writes to the database will freeze for 64 / 7.83 = 8.17 seconds while the BI file expands by one cluster. I think your users will notice that.

But be careful! The easy answer is not necessarily to shrink your BI cluster size. You want your shortest checkpoints to last at least 30 seconds to give all the moving parts time to catch up (fish around our blog page for articles on BI cluster size). Shrinking your BI cluster size might simply be trading one performance problem for another.

With the commercial version of ProTop, we trend these metrics over years so that you can easily see when and how changes in infrastructure affected your OpenEdge systems.

Confused? Ask the experts.

RM Chain Activity

ProTop RT Label	Alertable Metric*	Description
Examined	rm_examined	Number of RM blocks examined to see if space is available
New RM	rm_allocNew	Number of new blocks created in the RM chain
From RM	rm_fromRM	Number of blocks allocated from the RM chain
RM Locked	rm_rmlocked	Number of blocks locked
From free	rm_fromFree	Number of blocks allocated from the Free chain
Frnt2Bk	rm_frnt2bk	Number of blocks moved from the front of the chain to the back because not enough space was available
Mod Bufs	modbufs	Number of modified buffers
Evicted	evictbufs	Number of blocks evicted from the RM chain

Insight into the RM Chain

The Record Manager (RM) chain is a linked list of pointers to data blocks containing space for new records. In a type 2 storage area each object has its own RM chain, so when a new record is added, for example, the DB manager first searches the RM chain, looking for a block with sufficient space to hold the new record. By default, up to 5 blocks on the chain will be examined before the inserting process gives up and allocates a new block (see the recspdepth parameter in the configuration screen).

Custom Application Metrics

Metric	Description
New Orders	Placeholder for user-defined metrics. See Custom Application KPIs.

What are Custom Metrics?

With the commercial version of ProTop, you can write your own Key Performance Indicators (KPI) and have them appear both in the Real-Time monitor and in the web portal. For example, you may wish to track the number of orders processed or the number of widgets manufactured so that you can correlate business activity against database activity. ProTop supports up to 8 such Custom Application KPIs.

Asynchronous Writers

ProTop RT Label	Alertable Metric*	Description
APW Writes	apwWrts	Number of APW writes
APW Write%	apwWrtPct	Percentage of database buffer block writes performed by APW
Bufs Scanned	apwBufsScan	Number of database buffers scanned
APW Scan Wrs	apwSanWrts	Number of scanned database buffers written
APW Q Wrts	apwQueWrts	APW queue writes
Chkpt Q Wrts	apwChkPtWrts	Checkpoint queue writes
Flushed Bufs	bufsFlushed	Number of database buffers flushed to disk at the end of the most recent checkpoint
Chkpt Len	minChkPtLen	Length, in HH:MM:SS, of the shortest checkpoint since the beginning of the ProTop session

AI/BI Activity

ProTop RT Label	Alertable Metric*	Description
Notes	biNotes	Number of notes written to the BI log / AI log
Wrts to log	biNotes	Number of notes written to the BI log / AI log
BI/AI Wrts	biWrts	Number of notes written to the BI file / AI file
BI/AI Write%	biwWrtPct	Percentage of BI / AI notes written by BIW / AIW
Partial Wr	biPartWrt	BI / AI blocks written, not because they are full but because they have been in memory over n seconds (default 3).
Busy Waits	biBusyWt	Number of times a process waited because the BIW / AIW was busy. Non-zero is normal.
Empty Wts	biEmptyWt	Number of times a process waited for an empty BI / AI buffer. Should always be zero.

DB Status and Recoverability

ProTop RT Label	Alertable Metric*	Description
DB UpTime	dbUpTime	Length of time since the database server process started
Backup Age	dbBkUpAge	String version of the elapsed time since the last full backup using probkup including days i.e. 10d 15:08. Will contain "Never!" if a backup has not occured. Third-party backups, OS backups, and incremental backups are not tracked.
Oldest TRX	OldTRX	Age of the oldest open transaction
Curr BIClstr	currBI	Current BI cluster in use
Old BIClstr	oldBI	Oldest BI cluster in use
Num BIClstrs	numBI	Number of BI clusters in use
BI MB Used	biUsed	Megabytes of BI file in use by BI clusters
Curr AI Ext	ai_Curr	AI extent number of the currently busy AI extent
Curr Seq#	ai_Seq	AI sequence number of the currently busy AI extent
Empty AI	ai_Empty	Number of AI extents empty and available
Full AI	ai_Full	Number of full AI extents awaiting archiving
Locked AI	ai_Locked	Number of locked AI extents (only useful with OpenEdge Replication)
pica Used	picaUsed	Number of Replication message entries in use.
pica Used%	picaUsedPct	Percent of total Replication message entries in use.

Recoverability Insight

BI MB Used: The physical size of the BI file(s) does not offer any insight into the actual amount of BI space used. A 10 GB BI file does not mean that you are actively using 10 GB of BI. ProTop calculates the actual number of BI clusters in use in order to calculate the effective size of the BI file.

Full AI Extents: Filling all AI extents has two rather unfortunate implications: if the AI extents are fixed in size, the database will crash, or at least stall if the -aistall parameter is in effect. More stealthily, online backups will fail as these require an AI file switch, impossible if there are no empty AI extents available. More often than we can count, we've come across situations where database backups have failed for weeks without anyone noticing.

OpenEdge Replication: OER adds another level of risk that must be mitigated: locked AI extents and full pica buffers. The DB Service Queue (commonly called the pica buffer because of the -pica startup parameter) contains pointers to AI notes in the AI files that have not yet been replicated to the target database(s). If the DB Service Queue fills, modifications to the database are halted until space becomes available. Additionally, full AI files cannot be cleared for reuse as long as they contain AI notes that have not yet been replicated, potentially leading to the same situation as the "all full" AI extents mentioned in the previous paragraph.

The commercial version of ProTop monitors all of these important metrics automatically.

Still confused? Ask the experts.

Connection Information

ProTop RT Label	Alertable Metric*	Description
Connections	con_Total	Number of connections to the database
-n %	dashNpct	Number of connections to the database as a percentage of maximum number of connections as set during database server startup (-n)
Brokers	con_Brok	Number of Broker processes
SQL Servers	con_SQLServ	Number of SQL Server processes
SQL Clients	con_SQL	Number of SQL client processes
4gl Servers	con_4glServ	Number of 4GL Server processes
4gl RemCnx	con_Remc	Number of remote client / server connections
App Server	con_APSV	Number of App Server connections
Web Speed	con_wta	Number of Web Speed connections
Local	con_Self	Number of connections using shared memory
Batch	con_Batch	Number of batch processes
BIW/AIW/WDOG	con_BIW	Values of 0 or 1 indicate the existence [1] or not [0] of the BI writer process, AI writer process and Watchdog process
AI Mgmt	con_aimd	Value of 0 or 1 indicates the existence [1] or not [0] of the AI File Management Daemon process
APWs	con_APW	Number of Asynchronous Page Writers
RPLA/RPLS	con_rpla	Number of OpenEdge Replication Agents (possible values are 0, 1 or 2) and OpenEdge Replication Servers (0 or 1)
Utilities	con_Util	Number of database utilities, such as promon or proutil running
TRX	con_TRX	Number of open transactions
Blocked rec	con_BlockRec	Number of users blocked waiting for a record
Other Blkd	con_Block	Number of users blocked for any reason other than waiting for a record (ex.: latch)

Helper Processes:

AIW and BIW: These processes read AI or BI buffers from memory and write them to disk. However, it is not true that they are responsible for 100% of AI/BI writes. Any process, in order to respect WAL (Write-Ahead Logging), may execute an AI or BI buffer write.

WDOG: The watchdog scans the user table for local processes and checks to see if the associated PID still exists. If not, it assumes the cleanup duty for the vanished process, undoing open transactions, freeing up resources, etc.

AIMGT: The AI File Management Daemon is responsible for rotating and archiving AI files. It relies on the values of -aiarcdir (AI Archive Directory) to know where to archive full AI files, and, optionally, -aiarcinterval (AI Archive Interval), to know at what frequency to rotate AI files. If -aiarcinterval is not set, the presumption is that the AI files are of fixed size and the AIMGT archives them when they fill.

Insight into Connection Information

AIW/BIW/WDOG and AIMGT: each database can only have 0 or 1 of these helper processes. We strongly recommend 1 over 0!

APW: While it is possible to start more than 1 APW, 99% of databases should not require more than 1. If you really think that you need 2 or more, share your use case on ProTop Community. We might agree with you, but we probably won't. At the very least, we'll be happy to explain why.