How to configure OpenEdge Replication monitoring.
Replication Monitoring Configuration on the source database server:- Edit etc/pt3agent.[*].cfg
- Add the “replAgent” data collector to ptInitDC
- Restart the agent by removing tmp/pt3agent.[*].flg. The dbmonitor will restart the agent shortly.
The pt3agent on the source database server uses the repl.properties file to connect to one target database.
NOTE: Only one replication target DB can be connected by the source agent (future enhancement to connect both)
If there is something extra required to connect to the target database (ex.: -U -P), copy [PROTOPDIR]/etc/replx.pfx to [PROTOPDIR]/etc/replx.pf and add the parameters to that pf file (there is only one global replx.pf).
[This connection is currently used to calculate the lag between the source and the target, but this information has proven unreliable in OE 10 and 11. OE 12 is still unknown.]
replagent.p
Alert metrics are included in the default etc/alert.cfg (see the top of that file for more detail on these parameters):
| Metric Name | Type | Operator | Threshold | Sensitivity | Frequency | Message and Parameters | Action | 
| trxBehind | num | > | 1000 | "" | "hourly" | "Replication lag &1 &2 &3" | alert # verify usefulness | 
| agentCommStat | num | <> | 1 | "" | "hourly" | "Replication agent comm status &1 &2 &3" | alert | 
| agentStatus | num | <> | 3049 | "5:5" | "28800" | "Replication agent status &1 &2 &3" | page | 
| agentStatus | num | <> | 3049 | "3:3" | "28800" | "Replication agent status &1 &2 &3" | alarm | 
| agentStatus | num | <> | 3049 | "" | "hourly" | "Replication agent status &1 &2 &3" | alert | 
| picaFree | num | < | 50000 | "" | "hourly" | "&1 &2 &3" | alarm | 
| picaUsed | num | > | 1000 | "" | "hourly" | "&1 &2 &3" | alarm | 
| picaUsed | num | > | 0 | "" | "hourly" | "&1 &2 &3" | alert | 
| picaUsedPct | num | > | 30 | "" | "hourly" | "&1 &2 &3" | alarm | 
| picaUsedPct | num | > | 5 | "" | "hourly" | "&1 &2 &3" | alert | 
- agentCommStat: a value of 1 means the agent is connected, while 2 means disconnected
- agentStatus: the value of “dsrutil [db] -C status -detail”. The desired value is 6021 on the source and 3049 on the target. Your etc/alert.[*].cfg should always check for “num <> 3049”
Note: ProTop reads the agent status from the _repl-AgentControl VST, even on the source side.
The meaning of other values is available in the Progress documentation, including:
3048: Startup synchronization
3050: Recovery synchronization
3051: Online backup of the target DB
There are a few known bugs in OE Replication where the VST data is incorrect. In this case, we rely on secondary alerts to indicate a problem the replAgent data collector is not indicating. For example, the number of locked AI files will increase and the “ai_Locked” metric in alert.cfg will fire an alert and numerous database log file alerts may fire.
