Investigating sync errors in the Robot log and diagnosing data issues
Data for Audits and Staff entered into Internal Audit can be sync'd with Retain via the Robot service. Once a record from the system is copied to Retain it is considered 'matched', and any update in Internal Audit results in a corresponding update in Retain.
Sync problems can be caused in one of two ways:
- Where databases are not treated as a pair - for example:
- if one is restored to an earlier point in time, the other must be restored to the same point
- If one is copied to another environment, the other must be copied with it
- Very early versions (< v4) of the Robot could sometimes add multiple copies of the same person
- Sometimes users will add the same person or job manually to both applications, and expect (incorrectly) that they are both the same entity, and should be sync'ing.
Where two databases get out of sync you can find:
- Records from one of the databases being matched with incorrect records in the other.
- Only one partner in a pair being present.
Problems will normally be discovered by users, who will notice duplications of audits or staff, or that an update in Internal Audit doesn't show up in Retain.
The Robot service log will usually show errors such as the following:
ERROR: , Pentana.Tng.ServerRobot.RobotService, 15, 18/09/2021 13:57:09, Audit has a RetainID of 765 in Pentana, but no matching record is found in Retain.
Expected a JOB record to have the same ID. Are you sure these databases match?
ERROR: , Pentana.Tng.ServerRobot.RobotService, 15, 18/09/2021 13:57:09, Person has a RetainID of 1150 in Pentana, but no matching record is found in Retain.
Expected a RES record to have the same ID. Are you sure these databases match?
Where do we go from here?
We know that data can be in the following states:
-
Good
- Matched and up to date, so records are all good
- Retain record has not yet been added (if the robot has not yet run successfully since the record was added to Internal Audit)
-
May be good, requires review and maybe fixing
- Record exists in Retain only - likely it was made manually, and this is usually the case for things like Vacation, Training and other tasks that are not audit jobs. In some cases users will add a Retain record to compensate for a broken Robot job, and this causes an issue because you end up with the same logical thing in Internal Audit and Retain, but not linked via ID
-
Not good, with easy manual fix
- Matched pair of records out of sync (e.g. if a change was removed from the queue and not applied or if the robot has not yet run successfully since the update) - this can be fixed by making a change to the Internal Audit record and forcing a re-sync
-
Not good, requires database update
- Exists in Internal Audit only, yet has a Retain ID
- GUID is missing or doesn't match on both sides
- More than one Internal Audit record has the same Retain ID
We also know that correctly matched record pairs have the following properties:
-
People / Resources
PentanaDB.dbo.Person.Name = RetainDB.dbo.RES.RES_DESCR
PentanaDB.dbo.Person.GUID = RetainDB.dbo.RES.RES_TEXT1
PentanaDB.dbo.Person.RetainID = RetainDB.dbo.RES.RES_ID -
Audits / jobs
PentanaDB.dbo.Audit.Name = RetainDB.dbo.JOB.JOB_DESCR
PentanaDB.dbo.Audit.GUID = RetainDB.dbo.JOB.JOB_TEXT1
PentanaDB.dbo.Audit.RetainID = RetainDB.dbo.JOB.JOB_ID
So on this basis we can query the data and flag each record with a 'diagnosis'. The script Pentana - Retain sync ID mismatch analysis vx.sql compares the records across both databases and provides this feedback.
To use:
- Disable the Retain sync robot jobs, so that no further data changes are made before any fix script is run
- Run the script Pentana - Retain sync ID mismatch analysis.sql against the system. It's safe for production, no data changes are made. Before running the script, the Internal Audit and Retain database names must be substituted (placeholders 'RetainDB' and 'PentanaDB' are used in numerous locations)
- The script outputs two tables containing an analysis of the problems. Details of these are in comments in the script, so do read the comments.
Records and record pairs will be tagged as follows:
|
Detail |
Action required |
|
Matched |
None |
|
Name will update when robot runs |
None |
|
Record will add to Retain when robot runs |
None |
|
Check: in Retain only |
The record is in Retain only - check with users that this is correct (as would be the case for Vacation, Training etc). |
|
GUID mismatching, requires update |
The ID and Name match, but the Retain value at RES_TEXT1 or JOB_TEXT1 must be updated with the correct value via a fix script |
|
Internal Audit-only record with erroneous Retain ID |
The Retain ID stored in the Internal Audit record is wrong. If there is a correct one then the Internal Audit record must be updated. If there is no correct Retain record to match to, then the Internal Audit record should be set to 0 and the robot will add it next time it runs. These both require a script to fix |
|
Check: Name and GUID vary |
Where there is an ID on both sides, but the names and GUIDs match. It is necessary to check with the users whether these are the same record, and whether the Name and GUID need updating or whether the ID link needs breaking (in which case it's an 'Erroneous Retain ID') |
|
Missing GUID |
Add the Internal Audit GUID to the Retain *_TEXT2 field |
|
Duplicate Retain ID |
Figure out which is wrong in Internal Audit and remove the Retain ID from that record via script |
Note: while it is unusual to delete records in Internal Audit this can happen, leaving orphaned/soft-deleted Retain records (as indicated by the DELETED_BIT column). See this article for an example: The time it might be permissible to have a 'duplicate' Retain resource.
Once the results are available we can:
- Investigate problem data
- Produce a fix script if required, and then:
- Test and validate in UAT
- Backup, run and validate in Production
- Note that we can no longer make changes to Retain databases or support the Retain application. If there is a sync issue then it might be necessary for the customer to engage Retain International support and provide them with our guidance on changes required.