Cache database Replication has stopped working.

Viewpoint
Teradata Employee

Cache database Replication has stopped working.

Hi Techos,

I enounter an issue in the dcs.log as "Cache database replication has stopped working". 

Please read the little story behind what I have done.

1. I set up a cluster in our DEV environment between Viewpoint Server 'A' and Viewpoint Server 'B'. 

2. I set up a cluster as follows 

    Active Server : VP Server A

    Standby Server : VP Server B

3. I have checked the distributed cluster properties file on both the servers and the configuration is set as mentioned in point 2. 

4. Active Server VP Server 'A' is hosting active cache db and the dcs. 

5. I monitored the cluster for a day or two and I checked the cache replication between the servers. I ran the following command and checked the statics for the replication. 

grep "Database Replication Stats"

result : Total Log Count 5, Restored segment files = 1, Unsuccessful Archive Attempts = 0, Unrestored segment files = 1.

6. When the cluster was working properly I checked the logs for the Unrestored segment files and at the max I saw the number at 4.

7. One fine day I stopped the dcs service on the Active Server 'A',  the Standby Server 'B' took over the role of running the dcs collection. This part worked fine. 

8. Then after 1 hour and I then started the dcs server on the Active Server 'A', and active dcs collection is started on Server 'A' and the Server 'B' goes into its usual standby mode. 

9. Initially when I set up the cluster, I configured the servers to trigger email alerts. When Active Server 'A' took over the dcs collection then an email was triggered as below, (This email was triggered from the Active Server 'A');

"Cache database replication has stopped working".

The standby cache database on Server 'B' is no longer restoring WAL segment files from the active cache database on 'Server A'. 

This email alert will be sent every hour unless the problem is corrected. 

10. I checked the dcs log for some investigation and I executed the following command;

grep "Database Replication Stats"

result : Total Log Count 2000, Restored segment files = 1, Unsuccessful Archive Attempts = 0, Unrestored segment files = 2700.

11. I observed that the count for Unrestored segment files increased to 2700. 

12. Every time when the dcs on active Server 'A' takes over from standby Server 'B' then this email is triggered. I observed this as a result of few test attempts. 

13. Could anyone please let me know how to fix the issue ?

      Do I need to clean up the unrestored segment files and set up the cluster again. How to resolve the issue ? 

Your assistance is greatefully appreicated.