Heartbeat data not getting logged anymore for some machines

We are running version 3.2.1.66 and recently noticed there are roughly 10 servers (various locations, different computer groups in EventSentry, various subnets, various OS, all on the domain) which are no longer reporting Heartbeat data on the Agent Status page. I tried restarting the heartbeat service, pushing the configuration out, redeploying the agent, rebooting one of the servers, uninstalling and reinstalling the agent, setting custom heartbeat settings, etc., all with no luck.

I set one of the nodes to a high log level but am not seeing anything strange in the log. In the web reports, I went to Status --> Heartbeat --> Detailed and did a search for one of the nodes having the problem. Two records with the same name appeared. Selecting either one returned the same data. I then went to Maintenance Wizard --> Certain Instances and clicked the checkbox to Include hosts without an agent. In doing this, the two records were displayed. The first one looked like it was the one with the agent installed as it was monitoring multiple different instances. The second one only had Heartbeat and Ping Tracking. I decided to try deleting the second one to see if it fixed the problem but it didn't. After some time, there were again two records in the Maintenance Wizard. Being the first record also had Heartbeat and Ping Tracking, I deleted those instances. I still have two records in the Maintenance Wizard when Include hosts without an agent is selected, the agent record doesn't have the Heartbeat and Ping Tracking instances but the second record has them. And the Agent Status page still doesn't report the heartbeat data.

Any ideas?

Comments

  • We're sorry for the issues you're experiencing with the agent status page, as well as the duplicate entries.

    It looks like you have duplicate host entries in one of the EventSentry tables, resulting in a duplicate entry in the heartbeat status table, with the potential side effect of the host not showing up on the agent status page. It looks like different components in EventSentry are using different computer entries when they are logging to the database.

    Did somebody change those computers recently, e.g. assign IP addresses or move the computers into different groups?

    Are you comfortable issuing SQL commands? If so we can send you some instructions to confirm that this is indeed the case. You can also email our support team which can assist you further to get this resolved.
  • To my knowledge, no changes have been made to these nodes recently except upgrading the agent when we went to version 3.2.1.66 the end of July.

    I can work with our DBAs to run any SQL commands needed to resolve the issue.
  • Thank you. I'll be posting the instructions here, please feel free to email support at any time if you have a valid maintenance agreement.

    ===========================
    IMPORTANT: Before making any changes, I recommend the following:

    1. Make a backup of the ESEventlogComputer table, even if it's just into notepad/excel. Any changes you make can affect the integrity of all features, since all data points to this table.

    2. The EventSentry agent of the computer in question, the "Heartbeat Agent" and the "EventSentry Collector" (if used) services all should be STOPPED while you issue the commands. If that's not possible, make sure to restart them immediately after you made the change.
    ===========================

    First you would need to enumerate all computer entries for the affected host(s) to determine how many duplicates there are. E.g., if the host in question is called "FILESERVER", then run the following SQL Statement:

    SELECT * FROM ESEventlogComputer WHERE eventcomputer='FILESERVER'

    This will likely return 2 or more rows, of course we want to end up with only one row, the one with the highest "id" value in that table.

    As such, note down all results EXCEPT for the row which has the highest value for the "id" column. For example, you may get these rows:

    17 FILESERVER
    24 FILESERVER
    87 FILESERVER


    Now we will rename the hosts (deleting just from this table won't work as it would affect foreign indexes):

    UPDATE ESEventlogComputer SET eventcomputer='FILESERVER-ARCHIVE1' WHERE id=17
    UPDATE ESEventlogComputer SET eventcomputer='FILESERVER-ARCHIVE2' WHERE id=24

    You can of course rename the duplicate entries to anything you'd like, I just appended "ARCHIVE#" as one option. Repeating the above SQL statement SELECT * FROM ESEventlogComputer WHERE eventcomputer='FILESERVER' should now only result in one row:

    87 FILESERVER

    Now restart the EventSentry agent on FILESERVER and also restart the Heartbeat Monitor (and collector if used) services, and the duplicate entries should be a thing of the past.

    Thank you for bringing this to our attention and sorry for the inconvenience. This issue should be fixed in the upcoming 3.3 release of EventSentry.
  • Is it accurate that FILESERVER-ARCHIVE1 will have all the historical data and FILESERVER will start fresh like a newly monitored node?
  • What will most likely happen is that all historical Non-Heartbeat data will be available under FILESERVER-ARCHIVE1, whereas all heartbeat status/history data will be unchanged, since that was most likely already written to the correct lookup entry.

    Does that make sense?
  • Yes, thank you!
Sign In or Register to comment.