The temptation that every MSP understands
The easiest response to a noisy monitoring dashboard is suppression. Disable the alert. Put the device into maintenance mode. Delete the monitoring component. Move on to the next urgent thing, because there is always a next urgent thing.
It’s understandable. Admin interfaces that aren’t shut because nothing is plugged into them feel like low priority. A licence showing as expired on a service the client might not even use feels like clutter. Error rates on virtual interfaces feel like noise. So teams suppress, exclude, and over time, the monitoring system tells you less and less about what is actually happening on your network.
The problem is that businesses change. Policies that made sense eighteen months ago get forgotten. Devices put into maintenance mode for a supplier investigation never come out because nobody remembers to check. And then a client phones asking why they weren’t told about something, and the honest answer is that the alert was turned off three years ago for a reason nobody can recall.
Root cause resolution: what we did instead
We made the decision to resolve every single alert at the source rather than suppress it on the monitoring side. The team divided the work based on expertise: Robin handled the MikroTik router fleet, where around 90 devices needed Engine ID configuration. Jacques worked through the firewall alerts, investigating each one to determine whether the issue was genuine. Rudie covered servers and Zabbix process items. Then there was a final concentrated push where everyone worked together to clear the remaining alerts.
The specifics varied. Firewall interfaces that were active but unused got admin shut, which isn’t just tidier monitoring; it’s a genuine managed cyber security improvement. If an interface is administratively disabled, someone can’t simply walk into a building, plug a laptop into an open port, and start probing the network. For compliance frameworks that require unused ports to be disabled, this moved clients from theoretical policy to actual enforcement.
Licences flagged as expired or unused prompted us to go back to clients and ask whether those services should actually be active. In several cases, they shouldn’t have been, meaning the client had a capability sitting dormant that they’d forgotten about. Monitoring that had been configured using ping got switched to SNMP where it made more sense, giving us richer data. Devices still running SNMP v2 got identified and queued for the v3 encryption upgrade that had been partially rolled out.
Not everything needed fixing in the traditional sense. TACACS tunnels that show as down on backup links, for example, are behaving exactly as designed. They only come up when the primary fails over. Error rates on virtual interfaces turned out to be normal behaviour when users connect and disconnect. We stopped monitoring those on virtual interfaces but kept it active on physical fibre connections, where error rates genuinely indicate a problem developing on a link.
What the clean monitoring baseline actually looks like
After the concentrated effort (roughly one day of coordinated work across the team), the monitoring estate went from over 300 alerts to six. Those six are known issues with active investigations underway. Five hosts are currently paused, each because a supplier is actively working on something and we’re waiting for resolution.

Out of over 2,000 monitored items across the entire client estate, only a handful required intervention. That’s actually reassuring, because it validates that our deployment standards are working correctly. The cleanup didn’t reveal systemic problems with how we build environments; it revealed the accumulated small decisions that every busy team makes when something isn’t urgent enough to fix right now.
The difference is that now, when the dashboard shows an alert, it means something. The team’s daily experience has fundamentally changed from filtering signal out of noise to responding to genuine operational health information. And we’ve committed to running this exercise annually, making sure that the monitoring estate stays honest rather than slowly drifting back into comfortable suppression.
The proactive IT monitoring philosophy underneath
We’re not here to simply monitor networks and react when things go down. What we’re here to do is look at the overall health of the services we offer to our clients and make sure those services are genuinely providing value, not just generating dashboards that look busy but hide the things that actually matter.
When every alert on the dashboard represents a genuine issue rather than accumulated noise, monitoring becomes what it was always meant to be: an honest view of operational health that drives real decisions.
