Search My Techie Guy

Wednesday, August 13, 2014

Cacti graphs NOT returning the actual interface traffic throughput statistics (bits/sec)

Having successfully installed and configured cacti on RHEL 6.2, i was very excited and i couldn't wait to start monitoring my routers and switches. I work in a network that has over 100 Routers and 500 Switches, and you can imagine what a relief cacti was going to bring to my life!

However, my happiness was short lived after i graphed my first interface and cacti returned wrong values for the bandwidth utilization. This was a high speed interface (Gigabit Ethernet), that carries all our mobile internet traffic and the current throughput was about 400 Mbps, yet the cacti graph was reporting a maximum of 120 Mbps and the graph appeared like it was dropping traffic (crazy graph)! see screen shot below:

Problem:

when you try graphing a high speed interface using SNMPv1 and 32-bit counters in cacti
There was no way i was going to believe this graph, because another proprietary tool was reporting 400 Mbps and my link was very stable. Plus, i logged in the router and ran the interface statistics command and observed an average of 350 Mbps. This meant two things; my cacti tool had issues! or my router doesn't love cacti! at first i suspected the router because it's a HUAWEI NE40 and i have a bias when it comes to Chinese equipment! On so many occasions they have fallen short of the standard protocols. but i was wrong this time, the router is perfect, it was my cacti installation that had issues!

Like always i hit Google to find some answers and below is what i discovered before i finally fixed my cacti installation (Am super excited i did, and if you are in the same situation, i wish you the best).

Below are the lessons i learned, call them solutions:

Solution:

1. For high speed interfaces, you should use 64-bit counters if the device you are trying to monitor supports them. (refer to this article)
High Speed interfaces (100 Mbps or above)
  • ifHCInOctets1.3.6.1.2.1.31.1.1.1.6 (64-bit Octets in counter)
  • ifHCOutOctets1.3.6.1.2.1.31.1.1.1.10 (64-bit Octets out counter)
  • ifHCInUcastPkts1.3.6.1.2.1.31.1.1.1.7 (64-bit Packets in counter)
  • ifHCOutUcastPkts1.3.6.1.2.1.31.1.1.1.11 (64-bit Packets out counter)
  • ifHighSpeed1.3.6.1.2.1.31.1.1.1.15 (An estimate of the interface's current bandwidth in units of 1Mbps)
Low Speed interfaces
Lower speed interfaces can get by with 32-bit counters. If you use 32-bit counters on high-speed interfaces, they can wrap quickly; a 10 Mbps stream of back-to-back, full-size packets causes ifInOctets to wrap in just over 57 minutes. At 100 Mbps, the minimum wrap time is 5.7 minutes, and at 1 Gbps, the minimum is 34 seconds
  • ifInOctets1.3.6.1.2.1.2.2.1.10 (32-bit Octets in counter)
  • ifOutOctets1.3.6.1.2.1.2.2.1.16 (32-bit Octets out counter)
  • ifInUcastPkts1.3.6.1.2.1.2.2.1.11 (32-bit Packets in counter)
  • ifOutUcastPkts1.3.6.1.2.1.2.2.1.17 (32-bit Packets out counter)
  • ifSpeed1.3.6.1.2.1.2.2.1.5 (Currently negotiated speed of the interface - Max: 4.294 Gbps)
2. This was a great piece of information to land on, and i thought my problems where solved! but i was yet to learn more. I deleted the graph i had created with 32-bit counters and created one with 64-bit counters and hoped for the best. But sorry, the graph came out just as before, crazy!
Another hour bouncing around Google, i found out that 64-bit OID counters are only supported in SNMPv2 yet i was still using SNMPv1!

3. To confirm if my router can support 64-bit OIDs, i tried an SNMPWALK from my cacti server and it returned the correct values. replace the community_string with your actual community string e.g "public" and ip_address with the ip address of the router you are trying to monitor.

//64-bit Octets In Counter
snmpwalk -v2c -c community_string ip_address 1.3.6.1.2.1.2.2.1.6
//64-bit Octets Out Counter
snmpwalk -v2c -c community_string ip_address 1.3.6.1.2.1.31.1.1.1.10

4. At this point, i was using 64-bit counters and SNMPv2 for the snmpwalk test above, so i thought it was a done deal (problem solved!). i deleted the device from cacti and created it again using SNMPv2 and created the graph using 64-bit counters (combined two of the information pieces that i had learnt), and this time round, the graph was empty :-(, And whenever i would try running snmp query in debug mode via cacti, the query was successful but "NO SNMP DATA RETURNED!!!!!"

5. Google again was my immediate friend, and after about 5 hours of reading other people's problems, i landed on my third piece of important information; The php-snmp module requires a version of PHP 5.4 and above to be able to handle "snmpbulkwalks" used in SNMPv2. Refer to this article for details.

6. I was running PHP 5.3 at the moment and i never had PHP-SNMP module installed, i think my installation was using NET-SNMP to do snmpwalks which if you are only dealing with SNMPv1 should be enough to get you cacti working. See how i installed and enabled the PHP-SNMP module here.

7. I upgraded my PHP installation to version PHP 5.5, enabled the PHP-SNMP module, restarted Apache and everything was smooth afterwards. The snmp "Verbose Query" was returning data, i re-created my devices with SNMPv2 and tested. See how my graph returned the correct values (400 Mbps) and it was smooth & accurate. I wish the same for you. Have fun :-)

Cacti Monitoring High Speed Interfaces with SNMPv2 and 64-bit counters

No comments: