Checking the Health of an Ethernet Cable on Exadata

Exachk does a number of health checks.  One of them is to verify the connectivity, settings and such of the ethernet cables in the Exadata environment.  If one is found amiss, the following will be returned in the Exachk summary report:

FAIL OS Check One or more Ethernet network cables are not connected. Node_Name

To validate the Exachk findings, you are asked to run the following as root on the specific node that is reporting the issue:
for cable in `ls /sys/class/net | grep ^eth`; do  printf “$cable: “; cat /sys/class/net/$cable/carrier; done

This command, if run on one of the other nodes that were not showing up in Exachk without an issue, the following is returned:
eth0: 1
eth1: 1
eth2: 1
eth3: cat: /sys/class/net/eth3/carrier: Invalid argument
eth4: 1
eth5: 1

Running this on third node, where the report of the failure had occurred and the following is returned, clearly reporting the issue on eth1 and eth4 that requires a physical check of the connections reporting back with a value of “0”.

eth0: 1
eth1: 0
eth2: 1
eth3: cat: /sys/class/net/eth3/carrier: Invalid argument
eth4: 0
eth5: 1

“Invalid Argument” signifies that it is not in use, so disregard eth3.  The values of 0 are the issue.  This shows we have a problem.  Andy Colvin, (our very own Enkitec Exadata Ninja!) recommended doing the following to just check a bit further.

As root on the third node, issue the following: ethtool eth3

$ ethtool eth3
 Settings for eth3:
 Supported ports: [ TP ]
 Supported link modes:   10baseT/Half 10baseT/Full
 100baseT/Half 100baseT/Full
 Supports auto-negotiation: Yes
 Advertised link modes:  10baseT/Half 10baseT/Full
 100baseT/Half 100baseT/Full
 Advertised auto-negotiation: Yes
         Speed: Unknown!
         Duplex: Unknown! (255)
 Port: Twisted Pair
 Transceiver: internal
 Auto-negotiation: on
 Supports Wake-on: pumbg
 Wake-on: g
 Current message level: 0x00000003 (3)
   Link detected: no

Yep, the lights are on, but nobody’s home… 🙂

At this point, someone *physically* needs to check the hardware.  For the incident above, it was suspected that someone had yanked on the cable, damaging it and due to this it was replaced, ran through the above tests again to verify that all was well.

Print Friendly, PDF & Email
March 7, 2013

Comments Closed