Checking the Health of an Ethernet Cable on Exadata
Exachk does a number of health checks. One of them is to verify the connectivity, settings and such of the ethernet cables in the Exadata environment. If one is found amiss, the following will be returned in the Exachk summary report:
FAIL | OS Check | One or more Ethernet network cables are not connected. | Node_Name |
To validate the Exachk findings, you are asked to run the following as root on the specific node that is reporting the issue:
for cable in `ls /sys/class/net | grep ^eth`; do printf “$cable: “; cat /sys/class/net/$cable/carrier; done
This command, if run on one of the other nodes that were not showing up in Exachk without an issue, the following is returned:
eth0: 1
eth1: 1
eth2: 1
eth3: cat: /sys/class/net/eth3/carrier: Invalid argument
eth4: 1
eth5: 1
Running this on third node, where the report of the failure had occurred and the following is returned, clearly reporting the issue on eth1 and eth4 that requires a physical check of the connections reporting back with a value of “0”.
eth0: 1
eth1: 0
eth2: 1
eth3: cat: /sys/class/net/eth3/carrier: Invalid argument
eth4: 0
eth5: 1
“Invalid Argument” signifies that it is not in use, so disregard eth3. The values of 0 are the issue. This shows we have a problem. Andy Colvin, (our very own Enkitec Exadata Ninja!) recommended doing the following to just check a bit further.
As root on the third node, issue the following: ethtool eth3
$ ethtool eth3 Settings for eth3: Supported ports: [ TP ] Supported link modes: 10baseT/Half 10baseT/Full 100baseT/Half 100baseT/Full 1000baseT/Full Supports auto-negotiation: Yes Advertised link modes: 10baseT/Half 10baseT/Full 100baseT/Half 100baseT/Full 1000baseT/Full Advertised auto-negotiation: Yes Speed: Unknown! Duplex: Unknown! (255) Port: Twisted Pair PHYAD: 1 Transceiver: internal Auto-negotiation: on Supports Wake-on: pumbg Wake-on: g Current message level: 0x00000003 (3) Link detected: no
Yep, the lights are on, but nobody’s home… 🙂
At this point, someone *physically* needs to check the hardware. For the incident above, it was suspected that someone had yanked on the cable, damaging it and due to this it was replaced, ran through the above tests again to verify that all was well.
Funny, did a remote analysis of networking problems during hotsos where a network was an issue. Used ethtool too.
Hey Frits, (long time no see… :))
Feel free to offer any additional advice!
Kellyn
Hi,
It’s nice feature thanks for updating kelly.
Pingback: Why Automate Target Patching with Enterprise Manager 12c - Oracle - Oracle - Toad World