Scientific Linux Forum.org



  Reply to this topicStart new topicStart Poll

> Network communication issue with lotsa interfaces.
watkinb
 Posted: Jan 10 2014, 05:57 PM
Quote Post


SLF Newbie


Group: Members
Posts: 9
Member No.: 2887
Joined: 10-January 14









I have an old rack mount computer (3.0GHz, 512MB RAM) running SL6.4 that was installed from the LiveCD.

In the computer, I have 6 PCI cards that are carriers for 4 port fiber optic PMC Mezzanine interfaces. This puts the total number of interfaces at 25 (including the built in ethernet port).

The computer used to run on Fedora Core 3 and all of the interfaces functioned fine at that time. I've installed SL6.4 on the machine and it now appears now that all but around 6 (sometimes more or less) of the interfaces will communicate. The catch is that this behavior isn't consistent across reboots. Sometimes the ones that won't communicate after one reboot start working on the next.

All interfaces show that they're up and that a link is detected.

This computer is not a networked computer and doesn't have USB ports, so I'm unable to provide an lspci output unfortunately.

the lspci command output is detecting the interfaces as:

CODE
Advanced Micro Devices [AMD] 79C970 [PCnet32 LANCE] (rev 44)


ethtool detects them as MII ports.

Any ideas or things for me to try?

If anymore information is needed, let me know. I'm not a linux guru, but I'm not a complete newbie either. I'm in the "know enough to be dangerous" category.

Thanks in advance!

I know my first post is about a problem. I'm on a short timeline at the moment. I promise I'll introduce myself in the introduction thread at some point.
PM
^
watkinb
 Posted: Jan 13 2014, 01:59 PM
Quote Post


SLF Newbie


Group: Members
Posts: 9
Member No.: 2887
Joined: 10-January 14









Something else I thought of... is there a limit to how many connections that Network-Manager can handle?

Currently, I'm going to try disabling network manager and services and edit rc.local to add "ifconfig" commands for each interface as well as "ethtool" to configure speed, duplex, et. al.

The old version of Fedora (2) that was on the computer had it's /etc/profile file edited to do just that (except for the ethtool commands). Even though a network manager is installed on Fedora, and the service was running, it didn't create any if config files.

Any thoughts?
PM
^
watkinb
 Posted: Jan 13 2014, 04:37 PM
Quote Post


SLF Newbie


Group: Members
Posts: 9
Member No.: 2887
Joined: 10-January 14









Results of more tinkering:

Computer 1:
- Four of 12 ports (4 ports per PCI interface) won't communicate. Link is detected on both the interfaces and their partners.
- The 4 ports that won't communicate are all on the same PCI interface.

I originally thought this was possibly a bad card... however, on...

Computer 2:
- Four of 12 ports WERE communicating fine, but stopped after several minutes. Link is detected on both the interfaces and their partners.
- The 4 ports that STOPPED communicating are all on the same PCI interface...

Interestingly enough, the two computers are 16 PCI slot rack mount computers, and these failing PCI devices in the same slot number on both machines.

Looking at the ifconfig results for the failing ports, most of them are reporting errors, dropped packets and overruns... but at least 2 of the 8 failing ports are reporting "0" for those values.
PM
^
watkinb
 Posted: Jan 14 2014, 03:08 PM
Quote Post


SLF Newbie


Group: Members
Posts: 9
Member No.: 2887
Joined: 10-January 14









Really? No one has any input? Have I stumped even the SL masters?
PM
^
Nicram
 Posted: Jan 14 2014, 03:38 PM
Quote Post


SLF Junior
**

Group: Members
Posts: 35
Member No.: 1399
Joined: 23-March 12









I think it is hard to find someone with similar hardware like You.
Have You tried with some different kernel version?

Some time ago UnBreakable kernel could be used with SL, maybe You should try it if it's still there smile.gif
PMUsers WebsiteAOLYahoo
^
watkinb
 Posted: Jan 14 2014, 05:35 PM
Quote Post


SLF Newbie


Group: Members
Posts: 9
Member No.: 2887
Joined: 10-January 14









Fair enough, I understand.

I was hoping though that maybe I could be given some pointers on things to try from those more familiar with Linux than I am.

Another option I had thought of was maybe to blow away SL and go back to Fedora as it used to work, but the new link partner interfaces I'm connecting to don't seem to want to communicate with ANY of my machine interfaces (protocol maybe)?

Another option is for me to find updated drivers for my interfaces and install them on Fedora, but all I can find is the source code, and I know nothing about compiling it for use on the old Fedora system... which doesn't have the kernel headers or compiler or even an internet connection (these are offline PCs that I'm working with).

I'll have a look at what I can do with a different kernel version. Thanks for the suggestion!
PM
^
burakkucat
 Posted: Jan 14 2014, 09:58 PM
Quote Post


SLF Administrator
****

Group: Admins
Posts: 202
Member No.: 14
Joined: 10-April 11









If you do decide to investigate alternate kernels, I will suggest that you try the kernel-lt package that is available from the ELRepo Project.

--------------------
user posted image 100% Linux and, previously, Unix. Co-founder of the ELRepo Project.
PMUsers Website
^
watkinb
 Posted: Jan 15 2014, 11:48 AM
Quote Post


SLF Newbie


Group: Members
Posts: 9
Member No.: 2887
Joined: 10-January 14









Well, I haven't gotten to trying alternative kernels just yet (I haven't done it before), but in some of my investigation yesterday, it appears that it may have something to do with the number of interrupts.

Executing cat /proc/interrupts, it appears that the 24 network interfaces are divided up and handled by 4 IRQs and using IO-APIC-fasteoi.

The network interfaces that are NOT communicating appear to be the last ones detected upon startup (eth20-23), so it acts like i'm hitting some sort of limitation. tcpdump shows NOTHING on any of the failing channels. Any indication that something is talking to them (if any) is the "Dropped Packets" number keeps incrementing each time I run ifconfig.

This morning, I tried reimaging the computer back to the version of Fedora I had on it before. NONE of the 24 interfaces can ping or communicate with their link partners. However, executing tcpdump shows that all channels are receiving and sending data (and the RX counter is incrementing with ifconfig). But the sent data isn't making it back to the link partner as it was with SL.

Executing tcpdump and viewing just the ARP communication, I see ARP requests making it to the ports, ARP replies going OUT on those ports (with the correct MAC addresses), but on the link partners, I see only the ARP requests going out, and not the replies that the Fedora computer shows is being sent out.

cat /proc/interrupts shows the same grouping of interrupts (different interrupt numbers though) and the same 6 interfaces per interrupt... but using IO-APIC-level.

I've looked at options for interrupt leveling (when I had SL on the computer), but it appears that it's already enabled. The computer is a dual core hyperthreading computer (2 processors, 4 siblings total) and when running full tilt with the networking, I'm less than 50% CPU load and still have over 100MB of RAM left. (200MB on Fedora).

I know that's a lot to post, but does anything stand out as red flags?

Summary:

Fedora - Receiving and sending out data on all NICs, but link partners are not receiving the sent data from any NICs.

SL - Recieving and sending successfully on all but the last 4 NICs (eth20-23).
PM
^
watkinb
 Posted: Jan 21 2014, 10:43 AM
Quote Post


SLF Newbie


Group: Members
Posts: 9
Member No.: 2887
Joined: 10-January 14









Just wanted to provide an update to this.

Although I really wanted to stay with Scientific Linux, just out of curiosity I tried installing a newer version of Fedora (14 was the latest distro compatible with my old setup), and although it detected and renamed my eth ports in some strange order, it seems to detect and communicate though all of them.

I don't know what the limitation is on SL6, but I wasn't able to get it to work with more than 20 ports. As was said, having so many ports isn't really a common configuration, so I don't hold it against SL, but just wish I could have gotten it working.

On the plus side, I added much more knowledge and experience with Linux in the process of troubleshooting.
PM
^
redman
 Posted: Jan 24 2014, 03:40 PM
Quote Post


Retired SLF Administrator
********

Group: Admins
Posts: 1276
Member No.: 2
Joined: 8-April 11









QUOTE (watkinb @ Jan 21 2014, 12:43 PM)
... just out of curiosity I tried installing a newer version of Fedora (14 was the latest distro compatible with my old setup)

You are free to play with Fedora 14, but please be aware that it not supported anymore.
So your system is not updated with security updates and such, where EL6 kind of clones are still up to date.

--------------------
"Sometimes the best helping hand you can give is a good, firm push."
PM
^
watkinb
 Posted: Jan 28 2014, 12:04 PM
Quote Post


SLF Newbie


Group: Members
Posts: 9
Member No.: 2887
Joined: 10-January 14









QUOTE (redman @ Jan 24 2014, 03:40 PM)
QUOTE (watkinb @ Jan 21 2014, 12:43 PM)
... just out of curiosity I tried installing a newer version of Fedora (14 was the latest distro compatible with my old setup)

You are free to play with Fedora 14, but please be aware that it not supported anymore.
So your system is not updated with security updates and such, where EL6 kind of clones are still up to date.


Well, this setup is not connected to any external network, so security isn't really a concern. The only network connections are between our unit under test and the setup itself.

Additionally, F14 was the latest version compatible with my old hardware. After that, the minimum RAM requirements were too large.
PM
^
0 User(s) are reading this topic (0 Guests and 0 Anonymous Users)
0 Members:

Topic Options Reply to this topicStart new topicStart Poll