[Prev] Thread [Next]  |  [Prev] Date [Next]

[Bridge] Bridged vlan issue. Jonathan Thibault Tue Jun 17 00:02:24 2008

I have a strange issue with bridged vlan interfaces. I've discussed it at length in the ebtables mailing list and have gotten a fair bit of valuable feedback from there. It is still a bit unclear where the problem resides but it definitely seems ARP related.

First of all, this is kernel 2.6.23. I have two tg3 gigabit interfaces on the box, conveniently named: 'out' and 'in'. The vlans are on the 'in' side of the bridge, so in.2, in.3, in.4 ... in.6 while the 'out' interface is plain untagged ethernet.

As it is now, I only use ebtables to filter out anything that isn't ipv4 or arp, I do the rest of my filtering through iptables. There is also no STP on the bridge or anywhere in our network, though we might use it once I get this fixed.

In its current, working condition, the bridge (br0) has interfaces 'in.2' and 'out' with the clients on the 'in' side of the bridge, and the internet gateway on the 'out' side. Does the job brilliantly.

I start having problems when in.3 is added to the bridge (it exists and is up on the box, just not on the bridge). There are still no clients in vlan 3, but when I add it to the bridge, the bridge won't relay ARP replies from the gateway to some of my clients in vlan2, effectively disabling their internet.

The strange thing is that I see the reply come into the 'out' interface (with tcpdump), I see it on the 'br0' interface, and I also see it on the in.2 interface where it should be on its way to the customer. But putting a hub between the customer and the bridge box, I never see it. It's as if the arp reply just vanished just before it got fed to the ethernet cable. To the linux box, it's been sent, but it never shows up on the trunk.

I've also validated this by testing when only 'in.2' and 'out' are on the bridge, I see both requests and replies for affected customers go through the hub and everything works.

I know the tg3 driver does some vlan acceleration of sorts, that might have something to do with it, but something tells me I'd have the same problem with just one vlan interface on the bridge then.

As I said before, this only manifests in our production environment, so I have to be pretty careful with scheduling tests and what not, but I'd very much love some ideas to figure out where the vanishing packets go.

Bridge mailing list