Given there was some interest to give an overview of the network design and layout of the MCH2022 network I did a small brain dump in this post.

Terrain

The terrain where MCH2022 was is the same as SHA2017. A lot of permanent infrastructures are already in place. From a network standpoint, the terrain is divided up into three parts called N, E and L. Each letter designates an area and it is followed by a number designating the building or field cabinet. The main buildings are N0, E0, and L0. N0 is closest to the road, E0 is next to the radio tower, and L0 is next to the permanent building. Each field cabinet is connected to one of these buildings with 8 fibre cores and 20 copper strands, for example, E1 is connected to E0, and L1 is connected to L0 Between N0 and E0, and E0 and L0 there are 40 fibre cores. Between N0 and L0 there are 8 cores.

Uplink

N0 is the closest to the public road, inside her the uplink dark fibre towards Almere is terminated onto an ADVA quadflex 3000, this transports 2x100Gbit waves inside a 50Ghz channel, one 100Gbit connection is patched to E0 the other one to L0.

In Almere, we have an alien wave towards Amsterdam. In NIKHEF the other ADVA is places and one 100Gbit is broken out locally and the other one is patched trough towards SmartDC in Rotterdam together with a second 100Gbit link to connect the two routers between NIKHEF and Rotterdam. With this, we end up with 4 routers, connected up in a square, each link between them is 100Gbit, except the link between the E0 and L0 routers that one is an LACP bundle of 2x100Gbit.

Backbone

The backbone of the network consists of an SR-MPLS network between the 4 routers, for the IGP we used IS-IS, and on top of this a full-mesh iBGP to propagate the full table from the routers in NIKHEF and SmartDC toward the routers on the field.

Between the two routers in E0 and L0 we also ran EVPN-MPLS to be able to provide sitewide L2 services for other teams.

Distribution layer

In E0, L0 and also across the dike (called L2A) was also a distribution switch these switches were connected with an ESI-LAG towards the routers in E0 and L0. For switches and locations that could not be multihomed, they were connected to this distribution layer. The L2A location was needed because in total we only had 2 fibre cores available for the whole area outside of the dike. this switch was therefore uplinked with 100Gbit BiDi optics.

Access layer logical design

Each access switch has its own /24 of ipv4 space and a /64 of ipv6 space. Given we have EVPN-MPLS between the two routers on the field and wanted some form of redundancy each access switch was connected with an active-active ESI-LAG between the two routers. Together with the active-active ESI-LAG, the gateway for each access switch was also an active-active gateway on both routers, that way each router would handle network traffic designated for other destinations locally.

Access layer physical design

On the physical layer, we build fibre rings originating from E0 and ending up in L0, each ring had up to 4 locations in there and at each location, we used a CWDM OADM to drop one wavelength and add it again in the other direction. For the optics we used 40Gbit-LR4s everywhere running in breakout mode (4x10), this means that on the E0/L0 side we had 1 optic in each router where all 4 channels would be active, but on the switch side where we also had LR4s only one channel would be active. We did configure all 4 channels on each switch so we did not have to worry about where we would place what switch.

Lessons learned

IPv6 RAs with an active-active anycast gateway were broken in the version of Junos we ran, and upgrade to a newer version did not help so we had to revert to having 2 different gateways being announced and let the client pick. Running a 2 node EVPN-MPLS setup with Juniper does not allow single nodes. If a node loses its last EVPN bgp session will shutdown all ESI interfaces.