Documenti di Didattica
Documenti di Professioni
Documenti di Cultura
ii
2010 by Juniper Networks, Inc. All rights reserved. Juniper Networks, the Juniper Networks logo, Junos, NetScreen, and ScreenOS are registered trademarks of Juniper Networks, Inc. in the United States and other countries. Junos-e is a trademark of Juniper Networks, Inc. All other trademarks, service marks, registered trademarks, or registered service marks are the property of their respective owners. Juniper Networks assumes no responsibility for any inaccuracies in this document. Juniper Networks reserves the right to change, modify, transfer, or otherwise revise this publication without notice. Products made or sold by Juniper Networks or components thereof might be covered by one or more of the following patents that are owned by or licensed to Juniper Networks: U.S. Patent Nos. 5,473,599, 5,905,725, 5,909,440, 6,192,051, 6,333,650, 6,359,479, 6,406,312, 6,429,706, 6,459,579, 6,493,347, 6,538,518, 6,538,899, 6,552,918, 6,567,902, 6,578,186, and 6,590,785. Printed in the USA by Vervante Corporation. Version History: v1 June 2010 2 3 4 5 6 7 8 9 10
Key Contributors
Chandra Shekhar Pandey is a Juniper Networks Director of Solutions Engineering. He is responsible for service provider, enterprise and OEM partners solutions engineering and validation. Chandra has more than18 years of networking experience, designing ASICs, architecting systems and designing solutions to address customers challenges in the service providers, MSO and enterprise market. He holds a bachelors degree in Electronics Engineering from K.N.I.T, Sultanpur, India and a MBA in High Tech and Finance from Northeastern University, Boston, MA. Louise Apichell is a Juniper Networks Senior Technical Writing Specialist in the Solutions Marketing Group. She assisted as a content developer, chief editor and project manager in organizing, writing and editing this book. Louise specializes in writing and editing all types of technical collateral, such as white papers, application notes, implementation guides, reference architectures and solution briefs. Ravinder Singh is a Juniper Networks Director of Solution Architecture and Technical Marketing in the Solutions Marketing Group. He is responsible for creating technical knowledge bases and has significant experience working with sales engineers and channels to support Junipers Cloud/Data Center Solutions for the enterprise, service providers and key OEM alliances. Prior to this role, Ravinder was responsible for Enterprise Solutions Architecture and Engineering where his team delivered several Enterprise Solutions including Adaptive Threat Management, Distributed Enterprise and Juniper Simplified Data Center solutions. Ravinder holds a bachelors and masters degree in Electronics and a masters of business degree in IT Management and Marketing. Mike Barker is a Juniper Networks Technical Marketing Director, Solutions Engineering and Architectures. In this role, he focuses on developing architectures and validating multi-product solutions that create business value for enterprise and Service Provider customers. Prior to this role, Mike served in various Consulting and Systems Engineering roles for Federal, Enterprise and Service Provider markets at Juniper Networks, Acorn Packet Solutions and Arbor Networks. Earlier in his career, Mike held Network Engineering positions at Cable & Wireless, Stanford Telecom and the USAF. Mr. Barker holds a Bachelors of Science Degree in Business Management from Mount Olive College and a MBA from Mount St. Marys University. Karen Joice is a Juniper Networks Marketing Specialist who provided the technical illustrations for this book. Karen has been a graphic artist and marketing professional for more than 15 years, specializing in technical illustrations, Flash, and Web design, with expertise in print production. You can purchase a printed copy of this book, or download a free PDF version of this book, at: juniper.net/books.
iii
iv
Authors Acknowledgments
The authors would like to take this opportunity to thank Patrick Ames, whose direction and guidance was indispensible. To Nathan Alger, Lionel Ruggeri, and Zach Gibbs, who provided valuable technical feedback several times during the development of this booklet, your assistance was greatly appreciated. Thanks also to Cathy Gadecki for helping in the formative stages of the booklet. There are certainly others who helped in many different ways and we thank you all.
Preface
Preface
ENTERPRISES DEPEND MORE THAN EVER BEFORE on their data center infrastructure efficiency and business applications performance to improve employee productivity, reduce operational costs and increase revenue. To achieve these objectives, virtualization, simplification and consolidation are three of the most crucial initiatives to the enterprise. These objectives not only demand high performance server and network technologies, but also require smooth integration between the two as well to achieve optimal performance. Hence, successful integration of servers and simplified networking infrastructure is pivotal. This guide provides enterprise architects, sales engineers, IT developers, system administrators and other technical professionals guidance on how to design and implement a high-performance data center using Juniper Networks infrastructure and IBM Open Systems. With a step-by-step approach, readers can grasp a thorough understanding of design considerations, recommended designs, technical details and sample configurations, exemplifying simplified data center network design. This approach is based on testing performed using Juniper Networks devices and IBM servers in Juniper Networks solution labs. The IBM Open System Servers solution including IBM Power systems, System x, and Blade Center Systems comprises the foundation for a dynamic infrastructure. IBM server platforms help consolidate applications and servers, and virtualize its system resources while improving overall performance, availability and energy efficiency, providing a more flexible, dynamic IT infrastructure. Juniper Networks offers a unique best-in-class data center infrastructure solution based on open standards. It optimizes performance and enables consolidation which in turn increases network scalability and resilience, simplifies operations, and streamlines management while lowering overall Total Cost of Ownership (TCO). The solution also automates network infrastructure management, making existing infrastructure easily adaptable and flexible, especially for third-party application deployment. Key topics discussed in this book focus on the following routing and switching solutions in Junipers simplified two-tier data center network architecture with IBM open systems. Best practices for integrating Juniper Networks EX and MX Series switches and routers with IBM Open Systems. Configuration details for various spanning tree protocols such as Spanning Tree Protocol (STP), Multiple Spanning Tree Protocol (MSTP), Rapid Spanning Tree Protocol (RSTP), and Virtual Spanning Tree Protocol (VSTP); deployment
vi
scenarios such as RSTP/MSTP and Virtual Spanning Tree Protocol/Per-VLAN Spanning Tree (VSTP/PVST) with Juniper EX and MX Series (switches and routers) connecting to IBM Blade Center. Details for Layer 2 and Layer 3 multicast scenarios with Protocol Independent Multicast (PIM) and Internet Group Management Protocol (IGMP) snooping. Scenarios include video streaming client running on IBM servers with PIM implemented on network access and core/aggregation tiers along with IGMP snooping at the access layer. Low latency network design and techniques such as Class of Service (CoS) for improving data center network performance. Methods for increasing data center resiliency and high-availability. Configuration details for protocols such as Virtual Router Redundancy Protocol (VRRP), Redundant Trunk Group (RTG), Link Aggregation (LAG), Routing Engine Redundancy, virtual chassis, Nonstop Bridging (NSB), Nonstop Routing (NSR), Graceful Restart (GR) and In-Service-Software-Upgrade (ISSU). Juniper Networks realizes that the scope of data center network design encompasses many facets, for example servers, storage and security. Therefore, to narrow the scope of this book, we have focused on network connectivity implementation details based on Juniper EX, MX Series switches and routers and IBM Open Systems. However, as new relevant technologies and best practices evolve, we will continue to revise this book to include additional topics. Please make sure to send us your feedback with any new or relevant ideas that you would like to see in future revisions of this book, or in other Validated Solutions books, at: solutions-engineering@juniper.net.
Chapter 1
Introduction
TODAYS.DATA.CENTER.ARCHITECTS.and.designers.do.not.have.the.luxury.. of.simply.adding.more.and.more.devices.to.solve.networkings.constant.and. continuous.demands.such.as.higher.bandwidth.requirements,.increased.speed,. rack.space,.tighter.security,.storage,.interoperability.among.many.types.of.devices. and.applications,.and.more.and.more.diverse.and.remote.users .. This.chapter.discusses.in.detail.the.data.center.trends.and.challenges.now.facing. network.designers ..Juniper.Networks.and.IBM.directly.address.these.trends.and. challenges.with.a.data.center.solution.that.will.improve.data.center.efficiency.by. simplifying.the.network.infrastructure,.by.reducing.recurring.maintenance.and. software.costs,.and.by.streamlining.daily.management.and.maintenance.tasks .
Trends
Although there are several types of data centers for supporting a wide range of applications such as financial, web portals content providers, and IT back office operations, they all share certain trends, such as: More Data Than Ever Before Since the dawn of the computer age, many companies have struggled to store their electronic records. That struggle can be greater than ever today, as regulatory requirements can force some companies to save even more records than before. The growth of the Internet may compound the problem; as businesses move online, they need to store enormous amounts of data such as customer account information and order histories. The total capacity of shipped storage systems is soaring by more than 50 percent a year, according to market researcher IDC. The only thing that is growing faster than the volume of data itself is the amount of data that must be transferred between data centers and users. Numerous large enterprises are consolidating their geographically distributed data centers into mega data centers to take advantage of cost benefits and economies of scale, increased reliability, and to exploit the latest virtualization technologies. According to research conducted by Nemertes, more than 50 percent of companies consolidated their dispersed data centers into fewer but larger data centers in the last 12 months, with even more planning to consolidate in the upcoming 12 months. Server Growth Servers are continuing to grow at a high annual rate of 11 percent, while storage is growing at an even higher rate of 22 percent: both of which are causing tremendous strain on the data centers power and cooling capacity. According to Gartner, OS and application instability is increasing the server sprawl with utilization rates of 20 percent, leading to an increased adoption of server virtualization technologies. Evolution of Cloud Services Cloud computing is a style of computing in which dynamically scalable and often virtualized resources are provided as a service over the Internet. Large enterprises are adopting cloud-computing methodology into their mega data centers. Smaller businesses that cannot afford to keep up with the cost and complexity of maintaining their privately owned data centers may look to outsource those functions to cloud-hosting providers.
Challenges
Todays major data center challenges include scale and virtualization, complexity and cost, and interconnectivity for business continuity, and security: Scale and Virtualization With the evolution of mega data centers and cloud-computing architectures, tremendous strain is being placed on current network architectures. Scaling networking and security functions can quickly become a limiting factor to the success of growing data centers as they strive to meet stringent performance and
high-availability requirements. However, simply adding more equipment is not always satiating the appetite of hungry mega data centers. If the network and security architecture does not enable application workload mobility and quick responses to variable capacity requirements to support multi-tenancy within servers (as required in a cloud environment) then the full value of data center virtualization cannot be realized. Complexity and Cost Many data centers have become overly complex, inefficient and costly. Networking architectures have stagnated for over a decade, resulting in network device sprawl and increasingly chaotic network infrastructures designed largely to work around low-performance and low-density devices. The ensuing capital expenses, rack space, power consumption and management overhead all add to the overall cost, not to mention the environmental impact. Unfortunately, instead of containing costs and reallocating the savings into enhancing and accelerating business practices, the IT budget all too often is misappropriated into sustaining and rapidly growing already unwieldy data center operations. Emerging applications that use Service Oriented Architecture (SOA) and Web services are increasingly computational and network intensive; however, the network is not efficient. Gartner (2007) asserts that 50 percent of the Ethernet switch ports within the data center are used for switch interconnectivity. Interconnectivity for Business Continuity As data centers expand, they can easily outgrow a single location. When this occurs, enterprises may have to open new centers and transparently interconnect these locations so they can interoperate and appear as one large data center. Enterprises with geographically distributed data centers may want to virtually consolidate them into a single, logical data center in order to take advantage of the latest technology. Security The shared infrastructure in the data center or cloud should support multiple customers, each with multiple hosted applications, provide complete, granular and virtualized security that is easy to configure and understand, and support all major operating systems on a plethora of mobile and desktop devices. In addition, a shared infrastructure should integrate seamlessly with existing identity systems, check host posture before allowing access to the cloud, and make all of this accessible for thousands of users, while protecting against sophisticated application attacks, Distributed Denial of Service (DDoS) attacks and hackers. Today, a data center infrastructure solution requires a dynamic infrastructure, a high performance network and a comprehensive network management system.
10
IBM Power System The IBM Power Systems family of servers includes proven server platforms that help consolidate applications and servers, virtualize its system resources while improving overall performance, availability and energy efficiency, and providing a more flexible, dynamic IT infrastructure. A Power server can run up to 254 independent servers each with its own processor, memory and I/O resources within a single physical Power server. Processor resources can be assigned at a granularity of 1/100th of core. IBM System x The IBM System x3850 X5 server is the fifth generation of the Enterprise X-Architecture, delivering innovation with enhanced reliability and availability features to enable optimal performance for databases, enterprise applications and virtualized environments. According to a recent IBM Redbooks paper, a single IBM System x3850 X5 host server can support up to 384. For details, please refer to High density virtualization using the IBM system x3850 X5 at www.redbooks.ibm.com/ technotes/tips0770.pdf. IBM BladeCenter The BladeCenter is built on IBM X-Architecture to run multiple business-critical applications with simplification, cost reduction and improved productivity. Compared to first generation Xeon-based blade servers, IBM BladeCenter HS22 blade servers can help improve the economics of your data center with: Up to 11 times faster performance Up to 90 percent reduction in energy costs alone Up to 95 percent IT footprint reduction Up to 65 percent less in connectivity costs Up to 84 percent fewer cables For detailed benefits concerning the IBM BladeCenter, please refer to www-03.ibm.com/systems/migratetoibm/systems/bladecenter/.
11
Network virtualization with MPLS in the Juniper Networks MX Series 3D Universal Edge Routers and the Juniper Networks M Series Multiservice Edge Routers enables network segmentation across data centers and to remote offices for applications and departments without the need to build separate or overlay networks. Juniper Networks Junos operating system operates across the network infrastructure, providing one operating system, enhanced through a single release train, and developed upon a common modular architecturegiving enterprises a 1-1-1 advantage. J-Care Technical Services provide automated incident management and proactive analysis assistance through the Advanced Insight Solutions technology resident in Junos OS. MX Series 3D Universal Edge Routers The Juniper Networks MX Series 3D Universal Edge Routers are a family of highperformance Ethernet routers with powerful switching features designed for enterprise and service provider networks. The MX Series provides unmatched flexibility and reliability to support advanced services and applications. It addresses a wide range of deployments, architectures, port densities and interfaces. Highperformance enterprise networks typically deploy MX Series routers in high-density Ethernet LAN and data center aggregation, and the data center core. The MX Series provides carrier grade reliability, density, performance, capacity and scale for enterprise networks with mission critical applications. High availability features such as nonstop routing (NSR), fast reroute, and unified in service software upgrade (ISSU) ensure that the network is always up and running. The MX Series delivers significant operational efficiencies enabled by Junos OS, and supports a collapsed architecture requiring less power, cooling and space consumption. The MX Series also provides open APIs for easily customized applications and services. The MX Series enables enterprise networks to profit from the tremendous growth of Ethernet transport with the confidence that the platforms they install now will have the performance and service flexibility to meet the challenges of their evolving requirements. The MX Series 3D Universal Edge Routers include the MX80 and MX80-48T, MX240, MX480 and MX960.Their common key features include: 256K multicast groups 1M MAC address and V4 routes 6K L3VPN and 4K VPLS instances Broadband services router IPsec Session boarder controller Video quality monitoring As a member of the MX Series, the MX960 is a high density Layer 2 and Layer 3 Ethernet platform with up to 2.6 Tbps of switching and routing capacity, and is the industrys first 16-port 10GbE card. It is optimized for emerging Ethernet network
12
architectures and services that require high availability, advanced QoS, and performance and scalability that support mission critical networks. The MX960 platform is ideal where SCB and Routing Engine redundancy are required. All major components are field-replaceable, increasing system serviceability and reliability, and decreasing mean time to repair (MTTR). The enterprise customers typically deploy MX960 or MX480 in their data center core. NOTE We deployed the MX480 in this handbook. However, the configurations and discussions pertaining to the MX480 also apply to the entire MX product line. EX Series Ethernet Switches As a member of the EX Series Ethernet Switches, the EX4200 Ethernet switches with virtual chassis technology and the EX8200 modular chassis switches are commonly deployed in the enterprise data center. We used the EX4200 and EX8200 for most of our deployment scenarios. EX4200 Ethernet Switches with Virtual Chassis Technology The EX4200 line of Ethernet switches with Virtual Chassis technology combine the HA and carrier class reliability of modular systems with the economics and flexibility of stackable platforms, delivering a high-performance, scalable solution for data center, campus, and branch office environments. The EX4200 Ethernet switches with virtual chassis technology have the following major features: Deliver high availability, performance and manageability of chassis-based switches in a compact, power-efficient form factor. Offer the same connectivity, Power over Ethernet (PoE) and Junos OS options as the EX3200 switches, with an additional 24-port fiber-based platform for Gigabit aggregation deployments. Enable up to 10 EX4200 switches (with Virtual Chassis technology) to be interconnected as a single logical device supporting up to 480 ports. Provide redundant, hot-swappable, load-sharing power supplies that reduce mean time to repair (MTTR), while Graceful Route Engine Switchover (GRES) ensures hitless forwarding in the unlikely event of a switch failure. Run the same modular fault-tolerant Junos OS as other EX Series switches and all Juniper routers. EX8200 Modular Chassis Switches The EX8200 Modular chassis switches have the following major features: High-performance 8-slot (EX8208) and 16-slot (EX8216) switches support data center and campus LAN core and aggregation layer deployments. Scalable switch fabric delivers up to 320 Gbps per slot 48-port 10/100/1000BASE-T and 100BASE-FX/1000BASE-X line cards support up to 384 (EX8208) or 768 (EX8216) GbE ports per chassis.
13
48-port 100/1000BASE-T and 100BASE-FX/100BASE-X line cards support up to 384 (EX8208) or 768 (EX8216) GbE ports per chassis. 8-port 10GBASE-X line cards with SFP+ interfaces deliver up to 64 (EX8208) or 128 (EX8216) 10-GbE ports per chassis. Carrier-class architecture includes redundant internal Routing Engines, switch fabrics, and power and cooling, all ensuring uninterrupted forwarding and maximum availability. Run the same modular fault-tolerant Junos OS as other EX Series switches and all Juniper routers. Juniper Networks high-performance data center network architecture reduces cost and complexity by requiring fewer tiers of switching, and consolidating security services, a common operating system, and one extensible model for network management. As shown in the Figure 1.1, the Junos OS runs many data center network switching, routing and security platforms, including Juniper Networks EX Series, Juniper Networks MX Series, and Juniper Networks SRX Series, and IBM j-type data center network products Juniper Networks original equipment manufacturer (OEM) for the EX and MX Series. For details concerning product mapping between IBM and Juniper Networks products, see Table 1.1 at the end of this chapter or visit the website, IBM and Junos in the Data Center: A Partnership Made for Now, at https://simplifymydatacenter.com/ibm .
EX8216
EX8208
NSM
NSMXpress SRX3000 Line MX Series SRX650 SRX240 M Series SRX100 SRX210 J Series EX4200 Line EX3200 Line EX2200 Line
SECURITY
ROUTERS
SWITCHES
Figure 1.1
Junos Operating System Runs on the Entire Data Center Network: Security, Routers, and Switching Platforms
14
Figure 1.2
REMOTE/CLOUD USER
SSL VPN
EX4200
IBM System z
EX8208 EX8216
EX4200
SRX100
IBM System p
EX4200
IBM System x and BladeCenter
Virtual Chassis
SBR Appliance
WAN NETWORK
SECURITY
NETWORK
MANAGEMENT
MX240 MX480 MX960 SRX5600 SRX5800 IC6500 Unied Access Control SA6500 EX4200
Federated Identity Manager IBM System z Access Manager IBM System p
WXC3400
EX8208 EX8216
EX8208 EX8216
SBR Appliance
NetView
EX4200
EX4200
Virtual Chassis
EX4200
Network Netcool Manager
IBM System x
Blade Center
SERVER
Provisioning Manager
STORAGE
15
Figure 1.2
Virtual Chassis
EX4200
EX4200
SRX650
WXC2600
HEADQUARTERS
IC4500
SMALL BRANCH
SRX3600
EX4200
EX4200
Tivoli Storage Manager Fastrack (TSMF)
EX4200
Virtual Chassis
WAN NETWORK
SECURITY
NETWORK
MANAGEMENT
MX240 MX480 MX960 SRX5600 SRX5800 IC6500 Unied Access Control SA6500 EX4200
Federated Identity Manager IBM System z Access Manager IBM System p
WXC3400
EX8208 EX8216
EX8208 EX8216
SBR Appliance
NetView
EX4200
EX4200
Virtual Chassis
EX4200
Network Netcool Manager
IBM System x
Blade Center
SERVER
Provisioning Manager
STORAGE
16
IBM Tivoli and Juniper Networks Junos Space for Comprehensive Network Management Solution
Managing the data center network often requires many tools from different vendors, as the typical network infrastructure often is a complex meshed network deployment. This type of network deployment combines different network topologies and often includes devices from multiple vendors and network technologies for delivery. IBM Tivoli products and Juniper Networks Junos Space together can manage data center networks effectively and comprehensively. The tools include: IBM System Director Tivoli Netcool/OMNIbus IBM Tivoli Provisioning Manger Junos Space Network Application Platform Juniper Networks Junos Space Ethernet Activator Juniper Networks Junos Space Security Designer Juniper Networks Junos Space Route Insight Manager Juniper Networks Junos Space Service Now MORE For the latest IBM and Juniper Networks data center solution, visit http://www. juniper.net/us/en/company/partners/global/ibm/#dynamic.
17
secure public cloud to ensure that high priority applications are given preference over lower priority ones when computing resources become constrained. IBM and Juniper are installing these advanced networking capabilities into IBMs nine worldwide Cloud Labs for customer engagements. Once these advanced networking capabilities are installed in the nine worldwide Cloud Labs, IBM and Juniper will be able to seamlessly moveclient-computing workloads between private and publicly managed cloud environments, enabling customers to deliver reliably on service-level agreements (SLAs). In July of 2009, Juniper and IBM continued to broaden their strategic relationship by entering into an OEM agreement that enables IBM to provide Junipers Ethernet networking products and support within IBMs data center portfolio. The addition of Junipers products to IBMs data center networking portfolio provides customers with a best-in-class networking solution and accelerates the shared vision of both companies for advancing the economics of networking and the data center by reducing costs, improving services and managing risk.
IBM j-type Data Center Products and Juniper Networks Products Cross Reference
The IBM j-type e-series Ethernet switches and m-series Ethernet routers use Juniper Networks technology. Table 1.1 shows the mapping of IBM switches and routers to their corresponding Juniper Networks model. For further information concerning product information, please visit the website, IBM and Junos in the Data Center: A Partnership Made For Now, at https://simplifymydatacenter.com/ibm. Table 1.1 Mapping of IBM j-type Data Center Network Products to Juniper Networks Products IBM Machine Type and Model
4273-E48 4274-E08 4274-E16 4274-M02 4274-M06 4274-M11 4274-S34. 4274-S36. 4274-S56.. 4274-S58.
IBM Description
IBM.j-type.e-series.Ethernet.Switch.J48E IBM.j-type.e-series.Ethernet.Switch.J08E. IBM.j-type.e-series.Ethernet.Switch.J16E. IBM.j-type.m-series.Ethernet.Router.J02M. IBM.j-type.m-series.Ethernet.Router.J06M. IBM.j-type.m-series.Ethernet.Router.J11M. IBM.j-type.s-series.Ethernet.Appliance.J34S. IBM.j-type.s-series.Ethernet.Appliance.J36S. IBM.j-type.s-series.Ethernet.Appliance.J56S....................................... IBM.j-type.s-series.Ethernet.Appliance.J58S.
19
Chapter 2
Design Considerations
20
21
PRIVATE WAN
INTERNET
M Series
Internet Access Gateway
SRX Series
SRX Series
NETWORK SERVICES
Intrusion Detection and Prevention
CORE NETWORK
IDP Series
EX4200
or Core Aggregation Router
IP Storage Network
Infrastructure Network
Figure 2. 1
22
23
servers associated to particular applications. For example, a security service, such as traffic SYN checking/sequence number checking, must apply to any server that is exposed to public networks. The network services tier requires: High performance devices, for example, high performance firewalls to process traffic associated with large numbers of endpoints, such as networks, servers and applications. Virtualization capabilities, such as virtual instance to secure many, simultaneous logical services.
24
Design Considerations
The following key design considerations are critical attributes for designing todays data center network architecture: High availability and disaster recovery Security Simplicity Performance Innovation NOTE The design considerations discussed in this handbook are not necessarily specific to Juniper Networks solutions and can be applied universally to any data center network design, regardless of vendor selection.
Security
The critical resources in any enterprise location are typically the applications themselves, and the servers and supporting systems such as storage and databases. Financial, human resources, and manufacturing applications with supporting data typically represent a companys most critical assets and, if compromised, can create a potential disaster for even the most stable enterprise. The core network security layers must protect these business critical resources from unauthorized user access and attacks, including application-level attacks. The security design must employ layers of protection from the network edge through the core to the various endpoints, such as, for example, defense in depth. A layered security solution protects critical network resources that reside on the network. If one layer fails, the next layer will stop the attack and/or limit the damages that can occur. This level of security allows IT departments to apply the appropriate level of resource protection to the various network entry points based upon their different security, performance and management requirements.
25
Layers of security that should be deployed at the data center include the following: DoS protection at the edge Firewalls to tightly control who and what gets in and out of the network VPN to provide secure remote access Intrusion Prevention System (IPS) solutions to prevent a more generic set of application layer attacks. Further, application-layer firewalls and gateways also play a key role in protecting specific application traffic such as XML. For further details, refer to the National Institute of Science and Technology (NIST) recommended best practices, as described in Guide to General Server Security Recommendations of the National Institute of Standards and Technology at http://csrc.nist.gov/publications/nistpubs/800-123/SP800-123.pdf . Policy-based networking is a powerful concept that enables devices in the network to be managed efficiently, especially within virtualized configurations, and can provide granular levels of network access control. The policy and control capabilities should allow organizations to centralize policy management while offering distributed enforcement at the same time. The network policy and control solution should provide appropriate levels of access control, policy creation as well as management and network and service management ensuring secure and reliable networks for all applications. In addition, the data center network infrastructure should integrate easily into a customers existing management frameworks and third-party tools, such as Tivoli, and provide best-in-class centralized management, monitoring and reporting services for network services and the infrastructure.
Simplicity
Simplicity can be achieved by adopting new architectural designs, new technologies, and network operating systems. The two-tier network architecture is a new design that allows network administrators to simplify the data center infrastructure. Traditionally, data center networks were constructed using a three-tier design approach, resulting in access, aggregation and core layers. A large number of devices must be deployed, configured and managed within each of these tiers, increasing cost and complexity. This is primarily because of scalability requirements, performance limitations and key feature deficiencies in traditional switches and routers. Juniper Networks products support a data center network design that requires fewer devices, interconnections and network tiers. Moreover, the design also enables the following key benefits: Reduced latency due to fewer device hops Simplified device management Significant power, cooling and space savings Fewer system failure points.
26
Figure 2.2 shows data center network design trends from a traditional data center network, to a network consisting of a virtualized access tier and collapsed aggregate and core tiers, to a network with improved network virtualization on the WAN.
Traditional Data Center Network Design WAN Gateway Tier 3: Core Tier 2: Aggregation
SRX5600 SRX5600
WAN
SRX5800
MX480
EX4200
Virtual Chassis
Multiple L2/L3 switches at aggregation Multiple L2 access switches to be managed Multiple layers in the network
Up to 10 EX4200 Ethernet Switches can be managed as single device with Virtual Chassis technology High-performance L2/L3 collapsed core/aggregation with EX8208 and EX8216 Ethernet Switches reduce number of devices
Collapsed aggregation and core layer MPLS capable core with MX240, MX480 and MX960 Ethernet Routers WAN interface available on MX240, MX480 and MX960 Ethernet Routers
Figure 2. 2
Data Center Network Design Trends Converged I/O technology is a new technology that simplifies the data center infrastructure by supporting flexible storage and data access on the same network interfaces on the server side, and by consolidating storage area networks (SANs and LANs) into a single logical infrastructure. This simplification and consolidation makes it possible to allocate dynamically any resource including routing, switching, security services, storage systems, appliances and servers without compromising performance. Keeping in mind that network devices are complex, designing an efficient hardware platform is not, by itself, sufficient in achieving an effective, cost-efficient and operationally tenable product. Software in the control plane plays a critical role in the development of features and in ensuring device usability. Because Junos is a proven modular software network operating system that runs across different platforms, implementing Junos is one of the best approaches to simplifying the daily operations of the data center network. In a recent study titled, The Total Economic Impact of Juniper Networks Junos Network Operating System, Forrester Consulting reported a 41 percent reduction in overall network operational costs based on dollar savings across specific task
27
categories, including planned events, reduction in frequency and duration of unplanned network events, the sum of planned and unplanned events, the time needed to resolve unplanned network events, and the adding infrastructure task. As the foundation of any high performance network, Junos exhibits the following key attributes as illustrated in Figure 2.3: One operating system with a single source base and a single consistent feature implementation. One software release train extended through a highly disciplined and firmly scheduled development process. One common modular software architecture that stretches across many different Junos hardware platforms for many different Junos hardware platforms, including MX Series, EX Series, and SRX Series.
S ECU R I T Y
N AG E M E N T
API
ONE OS
Module X
RO U T I N G
ONE Architecture
MA
Frequent Releases
9.6
10.0
10.1
SW I T
Figure 2.3 Junos: A 1-1-1 Advantage
CHING
Performance
To address performance requirements related to server virtualization, centralization and data center consolidation, the data center network should boost the performance of all application traffic, whether local or remote. The data center should offer LAN-like user experience levels for all enterprise users irrespective of their physical location. To accomplish this, the data center network should optimize applications, servers, storage and network performance.
28
WAN optimization techniques that include data compression, TCP and application protocol acceleration, bandwidth allocation, and traffic prioritization improve performance network traffic. In addition, these techniques can be applied to data replication, and to backup and restoration between data centers and remote sites, including disaster recovery sites. Within the data center, Application Front Ends (AFEs) and load balancing solutions boost the performance of both client-server and Web-based applications, as well as speeding Web page downloads. In addition, designers must consider offloading CPU-intensive functions, such as TCP connection processing and HTTP compression, from backend applications and Web servers. Beyond application acceleration, critical infrastructure components such as routers, switches, firewalls, remote access platforms and other security devices should be built on non-blocking modular architecture, so that they have the performance characteristics necessary to handle the higher volumes of mixed traffic types associated with centralization and consolidation. Designers also should account for remote users. Juniper Networks innovative silicon chipset and the virtualization technologies deliver a unique high performance data center solution. Junos Trio represents Junipers fourth generation of purpose-built silicon and is the industrys first network instruction set a new silicon architecture unlike traditional ASICs and network processing units (NPUs). The new architecture leverages customized network instructions that are designed into silicon to maximize performance and functionality, while working closely with Junos software to ensure programmability of network resources. The new Junos One family thus combines the performance benefits of ASICs and the flexibility of network processors to break the standard trade-offs between the two. Built in 65-nanometer technology, Junos Trio includes four chips with a total of 1.5 billion transistors and 320 simultaneous processes, yielding total router throughput up to 2.6 terabits per second and up to 2.3 million subscribers per rack far exceeding the performance and scale possible through off-the-shelf silicon. Junos Trio includes advanced forwarding, queuing, scheduling, synchronization and end-to-end resiliency features, helping customers provide service-level guarantees for voice, video and data delivery. Junos Trio also incorporates significant power efficiency features to enable more environmentally conscious data center and service provider networks. Junos Trio chipset with revolutionary 3D Scaling technology enables networks to scale dynamically for more bandwidth, subscribers and services all at the same time without compromise. Junos Trio also yields breakthroughs for delivering rich business, residential and mobile services at massive scale all while using half as much power per gigabit. The new chipset includes more than 30 patent-pending innovations in silicon architecture, packet processing, QoS and energy efficiency. The Juniper Networks data center network architecture employs a mix of virtualization technologies such as Virtual Chassis technology with VLANs and MPLS-based advanced traffic engineering, VPN enhanced security, QoS, VPLS, and other virtualization services. These virtualization technologies address many of the challenges introduced by server, storage and application virtualization. For example, Virtual Chassis supports low-latency server live migration from server to server in completely different racks within a data
29
center, and from server to server between data centers in a flat Layer 2 network, when these data centers are within reasonably close proximity. Virtual Chassis with MPLS allows the Layer 2 domain to extend across data centers to support live migration from server to server when data centers are distributed over significant distances. Juniper Networks virtualization technologies support low latency, throughput, QoS and high availability required by server and storage virtualization. MPLS-based virtualization addresses these requirements with advanced traffic engineering to provide bandwidth guarantees, label switching and intelligent path selection for optimized low latency and fast reroute for extreme high availability across the WAN. MPLS-based VPNs enhance security with QoS to efficiently meet application and user performance needs. These virtualization technologies serve to improve efficiencies and performance with greater agility while simplifying operations. For example, acquisitions and new networks can be folded quickly into the existing MPLS-based infrastructure without reconfiguring the network to avoid IP address conflicts. This approach creates a highly flexible and efficient data center WAN.
Innovation
Innovation, for example green initiatives, influences data center design. A green data center is a repository for the storage, management and dissemination of data in which the mechanical, lighting, electrical and computer systems provide maximum energy efficiency with minimum environmental impact. As older data center facilities are upgraded and newer data centers are built, it is important to ensure that the data center network infrastructure is highly energy and space efficient. Network designers should consider power, space and cooling requirements for all network components, and they should compare different architectures and systems so that they can ascertain the environmental and cost impacts across the entire data center. In some environments, it might be more efficient to implement high-end, highly scalable systems that can replace a large number of smaller components, thereby promoting energy and space efficiency. Green initiatives that track resource usage, carbon emissions and efficient utilization of resources, such as power and cooling are important factors when designing a data center. Among the many Juniper energy efficiency devices, the MX960 is presented in Table 2.1 to demonstrate its effects on reductions in energy consumption and footprint within the data center. Table 2. 1 Juniper Networks MX 960 Power Efficiency Analysis Juniper Networks Core MX960 2x Chassis
96 720 187.84 9020.00 36074.33 2 chassis 2/3rds of a single rack
Characteristics
Line-rate 10 GigE (ports) Throughput per chassis (Mpps) Output current (Amps) Output Power (Watts) Heat Dissipation (BTU/Hr) Chassis Required (rack space) Rack space (racks)
30
Core Tier
MX480
MX480
Access Tier
EX8200
EX8200
MM1 MM2
Servers
Virtual Switch
Virtual Switch
VIOS
LPAR
VIOS
LPAR
IBM Power VM
IBM Power VM
Figure 2. 4
31
MX960
VRF #1 VRF #2
MX960
VRF #1 VRF #2
IPS #1
Firewall #1 NAT #1
Firewall #2
IPS #2
Firewall #3
Access Layer
EX4200 Virtual Chassis
HR
Finance
Guest
Departments
Figure 2. 5
32
Two MX960 routers are shown to indicate high availability between these devices, providing end-to-end network virtualization for applications by mapping Virtual Routing and Forwarding (VRF) in the MX Series to security zones in the SRX. In Figure 2.5 for example, the VRF #1 is mapped to security zones Firewall #1, NAT #1, and IPS #1, and VRF #2 is mapped to Firewall #2 and NAT #2. For details concerning network virtualization on the MX Series, refer to Juniper Networks white paper, Extending The Virtualization Advantage With NetworkVirtualization Virtualization Techniques in Juniper Networks MX Series 3D Universal Edge Routers at www.juniper.net/us/en/local/pdf/whitepapers/2000342-en.pdf.
Access Tier
We typically deploy the EX4200 Ethernet Switch with Virtual Chassis Technology as a top-of-rack virtual chassis in the access tier. The EX4200, together with server virtualization technology, supports high availability and high maintainability two key requirements for mission critical, online applications.
EX4200 EX4200
TOR Virtual Chassis 2
EX4200
VIOS
VIOS
VIOS
VIOS
VIOS
VIOS
VIOS
VIOS
VIOS
VIOS
VIOS
Power 570
Power 570
Power 570
VIOS
RACK 1
RACK 2
RACK 7
Figure 2. 6
Deploying PowerVM Using Dual Vios and Dual Top-Of-Rack Virtual Chassis As illustrated in Figure 2.6: The Power 570 Servers are deployed with dual Virtual I/O Servers (VIOS): the primary VIOS runs in active mode while the secondary VIOS runs in standby mode. The primary VIOS connects to one top-of-rack virtual chassis while the secondary one connects to another top-of-rack virtual chassis.
33
The typical bandwidth between the PowerVMs VIOS and the top-of-rack virtual chassis switch is 4Gbps, realized as 4 x 1Gbps ports in the NIC combined in a LAG. The bandwidth can scale up to 8 Gbps by aggregating eight ports in a LAG interface. The two Hardware Management Consoles (HMCs) connect to two different top-of-rack virtual chassis, for example HMC 1 and HMC 2. Besides preventing single point of failure (SPOF), this approach also provides highly available maintenance architecture for the network: when a VIOS or virtual chassis instance requires maintenance, operators can upgrade the standby VIOS or virtual chassis while the environment runs business as usual, then switch the environment to the upgraded version without disrupting application service. For connecting a larger number of servers, it is straightforward to duplicate the top-of-rack virtual chassis deployment at the access layer. Figure 2.7 shows a top-of-rack virtual chassis with seven EX4200s connected to a group of 56 Power 570 systems. To connect additional 56 Power 570 systems, an additional top-of-rack virtual chassis is deployed at the access layer. As a result, the access layer can connect a large number of Power 570 systems. After addressing all the connectivity issues, we must not lose sight of the importance of performance in the other network layers and network security because we are operating the data center network as one secured network.
CORE LAYER
EX8200
EX8200
ACCESS LAYER
EX4200
EX4200
SERVER LAYER
Figure 2. 7
Top-Of-Rack Virtual Chassis with Seven EX4200s Connected to Power 570 Systems
34
The EX4200 top-of-rack virtual chassis supports different types of physical connections. The EX4200 provides 48, 1000 Base-TX ports and two ports for 10 Gbps XFP transceivers through its XFP uplink module. The XFP port can uplink other network devices or it can connect to the IBM Power Systems based on user requirements. Table 2.2 lists three typical 10 Gbps connections used in a Power System and the XFP Uplink module required for each EX4200 connection. MORE For further details concerning IBM PowerVM and EX4200 top-of-rack virtual chassis scalability, refer to.Implementing IBM PowerVM Virtual Machines on Juniper Networks Data Center Networks at www.juniper.net/us/en/local/pdf/ implementation-guides/8010049-en.pdf. Table 2. 2 Physical Connectivity Between IBM Power 570 and EX4200 XFP Uplink Module
XFP Uplink Module XFP LR 10 Gbps Optical Transceiver Module XFP Uplink Module XFP LR 10 Gbps Optical Transceiver Module XFP Uplink Module XFP SR 10 Gbps Optical Transceiver Module
Cable
SMF
SMF
SMF
35
Chapter 3
Implementation Overview
THIS.CHAPTER.SERVES.AS.a.reference.to.the.later.chapters.in.this.handbook.. by.presenting.an.overview.of.the.next.generation.intra-data.center.network.. implementation.scenarios ..The.implementation.scenarios.summarized.in.this.. chapter.address.the.requirements,.as.previously.discussed.in.Chapter.2 ..The. network.topology.of.this.reference.data.center.is.covered.specifically.as.a.part.of. this.chapter .. Chapters.4.through.8.focus.on.the.technical.aspects.of.the.implementation.that. primarily.include.server.connectivity,.STP,.multicast,.performance,.and.high. availability ..
36
MX480
Access Tier VRRP/RTG/STP
EX8200
VRRP/RTG/STP
MX480
VRRP/RTG/STP PIM
PIM
EX4200
EX4200 EX4200
Virtual Chassis 2
EX4200
EX4200
EX4200
EX4200
Virtual Chassis 1 To R1 LAG To R2 LAG
EX4200
To R3 STP
VC - EX4200 Virtual Chassis RTG - Redundant Trunk Group LAG - Link Aggregation Group VRRP - Virtual Router Redundancy Protocol STP - Spanning Tree Protocol I0 GE link Virtual Chassis 1 GE Link
Figure 3. 1
37
NOTE
Each individual implementation can differ based on network design and requirements. The topology described here consists of the following tiers and servers: Core/aggregation tier consisting of EX8200s or MX480s. Access tier comprised of EX4200s. These access switches can be deployed either individually or configured to form a virtual chassis. Either of these options can be implemented as top-of-rack switches to meet different Ethernet port density requirements. Pertaining to the topology under discussion: - - - Three EX4200 switches form a virtual chassis (VC1), functioning astop-of rack switching(ToR1). Two EX4200 switches form a virtual chassis (VC2), functioning as top-of rack switch (ToR2). The EX4200-1, EX4200-2, EX4200-3 are three individual access switches, functioning as top-of-rack switches (ToR3).
Servers where the IBM BladeCenter, IBM x3500 and IBM PowerVM reside for all scenarios presented. For ease of configuration, one server type is used for each scenario. Servers are segmented into different VLANs, for example VLAN A, B, and C, as shown in Figure 3.1. The physical network topology consists of the following connections: The servers connect to the access tier through multiple 1GbE links with Link Aggregation (LAG) to prevent single point of failure (SPOF) in the physical link and improve bandwidth. The access switches connect to the core layer with multiple 10GbE links. At the core tier, the MX480s and EX8200s interconnect to each other using redundant 10GbE links. These devices connect to the WAN edge tier, which interconnects the different data centers and connects to external networks. NOTE Choosing different connection configurations is based on network design and requirements. Redundant physical links are extremely important for achieving network high availability.
38
Multicast
The multicast protocol optimizes the delivery of video streaming and improves network infrastructure and overall efficiency. In Chapter 6, we present multicast implementation scenarios, including Protocol Independent Multicast (PIM) and IGMP snooping. In these scenarios, the video streaming client runs on IBM servers. PIM is implemented on the core/aggregation tiers, while IGMP snooping is implemented on the access tier.
Performance
In Chapter 7 two methods for improving data center network performance are covered in detail: Using CoS to manage traffic. Considering latency characteristics when designing networks using Juniper Networks data center network products.
39
Latency
Evolution of Web services and SOA has been critical to the integration of applications that use standard protocols such as HTML. This tight integration of applications with web services has generated an increase of almost 30-75 percent of east-west traffic (server-to-server traffic) within the data center. As a result, latency between servers must be reduced. Reduced latency can be achieved by: Consolidating the number of devices and thus the tiers within the data center. Extending the consolidation between tiers using techniques such as virtual chassis. Virtual chassis and multiple access layer switches can be grouped logically to form one single switch. This reduces the latency time to a few microseconds because the traffic from the server does not need to be forwarded through multiple devices to the aggregation layer. In the latency implementation scenario, we primarily focus on how to configure the MX480 for measuring Layer 2 and Layer 3 latency.
High Availability
High availability can provide continuous service availability when implementing redundancy, stateful recovery from a failure, and proactive fault prediction. High availability minimizes failure recovery time. Junos OS provides several high availability features to improve user experience and to reduce network downtime and maintenance. For example, features such as virtual chassis (supported on EX4200), Non Stop Routing/Bridging (NSR/NSB, both supported on MX Series), GRES, GR and Routing Engine Redundancy can help increase availability at the device level. The Virtual Routing Redundancy Protocol (VRRP), Redundant Trunk Group (RTG) and LAG features control the flow of traffic over chosen devices and links. The ISSU feature on the MX Series reduces network downtime for a Junos OS software upgrade. For further details concerning a variety of high availability features, see Chapter 8: Configuring High Availability. Each high availability feature can address certain technical challenges but may not address all the challenges that todays customers experience. To meet network design requirements, customers can implement one or many high availability features. In the following section, we discuss high availability features by comparing their characteristics and limitations within the following groups: GRES, GR versus NSR/NSB Routing Engine Switchover Virtual Chassis VRRP
40
Comparing GRES and GR to NSR/NSB Table 3.1 provides an overview of the GRES, GR and NSR/NSB high availability features available in Junos. Table 3. 1 High Availability Features in Junos OS Functions
Provides uninterrupted traffic forwarding. GRES Maintains kernel state between REs and PFE. Allows a failure of a neighboring router not to disrupt adjacencies or traffic forwarding for a certain time interval. GR (protocol extensions) Enables adjoining peers to recognize RE switchover as a transitional event. This prevents them from starting the process of reconverging network paths. Neighbors are required to support graceful restart. RE switchover is transparent to network peer. No peer participation required. NSR/NSB No drop in adjacencies or session. Minimal impact on convergence. Allows switchover to occur at any point, even when routing convergence is in progress. Unsupported protocols must be refreshed using the normal recovery mechanisms inherent in each protocol.
HA Features
Implementation Considerations
Incapable of providing router redundancy by itself. Works with GR protocol extensions. Network churn and processing not proportional to effective change.
Nonstop active routing/bridging and graceful restart are two different mechanisms for maintaining high availability when a router restarts. A router undergoing a graceful restart relies on its neighbors to restore its routing protocol information. Graceful restart requires a restart process where the neighbors have to exit a wait interval and start providing routing information to the restarting router. NSR/NSB does not require a route restart. Both primary and backup Routing Engines exchange updates with neighbors. Routing information exchange continues seamlessly with the neighbors when the primary Routing Engine fails because the backup takes over. NOTE NSR cannot be enabled when the router is configured for graceful restart.
41
Implementation Options
Dual Routing Engines only (no high availability features enabled)
Process Flow
. All.physical.interfaces.are.taken.offline . . Packet.Forwarding.Engines.restart . . Backup.Routing.Engine.restarts.the.routing. protocol.process.(rpd) .. . The.new.primary.Routing.Engine.discovers.all. hardware.and.interfaces . . The.switchover.takes.several.minutes.and.all. of.the.routers.adjacencies.are.aware.of.the. physical.(interface.alarms).and.routing. (topology).change .
Interface and kernel information preserved during switchover. The switchover is faster because the Packet Forwarding Engines are not restarted. Traffic is not interrupted during the switchover. Interface, kernel and routing protocol information is preserved. Traffic is not interrupted during the switchover. Interface and kernel information is preserved. Graceful restart protocol extensions quickly collect and restore routing information from the neighboring routers.
42
Virtual Chassis
Between 2 and 10 EX4200 switches can be connected and configured to form a single virtual chassis that acts as a single logical device to the rest of the network. A virtual chassis typically is deployed in the access tier. It provides high availability to the connections between the servers and access switches. The servers can be connected to different member switches of the virtual chassis to prevent SPOF.
Implementation Scenarios
Table 3.3 summarizes the implementation scenarios presented in this handbook. It provides mapping between each scenario, network tier, and devices. Using this table as a reference, you can map the corresponding chapter to each particular implementation scenario.
43
Table 3. 3
Implementation Scenarios
Spanning Tree (MSTP/RSTP/VSTP)
Device Support
EX4200 EX8200 MX Series EX4200 EX8200, MX Series EX4200, EX8200, MX Series EX4200 EX8200, MX Series EX4200
PIM
IGMP snooping
Chapter 6
CoS
Chapter 7
Virtual Chassis
Chapter 8
VRRP
Chapter 8
Aggregation/Core
ISSU
Chapter 8
MX Series only
RTG
Chapter 8
Chapter 8
Non-Stop Routing
Chapter 8
MX Series only EX4200 EX8200, MX Series EX Series only EX4200 EX8200, MX Series
GR
Chapter 8
RTG
Chapter 8
LAG
Chapter 8
44
Table 3.4 functions as a reference aid to help our customers thoroughly understand how Juniper Networks products and features, which are available in Junos 9.6, can be implemented into their networks. This table summarizes implementation scenarios and their supported products that are defined in detail later in this guide. Table 3. 4 Mapping of Implementation Scenarios to Juniper Networks Supported Products EX4200 EX8200 MX480
Performance
CoS Yes Yes Yes
Multicast
PIM IGMP Yes Yes Yes Yes Yes Yes
45
Chapter 4
Connecting IBM Servers in the Data Center Network
IBM.System.p.and.PowerVM.Production.Networks. . . . . . . . . . . . . . . . . . . . . . . . . . 46 IBM.System.p.and.PowerVM.Management.Networks . . . . . . . . . . . . . . . . . . . . . . . 47 Configuring.IBM.System.p.Servers.and.PowerVM.. . . . . . . . . . . . . . . . . . . . . . . . . . . 48 IBM.PowerVM.Network.Deployment.. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56 Junos.Operating.System.Overview. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58 Configuring.Network.Devices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61
46
App
App
App Virtual I/O Server (VIOS) Shared Ethernet Adapters Virtual Virtual
App
App
App
OS
OS
OS
OS Virtual
OS Virtual
OS Virtual
EX4200
Figure 4.1
IBM Power Systems Virtualization Overview The VIOS, also called the Hosting Partition, is a special-purpose LPAR in the server, which provides virtual I/O resources to client partitions. The VIOS owns the resources, such as physical network interfaces and storage connections. The network or storage resources, reachable through the VIOS, can be shared by client partitions running on the machine, enabling administrators to minimize the number of physical servers deployed in their network. In PowerVM, client partitions can communicate among each other on the same server without requiring access to physical Ethernet adapters. Physical Ethernet adapters are required to allow communication between applications running in the client partitions and external networks. A Shared Ethernet Adapter (SEA) in VIOS bridges the physical Ethernet adapters from the server to the virtual Ethernet adapters functioning within the server.
47
Because the SEA functions at Layer 2, the original MAC address and VLAN tags of the frames associated with the client partitions (virtual machines) are visible to other systems in the network. For further details, refer to IBMs white paper Virtual Networking on AIX 5L at www.ibm.com/servers/aix/whitepapers/aix_vn.pdf. In PowerVM, the physical Network Interface Card (NIC) typically is allocated on VIOS for improved utilization and in the IBM System p, the physical NIC is exclusively allocated to a LPAR.
Client
NIC
httpd
NIC
dhcpd
RMC HMC
HMC
Server SRV 1 LPAR VIOS NIC Server SRV 2 LPAR VIOS NIC LPAR 1 LPAR 2 LPAR 3 LPAR 1 LPAR 2 LPAR 3
FSP
NIC
NIC
FSP
NIC
NIC
Figure 4.2
As illustrated in Figure 4.2, IBM Power Systems management networks require two networks: Out-of-Band management network. HMC private management network.
48
The out-of-band management network connects HMC and client networks so that a clients request for access can be routed to the HMC. A HMC private management network is dedicated for communication between the HMC and its managed servers. The network uses a selected range of non-routable IP addresses, and the Dynamic Host Configuration Protocol (DHCP) server is available in the HMC for IP allocation. Each p server connects to the private management network through its Flexible Service Processor (FSP) ports. Through the HMC private management network, the HMC manages servers in the following steps: 1. Connects the p servers FSP port to the HMC private management network so that HMC and the server are in the same broadcast domain, and HMC runs DHCP server (dhcpd).
2. Powers on the server. The servers FSP runs the DHCP client and requests a new IP address. FSP gets the IP address, which is allocated from HMC. 3. HMC communicates with the server and updates its managed server list with this new server. 4. HMC performs operations on the server, for example powers on/off the server, creates LPAR, sets shared adapters (Host Ethernet and Host Channel) and configures virtual resources.
to HMC
FSP
NIC
NIC
Figure 4.3
To allocate (or remove) the NIC on the LPAR, perform the following steps: 1. 2. 3. 4. 5. 6. 7. Select LPAR. Select: Configuration >> Manage Profiles. Select the profile that you want to change. Select I/O tab. Select NIC (physical I/O resource). Click Add to add the NIC (or Remove to remove the NIC). Select OK to save changes, then click Close.
49
NOTE
The NIC can be allocated to multiple profiles. Because the NIC allocation is exclusive during the profile runtime, only one profile activates and uses this NIC. If the NIC is already used by one active LPAR, and you attempt to activate another LPAR, which requires the same NIC adapter, the activation process will be aborted. Adding or removing the NIC requires re-active LPAR profile.
Figure 4.4
This section provides steps for the following: Creating a virtual Ethernet switch Removing a virtual Ethernet switch Creating a virtual Ethernet adapter Removing a virtual Ethernet adapter Changing virtual Ethernet adapter properties
50
To Create a Virtual Ethernet Switch 1. Select server (Systems Management >> Servers >> select server).
2. Select Configuration >> Virtual Resources >> Virtual Network Management. 3. Select Action >> Create VSwitch. 4. Enter a name for the VSwitch then select OK. 5. Click Close to close dialog. To Remove a Virtual Ethernet Switch 1. Select server (Systems Management >> Servers >> select server).
2. Select Configuration >> Virtual Resources >> Virtual Network Management. 3. Select Action >> Remove VSwitch. 4. Click Close to close dialog. To Create a Virtual Ethernet Adapter 1. Select server (Systems Management >> Servers >> select server).
2. Select LPAR. 3. Select Configuration >> Manage Profiles. 4. Select the profile that you want to change. 5. Select Virtual Adapters tab. 6. Select Actions >> Create >> Ethernet Adapter (see Figure 4.5).
Figure 4.5
51
7.
In the virtual Ethernet Adapter Properties window (as shown in Figure 4.5), enter the following: a. Adapter ID, (default value displays). b. VSwitch, virtual Ethernet Switch that this adapter connects to. c. VLAN ID, VLAN ID for untagged frames, Vswitch will add/remove the VLAN header. d. Select the checkbox, this adapter is required for partition activation. e. Select the checkbox, IEEE 802.1q compatible adapter, for control if VLAN tagged frames are allowed on this adapter. f. Select the Add, Remove, New VLAN ID and Additional VLANs for adding/ removing VLAN IDs that are allowed for VLAN tagged frames.
g. Select the checkbox Access external network enabled only on LPARs used for bridging traffic from the virtual Ethernet Switch to some other NIC. Typically this should be kept unchecked for regular LPARs and checked for VIOS. h. Click OK to save changes made in the profile and then select Close. To Remove a Virtual Ethernet Adapter 1. Select server (Systems Management >> Servers >> select server).
2. Select LPAR. 3. Select Configuration >> Manage Profiles. 4. Select the profile that you want to change. 5. Select Virtual Adapters tab. 6. Select the Ethernet Adapter that you want to remove. 7. Select: Actions >> Delete.
8. Click OK to save changes made in the profile and then select Close. To Change a Virtual Ethernet Adapters Properties 1. Select server (Systems Management >> Servers >> select server).
2. Select LPAR. 3. Select Configuration >> Manage Profiles. 4. Select the profile that you want to change. 5. Select the Virtual Adapters tab. 6. Select the Ethernet Adapter that you want to remove. 7. Select Actions >> Edit.
8. Enter the required information in the fields, as illustrated in Figure 4.5 9. Click OK to save changes made in the profile and then select Close.
52
Table 4.1 lists and defines the parameters associated with this command. Table 4.1 Parameters
target _ device virtual _ ethernet _ adapters DefaultVirtualEthernetAdapter SEADefaultPVID
ent1 Ethernet Interface to NIC Assigned to VIOS LPAR ent2 Ethernet Interface to Virtual Switch ent3 Shared Ethernet Adaper (Logical Device)
Figure 4.6
53
to HMC
FSP
HEA
Figure 4.7
Because HEA creates a virtual network for the client partitions and bridges the virtual network to the physical network, it replaces the need for both the virtual Ethernet and the Shared Ethernet Adapter. In addition, HEA enhances performance and improves utilization for Ethernet because HEA eliminates the need to move packets (using virtual Ethernet) between partitions and then through a SEA to the physical Ethernet interface. For detailed information, refer to IBMs Redpaper Integrated Virtual Ethernet Adapter Technical Overview and Introduction at www.redbooks.ibm.com/abstracts/redp4340.html. HEA is configured through HMC. The following list includes some HEA configuration rules: LPAR uses only one logical port to connect to HEA. HEA consists of one or two groups of logical ports. Each group of logical ports has 16 logical ports (16 or 32 total for HEA). Each group of logical ports can have one or two external ports assigned to it (predefined). A logical port group consists of one or two Ethernet switch partitions, one for each external port. LPAR can have only one logical port connected to an Ethernet switch partition. This means that only one logical port can connect to the external port. MCS increases bandwidth between LPAR and NIC. MCS reduces the number of logical ports, for MCS=2 the number of logical ports is 16/2=8. For MCS to take effect, a server restart is required. Only one logical port in a port group can be set in promiscuous mode. In this section, we discuss the following HEA configurations: Configuring a HEA physical port Adding a LHEA logical port Removing a LHEA logical port.
54
2. Select Hardware Information >> Adapters >> Host Ethernet. 3. Select adapter (port). 4. Click the Configure button. 5. Enter parameters for the following fields: Speed, Duplex, Maximum receivingpacket size (Jumbo frames), Pending Port Group Multi-Core Scaling value, Flow control, Promiscuous LPAR. 6. Click OK to save your changes.
Figure 4.8
2. Select LPAR. 3. Select Configuration >> Manage Profiles. 4. Select the profile that you want to change. 5. Select the tab Logical Host Ethernet Adapters (LHEA). 6. Select the external port that LHEA connects to. 7. Click Configure. 8. Enter the parameters for the following fields: Logical port, select one port 116, if MCS is greater than 1 some logical ports will be identified as Not Available. 9. Select the checkbox Allow all VLAN IDs. Otherwise, enter the actual VLAN ID in the VLAN to add field, as shown in Figure 4.9. 10. Click OK.
55
Figure 4.9
2. Select LPAR. 3. Select Configuration >> Manage Profiles. 4. Select the profile that you want to change. 5. Select the tab Logical Host Ethernet Adapters (LHEA). 6. Select the external port that LHEA connects to. 7. Click the Reset button. 8. Click OK to close the window. 9. Click OK to save changes and close the window.
56
HMC
Ethernet Switch Private Network (192.168.128.0/17) p6 Server LPAR VIOS FSP Host Ethernet Adapter (HEA) LPAR RHEL LPAR SUSE LPAR AIX 5.3
p5 Server LPAR VIOS FSP Hypervisor NIC under test 2 LPAR RHEL LPAR SUSE
Figure 4.10
57
HMC runs on a Linux server with two network interfaces: one connects to a private network for all managed P5/P6 systems (on-board Ethernet adapter on servers, controlled by FSP process); the other network interface connects to a management network. In the management network, the management workstation accesses the HMC Web interface through a Web browser. There are two ways to set up communication with LPAR (logical partitions): Using a console window through HMC. Using Telnet/SSH over the management network. Each LPAR has one dedicated Ethernet interface for connecting to the management network using the first physical port on HEA (IVE) shared among LPARs. Each LPAR must connect to the virtual Ethernet Switch using the virtual Ethernet Adapter. You create a virtual Ethernet switch and a virtual Ethernet adapter using the HMC. Virtual Ethernet adapters for VIOS LPARs must have the Access External Network option enabled. VIOS LPAR, which is a special version of AIX, performs the bridging between the virtual Ethernet switch (implemented in Hypervisor) and the external port. For bridging frames between the physical adapter on the NIC and the virtual Ethernet adapter connected to the virtual Ethernet switch, another logical device (the SEA) is created in VIOS. As illustrated in Figure 4.11, the typical network deployment with the access switch and LPAR (virtual machine) is as follows: The access switch connects to physical NIC, which is assigned to ent1 in VIOS. The ent3 (SEA) bridges ent1 (physical NIC) and ent2 (virtual Ethernet adapters). The ent2 (virtual Ethernet adapter) is created and dedicated to LPAR which runs Red Hat Enterprise Linux. The ent3 also supports multiple VLANs. Each VLAN will associate with one logical Ethernet adapter, for example ent4.
LPAR RHEL ent2 ent4 Virtual Switch 1
ent1
ent1 Ethernet Interface to NIC Assigned to VIOS LPAR ent2 Ethernet Interface to Virtual Switch (Sw. in Hypervisor) ent3 Shared Ethernet Adaper (Logical Device) ent4 Logical Ethernet Adaper for One VLAN
Figure 4.11
58
SNMP
Routing Tables
Interface Process
Chassis Process
Forwarding Table
Kernel
Forwarding Table
Interface Process
Distributed ASICs
Chassis Process
Figure 4.12
Junos OS Architecture
Routing Engines
The Routing Engine runs the Junos operating system, which includes the FreeBSD kernel and the software processes. The primary operator processes include the device control process (dcd), routing protocol process (rpd), chassis process (chassisd), management process (mgd), traffic sampling process (sampled), automatic protection switching process (apsd), simple network management protocol process (snmpd) and system logging process (syslogd). The Routing Engine installs directly into the control panel and interacts with the Packet Forwarding Engine.
59
60
Junos Processes
Junos processes run on the Routing Engine and maintain the routing tables, manage the routing protocols used on the router, control the router interfaces, control some chassis components, and act as the interface for system management and user access to the router. Major processes are discussed in detail later this section. The Junos process is a UNIX process that runs nonstop in the background while a machine is running. All of the processes operate through the Command Line Interface (CLI). Each process is a piece of the software and has a specific function or area to manage. The processes run in separated and protected address spaces. The following sections briefly cover two major Junos processes: the routing protocol process (rpd) and the management process (mgd).
Management Process
Several databases connect to the management process (mgd). The config schema database merges the packages /usr/lib/dd/libjkernel-dd.so, /usr/lib/dd/ libjroute-dd.so and /usr/lib/dd/libjdocs-dd at initialization time to make /var/ db/schema.db, which controls what the user interface (UI) is. The config database holds /var/db/juniper.db. The mgd works closely with CLI, allowing the CLI to communicate with all the other processes. Mgd knows which process is required to execute commands (user input). When the user enters a command, the CLI communicates with mgd over a UNIX domain socket using Junoscript, an XML-based remote procedure call (RPC) protocol. The mgd is connected to all the processes, and each process has a UNIX domain management socket.
61
If the command is legal, the socket opens and mgd sends the command to the appropriate process. For example, the chassis process (chassisd) implements the actions for the command show chassis hardware. The process sends its response to mgd in XML form and mgd relays the response back to the CLI. Mgd plays an important part in the commit check phase. When you edit a configuration on the router, you must commit the change for it to take effect. Before the change actually is made, mgd subjects the candidate configuration to a check phase. The management process writes the new configuration into the config db (juniper.db).
62
Table 4.2
Methods for Connecting IBM Servers to Juniper Switches and Routers Description
To the IBM servers, the network device appears as a Layer 2 switch. The network device interfaces and IBM servers NIC are in the same Layer 2 broadcast domain. Because the network device interfaces do not configure Layer 3 IP addresses, they do not provide routing functionality. To the IBM servers, the network device appears as a Layer 2 switch. The network device interfaces and IBM servers NIC are in the same Layer 2 broadcast domain. The network device interfaces configure Layer 3 IP addresses so that they can route traffic to other connected networks. To the IBM servers, the network device appears as a Layer 3 router with a single Ethernet interface and IP address. The network device does not provide Layer 2 switching functionality.
Connection Types
The network device acts as Layer 2 switch.
In the next section, several different but typical methods for configuring the MX Series routers and EX Series switches are presented.
Configuring Layer 2 Switching As illustrated in the following code, two Ethernet ports are in the same broadcast domain: ge-5/1/5 interface is configured with untagged VLAN, while ge-5/1/7 interface is configured with tagged VLAN. Ethernet interfaces in MX Series routers can support one or many VLANs. Each Ethernet VLAN is mapped into one logical interface. If logical interfaces are used to separate traffic to different VLANs, we recommend using the same numbers for logical interface (unit) and VLAN ID. For instance, the logical interface and the VLAN ID in the following sample use the same number (100):
interfaces ge-5/1/5 { unit 0 { family bridge; } } interfaces ge-5/1/7 { vlan-tagging; encapsulation flexible-ethernet-services; unit 100 { encapsulation vlan-bridge; family bridge; } } bridge-domains { Data01 {
63
64
65
66
NOTE
Juniper Network Devices Interface Type MX Series Routers EX Series Ethernet Switches Physical interface IRB interface
set set Physical interface set VLAN interface Interface VLAN unit set
irb mtu <mtu> mtu <mtu> vlan mtu <mtu> vlan unit 100 family inet mtu <mtu>
67
Chapter 5
Configuring Spanning Tree Protocols
THIS.CHAPTER.FOCUSES.on.the.different.spanning.tree.protocols..STP,.. RSTP,.MSTP,.and.VSTP..that.are.used.in.Layer.2.networks.to.prevent.loops .. Typically,.STP.is.supported.only.on.legacy.equipment.and.has.been.replaced.with. RSTP.and.other.variants.of.Spanning.Tree ..Support.for.RSTP.is.mandatory.on.all. devices.that.are.capable.of.spanning.tree.functionality ..When.interoperating.with. legacy.switches,.a.RSTP.capable.switch.automatically.reverts.to.STP ..We.discuss. STP.in.this.chapter.to.provide.a.background.on.spanning.tree.functionality .
68
2. Root Port Election (on non-root switches). 3. Designated Port Election (on network segment). Figure 5.1 shows three switches: one root and two non-root bridges. The ports on the root bridge are the designated ports (DP). The ports with least cost to the root bridge are the root ports (RP). All other interfaces running STP on the non-root bridges are alternate ports (ALT).
69
Generating and transmitting BPDUs from all nodes at the configured Hello interval, irrespective of whether they receive any BPDUs from the RP. This allows the nodes to monitor any loss of Hello messages and thus detect link failures more quickly than STP. Expediting changes in topology by directly transitioning the port (either edge port or a port connected to a point-to-point link) from a blocked to forwarding state. Providing a distributed model where all bridges in the network actively participate in network connectivity.
Root Bridge DP DP
RP ALT DP
RP
Non-Root Bridge 1
Non-Root Bridge 2
Figure 5.1 STP Network New interface types defined in RSTP are: Point to Point Edge Shared or Non-edge Point to Point A point-to-point (P2P) interface provides a direct connection between two switches. Usually, a full duplex interface is set automatically to be P2P. Edge The edge interface is another enhancement in RSTP that helps reduce convergence time when compared to STP. Ports connected to servers (there are no bridges attached) are typically defined as edge ports. Any changes that are made to the status of the edge port do not result in changes to the forwarding network topology and thus are ignored by RSTP.
70
Shared or Non-edge A Shared or Non-edge interface is an interface that is half-duplex or has more than two bridges on the same LAN. When compared to STP, RSTP introduces the concept of a port state, role and interface. The state and role of a RSTP based port are independent. A port can send or receive BPDUs or data based on its current state. The role of a port depends on its position in the network. The role of a port can be determined by performing a BPDU comparison during convergence. Table 5.1 shows the mapping between RSTP port states and roles. Table 5.1 Mapping between RSTP Port States and Roles RSTP State
Forwarding Forwarding Discard Discard Discard
RSTP Role
Root Designated Alternate Backup Disabled
The Alternate role in RSTP is analogous to the Blocked port in STP. Defining an edge port allows a port to transition into a forwarding state, eliminating the 30-second delay that occurs with STP.
71
CIST
BPDUs between instances
Figure 5.2
MSTP Example
Figure 5.2 shows the three MSTIs: A, B, and C. Each of these instances consists of either one or more VLANs. BPDUs specific to the particular instance are exchanged within each of the MSTIs. The CIST handles all BPDU information which is required to maintain the topology across the regions. CIST is the instance that is common to all regions. With MSTP, bridge priorities and related configurations can be applied on a per instance basis. Thus, a root bridge in one instance does not necessarily have to be in a different instance.
72
IBM BladeCenter (Cisco ESM) Configuration not supported, works with MSTP/PVST (backwards compatible) Configuration not supported, works with MSTP/PVST (backwards compatible) MSTP PVST+ Rapid-PVST+
EX4200 STP
EX8200 STP
MX Series Configuration not supported, works with RSTP (backwards compatible) RSTP MSTP VSTP VSTP
Configuring RSTP/MSTP
Figure 5.3 shows a sample MSTP network that can be used to configure and verify RSTP/MSTP functionality. The devices in this network connect in a full mesh. The switches and IBM BladeCenter connect in a mesh and are assigned these priorities: EX 4200-A 0K (lowest bridge priority number) MX480 8K EX8200 16K IBM BladeCenter (Cisco ESM) 16K EX4200-B 32K We configure EX4200-A as the root bridge. Two MSTP instances, MSTI-1 and MSTI-2, correspond to VLANs 1122 and 71, respectively. Either one, or both, of these VLANs are configured on links between the switches on this spanning tree network. Table 5.3 shows the association between the links, VLANs and MSTI instances. Table 5.3 Association between Links, VLANs and MSTI Instances MSTI Instance
MSTI-1 MSTI-1 MSTI-1, MSTI-2 MSTI-2 MSTI-1, MSTI-2 MSTI-1 MSTI-1, MSTI-2
VLAN ID
1122 1122 1122, 71 71 1122, 71 1122 1122, 71
73
MSTP
Priority 8K
ge-5/3/4
MX480
ge-5/1/1 VLAN [71, 1122] ge-0/0/21
VLAN [1122]
ge-0/0/20 ge-0/0/0
EX4200 EX8200
VLAN [1122] ge-0/0/12 ge-0/0/15 VLAN [71, 1122] ge-1/0/4 ge-0/0/14 ge-0/0/10 VLAN [71] Trunk Port 17 VLAN [1122] Trunk Port 18 Trunk Port 19 VLAN [71, 1122] ge-0/0/23 172.28.113.175 Priority 0K BladeCenter *blade 9, eth 1, ip=11.22.1.9
EX4200
172.28.113.180 Priority 32K BladeCenter * blade 10, eth 1, ip=11.22.1.10
ge-0/0/13
IBM BladeCenter Priority 16K Pass-Through Module via eth 1 interface on Blades [6, 7, 8, 9, 10]
Internal eth 0 interface for each blade [6, 7, 8, 9, 10] connected via Trunk Ports [17, 18, 19, 20]
*Server connections simulated to each DUT via eth 1 interface connected to BladeCenter Pass-Through Module
Figure 5.3
Spanning Tree MSTP/RSTP Another instance MSTI-0 (constitutes the CIST) is created by default to exchange the overall spanning tree information for all MSTIs between the switches. The blade servers connect to each of the switches as hosts/servers. The switch ports on the different switches that connect to these BladeCenter servers are defined as edge ports and are assigned IP addresses. The selection of the root bridge is controlled by explicit configuration. That is, a bridge can be prevented from being elected as a root bridge by enabling root protection.
74
Configuration Snippets
The following code pertains to the EX4200-A (RSTP/MSTP):
// Enable RSTP by assigning bridge priorities. // Set priorities on interfaces to calculate the least cost path. // Enable root protection so that the interface is blocked for the RSTP instance that receives superior BPDUs. Also, define the port to be an edge port. rstp { bridge-priority 4k; interface ge-0/0/0.0 { priority 240; } interface ge-0/0/7.0 { priority 240; edge; no-root-port; } interface ge-0/0/9.0 { priority 240; edge; no-root-port; } interface ge-0/0/20.0 { priority 240; } interface ge-0/0/21.0 { priority 240; } } chandra@EX-175-CSR# show protocols mstp configuration-name MSTP; bridge-priority 8k; interface ge-0/0/0.0 { priority 240; } // Enable RSTP by assigning bridge priorities // Set priorities on interfaces. // Enable root protection so that the interface is blocked when it receives BPDUs. An operator can configure a bridge not to be elected as a root bridge by enabling root protection. Root protection increases user control over the placement of the root bridge in the network. Also, define the port to be an edge port. // Define MST-1, provide a bridge priority for the instance. Associate a VLAN with the instance. // Define MSTI-2, provide a bridge priority for the instance. Associate a VLAN and interface with the instance. interface ge-0/0/7.0 { priority 240; edge; no-root-port; } interface ge-0/0/9.0 { priority 240; edge; no-root-port; } interface ge-0/0/20.0 {
75
priority 224; } interface ge-0/0/21.0 { priority 192; } interface ge-0/0/23.0 { priority 224; } msti 1 { bridge-priority 8k; vlan 1122; } msti 2 { bridge-priority 8k; vlan 71; interface ge-0/0/23.0 { priority 224; } }
76
interface ge-0/0/13.0 { priority 224; } interface ge-0/0/14.0 { priority 192; } interface ge-0/0/15.0 { priority 240; edge; no-root-port; } // Define MSTI-1, provide a bridge priority for the instance. Associate a VLAN with the instance. msti 1 { bridge-priority 0; vlan 1122; } // Define MSTI-2, provide a bridge priority for the instance. Associate a VLAN and interface with the instance. msti 2 { bridge-priority 0; vlan 71; interface ge-0/0/13.0 { priority 224; } }
} chandra@HE-RE-0-MX480# show protocols mstp bridge-priority 8k; interface ge-5/1/1 { priority 224; } interface ge-5/1/2 { priority 192; } interface ge-5/2/2 { priority 192; }
77
interface ge-5/3/3 { priority 224; } interface ge-5/3/4 { priority 240; edge; no-root-port; } msti 1 { bridge-priority 4k; vlan 1122; } msti 2 { bridge-priority 4k; vlan 71; interface ge-5/1/1 { priority 224; } }
Verification
Based on the sample network, administrators can verify the RSTP/MSTP configuration by issuing the show commands to verify that there are two MSTI instances and one MTSI-0 common instance present on each switch. The following CLI sample shows these three different MSTI instances and the VLANs associated with each of them:
chandra@SPLAB-EX-180> show spanning-tree mstp configuration MSTP information Context identifier : 0 Region name : MSTP Revision : 0 Configuration digest : 0xeef3ba72b1e4404425b44520425d3d9e MSTI Member VLANs 0 0-70,72-1121,1123-4094 1 1122 2 71
Each of these instances should have a RP (ROOT), BP (ALT) and DP (DESG) of its own:
chandra@SPLAB-EX-180> show spanning-tree interface Spanning tree interface parameters for instance 0 Interface Port ID Designated Designated Port State Role port ID bridgeID Cost ge-0/0/10.0 240:523 240:513 0.0019e2544040 20000 FWD ROOT ge-0/0/11.0 240:524 240:524 32768.0019e2544ec0 20000 FWD DESG ge-0/0/12.0 224:525 224:525 32768.0019e2544ec0 20000 FWD DESG ge-0/0/13.0 224:526 224:526 32768.0019e2544ec0 20000 FWD DESG ge-0/0/14.0 192:527 192:213 8192.001db5a167d1 20000 BLK ALT ge-0/0/15.0 240:528 240:528 32768.0019e2544ec0 20000 FWD DESG ge-0/0/36.0 128:549 128:549 32768.0019e2544ec0 20000 FWD DESG ge-0/0/46.0 128:559 128:559 32768.0019e2544ec0 20000 FWD DESG Spanning tree interface parameters for instance 1 Interface Port ID Designated Designated Port State Role port ID bridge ID Cost ge-0/0/10.0 128:523 128:513 1.0019e2544040 20000 FWD ROOT
78
ge-0/0/12.0 128:525 128:525 32769.0019e2544ec0 20000 FWD DESG ge-0/0/14.0 128:527 192:213 4097.001db5a167d1 20000 BLK ALT ge-0/0/15.0 128:528 128:528 32769.0019e2544ec0 20000 FWD DESG Spanning tree interface parameters for instance 2 Interface Port ID Designated Designated Port State Role port ID bridge ID Cost ge-0/0/10.0 128:523 128:513 2.0019e2544040 20000 FWD ROOT ge-0/0/13.0 224:526 224:526 16386.0019e2544ec0 20000 FWD DESG ge-0/0/14.0 128:527 192:213 4098.001db5a167d1 20000 BLK ALT
The following CLI output shows the MSTI-0 information on the Root Bridge. All ports are in the forwarding state.
chandra@EX-175-CSR> show spanning-tree interface Spanning tree interface parameters for instance 0 Interface Port ID Designated Designated Port State Role port ID bridge ID Cost ge-0/0/0.0 240:513 240:513 12288.0019e2544040 20000 FWD DESG ge-0/0/7.0 240:520 240:520 12288.0019e2544040 20000 FWD DESG ge-0/0/9.0 240:522 240:522 12288.0019e2544040 20000 FWD DESG ge-0/0/20.0 240:533 240:533 12288.0019e2544040 20000 FWD DESG ge-0/0/21.0 240:534 240:534 12288.0019e2544040 20000 FWD DESG ge-0/0/24.0 128:537 128:537 12288.0019e2544040 20000 FWD DESG ge-0/0/25.0 128:538 128:538 12288.0019e2544040 20000 FWD DESG
1.
Check that only the information from instance MSTI-0 (but not MSTI-1 and MSTI-2) is available on all switches.
2. Confirm that there is only one direct path to any other interface within each MSTI instance on a switch. All other redundant paths should be designated as Blocked. Use the show spanning-tree interface command for this purpose. 3. Verify that a change in priority on any MSTI instance on a switch is propagated through the entire mesh using the show spanning-tree interface command.
Configuring VSTP/PVST+/Rapid-PVST+
Figure 5.4 depicts a sample network consisting of a mesh of EX8200/4200 and MX480 devices with the Cisco ESM switch. VSTP and PVST+ must be enabled on the Cisco and Juniper devices, respectively, for interoperability. Two VLANs 1122 and 71 are created on all devices; VSTP is enabled for both of these VLANs.
79
MX480
ge-5/1/2 ge-5/1/1
Trunk Port 17
9.3.71.39 9.3.71.50 9.3.71.40 9.3.71.35 9.3.71.41
Trunk Port 18
EX8200
VLAN [1122]
Internal eth 0 interface for each blade [6, 7, 8, 9, 10] connected via Trunk Ports [17, 18, 19]
ge-0/0/21
ge-0/0/15
ge-0/0/13
EX4200
EX4200
172.28.113.175 Priority bc_ext:16K bc_int:8K
ge-0/0/9
*Server connections simulated to each DUT via eth 1 interface connected to BladeCenters Pass-Through Module for blade slots [6, 7, 8, 9, 10]
Figure 5.4
80
Table 5.4 lists the bridge priorities for each of the VLANs. Table 5.4 VLAN ID VSTP Bridge Priorities Bridge Priority
EX4200-A 8K 71 EX4200-B 4K EX8200 12K MX480 16K EX4200-A 16K 1122 EX4200-B 32K EX8200 24K MX480 16K
Verification
Based on the sample setup as shown in Figure 5.4, verify interoperability of the VSTP configuration with Cisco PVST+ by performing the following steps. 1. Verify that each of the switches with VSTP/ PVST+ enabled has two spanning trees corresponding to two VLANs. Each VLAN has its own RP (ROOT), BP (ALT) and DP (DESG). Use the show spanning tree command.
chandra@SPLAB-EX-180> show spanning-tree interface Spanning tree interface parameters for VLAN 1122 Interface Port ID Designated Designated Port State Role port ID bridge ID Cost ge-0/0/10.0 128:523 128:513 17506.0019e2544040 20000 FWD ROOT ge-0/0/12.0 224:525 224:525 33890.0019e2544ec0 20000 FWD DESG ge-0/0/14.0 240:527 240:213 17506.001db5a167d0 20000 BLK ALT ge-0/0/15.0 240:528 240:528 33890.0019e2544ec0 20000 FWD DESG Spanning tree interface parameters for VLAN 71 Interface Port ID Designated Designated Port State Role port ID bridge ID Cost ge-0/0/10.0 128:523 128:523 4167.0019e2544ec0 20000 FWD DESG ge-0/0/13.0 224:526 224:526 4167.0019e2544ec0 20000 FWD DESG ge-0/0/14.0 240:527 240:527 4167.0019e2544ec0 20000 FWD DESG
2. Confirm that there is only one direct active path per VLAN instance to any other non-root bridge. All redundant paths should be identified as Blocked. Use the output of the show spanning-tree interface command for this purpose. Rebooting the root port should cause the device with the next lower priority step up as the root for the particular VLAN. This information must be updated in the VLAN table on all devices. 3. Verify that the original root bridge becomes the primary (active), after the reboot. This information should be updated on all devices in the mesh. NOTE Any change in bridge priorities on either of the VSTP must be propagated through the mesh.
81
Configuration Snippets
The following code pertains to the EX4200-A.
chandra@EX-175-CSR> show configuration protocols vstp // Define a VLAN bc-external, assign bridge and interface priorities. // Enable root protection so that the interface is blocked when it receives BPDUs. Also, define the port to be an edge port. vlan bc-external { bridge-priority 16k; interface ge-0/0/7.0 { priority 240; edge; no-root-port; } interface ge-0/0/20.0 { priority 224; } interface ge-0/0/21.0 { priority 240; } } // Define a VLAN bc-internal, assign bridge and interface priorities. // Enable root protection so that interface is blocked when it receives BPDUs. Also, define the port to be an edge port. vlan bc-internal { bridge-priority 8k; interface ge-0/0/9.0 { priority 240; edge; no-root-port; } interface ge-0/0/21.0 { priority 240; } interface ge-0/0/23.0 { priority 224; } }
82
vlan 1122 { bridge-priority 16k; interface ge-5/1/1 { priority 240; } interface ge-5/1/2 { priority 240; } interface ge-5/2/2 { priority 240; } interface ge-5/3/3 { priority 240; } interface ge-5/3/4 { priority 240; } }
83
Chapter 6
Supporting Multicast Traffic
84
85
Multicast Router
Video Client
Video Server
Router 1
Laptop
PIM
IGMP
IGMP
Figure 6-1
IGMP manages the membership of hosts and routers in multicast groups. IP hosts use IGMP to report their multicast group memberships to any neighboring multicast routers. In addition, IGMP is used as the transport for several related multicast protocols, such as DVMRP and PIMv1. IGMP has three versions that are supported by hosts and routers: IGMPv1 The original protocol defined in RFC 1112. An explicit join message is sent to the router, but a timeout is used to determine when hosts leave a group. IGMPv2 Defined in RFC 2236. Among other features, IGMPv2 adds an explicit leave message to the join message so that routers can easily determine when a group has no listeners. IGMPv3 Defined in RFC 3376. IGMPv3 supports the ability to specify which sources can send to a multicast group. This type of multicast group is called a source-specific multicast (SSM) group and its multicast address is 232/8. IGMP v3 is also backwards compatible with IGMP v1 and IGMP v2. For SSM mode, we can configure the multicast source address so that the source can send the traffic to the multicast group. In this example, we create group 225.1.1.1 and accept IP address 10.0.0.2 as the only source.
user@host# set protocols igmp interface fe-0/1/2 static source 10.0.0.2 user@host# set protocols igmp interface fe-0/1/2 static source 10.0.0.2 source-count 3 user@host# set protocols igmp interface fe-0/1/2 static source 10.0.0.2 source-count 3 source-increment 0.0.0.2 user@host# set protocols igmp interface fe-0/1/2 static exclude source 10.0.0.2 group 225.1.1.1 group 225.1.1.1 group 225.1.1.1 group 225.1.1.1
NOTE
The SSM configuration requires that the IGMP version on the interface be set to IGMPv3.
86
When we enable IGMP static group membership, data is forwarded to an interface without that interface receiving membership reports from downstream hosts. NOTE When we configure static IGMP group entries on point-to-point links that connect routers to a rendezvous point (RP), the static IGMP group entries do not generate join messages toward the RP.
Sparse Mode
No No No Yes Yes
Implicit Join
Yes No Yes No No
Explicit Join
No Yes No Yes Yes
(S,G) SBT
Yes Yes Yes Yes No
Because PIM Sparse Mode and PIM Dense Mode are the most widely deployed techniques, they were used in this reference design.
87
PIM dense mode requires only a multicast source and a series of multicast-enabled routers that run PIM dense mode to allow receivers to obtain multicast content. Dense mode ensures that the traffic reaches its prescribed destination by periodically flooding the network with multicast traffic, and relies on prune messages to ensure that subnets (where all receivers are un-interested in that particular multicast group) stop receiving packets. PIM sparse mode requires establishing special routers called rendezvous points (RPs) in the network core. This is the point where these routers upstream join messages from interested receivers and meet downstream traffic from the source of the multicast group content. A network can have many RPs, but PIM sparse mode allows only one RP to be active for any multicast group. On the multicast router, it typically has two IGMP interfaces: upstream IGMP interface and downstream IGMP interface. We must configure PIM on the upstream IGMP interfaces to enable multicast routing and to perform reverse path forwarding for multicast data packets to populate the multicast-forwarding table for the upstream interfaces. In the case of PIM sparse mode, to distribute IGMP group memberships into the multicast routing domain. Only one pseudo PIM interface is required to represent all IGMP downstream (IGMP-only) interfaces on the router. Therefore, PIM is generally not required on all IGMP downstream interfaces, reducing the amount of router resources, such as memory. IGMP and Nonstop Active Routing NSR configurations include passive support with IGMP in association with PIM. The primary Routing Engine uses IGMP to determine its PIM multicast state, and this IGMP-derived information is replicated on the backup Routing Engine. IGMP on the new primary Routing Engine (after failover) relearns the state information quickly through the IGMP operation. In the interim, the new primary Routing Engine retains the IGMP-derived PIM state as received by the replication process from the original primary Routing Engine. This state information times out unless refreshed by IGMP on the new primary Routing Engine. Additional IGMP configuration is not required. Filtering Unwanted IGMP Reports at the IGMP Interface Level The group-policy statement enables the router to filter unwanted IGMP reports at the interface level. When this statement is enabled on a router running IGMP version 2 (IGMPv2) or version 3 (IGMPv3), after the router receives an IGMP report, the router compares the group against the specified group policy and performs the action configured in that policy. For example, the router rejects the report if the policy doesnt match the defined address or network. To enable IGMP report filtering for an interface, include the following group-policy statement:
protocols { igmp { interface ge-1/1/1.0 { group-policy reject_policy; } } }
88
policy-options { //IGMPv2 policy policy-statement reject_policy { from { router-filter 192.1.1.1/32 exact; } then reject; } policy-statement reject_policy { //IGMPv3 policy from { router-filter 192.1.1.1/32 exact; source-address-filter 10.1.0.0/16 orlonger; } then reject; } }
IGMP Configuration Command Hierarchy To configure the Internet Group Management Protocol (IGMP), include the following igmp statement:
igmp { accounting; // Accounting Purposes interface interface-name { disable; (accounting | no-accounting); // Individual interface specific accounting group-policy [ policy-names ]; immediate-leave; // see Note 1 at end of code snippet. oif-map map-name; promiscuous-mode; // See Note 2 at end of code snippet. ssm-map ssm-map-name; static { group multicast-group-address { exclude; group-count number; group-increment increment; source ip-address { source-count number; source-increment increment; } } } version version; // See Note 3 at end of code snippet.
} query-interval seconds; query-last-member-interval seconds; // Default 1 Second query-response-interval seconds; // Default 10 Seconds robust-count number; // See Note 4 at end of code snippet. traceoptions { // Tracing Purposes file filename <files number> <size size> <world-readable | no-world-readable>; flag flag <flag-modifier> <disable>; // Flag can be : [leave (for IGMPv2 only)| mtrace | packets | query | report] } }
89
NOTE 1
Use this statement only on IGMP version 2 (IGMPv2) interfaces to which one IGMP host is connected. If more than one IGMP host is connected to a LAN through the same interface, and one host sends a leave group message, the router removes all hosts on the interface from the multicast group. The router loses contact with the hosts that must remain in the multicast group until they send join requests in response to the routers next general group membership query.
NOTE 2 By default, IGMP interfaces accept IGMP messages only from the same subnetwork. The promiscuous-mode statement enables the router to accept IGMP messages from different sub-networks. NOTE 3 By default, the router runs IGMPv2. If a source address is specified in a multicast group that is configured statically, the IGMP version must be set to IGMPv3. Otherwise, the source will be ignored and only the group will be added. The join will be treated as an IGMPv2 group join. When we reconfigure the router from IGMPv1 to IGMPv2, the router will continue to use IGMPv1 for up to 6 minutes and will then use IGMPv2. NOTE 4 The robustness variable provides fine-tuning to allow for expected packet loss on a subnetwork. The value of the robustness variable is used in calculating the following IGMP message intervals: Group member interval=(robustness variable x query-interval) + (1 x query-response-interval) Other querier present interval= (robustness variable x query-interval) + (0.5 x query-response-interval), lastmember query count=robustness variable. By default, the robustness variable is set to 2. Increase this value if you expect a subnetwork to lose packets.
90
MX480
PIM with OSPF
ge-5/2/5 ge-0/0/44
Multicast Router
EX4200
ge-0/0/2 VLAN 1119 IGMP Multicast Client BNT Pass-Through
Io0.0 6.6.6.6
Eth1
Figure 6.2
91
} interface ge-5/2/5.0 { static { group 239.168.1.4; } } {master}[edit] chandra@HE-RE-1-MX480# show protocols pim rp { local { address 8.8.8.8; } } interface all { mode sparse; } interface fxp0.0 { disable; } chandra@HE-RE-1-MX480# show protocols ospf area 0.0.0.0 { interface ge-5/2/5.0; interface lo0.0 { passive; } interface fxp0.0 { disable; } } chandra@HE-RE-1-MX480# show routing-options router-id 8.8.8.8;
92
rp { static { address 8.8.8.8; } } interface vlan.1119; interface me0.0 { disable; } interface all { mode sparse; } chandra@EX-175-CSR# show interfaces lo0 unit 0 { family inet { address 6.6.6.6/32; } } chandra@EX-175-CSR# show protocols ospf area 0.0.0.0 { interface ge-0/0/44.0; interface lo0.0 { passive; } interface me0.0 { disable; } } chandra@EX-175-CSR# show routing-options router-id 6.6.6.6;
93
Source: * RP: 8.8.8.8 Flags: sparse,rptree,wildcard Upstream interface: Local Group: 239.168.1.1 Source: 10.10.10.254 Flags: sparse,spt Upstream interface: ge-5/2/6.0 Group: 239.168.1.2 Source: * RP: 8.8.8.8 Flags: sparse,rptree,wildcard Upstream interface: Local Group: 239.168.1.2 Source: 10.10.10.254 Flags: sparse,spt Upstream interface: ge-5/2/6.0
chandra@HE-RE-1-MX480> show pim source Instance: PIM.master Family: INET Source 8.8.8.8 Prefix 8.8.8.8/32 Upstream interface Local Upstream neighbor Local Source 10.10.10.254 Prefix 10.10.10.0/24 Upstream interface ge-5/2/6.0 Upstream neighbor 10.10.10.2 Source 10.10.10.254 Prefix 10.10.10.0/24 Upstream interface ge-5/2/6.0 Upstream neighbor Direct
94
H = Hello Option Holdtime, L = Hello Option LAN Prune Delay, P = Hello Option DR Priority Interface IP V Mode Option ge-0/0/44.0 4 2 HPLG chandra@EX-175-CSR# run show pim source Instance: PIM.master Family: INET Source 8.8.8.8 Prefix 8.8.8.8/32 Upstream interface ge-0/0/44.0 Upstream neighbor 22.11.5.5 Source 10.10.10.254 Prefix 10.10.10.0/24 Upstream interface ge-0/0/44.0 Upstream neighbor 22.11.5.5 Uptime Neighbor addr 01:06:07 22.11.5.5
95
EX8200
PIM with RIP Multicast Router
EX4200
NICs/HEA (Host Ethernet Adapter) SEA (Shared Ethernet Adapter) Virtual Network
Figure 6.3
immediate-leave; } interface ge-0/0/2.2211; interface all; chandra@EX-175-CSR# show protocols pim rp { static { address 9.9.9.9; } } interface vlan.2211; interface me0.0 { disable; } interface all { mode sparse; } chandra@EX-175-CSR# show interfaces lo0 unit 0 { family inet { address 6.6.6.6/32; } }
96
chandra@EX-175-CSR# show protocols rip send broadcast; receive both; group jweb-rip { export jweb-policy-rip-direct; neighbor ge-0/0/2.0; neighbor lo0.0; neighbor vlan.2211; } chandra@EX-175-CSR# show policy-options policy-statement jweb-policy-rip-direct { term 1 { from { protocol [ direct rip ]; interface [ ge-0/0/2.0 ge-0/0/17.0]; } then accept; } term 2 { then accept; } }
IGMP Snooping
An access switch usually learns unicast MAC addresses by checking the source address field of the frames it receives. However, a multicast MAC address can never be the source address for a packet. As a result, the switch floods multicast traffic on the VLAN, consuming significant amounts of bandwidth.
97
IGMP snooping regulates multicast traffic on a VLAN to avoid flooding. When IGMP snooping is enabled, the switch intercepts IGMP packets and uses the content of the packets to build a multicast cache table. The cache table is a database of multicast groups and their corresponding member ports and is used to regulate multicast traffic on the VLAN. When the switch receives multicast packets, it uses the cache table to selectively forward the packets only to the ports that are members of the destination multicast group. As illustrated in Figure 6.4, the access switch EX4200 connects four hosts and segments their data traffic with two VLANs, where host1 and host2 belong to VLAN1 and host3 and host4 belong to VLAN2. The hosts at the same VLAN might take different action on whether to subscribe or to unsubscribe the multicast group. For instance, host1 has subscribed to multicast group 1, while host2 is not interested in multicast group1 traffic; host3 has subscribed to multicast group 2, while host4 is not interested in multicast group 2 traffic. The EX4200 IGMP snooping feature can accommodate this request so that host1 receives multicast group1 traffic, and host2 does not; host3 receives multicast group 2 traffic, and host4 does not.
Host 2 in VLAN 1
Trunk EX4200
Host 3 in VLAN 2 Subscribes Group 2
VLAN 1 VLAN 2 Multicast Group 1 Tra c Multicast Group 2 Tra c Host 4 in VLAN 2
Figure 6.4
Hosts can join multicast groups in two ways: By sending an unsolicited IGMP join message to a multicast router that specifies the IP multicast that the host is attempting to join. By sending an IGMP join message in response to a general query from a multicast router. A multicast router continues to forward multicast traffic to a VLAN if at least one host on that VLAN responds to the periodic general IGMP queries. To leave a multicast group, a host can either not respond to the periodic general IGMP queries, which results in a silent leave, or send a group-specific IGMPv2 leave message.
98
VLAN 0 v100
VLAN 1 v200
(R) ge-0/0/0.0
ge-2/0/0.0 (R)
(R) ge-0/0/0.0
ge-2/0/0.0 (R)
ge-1/0/0.0
ge-1/0/0.0
Figure 6.5
99
NOTE
By default, IGMP snooping is not enabled. Statements configured at the VLAN level apply only to that particular VLAN. With the MX Ethernet Router Series in the Junos CLI, we can configure a Layer 2 broadcasting domain with a bridge domain, so that IGMP snooping is configured at the [bridge-domains] configuration hierarchy. The detailed configuration stanza is as follows:
multicast-snooping-options { flood-groups [ ip-addresses ]; forwarding-cache { threshold suppress value <reuse value>; } graceful-restart <restart-duration seconds>; ignore-stp-topology-change; }
ge-5/2/3
MX480
ge-0/0/2 ge-0/0/6
ge-5/2/4
ge-1/0/20
EX4200
EX8200
Cisco MM1 ESM 1 BNT Pass-Through
Figure 6.6
MX480, EX8200, EX4200 and IBM Blade Center IGMP Traffic Flow with IGMP Snooping
101
102
103
Not local: 0 Receive unknown: 0 Timed out: 2 IGMP Type Received Transmitted Receive Errors Queries: 156 12 0 Reports: 121 121 0 Leaves: 2 2 0 Other: 0 0 0
Multicast Router
MX480
ge-5/2/4
ip = 239.168.1.4
IGMP Client
Figure 6.7
MX480 and IBM x3500 IGMP Traffic Flow with IGMP Snooping
chandra@HE-RE-1-MX480> show configuration bridge-domains 1118 domain-type bridge vlan-id 1118; interface ge-5/2/4.0; interface ge-5/2/6.0; protocols { igmp-snooping { interface ge-5/2/6.0 { multicast-router-interface; } interface ge-5/2/4.0 { static { group 239.168.1.4; } } } }
105
Chapter 7
Understanding Network CoS and Latency
Class.of.Service. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 106 Configuring.CoS. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 111 Latency. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. . 115
AN.APPLICATIONS.PERFORMANCE.directly.relies.on.network.performance .. Network.performance.typically.refers.to.bandwidth.because.bandwidth.is.the. primary.measure.of.computer.network.speed.and.represents.overall.capacity.of.a. connection ..Greater.capacity.typically.generates.improved.performance ..However,. network.bandwidth.is.not.the.only.factor.that.contributes.to.network. performance . The.performance.of.an.application.relies.on.different.network.characteristics ... Some.real-time.applications.such.as.voice.and.video.are.extremely.sensitive.to. latency,.jitter,.and.packet.loss,.while.some.non.real-time.applications,.such.as.web. applications.(HTTP),.email,.File.Transfer.Protocol.(FTP),.and.Telnet,.do.not. require.any.specific.reliability.on.the.network,.and.best.effort.policy.works.well.in. transmitting.these.traffic.types .
In todays converged network, including data/voice converged networks and data/storage converged networks, and in cloud-ready data centers with server virtualization, different types of applications are transmitted throughout the same network. To ensure application performance for all types of applications, additional provisions are required within the network to minimize latency and packet loss. This chapter covers two techniques for improving data center network performance: Using class of service (CoS) to manage packet loss. Considering latency characteristics when designing networks using Juniper Networks data center network products.
Class of Service
Typically, when a network experiences congestion and delay, some packets will be dropped. However, as an aid in preventing dropped packets, Junos CoS allows an administrator to divide traffic into classes and offers various levels of throughput and packet loss when congestion and delay occur. This allows packet loss to occur only when specific rules are configured on the system. In designing CoS applications, we must consider service needs, and we must thoroughly plan and design CoS configuration to ensure consistency across all routers in a CoS domain. We must also consider all the routers and other networking equipment in the CoS domain to ensure interoperability among different types of equipment. However, before we further proceed with implementing CoS in Junos, we should understand CoS components and packet flow through the CoS process.
BA Classiers
MF Classiers
Policing
Packet Forwarding
Engress Processing
Rewrite Packet Dropping Queueing and Shaping Policing MF Classiers
Figure 7.1
107
The following is a list of the key steps in the QoS process, together with the corresponding configuration commands for the process. 1. Classifying: This step examines (for example, EXP bits, IEEE 802.1p bits, or DSCP bits) to separate incoming traffic. One or more classifiers must be assigned to a physical interface or a logical interface must be assigned one or more classifiers to separate the traffic flows. The classifier configuration is at the [edit class-of-service interfaces] hierarchy level in Junos CLI. In addition, the classifier statement further defines how to assign the packet to a forwarding class with a loss priority. The configuration is at the [edit classof-service classifiers] hierarchy level in Junos CLI. For details concerning packet loss priority and forwarding class, see Defining Loss Priorities and Defining Forwarding Classes on page 109 of this handbook. Furthermore, each forwarding class can be assigned to a queue. The configuration is at the [edit class-of-service forwarding-classes] hierarchy level. 2. Policing: This step meters traffic. It changes the forwarding class and loss priority if a traffic flow exceeds its pre-defined service level. 3. Scheduling: This step manages all attributes of queuing, such as transmission rate, buffer depth, priority, and Random Early Detection (RED) profile. A schedule map will be assigned to the physical or logical interface. The configuration is at the [edit class-of-service interfaces] hierarchy level in Junos CLI. In addition, the scheduler statement defines how traffic is treated in the output queuefor example, the transmit rate, buffer size, priority, and drop profile. The configuration is at the [edit class-of-service schedulers] hierarchy level. Finally, the scheduler-maps statement assigns a scheduler to each forwarding class. The configuration is at the [edit class-of-service scheduler-maps] hierarchy level. 4. Packet Dropping: This step manages drop-profile to avoid TCP synchronization and protect high priority traffic from being dropped. The drop-profile defines how aggressively to drop packets that are using a particular scheduler. The configuration is at the [edit class-of-service drop-profiles] hierarchy level. 5. Rewrite Marker: This step rewrites the packet CoS fields (for example, EXP or DSCP bits) according to the forwarding class and loss priority of the packet. The rewrite rule takes effect as the packet leaves a logical interface that has a rewrite rule. The configuration is at the [edit class-of-service rewrite-rules] hierarchy level in Junos CLI.
Packet Classifiers
dscp exp ieee-802.1 ieee-802.1ad inet-precedence
Function
Handles incoming IPv4 packets. Handles incoming IPv6 packets. Handles MPLS packets using Layer 2 headers. Handles Layer 2 CoS. Handles IEEE-802.1ad (DEI) classifier. Handles incoming IPv4 packets. IP precedence mapping requires only the upper three bits of the DSCP field.
dscp-ipv6
Using Code-Point Aliases Using code-point aliases requires an operator to assign a name to a pattern of code-point bits. We can use this name instead of the bit pattern when configuring other CoS components, such as classifiers, drop-profile maps, and rewrite rules, for example ieee-802.1 { be 000; af12 101; af11 100; be1 001; ef 010; } . Defining Loss Priorities Loss priority affects the scheduling of a packet without affecting the packets relative ordering. An administrator can use the packet loss priority (PLP) bit as part of a congestion control strategy and can use the loss priority setting to identify packets that have experienced congestion. Typically, an administrator will mark packets exceeding a specified service level with a high loss priority and set the loss priority by configuring a classifier or a policer. The loss priority is used later in the work flow to select one of the drop profiles used by random early detection (RED).
109
Defining Forwarding Classes The forwarding class affects the forwarding, scheduling, and marking of policies applied to packets as they move through a router. Table 7.2 summarizes the mapping between queues and different forwarding classes for both the MX and EX Series. Table 7.2 Forwarding Classes for MX480, EX4200 and EX8200 Series MX Series Queue
Q3 Q2 Q0
Forwarding Class
Voice (EF) Video (AF) Data (BE) Network Control (NC)
EX Series Queue
Q5 Q4 Q0 Q7
The forwarding class, plus the loss priority defines the per-hop behavior. If the use case requires associating the forwarding classes with next hops, then the forwarding policy options are available only on the MX Series. Defining Comprehensive Schedulers An individual router interface has multiple queues assigned to store packets. The router determines which queue to service based on a particular method of scheduling. This process often involves a determination of which type of packet should be transmitted before another type of packet. Junos schedulers allow an administrator to define the priority, bandwidth, delay buffer size, rate control status, and RED drop profiles to be applied to a particular queue for packet transmission. Defining Policers for Traffic Classes Policers allow an administrator to limit traffic of a certain class to a specified bandwidth and burst size. Packets exceeding the policer limits can be discarded or can be assigned to a different forwarding class, a different loss priority, or both. Juniper defines policers with filters that can be associated with input or output interfaces. Table 7.3 compares the multicast routing protocols as they pertain to Juniper Networks MX4800, EX8200, and EX4200.
110
MX4800 Series
Yes Yes Yes Yes Yes Yes Yes Yes Yes Yes Yes Yes Yes Yes Yes Yes Yes
EX8200 Series
Yes Yes Yes Yes Yes Yes Yes Yes Yes Yes -
EX4200 Series
Yes Yes Yes Yes Yes Yes Yes -
drop-profiles fabric forwardingclasses forwardingpolicy fragmentationmaps host-outboundtraffic interfaces multidestination restrictedqueues rewrite-rules routinginstances schedulermaps schedulers traffic-controlprofiles translationtable tri-color
111
Configuring CoS
In this section, we demonstrate a sample configuration scenario for configuring CoS on the EX4200. Two blade servers connect to two different interfaces to simulate production traffic by issuing a ping command; the test device (N2X) will generate significant network traffic classified as background traffic through the EX4200 to one of the blade servers. This background package will congest with production traffic, causing packet loss in the product traffic. Because the EX4200 is central to network traffic aggregation in this scenario, it is reasonable to apply a CoS packet loss policy on the EX4200 to ensure that no packet loss occurs in the product traffic. NOTE The configuration scenario and snippet is also applicable to MX Series Ethernet Routers.
Configuration Description
As illustrated in Figure 7.2, the EX4200 is the DUT, which interconnects IBM blade servers, and the Agilent Traffic Generator N2X.
ge-203/1 11.22.1.100 N2X ge-304/4 11.22.1.200 ge-0/0/25
ge-0/0/24
ge-0/0/9 11.22.1.9
EX4200
IBM BladeCenter
Pass-Through Module via eth 1 Interface on 9th Blade
ge-0/0/7
11.22.1.7
IBM BladeCenter
Pass-Through Module via eth 1 Interface on 7th Blade
Figure 7.2
The test includes the following steps: 1. The N2X generates network traffic as background traffic onto the EX4200 through two ingress GigE ports (ge-0/0/24 and ge-0/0/25).
2. The EX4200 forwards the background traffic to a single egress GigE port (ge-0/0/9). 3. At the same time, the blade server uses the ping command to generate production traffic onto the EX4200 through a different interface (ge-0/0/7). 4. The EX4200 also forwards the production traffic to the same egress port (ge-0/0/9). From a packet loss policy perspective, the production traffic is low loss priority, while the background traffic is high.
112
To verify the status of packets on ingress/egress ports, we enable the show interface queue <ge-0/x/y> command to confirm that only high loss priority packets from the BACKGROUND forwarding class were being tail dropped.
NOTE
The configuration used in this setup was sufficient to achieve confirmation on CoS functionality (in simplest form). Other detailed configuration options are available and can be enabled as needed. Refer to the CoS command Hierarchy Levels in the Junos Software CLI User Guide at www.juniper.net/techpubs/software/junos/ junos95/swref-hierarchy/hierarchy-summary-configuration-statement-classof-service.html#hierarchy-summary-configuration-statement-class-of-service. The following steps summarize the setup configuration process. 1. Configure the setup as illustrated in Figure 7.2 and by reviewing the CoS configuration code snippet. 2. Create some simple flows on N2X to send from each port-to-port ge0/0/9. 3. Send the traffic at 50% from each port to 11.22.1.9. (in absence of two ports, one port could be used to send 100% traffic). 4. Configure the DUT to do the CoS-based processing on ingress traffic from source 11.22.1.7 coming over interface ge-0/0/7 as High Class and low probability to get dropped and from interfaces ge-0/0/24 and ge-0/0/25 as High Priority to get dropped. 5. Now start the ping from 11.22.1.7 onto 11.22.1.9. 6. Tune the line-rate parameter with N2X traffic coming to ge-0/0/9. 7. Observe the egress interface statistics and ingress ports statistics to get confirmation that ping traffic is tagged higher forwarding class and does not get dropped, while traffic coming from port ge-0/0/24 and ge-0/0/25 gets dropped on ingress.
113
code-point-aliases { ieee-802.1 { //associate the code point aliases be 000; af12 101; af11 100; be1 001; ef 010; } } forwarding-classes { //assigned the four queues to the forwarding classes queue 0 BACKGROUND; queue 3 CONVERSATIONAL; queue 2 INTERACTIVE; queue 1 STREAMING; } interfaces { ge-0/0/9 { //associate the scheduler map, rewrite rules and classifer with the interface scheduler-map SCHED-MAP; unit 0 { classifiers { ieee-802.1 DOTP-CLASSIFIER; } rewrite-rules { ieee-802.1 DOTP-RW; } } } } rewrite-rules { //define the rewrite rules for each of the forwarding classes. Set the code points to be used in each case ieee-802.1 DOTP-RW { forwarding-class CONVERSATIONAL { loss-priority low code-point ef; } forwarding-class INTERACTIVE { loss-priority low code-point af12; } forwarding-class STREAMING { loss-priority low code-point af11; } forwarding-class BACKGROUND { loss-priority high code-point be; } } } scheduler-maps { //define the scheduler maps for each forwarding class SCHED-MAP { forwarding-class BACKGROUND scheduler BACK-SCHED; forwarding-class CONVERSATIONAL scheduler CONV-SCHED; forwarding-class INTERACTIVE scheduler INTERACT-SCHED; forwarding-class STREAMING scheduler STREAMING-SCHED; } } schedulers { //Specify the scheduler properties for each forwarding class. Priorities assigned here define how the scheduler handles the traffic. CONV-SCHED { transmit-rate remainder; buffer-size percent 80; priority strict-high; }
114
INTERACT-SCHED; STREAMING-SCHED { transmit-rate percent 20; } BACK-SCHED { transmit-rate remainder; priority low; } } chandra@EX> show configuration firewall family ethernet-switching { //Configure a multifield classifer for better granularity. CONVERSATIONAL class gets higher priority than BACKGROUND filter HIGH { term 1 { from { source-address { 11.22.1.7/32; } } then { accept; forwarding-class CONVERSATIONAL; loss-priority low; } } term 2 { then { accept; count all; } } } filter LOW { term 1 { from { source-address { 11.22.1.100/32; 11.22.1.101/32; } } then { accept; forwarding-class BACKGROUND; loss-priority high; } } term 2 { then { accept; count all; } } } } chandra@EX > show configuration interfaces ge-0/0/24 unit 0 { family ethernet-switching { //Assign the firewall filter to the interface port-mode access; filter { input LOW; output LOW; } } } chandra@EX> show configuration interfaces ge-0/0/25 unit 0 {
115
family ethernet-switching { port-mode access; filter { input LOW; output LOW; } } } chandra@EX> show configuration interfaces ge-0/0/7 unit 0 { family ethernet-switching { port-mode access; filter { input HIGH; output HIGH; } } } chandra@EX> show configuration interfaces ge-0/0/9 unit 0 { family ethernet-switching { port-mode access; } }
Latency
Network latency is critical to business. Today, the competitiveness in the global financial markets is measured in microseconds. High performance computing and financial trading demand an ultra low-latency network infrastructure. Voice and video traffic is time-sensitive and typically requires low latency. Because network latency in a TCP/IP network can be measured on different layers, such as Layer 2/3, and for different types of traffic, such as unicast or multicast, it often refers to one of the following: Layer 2 unicast, Layer 3 unicast, Layer 2 multicast or Layer 3 multicast. Often, latency is measured in various frame sizes 64, 128, 256, 512, 1024, 1280, 1518 bytes for Ethernet. The simulated traffic throughput is a critical factor in the accuracy of test results. For a 1 Gbps full-duplex interface, the transmitting (TX) throughput of simulated traffic and the receiving (TR) throughput require 1Gbps and the TX/TR throughput ratio must be at least 99%. Measuring network latency often requires sophisticated test appliances, such as Agilent N2X, Spirent Communications, and IXIA. NetworkWorld validated Juniper Networks EX4200 performance, including Layer 2 unicast latency, Layer 3 unicast, Layer 2 multicast and Layer 3 multicast. For detailed test results, please refer to www.networkworld.com/ reviews/2008/071408-test-juniper-switch.html.
116
In this section, we discuss the concept of measuring device latency and demonstrate the sample configuration for measuring Layer 2 and Layer 3 unicast latency on the MX480.
Measuring Latency
IEFT standard RFC 2544 defines performance test criteria for measuring latency of the DUT. As shown in Figure 7.3, the ideal way to test DUT latency is to use a tester with both transmitting and receiving ports. The tester connects DUT with two connections: the transmitting port of the tester connects to the receiving port of the DUT, and the sending port of the DUT connects to the receiving port of the tester. The setup also applies to measuring the latency of multiple DUTs, as shown in Figure 7.3.
DUT 1 Tester DUT Tester DUT 2
Figure 7.3
Measuring Latency
Figure 7.4 illustrates two latency test scenarios. We measured the latency of the MX480 in one scenario; we measured the end-to-end latency of MX480 and Ciscos ESM in another scenario. We used Agilents N2X with transmitting port (ge2/3/1) and receiving port (ge-3/4/4) as a tester.
ge-5/3/5 11.22.1.1 N2X ge-2/3/1 11.22.1.2 ge-5/3/7 11.22.2.1
MX480
Port 18 N2X ge-3/4/4 11.22.2.2 Port 20 Cisco ESM IBM BladeCenter Device Latency End-to-End Latency
Figure 7.4
Latency Setup
117
In the first test scenario, the N2X and MX480 connections, represented by the dashed line,are made from the sending ports (ge-2/3/1) of the N2X to the receiving ports (ge-5/3/5) of the MX480 and from the sending ports (ge-5/3/6) of the MX480 back to the receiving ports (ge-3/4/4) of the tester. In second test scenario, the connection among the N2X, MX480 and Ciscos ESM (represented by the solid line in Figure 7.4) occurs in the following order: Connection from the sending ports of the N2X to the receiving ports of the MX480 Connection from the sending port of the MX480 to the receiving port (Port 18) of Ciscos ESM Connection from the sending port (Port 20) of Ciscos ESM to the receiving port of the N2X.
118
119
Chapter 8
Configuring High Availability
Routing Engine Redundancy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 120 Graceful Routing Engine Switchover . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 121 Virtual Chassis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 122 Nonstop Active Routing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 125 Nonstop Bridging . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 126 Graceful Restart . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 126 In-Service Software Upgrade . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 127 Virtual Router Redundancy Protocol . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 130 Link Aggregation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 135 Redundant Trunk Group . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 139
IMPLEMENTING HIGH AVAILABILITY (HA) is critical when designing a network. Operators can implement high availability using one or more of the several methods described in Chapter 3: Implementation Overview.
120
This chapter covers the following software-based high availability features that operators can enable in the data center: Routing Engine Redundancy Graceful Routing Engine Switchover (GRES) Virtual Chassis Nonstop Routing (NSR) Nonstop Bridging (NSB) Graceful Restart (GR) In-Service Software Upgrade (ISSU) Virtual Router Redundancy Protocol (VRRP) Link Aggregation (LAG) Redundant Trunk Group (RTG) Enabling either one or a combination of the features listed increases the reliability of the network. This chapter first introduces Junos OS based features such as Routing Engine redundancy, GRES, GR, NSR, NSB and ISSU that are critical to implementing high availability in the data center. Reliability features such as VRRP, RTG and LAG are implemented over these key high availability elements.
121
1.
Configuring the automatic failover from an active to backup Routing Engine without any interruption to packet forwarding can be done at the edit chassis redundancy heirarchy. The triggers are either a detection of a hard disk error or a loss of keepalives from the primary Routing Engine:
[edit chassis redundancy]{ failover on-disk-failure; failover on-loss-of-keepalives; }
2. Specify the threshold time interval for loss of keepalives after which the backup Routing Engine takes over from the primaryRouting Engine. The failover occurs by default after 300 seconds when Graceful Routing Engine Switchover is not configured.
[edit chassis redundancy] keepalive-time seconds;
3. Configure automatic switchover to the backup Routing Engine following a software process failure by including the failover other-routing-engine statement at the [edit system processes process-name] hierarchy level:
[edit system processes] <process-name> failover other-routing-engine;
4. The Routing Engine mastership can be manually switched using the following CLI commands:
request chassis routing-engine master acquire on backup Routing Engine request chassis routing-engine master release on primary Routing Engine request chassis routing-engine master switch on either primary or backup Routing Engines
122
It is important to note that graceful Routing Engine switchover only offers Routing Engine redundancy, not router level redundancy. Traffic flows through the router for a short interval during the Routing Engine switchover. However, the traffic is dropped as soon as any of the routing protocol timers expire and the neighbor relationship with the upstream router ends. To avoid this situation, operators must apply graceful Routing Engine switchover in conjunction with Graceful Restart (GR) protocol extensions. NOTE Although graceful Routing Engine switchover is available on many other platforms, with respect to the scope of this handbook, graceful Routing Engine switchover is available only on the MX Series and EX8200 platforms. Figure 8.1 shows a primary and backup Routing Engine exchanging keepalive messages.
Keep-Alives
Figure 8.1
For details concerning GR, see the Graceful Restart section on page 126.
Graceful Routing Engine switchover can be configured under the edit chassis
[edit chassis redundancy] graceful-switchover;
2. The operational show system switchover command can be used to check the graceful Routing Engine switchover status on the backup Routing Engine:
{backup} chandra@HE-Routing Engine-1-MX480-194> show system switchover Graceful switchover: On Configuration database: Ready Kernel database: Ready state: Steady State
Virtual Chassis
Routing Engines are built into the EX Series chassis. In this case, Routing Engine redundancy can be achieved by connecting and configuring two (or up to ten) EX switches as a part of a virtual chassis. This virtual chassis operates as a single network entity and consists of designated primary and backup switches. Routing Engines on each of these two switches then become the master and backup Routing Engines of the virtual chassis, respectively. The rest of the switches of
123
the virtual chassis assume the role of line cards. The master Routing Engine on the primary switch manages all the other switches that are members of the virtual chassis and has full control of the configuration and processes. It receives and transmits routing information, builds and maintains routing tables, and communicates with interfaces and the forwarding components of the member switches. The backup switch acts as the backup Routing Engine of the virtual chassis and takes over as the master when the primary Routing Engine fails. The virtual chassis uses GRES and NSR to recover from control plane failures. Operators can physically connect individual chassis using either virtual chassis extension cables or 10G/1G Ethernet links. Using graceful Routing Engine switchover on a virtual chassis enables the interface and kernel states to be synchronized between the primary and backup Routing Engines. This allows the switchover between primary and backup Routing Engine to occur with minimal disruption to traffic. The graceful Routing Engine switchover behavior on the virtual chassis is similar to the description in the Graceful Routing Engine Switchover section on page 121. When graceful Routing Engine switchover is not enabled, the line card switches of the virtual chassis initialize to the boot up state before connecting to the backup that takes over as the master when Routing Engine failover occurs. Enabling graceful Routing Engine switchover eliminates the need for the line card switches to re-initialize their state. Instead, they resynchronize their state with the new master Routing Engine thus ensuring minimal disruption to traffic. Some of the resiliency features of a virtual chassis include the following: A software upgrade either succeeds or fails on all or none of the switches belonging to the virtual chassis. A virtual chassis fast failover, a hardware mechanism that automatically reroutes traffic and reduces traffic loss when a link failure occurs. A virtual chassis split and merge that causes the virtual chassis configuration to split into two separate virtual chassis when member switches fail or are removed. Figure 8.2 shows a virtual chassis that consists of three EX4200 switches: EX-6, EX-7 and EX-8. A virtual chassis cable connects the switches to each other, ensuring that the failure of one link does not cause a virtual chassis split.
Line Card
EX4200
(EX-6)
Backup
EX4200
(EX-7)
Primary
EX4200
(EX-8)
Figure 8.2
124
The show virtual-chassis CLI command provides a status of a virtual chassis that has a master and backup switch and line card. There are three EX4200 switches connected and configured to form a virtual chassis. Each switch has a member ID and sees the other two switches as its neighbors when the virtual chassis is fully functioning. The master and backup switches are assigned the same priority (130) to ensure a non-revertive behavior after the master recovers.
show virtual-chassis Virtual Chassis ID: 555c.afba.0405 Member ID Status 0 (FPC 0) Prsnt
Mastership Neighbor List Serial No Model priority Role ID Interface BQ0208376936 ex4200-48p 128 Linecard 1 vcp-0 2 vcp-1 1 (FPC 1) Prsnt BQ0208376979 ex4200-48p 130 Backup 2 vcp-0 0 vcp-1 2 (FPC 2) Prsnt BQ0208376919 ex4200-48p 130 Master* 0 vcp-0 1 vcp-1 Member ID for next new member: 0 (FPC 0)
Use the following operational CLI command to define the 10/1G Ethernet ports that are used only for virtual chassis inter-member connectivity. request virtual-chassis vc-port set pic-slot 1 port 0 or request virtual-chassis vcport set pic-slot 1 port 1
125
Juniper Networks recommends enabling NSR in conjunction with graceful Routing Engine switchover because this maintains the forwarding plane information during the switchover. State information for a protocol that is not supported by NSR is the primary Routing Engine. State information must be refreshed using the normal recovery mechanism inherent to the protocol. Automatic route distinguishers for multicast can be enabled simultaneously with NSR. It is not necessary to start the primary and backup Routing Engines at the same time. Activating a backup Routing Engine at any time automatically synchronizes the primary Routing Engine. For further details, refer to the Junos High Availability Guide for the latest Junos software version at www.juniper.net/techpubs/en_US/junos10.1/informationproducts/topic-collections/swconfig-high-availability/noframescollapsedTOC.html. Configuring Nonstop Active Routing 1. Enable graceful Routing Engine switchover under the chassis stanza.
[edit chassis redundancy] graceful-switchover;
3. When operators enable NSR, they must synchronize configuration changes on both Routing Engines.
[edit system] commit synchronize;
4. A switchover to the backup Routing Engine must occur when the routing protocol process (rpd) fails three times consecutively, in rapid intervals. For this to occur, the following statement must be included.
[edit system processes routing failover] routing failover other-routing-engine;
5. Operators must add the following command to achieve synchronization between the Routing Engines after configuration changes.
[edit system] commit synchronize
6. Operators can use the following operational command to verify if NSR is enabled and active.
show task replication
126
Nonstop Bridging
Nonstop Bridging (NSB) enables a switchover between the primary and backup Routing Engines without losing Layer 2 Control Protocol (L2CP) information. NSB is similar to NSR in that it preserves interface and kernel information. The difference is that NSB saves the Layer 2 control information by running a Layer 2 Control Protocol process (l2cpd) on the backup Routing Engine. For NSB to function, operators must enable Graceful Routing Engine switchover. The following Layer 2 control protocols support NSB: Spanning Tree Protocol (STP) Rapid STP (RSTP) Multiple STP (MSTP) Configuring Nonstop Bridging 1. Enable graceful Routing Engine switchover under the chassis stanza.
[edit chassis redundancy] graceful-switchover; Explicitly enable NSB [edit protocols layer2-control] nonstop-bridging;
NOTE
It is not necessary to start the primary and backup Routing Engines at the same time. Implementing a backup Routing Engine at any time automatically synchronizes with the primary Routing Engine when NSB is enabled.
Graceful Restart
A service disruption necessitates routing protocols on a router to recalculate peering relationships, protocol specific information and routing databases. Disruptions due to an unprotected restart of a router can cause route flapping, greater protocol reconvergence times or forwarding delays, ultimately resulting in dropped packets. However, Graceful Restart (GR) alleviates this situation, acting as an extension to the routing protocols. A router with GR extensions can be defined either in a role of restarting or helper. These extensions provide the neighboring routers with the status of a router when a failure occurs. Consider a router on which a failure has occurred, the GR extensions signal the neighboring routers that a restart is occurring. This prevents the neighbors from sending out network updates to the router for the duration of the graceful restart wait interval. A router with GR enabled must negotiate the GR support with its neighbors at the start of a routing session. The primary advantages of GR are uninterrupted packet forwarding and temporary suppression of all routing protocol updates.
127
NOTE
A helper router undergoing Routing Engine switchover drops the GR wait state that it may be in and propagates the adjacencys state change to the network. GR support is available for routing/MPLS related protocols and Layer 2 or Layer 3 VPNs. See Table-B.3 in Appendix B of this handbook for a list of GR protocols supported on the MX and EX Series platforms. Configuring Graceful Restart 1. Enable GR either at global or at specific protocol levels. When configuring on a global level, operators must use the routing-options hierarchy. The restart duration specifies the duration of the GR period.
MORE
NOTE
The GR helper mode is enabled by default even though GR may not be enabled. If necessary, the GR helper mode can be disabled on a per-protocol basis. If GR is enabled globally, it can be disabled only if required for each individual protocol.
edit routing-options] graceful-restart restart-duration
128
An ISSU can be performed in one of the following ways: Upgrading and rebooting both Routing Engines automatically Both Routing Engines are upgraded to the newer version of software and then rebooted automatically. Upgrading both Routing Engines and then manually rebooting the new backup Routing Engine The original backup Routing Engine is rebooted first after the upgrade to become the new primary Routing Engine. Following this, the original primary Routing Engine must be rebooted manually for the new software to take effect. The original primary Routing Engine then becomes the backup Routing Engine. Upgrading and rebooting only one Routing Engine In this case, the original backup Routing Engine is upgraded and rebooted and becomes the new primary Routing Engine. The former primary Routing Engine must be upgraded and rebooted manually. MORE For more details when performing an ISSU using the above-listed methods, see Appendix A of this handbook. Verifying Conditions and Tasks Prior to ISSU Operation 1. Verify that the primary and backup Routing Engines are running the same software version using the show version invoke-on all-routing-engines CLI command:
{master} chandra@MX480-131-0> show version invoke-on all-routing-engines re0: ------------------------------------------------------------------------Hostname: MX480-131-0 Model: mx480 JUNOS Base OS boot [10.0R1.8] JUNOS Base OS Software Suite [10.0R1.8] JUNOS Kernel Software Suite [10.0R1.8] JUNOS Crypto Software Suite [10.0R1.8] JUNOS Packet Forwarding Engine Support (M/T Common) [10.0R1.8] JUNOS Packet Forwarding Engine Support (MX Common) [10.0R1.8] JUNOS Online Documentation [10.0R1.8] JUNOS Voice Services Container package [10.0R1.8] JUNOS Border Gateway Function package [10.0R1.8] JUNOS Services AACL Container package [10.0R1.8] JUNOS Services LL-PDF Container package [10.0R1.8] JUNOS Services Stateful Firewall [10.0R1.8] JUNOS AppId Services [10.0R1.8] JUNOS IDP Services [10.0R1.8] JUNOS Routing Software Suite [10.0R1.8] re1: ------------------------------------------------------------------------Hostname: MX480-131-1 Model: mx480 JUNOS Base OS boot [10.0R1.8] JUNOS Base OS Software Suite [10.0R1.8] JUNOS Kernel Software Suite [10.0R1.8] JUNOS Crypto Software Suite [10.0R1.8]
129
JUNOS Packet Forwarding Engine Support (M/T Common) [10.0R1.8] JUNOS Packet Forwarding Engine Support (MX Common) [10.0R1.8] JUNOS Online Documentation [10.0R1.8] JUNOS Voice Services Container package [10.0R1.8] JUNOS Border Gateway Function package [10.0R1.8] JUNOS Services AACL Container package [10.0R1.8] JUNOS Services LL-PDF Container package [10.0R1.8JUNOS Services Stateful Firewall [10.0R1.8] JUNOS AppId Services [10.0R1.8] JUNOS IDP Services [10.0R1.8] JUNOS Routing Software Suite [10.0R1.8]
2. Verify that graceful Routing Engine switchover and NSR are enabled using the show system switchover and show task replication commands. 3. BFD timer negotiation can be disabled explicitly during the ISSU activity using the [edit protocols bfd] hierarchy:
[edit protocols bfd] no-issu-timer-negotiation;
4. Perform a software backup on each Routing Engine using the request system snapshot CLI command:
{master} chandra@MX480-131-0> request system snapshot Verifying compatibility of destination media partitions... Running newfs (899MB) on hard-disk media / partition (ad2s1a)... Running newfs (99MB) on hard-disk media /config partition (ad2s1e)... Copying /dev/ad0s1a to /dev/ad2s1a .. (this may take a few minutes) Copying /dev/ad0s1e to /dev/ad2s1e .. (this may take a few minutes) The following filesystems were archived: / /config
Verifying a Unified ISSU Execute the show chassis in-service-upgrade command on the primary Routing Engine to verify the status of FPCs and their corresponding PICs after the most recent ISSU activity.
130
131
EX8200 - 1
Virtual Address 172.1.1.10/16
EX4200 - 0
EX8200 - 2
VRRP
For VRRP configuration details, refer to the Junos High Availability Guide at www.juniper.net/techpubs/software/junos/junos90/swconfig-highavailability/high-availability-overview.html. VRRP Configuration Diagram Figure 8.4 shows a sample VRRP network scenario. In this scenario, two EX4200 devices (EX4200-A and EX4200-B) are configured as part of a VRRP group.
NOTE
Although this VRRP sample scenario uses EX4200 devices, it is possible to configure other combinations of VRRP groups consisting of devices such as: EX8200 EX4200 EX8200 MX480 MX480 MX480 EX8200 EX8200 Figure 8.4 shows devices EX8200-A and EX8200-B, MX480-A and MX480-B to illustrate the choices of different platforms when configuring VRRP in the network.
132
VRRP Conguration Options: MX480-A MX480-B MX480-A EX4200-B MX480-A EX8200-B EX8200-A EX8200-B EX8200-A EX4200-B 11.22.5.1/24 ge-5/3/5
MX480
MX480-A EX4200-A
MX480-B EX4200-B
11.22.1.11/24 ge-0/0/11
11.22.1.31/24 ge-0/0/31
Primary Path
Backup Path
Trunk Port 18
Trunk Port 19
EX8200-A
MM1 MM2
EX8200-B
Eth1 SoL Eth0 IBM Blade Center connected via Cisco ESM/IBM Power 5/ Power6/IBM x3500 servers
Figure 8.4
The virtual address assigned to the EX4200 group discussed here is 11.22.1.1. The two devices and the IBM Blade servers physically connect on the same broadcast domain. EX4200-A is elected as the primary and so the path between the servers to EX4200-A through the Cisco ESM is the primary preferred path. The link between the Cisco ESM and EX4200-B is the backup path. NOTE Ciscos ESM included in the IBM Blade Center is a Layer 2 switch that does not support VRRP, but it serves as an access network layer switch connected to routers that use VRRP. Other switch modules for the IBM Blade Center support Layer3 functionality but are out of the scope of this book.
133
Configuring VRRP To configure VRRP on the sample network perform the following steps: 1. Create two trunk ports on Ciscos ESM. Assign an internal eth0 port on Blade[x] to same network as VRRP, for example 11.22.1.x.
2. Add a router with a Layer 3 address that is reachable from the 11.22.1.x network on the blade center. In this case, the MX480 acts as a Layer 3 router that connects to both EX4200-A and EX4200-B through the 11.22.2.x and11.22.3.x networks, respectively. 3. This Layer 3 MX480 router also terminates the 11.22.5.X network via interface ge-5/3/5 with family inet address 11.22.5.1. 4. Verify that this address is reachable from the blade server by configuring the default gateway to be either 11.22.1.11(ge-0/0/11) or 11.22.1.31 (ge-0/0/31). 5. Configure VRRP between the two interfaces ge-0/0/11 (EX4200-A) and ge-0/0/31 (EX4200-B). The default virtual address (known as vrrp-id) is 11.22.1.1 with ge-0/0/11 on EX4200-A set to have a higher priority. Verify operation on the sample network by performing the following steps. 1. Reconfigure the default route on 11.22.1.60 (blade server) to 11.22.1.1 (vrrp router id).
2. Confirm that 11.22.5.1 is reachable from 11.22.1.60 and vice-versa .Perform a traceroute to ensure that the next hop is 11.22.1.11 on EX4200-A. 3. Either lower the priority on EX4200-A or administratively disable the interface ge-0/0/11 to simulate an outage of EX4200-A. 4. Confirm that pings from 11.22.1.60 to 11.22.5.1 are still working but use the backup path to EX4200-B. 5. Perform a traceroute to confirm that the backup path is being used. NOTE The traceroute command can be used for confirmation in both directions to and from the BladeCenter. VRRP Configuration Snippet The VRRP configuration snippet shows the minimum configuration required on the EX Series to enable a VRRP group.
// Configure the interface ge-0/0/31 on EX4200-B with an IP address of 11.22.1.31/24 on the logical unit 0. // Define a VRRP group with a virtual IP of 11.22.1.1 and priority of 243. show configuration interfaces ge-0/0/31 unit 0 { family inet { address 11.22.1.31/24 { vrrp-group 1 { virtual-address 11.22.1.1; priority 243; preempt { hold-time 0; } accept-data; } } } }
134
// Interface ge-0/0/36 to MX480 with an IP of 11.22.2.36/24 show configuration interfaces ge-0/0/36 unit 0 { family inet { address 11.22.2.36/24; } } // Configure the interface ge-0/0/11 on EX4200-A with an IP address of 11.22.1.11/24 on the logical unit 0. // Define a VRRP group with a virtual IP of 11.22.1.1 and priority of 240. show configuration interfaces ge-0/0/11 unit 0 { family inet { address 11.22.1.11/24 { vrrp-group 1 { virtual-address 11.22.1.1; priority 240; preempt { hold-time 0; } accept-data; } } } }
VRRP Configuration Hierarchy for IPv4 This section shows that VRRP statements can be included at the interface hierarchy level.
[edit interfaces interface-name unit <unit-number> family inet address address] vrrp-group group-id { (accept-data | no-accept-data); advertise-interval seconds; authentication-key key; authentication-type authentication; fast-interval milliseconds; (preempt | no-preempt) { hold-time seconds; } priority number; track { interface interface-name { priority-cost priority; bandwidth-threshold bits-per-second { priority-cost priority; } } priority-hold-time seconds; route prefix routing-instance instance-name { priority-cost priority; } } virtual-address [ addresses ]; }
135
Configuring VRRP for IPv6 (MX Series Platform Only) As mentioned earlier, operators can configure VRRP for IPv6 on the MX platform. To configure VRRP for IPv6, include the following statements at this hierarchy level:
[edit interfaces interface-name unit <-unit-number> family inet6 address address] vrrp-inet6-group group-id { (accept-data | no-accept-data); fast-interval milliseconds; inet6-advertise-interval seconds; (preempt | no-preempt) { hold-time seconds; } priority number; track { interface interface-name { priority-cost priority; bandwidth-threshold bits-per-second { priority-cost priority; } } priority-hold-time seconds; route prefix routing-instance instance-name { priority-cost priority; } } virtual-inet6-address [ addresses ]; virtual-link-local-address ipv6-address }
Link Aggregation
Link Aggregation (LAG) is a feature that aggregates two or more physical Ethernet links into one logical link to obtain higher bandwidth and to provide redundancy. LAG provides high link availability and capacity which results in improved performance and availability. Traffic is balanced across all links that are members of an aggregated bundle. The failure of a member link does not cause traffic disruption. Instead, because there are multiple member links, traffic continues over active links. LAG is an 802.3ad standard that can be used in conjunction with Link Aggregation Control Protocol (LACP). Using LACP, multiple physical ports can be bundled together to form a logical channel. Enabling LACP on two peers that participate in a LAG group enables them to exchange LACP packets and negotiate the automatic bundling of links. NOTE LAG can be enabled on interfaces spread across multiple chassis; this is known as Multichassis LAG (MC-LAG). This means that the member links of a bundle can be configured between multiple chassis instead of only two chassis. Currently, MC-LAG support only exists on the MX platforms.
136
Some points to note with respect to LAG: Ethernet links between two points support LAG. A maximum of 16 Ethernet interfaces can be included within a LAG on the MX Series Platforms. The LAG can consist of interfaces that reside on different Flexible PIC Concentrators (FPC) cards in the same MX chassis. However, these interface links must be of the same type. The EX Series Platforms supports a maximum of 8 Ethernet interfaces in a LAG. In case of an EX4200 based virtual chassis, the interfaces that belong to a LAG can be on different switch members of the virtual chassis. Link Aggregation Configuration Diagram Figure 8.5 shows a sample link aggregation and load balancing setup. In this configuration, LAG is enabled on the interfaces between the MX480 and Ciscos ESM switch on the IBM Blade Center, thus bundling the physical connections into one logical link.
LAG EX and MX Series DUT 17 18 ge-5/0/1 Aggregated Ethernet Etherchannel Cisco ESM IBM Blade
N2X
ge-5/0/5 ge-304/4
MX480
The EX8200 or any of the MX Series devices can be used instead of the MX480, as shown in Figure 8.5. Link Aggregation Configuration Hierarchy This section describes the different steps involved in configuring and verifying LAG on the test network. A physical interface can be associated with an aggregated Ethernet interface on the EX and MX Series Platforms. Enable the aggregated link as follows: 1. At [edit chassis] hierarchy level, configure the maximum number of aggregated-devices available on system:
aggregated-devices { ethernet { device-count X; } }
137
NOTE
Here X refers to the number of aggregated interfaces (0-127). 2. At [edit interfaces interface-name] hierarchy level, include the 802.3ad statement:
[edit interfaces interface-name (fastether-options | gigether-options)] 802.3ad aeX;
3. A statement defining aeX also must be included at the [edit interfaces] hierarchy level. 4. Some of the physical properties that specifically apply to aggregated Ethernet interfaces also can be configured:
chandra@HE-Routing Engine-1-MX480> show configuration interfaces aeX aggregated-ether-options { minimum-links 1; link-speed 1g; lacp { active; periodic fast; } } unit 0 { family bridge { interface-mode trunk; vlan-id-list 1122; } }
An aggregated Ethernet interface can be deleted from the configuration by issuing the delete interfaces aex command at the [edit] hierarchy level in configuration mode.
[edit] user@host# delete interfaces aeX
NOTE
When an aggregated Ethernet interface is deleted from the configuration, Junos removes the configuration statements related to aeX and sets this interface to the DOWN state. However, the aggregated Ethernet interface is not deleted until the chassis aggregated-devices ethernet device-count configuration statement is deleted. Forwarding Options in LAG (MX 480 only) By default, hash-key algorithms use the interface as the default parameter to generate hash-keys for load distribution. Forwarding options must be configured to achieve load balancing based on source and destination IP; source and destination MAC or any other combination of Layer 3 or Layer 4 parameters.
NOTE
Although EX Series Platforms can also perform hash-key based load balancing as of release 9.6R1.13, they do not have the flexibility to configure the criteria for hashing.
hash-key { family multiservice { source-mac; destination-mac;
138
Link Failover Scenarios - LAG with LACP and NSR Link failover between members of LAG on MX480 can occur in conjunction with different combinations of LACP and NSR. There are various failure scenarios such as Routing Engine/ FPC/ Switch fabric failover, system upgrade with and without ISSU possible for each of the LACP/NSR combinations. The different LACP/NSR combinations on the MX480 include the following: LACP Enabled, NSR Enabled LACP Enabled, NSR Disabled LACP Disabled, NSR Enabled LACP Disabled, NSR Disabled Table B.3 and Table B.4 in Appendix B of this handbook provide detailed LAG testing results based on the scenarios listed above.
139
The salient test results, listed in Appendix B are as follows: Enabling LACP provided seamless recovery from Routing Engine failover on the MX480. The Routing Engine took approximately 20 seconds to recover from a failure with LACP disabled as opposed to no disruption when it was enabled. FPCs with only one LAG interface recovered more quickly (in 1.5 seconds) than FPCs with two interfaces (approximately 55 seconds). The switch fabric recovered immediately after a failure in all the scenarios. A similar validation was performed using the EX4200 instead of the MX480. In this case, enabling or disabling the LACP did not make a difference. The following scenarios were validated: - - - - - MORE Routing Engine Failover FPC Failover (two LAG links and an interface to the traffic generator) Switch Fabric Failover System Upgrade (without ISSU or graceful Routing Engine switchover) System Upgrade (without ISSU, with graceful Routing Engine switchover)
Table C.1 and Table C.4 in Appendix C of this handbook provide detailed LAG test results using the EX4200 and MX480.
140
Switch 1 Active
EX Series
Backup
Switch 2
Figure 8.6
Figure 8.7 shows an EX Series switch that has two links to Switch1. RTG is configured on the EX Series switch so that one of the links to Switch1 is active and performs traffic forwarding while the other link acts as the backup. The backup link starts forwarding traffic to Switch1 when the active link fails. NOTE In this scenario, it may be more efficient in terms of bandwidth and availability to use LAG instead of RTG. LAG provides better use of bandwidth and faster recovery because there is no flushing and relearning of MAC addresses.
Active
EX Series
Backup
Switch 1
Figure 8.7
Based on these two scenarios, RTG can be used to control the flow of traffic over links from a single switch to multiple destination switches while providing link redundancy. This feature is enabled on a physical interface and is similar specifically to STP. However, RTG and STP are mutually exclusive on a physical port. Junos does not permit the same interface to be a part of both RTG and STP simultaneously. The significance of RTG is local and not network wide since decisions are made locally on the switch. Typically, RTG is implemented on an access switch device or on a virtual chassis that is connected to two or more devices that do not operate as a virtual chassis, multi-chassis or use STP. It is configured between the access and core layers in a two-tier data center architecture or between the access and aggregation layers in a three-tier model. There can be a maximum of 16 RTGs in a standalone switch or in a virtual chassis. Both RTG active and backup links must be members of the same VLANs. NOTE Junos does not allow the configuration to take effect if there is a mismatch of VLAN IDs between the links belonging to a RTG.
141
Figure 8.8 shows a sample two-tier architecture with RTG and LAG enabled between the access-core layers and access-to-server layers. The core consists of two MX Series devices: MX480-A and MX480-B. Two EX4200 based virtual chassis (EX4200 VC-A, EX4200 VC-B) and EX8200s-A and B form the access layer. There are connections from each of the access layer devices to MX480-A and B, respectively.
MX480-A
ae1
ae4 RTG
EX4200
(EX-1)
EX4200
(EX-4)
EX4200
(EX-2)
EX4200
(EX-5)
EX4200
(EX-3)
EX4200
(EX-6)
Figure 8.8
We enable LAG and RTG on these links to ensure redundancy and control traffic flow. We enable LAG on the access devices for links between the following devices: A-ae1 (EX4200 VC-A -> MX480-A) A-ae2 (EX4200 VC-A -> MX480-B) B-ae1 (EX4200 VC-B -> MX480-A) B-ae2 (EX4200 VC-B -> MX480-B) EX-A-ae1 (EX8200-A -> MX480-A) EX-A-ae2 (EX8200-A -> MX480-B) EX-B-ae1 (EX8200-B -> MX480-A) EX-B-ae1 (EX8200-B -> MX480-B) In addition, we configure LAG on the EX8200-A and EX8200-B to provide aggregation on links to the IBM Power VMServers. We enable RTG on the EX4200 VC-A and B so that that links AL-A and Al-B to MX480-A are active and are used to forward traffic. The set of backup links RL-A and RL-B from the virtual chasses to MX480-B take over the traffic forwarding activity when the active link(s) fails.
142
Configuration Details
To configure a redundant trunk link, a RTG first must be created. As stated earlier, RTG can be configured on the access switch that has two links a primary (active) and a secondary (backup) link. The secondary link automatically starts forwarding data traffic when the active link fails. Execute the following commands to configure RTG and to disable RSTP on the EX switches. Define RTG on the LAG interface ae1:
set ethernet-switching-options redundant-trunk-group group DC_RTG interface ae1
143
Appendices
144
Appendices
Configuration Tasks
Uses HMC to allocate the physical NIC to partition. The adapter configuration in the partition depends on the OS, including RHEL, SUSE and AIX. Uses HMC to allocate the virtual Ethernet Adapter to each partition.
IBM PowerVM
The adapter configuration in the partition depends on the OS, including RHEL, SUSE, AIX. Uses HMC to allocate the virtual Ethernet Adapter to each partition.
Host Ethernet Adapter (HEA) Logical Host Ethernet Adapter (LHEA) Shared Ethernet Adapter (SEA) Interfaces in the Ethernet PassThru Module
IBM PowerVM
The adapter configuration in the partition depends on the OS, including RHEL, SUSE, AIX. Uses HMC to allocate the virtual Ethernet Adapter to each partition.
IBM PowerVM
The adapter configuration in the partition depends on the OS, including RHEL, SUSE, AIX. Uses HMC to allocate the interface to VIOS. Uses VIOS commands to configure SEA. Uses Blade Center Management Module (GUI) to allocate the interface to the blade server. Interface configuration in the blade server depends on the OS, including RHEL, SUSE, AIX, Windows. The physical NIC configuration depends on the OS, including RHEL, SUSE, AIX and Windows.
IBM PowerVM
Physical NIC
IBM x3500
NOTE
Some of these commands will change IP address settings immediately, while some of them require a restart of network service. Not all tools will save changes in the configuration database. It means that the changes may not be preserved after server reboot.
NOTE
Appendices
145
In addition, several other commands can be helpful, as listed in Table A.2. Table A.2 Commands
ethtool kudzu ifconfig
The following is a sample ifconfig command to create eth0 interface with a fixed IP address.
# ifconfig eth0.5 192.168.1.100 netmask 255.255.255.0 broadcast 192.168.1.255 up
146
Appendices
Vconfig adds or removes a VLAN interface. When vconfig adds a VLAN interface, a new logical interface will be formed with its base interface name and the VLAN ID. Below is a sample vconfig command to add a VLAN 5 interface on the eth0 interface:
#vconfig add eth0 5 The eht0.5 interface configuration file will be created in /etc/sysconfig/ network-scripts/ifcfg-eth0.5
Service network restart restarts networking. System-config-network launches a GUI-based network administration tool for configuring the interface. Route allows operators to inquire about a routing table or to add a static route. The static route added by the route command is not persistent after a system reboot or network service restart. Netstat allows operators to check network configuration and activity. For instance, netstat I shows interface statistic reports; netstat r shows routing table information. Ping allows operators to check network connectivity. Traceroute allows operators to trace the route packets transmitted from an IP network to a given host. For further details concerning these commands, refer to Red hat Linux Reference Guide at www.redhat.com/docs/manuals/linux/RHL-9-Manual/pdf/ rhl-rg-en-9.pdf.
For further details concerning the SUSE Linux network configuration commands, refer to Novells Command Line Utilities at www.novell.com/documentation/oes/ tcpipenu/?page=/documentation/oes/tcpipenu/data/ajn67vf.html.
Appendices
147
148
Appendices
mkvdev
lsmap
Lists the mapping between virtual adapters and physical resources. For example, use the following lsmap command to list all virtual adapters attached to vhost1:
lsmap vadapter vhost1
chdev
Changes the attribute on the device. For instance, use the following chdev command to enable jumbo frames on the ent1 device:
chdev dev ent0 attr jumbo _ frame=yes
chtcpip
Changes the VIOS TCP/IP setting and parameters. For example, use the following command to change the current network address and mask to the new setting:
chtcpip interface en0 inetaddr 9.1.1.1 netmask 255.255.255.0
lstcpip
Displays the VIOS TCP/IP setting and parameters. For example, use the following command to list the current routing table:
lstcpip routetable
Initiates the OEM installation and setup environment so that users can install and set up software in the traditional way. For example, the oem_setup_env command can place a user in a non-restricted UNIX root shell so that the user can implement the AIX commands to install and set up software and use most of the AIX network commands, including lsdev, rmdev, chdev, netstat, entstat, ping and traceroute.
For further details concerning VIOS network commands, refer to publib.boulder.ibm.com/infocenter/powersys/v3r1m5/index.jsp?topic=/iphcg/ iphcg_network_commands.htm .
Appendices 149
For the details concerning Windows 2003 network commands, refer to Windows 2003 product help at technet.microsoft.com/en-us/library/ cc780339%28WS.10%29.aspx .
150
Appendices
MX480 Link Aggregation Failover Scenarios Routing Engine Failover FPC Failover (FPC with one link of LAG) FPC Failover (FPC with 2 links of LAG and interface to traffic generator) Switch Fabric Failover System Upgrade without ISSU System Upgrade with ISSU (NSR must be enabled)
0 LACP Enabled NSR Enabled 0 1.5 53 (53, 53 ) Immediate (upgrade backup first, and then upgrade the primary) ~20 LACP Disabled NSR Enabled ~ 20 10 ~63 (57, 63, 64) Immediate (upgrade backup first, and then upgrade the primary) ~20 * 0
LACP Enabled
NSR Disabled
1.5
Immediate
LACP Disable
NSR Disabled
~ 20
10
Immediate
EX8200 Link Aggregation Failover Scenarios Routing Engine Failover FPC Failover (FPC with LAG and interface to traffic) Switch Fabric Failover System Upgrade (without ISSU/ without GRES) System Upgrade (without ISSU/with GRES)
Appendices
151
NOTE
Refer to TableB.2 when reviewing the following system upgrade steps. Steps associated with system upgrade (without ISSU/without GRES): 1. Break GRES between the primary and the backup device.
2. Upgrade the backup device. 3. Upgrade the primary device. (Observe the outage in approximate seconds). 4. Re-establish the GRES between the primary and the backup device. Steps associated with System Upgrade (without ISSU/with GRES): 1. Break GRES between the primary and the backup device.
2. Upgrade the backup device. 3. Re-establish the GRES between the primary and the backup device. 4. Reverse the roles between the primary and backup devices (The primary device becomes the backup and the backup device becomes the primary). Ignore the warning about version-mismatch. 5. Break GRES between the primary device and the backup device. 6. Upgrade the backup device. 7. Re-establish the GRES between the primary and the backup device.
1.
Download the software package from the Juniper Networks Support Web site.
3. Verify the current software version on both Routing Engines, using the show
152
Appendices
4. Issue the request system software in-service-upgrade package-name reboot command on the master Routing Engine:
{master} user@host> request system software in-service-upgrade /var/tmp/jinstall9.0-20080114.2-domestic-signed.tgz reboot ISSU: Validating Image PIC 0/3 will be offlined (In-Service-Upgrade not supported) Do you want to continue with these actions being taken ? [yes,no] (no) yes ISSU: Preparing Backup RE Pushing bundle to re1 Checking compatibility with configuration . . . ISSU: Old Master Upgrade Done ISSU: IDLE Shutdown NOW! . . . *** FINAL System shutdown message from root@host *** System going down IMMEDIATELY Connection to host closed.
5. Log in to the router once the new master (formerly backup Routing Engine) is online. Verify that both Routing Engines have been upgraded:
{backup} user@host> show version invoke-on all-routing-engines
6. To make the backup Routing Engine (former master Routing Engine) the primary Routing Engine, issue the following command:
{backup} user@host> request chassis routing-engine master acquire Attempt to become the primary routing engine ? [yes,no] (no) yes Resolving mastership... Complete. The local routing engine becomes the master. {master} user@host>
7.
Issue the request system snapshot command on each of the Routing Engines to back up the system software to the routers hard disk.
Method 2: Upgrading Both Routing Engines and Manually Rebooting the New Backup Routing Engine 1. Issue the request system software in-service-upgrade command.
2. Perform steps 1 through 4 as described in Method 1. 3. Issue the show version invoke-on all-routing-engines command to verify that the new backup Routing Engine (former master) is still running the previous software image, while the new primary Routing Engine (former backup) is running the new software image:
{backup} user@host> show version
4. At this point, a choice between installing newer software or retaining the old version can be made. To retain the older version, execute the request system software delete install command.
Appendices
153
5. To ensure that a newer version of software is activated, reboot the new backup Routing Engine, by issuing the following:
{backup} user@host> request system reboot Reboot the system ? [yes,no] (no) yes Shutdown NOW! . . . System going down IMMEDIATELY Connection to host closed by remote host.
6. Log in to the new backup Routing Engine and verify that both Routing Engines have been upgraded:
{backup} user@host> show version invoke-on all-routing-engines
7.
To make the new backup the primary, issue the following command:
{backup} user@host> request chassis routing-engine master acquire Attempt to become the master routing engine ? [yes,no] (no) yes
8. Issue the request system snapshot command on each of the Routing Engines to back up the system software to the routers hard disk. Method 3: Upgrading and Rebooting Only One Routing Engine Use the request system software in-service-upgrade package-name no-oldmaster-upgrade command on the master Routing Engine. 1. Request an ISSU upgrade:
{master} user@host> request system software in-service-upgrade /var/tmp/jinstall-9.0-20080116.2-domestic-signed.tgz no-old-masterupgrade
2. To install the new software version on the new backup Routing Engine, issue the request system software add command. Troubleshooting Unified ISSU NOTE The following Unified ISSU steps relate only to the Junos 9.6 release. Perform the following steps if the ISSU procedure stops progressing. 1. Execute a request system software abort in-service-upgrade command on the master Routing Engine.
2. To verify that the upgrade has been aborted, check the existing router session for the following message: ISSU: aborted!
154
Appendices
Appendix C: Acronyms
A AFE: Application Front Ends apsd: automatic protection switching process B BPDU: Bridge Protocol Data Unit BSR: Bootstrap Router C CBT: Core Based Tree CIST: Common Instance Spanning Tree CLI: Command Line Interface CoS: class of service D dcd: device control process DDoS: Distributed Denial of Service DHCP: Dynamic Host control Protocol DNS: Domain Name System DSCP: Diffserv Code Points DUT: Device Under Test DVMRP: Distance Vector Multicast Routing Protocol E F ESM: Ethernet Switch Module, Embedded Syslog Manager FC: Fibre Channel FCS: frame check sequence FPC: Flexible PIC Concentrator FSP: Flexible Service Processor G GRES: Graceful Route Engine Switchover GSL: global server load balancing H HBA: Host Bus Adapter HEA: Host Ethernet Adapter HMC: Hardware Management Console I
Appendices
155
IDP: Intrusion Detection and Prevention IGMP: Internet Group Management Protocol ISCSI: Internet Small Computer System Interface iSSU: In Service Software Upgrade IVE: Instant Virtual Extranet IVM: Integrated Virtualization Manager L LAG: Link Aggregation LDAP: Lightweight Directory Access Protocol LPAR: Logical Partitions LHEA: Logical Host Ethernet Adapter M MAC: Media Access Control MCS: Multi Core Scaling mgd: management process MLD: Multicast Listener Discovery MM: Management Module MOSPF: Multicast Open Shortest Path First MSTI: Multiple Spanning Tree Instance MSDP: Multicast Source Discovery Protocol MSTP: Multiple Spanning Tree Protocol MTA: mail transfer agent MTTR: mean time to repair MTU: Maximum Transmission Unit N NAT: Network Address Translation NIC: Network Interface Card NIST: National Institute of Science and Technology NPU: network processing unit NSB: Nonstop Bridging NSR: nonstop active routing O
156
Appendices
OEM: Original Equipment Manufacturer OSS: operation support systems P PDM: Power Distribution Module PIC: Physical Interface Card PIM: Protocol Independent Multicast PLP: packet loss priority PM: Pass: through Module PoE: Power over Ethernet PVST: Per-VLAN Spanning Tree Q R QoS: Quality of Service RED: random early detection ROI: return on investment RP: rendezvous point RPC: remote procedure call rpd: routing protocol process RTG: Redundant Trunk Group RSTP: Rapid Spanning Tree Protocol RVI: routed VLAN interface S SAN: storage area network SAP: Session Announcement Protocol SCB: Switch Control Board SDP: Session Description Protocol SEA: Shared Ethernet Adapter SMT: Simultaneous Multithreading SNMP: Simple Network Management Protocol snmpd: simple network management protocol process SOA: Service Oriented Architecture SOL: Serial over LAN SPOF: single point of failure
Appendices
157
STP: Spanning Tree Protocol SSH: source-specific multicast SSL: Secure Sockets Layer SSM: source: specific multicast Syslogd: system logging process T V TWAMP: Two-Way Active Measurement Protocol VID: VLAN Identifier (IEEE 802.1q) VIOS: Virtual I/O Server VLAN: Virtual LAN VLC: VideoLAN VPLS: virtual private LAN service VRF: Virtual Routing and Forwarding VRRP: Virtual Router Redundancy Protocol VSTP: Virtual Spanning Tree Protocol W WPAR: Workload based Partitioning
158
Appendices
Appendix D: References
www.juniper.net/techpubs/software/junos/junos90/swconfig-highavailability/swconfig-high-availability.pdf The Junos High Availability Configuration Guide, Release 9.0 presents an overview of high availability concepts and techniques. By understanding the redundancy features of Juniper Networks routing platforms and the Junos software, a network administrator can enhance the reliability of a network and deliver highly available services to customers. IEEE 802.3ad link aggregation standard STP - IEEE 802.1D 1998 specification RSTP - IEEE 802.1D-2004 specification MSTP - IEEE 802.1Q-2003 specification www.nettedautomation.com/standardization/IEEE_802/standards_802/ Summary_1999_11.html Provides access to the IEEE 802 Organization website with links to all 802 standards. RFC 3768, Virtual Router Redundancy Protocol https://datatracker.ietf.org/wg/vrrp/ Provides access to all RFCs associated with the Virtual Router Redundancy Protocol (VRRP). RFC 2338, Virtual Router Redundancy Protocol for IPv6 https://datatracker.ietf.org/doc/draft-ietf-vrrp-ipv6-spec/ Provides access to the abstract that defines VRRP for IPv6.
A MuST-READ, pRACTICAl GuIDE fOR IT pROfESSIONAlS, NETWORK ARCHITECTS AND ENGINEERS, WHO WISH TO DESIGN AND IMplEMENT A HIGH pERfORMANCE DATA CENTER INfRASTRuCTuRE. THIS BOOK pROVIDES A STEp-BY-STEp AppROACH, WITH VAlIDATED SOluTION SCENARIOS fOR INTEGRATING IBM OpEN SYSTEM SERVERS AND JuNIpER NETWORKS DATA CENTER NETWORK, INCluDING TECHNICAl CONCEpTS AND SAMplE CONfIGuRATIONS. Scott Stevens, VP Technology, Worldwide Systems Engineering, Juniper Networks
THIS BOOK IS A VAluABlE RESOuRCE fOR ANYONE INTERESTED IN DESIGNING NETWORK INfRASTRuCTuRE fOR NExT GENERATION DATA CENTERS...IT pROVIDES ClEAR, EASY TO uNDERSTAND DESCRIpTIONS Of THE uNIquE REquIREMENTS fOR DATA COMMuNICATION IN AN IBM OpEN SYSTEMS ENVIRONMENT. HIGHlY RECOMMENDED! Dr. Casimer DeCusatis, IBM Distinguished Engineer
7100125-001-EN
June 2010