Sei sulla pagina 1di 234

Table of Contents

Concepts and Components of NSX for vSphere platform

10

NSX for vSphere Networking and Security Core Management and Control Components

11

Management Plane - NSX f or vSphere Manager

12

Control Plane – NSX for vSphere Controller

13

Data Plane

14

NSX for vSphere Networking and Security Logical Switch Networks

15

NSX for vSphere Networking and Security Dynamic Routing Capabilities using the NSX for vSphere Edge Appliance

18

OSPF Protocol

18

ISIS Protocol

19

BGP Proto col

19

Distributed

Logical

Router

20

Distributed Logical Router: Logical View

21

NSX for vSphere Networking and Security Features of the VMware NSX Edge Services Gate way

 

23

Network Address Translation (NAT)

23

Load Balancing

23

High Availability

24

Virtual Private Networking

24

Layer 2 Bridging

25

NSX for vSphere Security

25

NSX for vSphere Edge Firewall

26

NSX

for

vSphere

Distributed Firewall

26

Service Composer

27

Introduction to the NSX Networking and Security Architecture

29

Overview of VMware’s NSX® Networking and Security Components

30

High Level Overview

30

The NSX for vSphere Platform Solution

30

NSX

for

vSphere

Manger

31

NSX for vSphere Edge Appliance

33

NSX for vSphere Security Features

35

NSX for vSphere Edge Services Gateway VPN Services

36

NSX for vSphere SSL VPN - Plus Services

44

NSX for vSphere Edge Services Gateway SSL VPN - Plus Secure Management Acc ess Server

46

L2 VPN and Stretched Logical Networks

46

NSX for vSphere Edge Services Gateway L2 VPN Use Case

47

NSX for vSphere Edge Services Gateway L2 VPN Topology

48

NSX for vSphere Edge Services Gateway Firewall

49

Flow Monitoring

53

NSX for vSphere Hardening Guide

53

Deployment of NSX for vSphere

56

Installation of Management Plane

56

Installation of Control plane

59

Installation of Data Plane

63

Edge Services Gateway

90

Scalability and Redundancy NSX Edge

90

Path Determination

91

Failure scenario of Edge Devices

93

Troubleshooting and visibility

94

Useful CLI for Debugging ECMP

97

ECMP Deployment Consideration

99

ECMP and Edge Firewall NSX

100

NSX Edge and DRS Rules

101

Edge NAT

113

SNAT

113

DNAT

114

Firewall rules and SNAT

116

Firewall rules and DNAT

116

DNAT Configurati on

116

SNAT configuration

121

PAT

124

NAT Order

126

Distributed Firewall

128

Micro - segmentation

128

How does the Distributed Firewall Operates?

130

Alignment of slots for the Distributed Firewall

135

What happens in case of VM mobility?

136

2 - Way Traffic Introspection

140

What happens in case of a failure of this DFW functionality?

140

Configuring Fail - Safe for Distributed Firewall

140

Visibility and Pack et Walks

141

Rule Priority

142

Distributed Firewall – Static Objects

146

Distributed Firewall – Dynamic Object

146

Creation of Security Groups

147

Security Policy

149

“Applied To” Field

157

Identity Firewall

160

Create Identity Based Firewall Rules

163

Application Level Gateway (ALG)

165

Backup and Recovery

166

Working with Distributed Firewall API

169

Troubleshooting

173

Integration of NSX for vSphere with Cisco UCS

180

Cisco Unified Computing System Architecture

180

Fabric Interconnect

182

Chassis

183

Fabri c Extender

183

Blade Servers

184

Network Adaptors

185

UCS Manager

186

Running NSX on UCS

186

Hardware Requ irements

187

Software Requirements

187

VLAN and IP Addressing

189

Distributed Logical Router

191

Overview of the Distributed Logical Rou ter (DLR)

191

DLR Interfaces type

193

Logical Interfaces, virtual MAC & Physical MAC

194

DLR Kernel module and ARP table

195

DLR and loc al routing

196

Multiple Route Instances

198

Logical Router Port

199

Routing information Control Plan Update Flow

200

DLR Control VM comm unications

201

DLR High Availability

203

Protocol Address and Forwarding Address

204

DLR Control VM Firewall

205

DLR Validation Scenario s

206

DLR Control Plane validation

206

Validate Routing in DLR

207

NSX - v Controller

210

Verify LIFs exist on the NSX - v Controller

211

ESXi Host

212

Validate the DLR LIF’s are present on the ESXi host

214

Validate the Forwarding table on the ESXi host

215

Validate ARP Entr ies in the ESXi host of the DLR

216

Capture Packets on the VDR Port

218

Dynamic Routing Verification

222

What’s new in NSX for vSphere version 6.2

224

Introduction

224

Cross vCenter Deployments through Egress Optimization

225

Increased Support for Cross vCenter vMotion

227

Introduction of Roles

228

When is egress optimization possible?

229

How is egress optimization possible?

230

NSX for vSphere Networking and Security Universal Objects

230

Applying Firewall rules using Universal Section of Distributed Firewall

233

About the Authors

Prasenjit Sarkar (@stretchcloud) is a Member of the CTO Ambassador Program & Staff Solutions Architect at VMware and part of Professional Services Center of Excellence Team where his primary role is to work directly with Product Development and Field/Partner Sales & Delivery organizations to incubate and help bring to market the next generation of deep technical architectures, solutions and services.

He is working on advanced product and solution incubation development, which is well in advance of market availability. He provides expert technical architectural support and guidance for cloud in the product incubation phase and during the development of emerging solutions and services.

He also provides training on emerging solutions and services offerings for Field Specialists & Consultants and Partner Sales & Delivery organizations.

He has also worked in vCloud Air R&D Team. He has an extensive background in designing and implementing cloud solutions. He holds several certifications including VCP3/4/5, VCAP-DCA, VCAP-DCD, VCAP-CIA, VCP-NV, VCIX-NV. He has been awarded the VMware vExpert award for 4 years. He is also the author of the blog http://stretch-cloud.info and Author of 4 other books including one as Amazon Best Seller. He is also part of many inventions and research papers and have 1 Granted and 11 pending patents on his name.

I would like to thank and dedicate this book to my mom and dad. Without their endless and untiring support, this book would not have been possible.

Michael Haines (@michaelahaines) is a Senior Member of Technical Staff who specializes in Cloud Networking and Security at VMware where his primary role is to architect and implement Software-Defined Data Center (SDDC) cloud, networking and security solutions. Michael also has extensive knowledge of VMware's SDKs and APIs, and in particular the NSX for vSphere API, vCloud Networking and Security API and vCloud API.

Michael also authored and co-authored various books and technical papers (over 25), including the NSX for vSphere Hardening Guide, vCloud Director Security Hardening Guide, VMware vCloud Architecture Toolkit (vCAT I and II), and was also the co-author of the first VMware Cloud published book, called Cloud Computing with VMware vCloud Director which provides use cases, design considerations, and technology guidance to answer your questions about cloud computing.

In addition to this, Michael is also a member of the DMTF's Open Cloud Standards incubator, which develops a suite of DMTF informational specifications that deliver architectural semantics to unify the interoperable management of enterprise computing and cloud computing. These DMTF specifications primarily include the Open Virtualization Format (OVF), the vCloud API and Cloud Security. Michael was also a member of the IETF, Internet Engineering Task Force and participated most noticeably in the LDAPv3 Working Group.

Prior to his current role at VMware, Michael was the Chief Architect of the Data Center Optimization and Virtualization Solutions team within the Sun Microsystems Solutions Development Group. His primary role within this group was as the Lead Architect and Engineer. Michael focused on the Data Center infrastructure architecture design, development and implementation of these solutions. In addition to being the Lead Data Center Architect and Engineer, a large part of this role also involved having a comprehensive and detailed understanding of not just Data Center Solutions, but also Virtualization (Sun/Oracle and VMware), Cloud Computing, Cloud Security, Network Security, Solaris and OpenSolaris (including internals) Security, Naming Services and Identity Management.

Michael holds certifications from VMware, Oracle, and Fortinet (Network Security Expert)

and blogs on http://stretch-cloud.info.

Roie Ben Haim (@roie9876) is a PSO Consultant in VMware PSO who specializes

in Networking and Security at VMware and who is currently focused on implementing

solutions, which incorporate VMware’s NSX platform as well as integrating with various Cloud platforms on VMware’s infrastructure.

Roie works in VMware’s Consulting (PSO) team whose focus is on the delivery of Networking Virtualization and Security solutions. In this role Roie provides technical leadership in all aspects, including the installation, configuration, and implementation of VMware’s products and services. This is also includes being involved from the inception of these project, through requirements assessment, design and deployment phases and then into production which ensures continuity for VMware’s customers.

Roie has over a 15 years of experience working on data center technologies, and providing solutions for global enterprises, which primarily focus on Network and Security.

A highly motivated and enthusiastic MSc graduate Roie holds a wide range of industry

leading certificates, including his most recent Network Virtualization (VCDX-NV). Roie is not only a strong team member, but is also able to demonstrate his skills and experience working in various fields.

As a well known and respected blogger, Roie maintains an impressive blog at

http://routetocloud.com

Ajit Sharma is a consultant at VMware Professional Services – India with over 6 years of rich experience in designing, deployment and support of virtualization and cloud based solutions using VMware product stack. His technical experience includes Software Defined Networking Solutions, Cloud Automation and Operation Solutions and Disaster Recovery Solutions. He comes with prior experience which includes vulnerability assessment, malware/spyware analysis, Security Policies assessment and auditing, Network Support & Designs and Wintel Engineering.

Ajit currently leads the Network Virtualization practice across India to ensure delivery of complex network virtualization designs and implementations. He holds some of the industry leading certifications by VMware like VCAP-DCD, VCAP-CID and VCIX-NV. Having served several industries like Telecom, retail, Healthcare, Financial Services etc. He has a proven track record of providing impeccable consulting and advisory services to some of the large customers across India.

Anuj Modi (@vConsultant) is a Unified Computing and Virtualization Consultant for Cloud & Network Service team at Cisco Systems. He works with Cisco Global Clients to strategize, plan and design, implement and deploy a secure, agile and highly automated data center and cloud-computing infrastructure. He has more than 12 years of expertize in Professional Services/Consulting, Post & Pre-Sales for architecting and designing of mission critical applications and virtualization solutions.

He has lot of passion in new technology initiatives on data center and cloud automation & orchestration, SDN, NFV, OpenStack, VMware and Cisco products.

He is an author, reviewer and blogger (anujmodi1.wordpress.com) for various communities on Cisco and VMware and holds certifications from VMware, Microsoft, Citrix and EXIN.

He feels great pleasure in traveling and meets the people across the world. He likes cricket, football, and mountain hiking.

I would like to thank and dedicate my first book to my family.

1

Concepts and Components of NSX for vSphere platform

Server virtualization has been accepted well in our IT industry over the past decade. As a result, we have seen a completely new way of provisioning and managing workloads in the data center. This concept of server virtualization saved businesses from spending billions of dollars.

However, the way these workloads are connected to the network has not been so agile.

The solution that every one is now looking at is to virtualize the network. Because when you virtualize the network layer, it abstracts the physical network and simplifies the provisioning and consumption of networking moving forward. In addition to this, security services are built-in, and do not require purpose-built hardware, and can scale as the network expands.

NSX for vSphere removes the operational barrier the network has become for IT. Programmatic provisioning transforms service delivery times from weeks to a matter of minutes. NSX for vSphere does not change the laws of physics, packets do not move faster through the network. NSX for vSphere transforms the operational model of networking, which combined with compute and storage virtualization delivers never before possible IT speed and agility for the business. This is depicted in this diagram below.

for the business. This is depicted in this diagram below. As the main motto of the

As the main motto of the Network Virtualization remains same, NSX for vSphere can be deployed over any existing physical network. NSX for vSphere reproduces exactly the networking model in software. As a result your existing application workloads operate unmodified and existing network

monitoring and troubleshooting tools view and process virtual network traffic just as if they would in the physical network.

This chapter will describe at a high-level the concepts and components of the NSX for vSphere Networking and Security platform.

List of topics that will be covered in the chapter:

Discuss the NSX for vSphere Networking and Security Core Management and Control Components: Providing powerful control over the Networking and Security functionality

Discuss the NSX for vSphere Networking and Security Logical Switch Networks:

Covering layer 2 networking and the basics of configuring, deploying, and using the logical switch networks

Discuss the NSX for vSphere Networking and Security Dynamic Routing Capabilities using the NSX Edge Appliance: Covering the NSX Logical Distributed Router Appliance to establish East-West traffic and North-South traffic.

Discuss the NSX for vSphere Networking and Security Features of the VMware NSX Edge Services Gateway: Cover all the specific components including, NAT (Network address translation) Load balancing, High availability, Virtual private networking, which covers Layer 2 VPN, IPsec VPN, SSL VPN-Plus and VLAN-to-VXLAN bridging

Discuss NSX for vSphere Security: This will cover the security components including Role-based access control, NSX Edge firewall, NSX distributed firewall, NSX data endpoint, Flow Monitoring and Service Composer

NSX for vSphere Networking and Security Core Management and Control Components

NSX for vSphere as a product uses 3 planes to keep the functionalities modularized. It has a Management Plane that is managed by the NSX for vSphere Manager, Control Plane that is managed by the NSX for vSphere Controller/Controllers Cluster and Data Plane, which runs on the ESXi hosts. This model of modularity just gives us a much greater deal of control and which allows us the minimal effect on the functions of the other planes. This is depicted in the diagram below.

Let us describe each section of the above diagram. Management Plane - NSX for vSphere

Let us describe each section of the above diagram.

Management Plane - NSX for vSphere Manager

The NSX for vSphere Manager communicates directly and securely with the vCenter Server and is the northbound interface for the NSX for vSphere REST API and for third party applications that integrate with NSX for vSphere, such as Palo Alto Network (PAN), TrendMicro and many more. The NSX for vSphere Controller instances are deployed by the NSX for vSphere Manager instance through requests to the vCenter Server system to deploy the NSX for vSphere Controller virtual machines from OVA files. In the below image we can see that we have a NSX Manager deployed along with just one NSX Controller. However, as a best practice of VMware we should have 3 controllers deployed.

deployed along with just one NSX Controller. However, as a best practice of VMware we should

NSX for vSphere Manager helps configure and manage logical routing services, both East-West routing, which is handled by the Distributed Logical Router, and North-South routing which is handled by the NSX Edge Services router. During the configuration process you have the choice to deploy a Distributed or Centralized Logical Router. If the Distributed Logical Router is selected, the NSX Manager instance deploys the Logical Router Control virtual machine and pushes the logical interface configurations to each host through the controller clusters. In essence, the logical control virtual machine is responsible for managing the routing network interactions, whereby giving the routing table to the NSX for vSphere Manager instance.

In the case of centralized routing, the NSX for vSphere Manager deploys the NSX for vSphere Edge services router virtual machine. The REST API interface for NSX Manager helps automate the deployment and management of these logical routers through a Cloud management platform, such as VMware vCloud Automation Center (vCAC), as well as a using a third-party solution or custom cloud management platform.

The only thing actually installed in the traditional sense is NSX for vSphere Manager. NSX for vSphere Manager handles all the management tasks, including its initial configuration, such as setting up SSO, NTP and other core services. There is a direct correlation of one vCenter Server system to one NSX for vSphere Manager, so if vCloud Automation Center is present with multiple vCenter Server systems, each of those vCenter Server systems has an NSX for vSphere Manager instance.

NSX for vSphere deploys into vSphere Clusters. The NSX for vSphere platform has a few basic requirements. Any server on which you can install ESXi 5.5 can run NSX for vSphere on, connected to any physical network. Multicast over the physical infrastructure is an added benefit but not required. After you deploy the NSX for vSphere Manager, you deploy the NSX for vSphere Controller instances, VIBs, and then configure the virtual networks and security services.

When you install the NSX for vSphere Manager it has the OVA files to deploy the NSX for vSphere Edge gateways, the NSX for vSphere Controller, and the VIBs that get pushed to the ESXi hosts for the distributed switches. The NSX for vSphere Manager also uses the REST API when communicating to and from third party applications such as firewalls from Palo Alto Networks and Checkpoint to name but a few and other networking and security services. The REST API is used extensively by the broader networking and security services ecosystem, and it is primarily this interface that used to integrate with the NSX for vSphere platform.

Control Plane – NSX for vSphere Controller

Before we start explaining the concept of the control plane and NSX for vSphere controller, let us look at various terminologies that we will use through out this section.

Controller Cluster – A set of virtual units operating in unison to perform distributed controller functions.

Controller Node – A single virtual unit of controller cluster member, each node can have one or more role. Each node can be upgraded, powered down and fail independently

Transport Node – A virtual entity outside the controller-performing packet forwarding functions and interacting with controller for logical network topology

The NSX for vSphere controller is responsible for two key functions. Firstly it serves as a control plane for all the forwarding elements (transport nodes) and it exposes REST APIs for the consumption of northbound entities, which could be vCAC or OpenStack for example, or any third party management and monitoring infrastructure. The NSX for vSphere Controller internally

maintains states of the logical network and its mapping to the physical infrastructure. This logical state is then distributed to the respective transport nodes thus allowing VMs (and physical hosts) to see logical networks. Northbound entities can then obtain the states and operational visibility of the entire logical network from the NSX for vSphere controller.

So, in essence the NSX for vSphere Controller provides the control plane to distribute VXLAN and Logical Routing network information to ESXi hosts. The Controller cluster also provides multi-node high-availability, whereby every controller node is Active (as opposed to an Active-Standby model,

providing for both scale out and high availability. It is important to note that say for example you loose one of the controller nodes, then once the controller node comes back online it then re-syncs the forwarding plane entries (flows) without any data plane outage. Also and just as important, is the security of all the control messages between the Controllers and transport nodes, which are encrypted.

In addition to this the Controller also provides an Audit Log facility for security compliance.

Major Problems today that we are facing in terms network virtualization are:

Need to dynamically distribute workloads across all available cluster nodes

Redistribute workloads when a new cluster member is added

Ability to sustain failure of any cluster node with zero impact

Perform all of the above transparent to application

Solution for this problem is using ‘slicing’. The first NSX for vSphere Controller instance deployed requests a password and all future NSX for vSphere Controller instances deployed use this password.

A user then uses this password to SSH into the NSX for vSphere Manager or NSX for vSphere

Controller. The NSX for vSphere Controller must be connected to the same vCenter Server system as NSX for vSphere Manager. It is recommended that the NSX for vSphere Controller instances be deployed in clusters of three nodes, which is currently the supported maximum. The reason is that the NSX for vSphere Controllers can scale with no trouble to the limit of the vCenter Server, which is currently 10k active VMs and 10k logical switches. So, apart from not being supported, at this point there is no value is say adding 5 NSX for vSphere Controllers, and that is why we currently support 3 controllers.

Each NSX for vSphere Controller instance in a cluster must be deployed individually. With NSX for vSphere there is currently a requirement to synchronize time (NTP) between the ESXi hosts and the NSX for vSphere Manager for deployment of the controllers.

Data Plane

The Distributed Switch that is provided by the vSphere platform fundamentally defines the Data plane. The distributed switch only performs layer 2 switching and hosts have to be on the same layer 2 network for virtual machines on each host to communicate with virtual machines on the other host.

NSX for vSphere installs three vSphere Installation Bundles (VIB) to the host to enable the NSX for vSphere functionality. One VIB enables the layer 2 VXLAN functionality and another VIB enables the distributed router, while the final VIB enables the distributed firewall. After adding the VIBs to a distributed switch that distributed switch is then referred to as an NSX Virtual Switch. On an NSX Virtual Switch, the hosts are not restricted to being on the same layer 2 domains for virtual machine to virtual machine communication across hosts.

The NSX for vSphere Edge Services Gateway (ESG) is not distributed so it does not have a control entity. The NSX for vSphere ESG handles control traffic itself.

NSX for vSphere Networking and Security Logical Switch Networks

Historically we have noticed some of the challenges in the traditional network that we wanted to eliminate. Those were:

Multi tenancy in network segment

L2 stretch for VM mobility across clusters/datacenters

Network sprawl in a large L2 environment because of STP

So we wanted to eliminate these using logical switch networks that bring VXLAN into the picture.

Let us look at the various different VXLAN terminologies.

VTEP – A Virtual Tunnel End Point (VTEP) is an entity that encapsulates an Ethernet frame in a VXLAN frame or de-encapsulates a VXLAN frame and forwards the inner Ethernet frame.

VTEP Proxy – A VTEP proxy is a VTEP that forwards VXLAN traffic to its local segment from another VTEP in a remote segment.

Transport Zone – A transport zone defines members or VTEPs of the VXLAN overlay:

o

Can include ESXi hosts from different vSphere clusters

o

A cluster can be part of multiple transport zones

VNI - A VXLAN Number Identifier (VNI) is a 24-bit number that gets added to the VXLAN frame:

o

The VNI uniquely identifies the segment to which the inner Ethernet frame belongs

o

Multiple VNIs can exist in the same transport zone

o

VMware NSX for vSphere starts with VNI 5000

Before we go into some of the high level details of VXLAN, lets take a brief look at VXLAN’s history.

Pre-VXLAN: DVFilter CHF (VCDNI), BFN, Overlay CHF, vTunnel

Introduced in vCloud Networking and Security v5.1 (SPOCK) (SP1) – tech preview for selected service providers in 2012

vCloud Director v5.1 (T2) – optimization in full network stack and platform 2012

2013 release – VXLAN control plane!

o

A separate (reliable and secure) control plane to distribute VXLAN mapping.

o

Suppress broadcast/multicast in VXLAN network:

Replace conventional broadcast / multicast based control plane protocols.

Remove dependency on physical multicast routing / PIM.

o

Discover and publish virtual network information.

o

Broadcast suppression (replacing bcast/mcast CP protocols)

§ ARP (85% broadcast traffic), RARP, DHCP, IGMP, etc.

Optimize VM multicast

Multicast proxy - fully distributed software replication.

Virtual eXtensible Local Area Network (VXLAN) is a network overlay that encapsulates layer 2 traffic within layer 3. VXLAN provides us the following benefits:

It provides us unique capability of managing overlapping addresses between multiple tenants

It supports vMotion of your virtual machines that is independent of the physical network

Unlimited number of virtual networks

It provides us the capability to decouple the network service provided to servers from the technology used in the physical network

It provides us a unique capability to isolate the issues such as MAC table size in physical switches.

VXLAN provides up to 16 million virtual networks in contrast to the 4094 limit of VLAN’s

Application agnostic, all work is performed in the ESXi host.

Let us take a look at the high level physical deployment of VXLAN as the following illustration.

in the ESXi host. Let us take a look at the high level physical deployment of

VXLAN has gone through some modifications in NSX for vSphere. Let us take a look at the table to understand what are the changes it has gone through in both Data Plane and Control Plane.

Data Plane

Control Plane

Support for multiple VXLAN vmknics per host to provide additional options for uplink load balancing

A

highly available and secure control plane to

distribute VXLAN network information to ESXi hosts

DSCP & COS Tag from internal frame copied to external VXLAN encapsulated header

Removes dependency on multicast routing/PIM

in

the physical network

Dedicated TCP/IP stack for VXLAN

Suppress broadcast traffic in VXLAN networks

Ready for VXLAN hardware offloading to network adapters (in future)

 

Unlike vCloud Networking and Security Manager, NSX for vSphere uses three different modes of VXLAN traffic replication.

Multicast

Unicast

Hybrid

Replication mode relates to the handling of broadcast, unknown unicast, and multicast (BUM) traffic. Unicast has no Physical Network requirements apart from the MTU. All traffic is replicated by the VTEPs. In the same VXLAN segment it’s the source VTEP, in remote VXLAN segments the NSX Controller selects a proxy VTEP. Hybrid mode uses IGMP layer 2 multicast to offload local replication to the physical network. Remote replication uses unicast proxies so there is no need for multicast routing. Hybrid is recommended for the majority of deployments. Multicast is seen frequently in upgrade scenarios from vCloud Networking and Security v5.x or environments that already have multicast routing. The following image illustrates these three available options while creating a logical switch.

The logical switch is a distributed port group on the distributed switch. The Logical switch

The logical switch is a distributed port group on the distributed switch. The Logical switch can expand distributed switches by being associated with a port group in each distributed switch. The creation of the port group is performed by the vCenter Server on behalf of the NSX for vSphere Manager. vSphere vMotion is supported but only among those hosts that are part of the same distributed switch.

NSX for vSphere Networking and Security Dynamic Routing Capabilities using the NSX for vSphere Edge Appliance

The distributed routing capability in NSX for vSphere provides an optimized and scalable way of handling East-West traffic of a data center. The NSX for vSphere Edge services router provides the traditional centralized routing support in the NSX for vSphere platform.

The TCP/IP protocol suite offers different routing protocols that provide a router with methods for building valid routes. This module will discuss three routing protocols:

OSPF: Open Shortest Path First is a link state protocol, which uses a link state routing algorithm and is an interior routing protocol.

IS-IS: Intermediate System to Intermediate System determines the best route for datagrams through a packet switched network.

BGP: Border Gateway Protocol is an exterior gateway protocol that is designed to exchange routing information between autonomous systems on the Internet.

OSPF Protocol

OSPF is a link-state protocol meaning each router maintains a database describing the Autonomous System’s topology. When you enable OSPF there are two areas created by default. Area 0 and area 51. Area 51 can be deleted and replaced with a desired area.

By default, OSPF adjacency negotiations happen in clear authentication assuming trust in the segment. If installed in an insecure segment, enabling authentication ensures a third party cannot corrupt the routing table or hijack connection by injecting a compromised default route.

OSPF maintains a link-state database that describes the autonomous system’s topology. Each participating router has an identical database. The router shares this database with routers in the autonomous system by a mechanism knows as flooding. All routers in the autonomous system run the exact same algorithm used to construct the shortest path between itself and the root. This in turn gives each router the route to each destination in the autonomous system. When multiple paths to a destination exist and those paths are equal cost, traffic is distributed equally among those paths.

Sets of networks grouped together are called areas. Areas are a collection of routers, links, and networks that have the same area identification. Each OSPF area can combine with other areas and form a backbone area. Backbone areas combine multiple independent areas into one logical routing domain. This backbone area is given the ID of 0 or (0.0.0.0). The primary responsibility of the backbone area is to distribute routing information between non-backbone areas.

ISIS Protocol

Intermediate System to Intermediate System (IS-IS) is an inter-domain dynamic routing protocol used to support large routing domains. OSPF is designed to support only TCP/IP networks whereas IS-IS started as an ISO protocol. Both protocols are interior gateway protocols (IGP) but IS-IS runs over layer 2 and was intended to support multiple routed protocols.

IS-IS uses a two level hierarchy for managing and scaling large networks. A routing domain is partitioned into areas. Level 1 routers know the topology of their area including all routers and endpoints in their area. Level 1 routers do not know the identity of routers or destinations outside their areas. Level 1 routers forward all traffic outside of their area to a level 2 router in their area.

Level 2 routers know the level 2 area and know which addresses it can reach by contacting other level 2 routers. A level 2 router does not know the topology of a layer 1 area. Level 2 routers can exchange packets or routing information directly with external routers located outside of the routing domain.

Level 1 routers belonging to a level 1 area only form neighbor adjacencies with level 1 routers in the same area and have full visibility of their area. Level 2 routers belonging to a level 2 area can form neighbor adjacencies with any level 2 router, including in other areas and advertise inter-area routes.

Level 1-2 routers belong to both level 1 and level 2 areas at the same time. Similar to OSPF’s area border router, level 1-2 routers can form neighbor adjacencies with any other router in any area. Level 1-2 router takes level 1 area routing updates and propagates them to level 2 areas and vice versa. Only level 2 routers can connect to an external network.

BGP Protocol

The Border Gateway Protocol (BGP) is an inter-autonomous system routing protocol. There are two different kinds of BGP. Internal and external BGP known as iBGP and eBGP. External BGP is used when talking to a router that has an AS number different from its own. Internal BGP is used with routers within the same local AS. Neighbor level configurations allow a variety of settings to be configured to customize the BGP configuration.

An autonomous system is a set of routers under a single technical administration, using an interior gateway protocol (IGP) and common metrics to determine how to route packets in the AS, and using an inter-AS routing protocol to determine how to route packets to other autonomous systems. Each of these autonomous systems is uniquely identified using an autonomous system number (ASN).

Peers are manually configured to exchange routing information and form TCP connections. There is no discovery in BGP. A peer in a different AS is referred to as an external peer, while a peer in the same AS is referred to as an internal peer.

Distributed Logical Router

Routing between virtual networks, layer 3 is distributed in the hypervisor. The distributed logical router optimizes the routing and data path, as well as supports single tenant or multitenant deployments. For example, a network that contains two VNIs that have the same IP addressing. This requires deploy two different distributed routers with one distributed router connecting to tenant A and one to tenant B.

Distributed routing is provided by a logical element called Distributed Logical Router (DLR), which has two main components:

DLR control plane is managed through the DLR control VM that supports dynamic routing protocols, which is in our case BGP & OSPF. Its main function is to exchange routing updates with the next layer 3 hop device (usually the NSX Edge) and communicates with the NSX Manager and the Controller Cluster. You can also achieve HA for the DLR Control VM through Active-Standby configuration.

Kernel modules known as VIBs will be installed on the ESXi hosts and form the data- plane level. These modules have routing information that is pushed through the controller cluster. These modules will do the route lookup, ARP entry lookup. The kernel modules are equipped with logical interfaces (called LIFs) connecting to the different logical switches. Each LIF has assigned an IP address representing the default IP gateway for the logical L2 segment it connects to and a vMAC address. The IP address has to be unique per LIF, whereas same vMAC can be assigned to all the defined LIFs.

A logical representation of this Logical Distributed Router is depicted below.

Without the distributed router, routing is done in one of two ways. The first method

Without the distributed router, routing is done in one of two ways. The first method uses a physical appliance, which means all traffic has to go to a physical appliance and come back regardless of whether or not the virtual machines are on the same host.

The other method performs routing on a virtual router such as the NSX for vSphere Edge gateway. This method uses a virtual machine running on one of the hosts to act as the router.

If virtual machines running on a hypervisor are connected to different subnets, the communication between these virtual machines has to go through a router. This non-optimal traffic flow is sometimes called “hair pinning”.

Distributed Logical Router: Logical View

The distributed logical router routes between VXLAN subnets. If two virtual machines are on the same host and the Web VM on VXLAN 5001 wants to communicate with the App VM on VXLAN 5002, the distributed logical router will route traffic between the two virtual machines on the same host. This logical representation has been depicted in the following illustration.

The distributed logical router can also route between physical and virtual subnets. The NSX for

The distributed logical router can also route between physical and virtual subnets.

The NSX for vSphere Manager configures and manages the routing service. During the configuration process, NSX for vSphere Manager deploys the Logical Router Control VM and pushes the logical interface configurations to each host through the control cluster.

The Logical Router Control VM is the control plane component of the routing process. It supports the following dynamic routing protocols:

OSPF

BGP

There is a kernel module required for the logical router and it is configured as part of the preparation through the NSX for vSphere Manager. You can think of these kernel modules as line cards in a modular chassis supporting layer 3 routing. This kernel module stores the routing information, which is pushed from the controller cluster.

The controller cluster is responsible for distributing routes learned from the Logical Router Control VM across the hypervisors. Controller nodes in a controller cluster divides the load of distributing the routes information when there is multiple distributed logical router instances deployed.

NSX for vSphere Networking and Security Features of the VMware NSX Edge Services Gateway

VMware NSX for vSphere provides you several Edge Gateway features. Those are:

NAT (Network Address Translation)

Load Balancing

High Availability (HA)

Virtual Private Network (VPN)

Layer 2 Bridging

Domain Name Service (Authoritative DNS)

Dynamic Host Configuration Protocol (DHCP Relay)

Firewall

Syslog

In this section we will take you through the new features those are newly introduced in NSX for vSphere 6.1. Other features such as DNS, DHCP Relay, Firewall and Syslog remains the same as it was in vCloud Networking and Security. However, the firewall feature is explained in the below section within NSX Security.

Network Address Translation (NAT)

NSX for vSphere Edge provides network address translation (NAT) service to assign a public address to a computer or group of computers in a private network. Using this technology limits the number of public IP addresses that an organization or company must use, for economy and security purposes.

You must configure NAT rules to provide access to services running on privately addressed virtual machines. The NAT service configuration is separated into source NAT and destination NAT rules.

There are mainly two types of NAT services it provides.

Source NAT - Source NAT is used to translate a private internal IP address into a public IP address for outbound traffic.

Destination NAT - Destination NAT is commonly used to publish a service located in a private network on a publicly accessible IP address.

Load Balancing

The NSX for vSphere Edge load balancer distributes incoming requests evenly among multiple. Load balancing helps in achieving optimal resource utilization, maximizing throughput, minimizing response time, and avoiding overload. NSX for vSphere Edge Devices provides load balancing up to layer 7.

The load balancer does not perform global balancing; it only performs local load balancing. If you have multiple virtual machines providing a Web service, the load balancer, which is a service of the NSX for vSphere Edge device, can provide load balancing across those virtual machines. Another feature is that when one of the virtual machines being load balanced to becomes unreachable, or the service becomes unresponsive, the load balancer service detects that condition and removes that Web server from the load balance rotation.

Clients do not open a Web browser and go to the IP address of the Web server. Instead the client points to an IP that is owned or hosted by the load balancer itself. The load balancer then takes the client traffic and redirects it by changing the destination IP address from the load balancers IP address to the IP address of the Web server that was selected to establish your session. The IP address that was used by the client to connect to the Web site is called the virtual IP (vIP).

High Availability

The NSX for vSphere Edge High Availability (HA) feature ensures that an NSX for vSphere Edge appliance is always available by installing an active pair of NSX for vSphere Edge gateways on your virtualized infrastructure. You can enable HA either when installing an NSX for vSphere Edge appliance or on an already installed/and or deployed NSX for vSphere Edge instance.

Primary Edge appliance is in the active state and the secondary appliance is in the standby state. Primary Edge Appliance replicates the configuration to the standby appliance. It is VMware best practice to deploy the primary and secondary appliances on separate resource pools and datastores, so that the underlying infrastructure does not become single point of failure for the edge appliances.

High Availability (HA) ensures that a NSX for vSphere Edge appliance is always available on your virtualized network. NSX Edge High Availability supports two NSX Edge appliances (peers) per cluster, running in active-standby mode.

NSX Manager manages the lifecycle of both peers and pushes user configurations as they are made to both NSX Edge instances simultaneously.

NSX Edge High Availability peers communicate with each other for heartbeat messages as well as runtime state synchronization. Each peer has a designated IP address to communicate with the other peer. The IP addresses are for HA purposes only and cannot be used for any other services. The IP addresses must be allocated on one of the internal interfaces of the NSX Edge.

Heartbeat and data synchronization both use the same internal vNIC. Layer 2 connectivity is through the same port group.

The heartbeat and sync channel requires L2 adjacency. The “internal Portgroup” is a vNic of type internal on the NSX for vSphere Edge, which can be shared or dedicated, but not on a management network.

When and if a failover occurs there is a heartbeat deadtime, which is the latency to detect a dead peer. Beyond that, there is extra latency for the Standby to completely take over, including reconfiguration and restarting non-hot standby services like VPN. This extra latency depends on NSX for vSphere Edge configuration. TCP flows can tolerate the downtime but there will be packet loss for udp traffic.

In NSX for vSphere, the default (minimum) deadtime is increased to 15 seconds since there are customer cases where a 6 second deadtime is too short.

Virtual Private Networking In NSX for vSphere v6.1 you will see the following VPN:

SSL VPN Plus – SSL VPN-Plus being a user based solution; you can use one Edge and configure SSL VPN on it. Then the Private network behind the edge will be reachable via the end users machine once connected via the SSL VPN client.

IPSec VPN – For IPSec VPN you can configure two edges and create a site-to-site tunnel between the two NSX for vSphere edges. Then the networks behind the two edges would be reachable to each other. This is a Layer 3 Site-to-Site solution and provides the ability to interconnect two different networks.

L2 VPN – For L2 VPN this is used to extend a network across boundaries such that the VMs being extended are not aware or require any change in their routing or MAC addresses etc. This is a Layer 2 Site-to-Site solution and basically lets you move VM's between two different networks without reconfiguring the VMs. And L2VPN is used by vCloud Connector for Data Center Extensions.

Layer 2 Bridging

In Layer 2 bridges, you create a bridge between a logical switch and a VLAN. This helps you to migrate virtual workloads to physical devices with no impact on IP addresses. A logical network can leverage a physical gateway and access existing physical network and security resources by bridging the logical switch broadcast domain to the VLAN broadcast domain.

Bridging can also be used in a migration strategy where you might be going P2V and you don't want to change subnets.

VXLAN to VXLAN bridging or VLAN to VLAN bridging is not supported. Bridging between different data centers is also not supported. All of the participants the VLAN and VXLAN bridge must be in the same data center.

The layer 2 bridge runs on the host that has the NSX Edge logical router virtual machine. The layer 2 bridging path is entirely in the VMkernel. This is where is where the sink port connection is for the distributed router to connect to the distributed port group. The sink port steers all interesting traffic related to bridging on to the switch. You cannot have routing enabled on those interfaces that you connect to the distributed router.

The distributed router that performs the bridging cannot perform routing on that logical switch. This means the virtual machines on that switch cannot use the distributed router as their default gateway. Because logical switches cannot have more than one distributed router connected to a logical switch, those virtual machines have to have a default gateway either outside in the physical network or in an appliance, such as the NSX Edge gateway, connected to the logical switch on the port group.

NSX for vSphere Security

NSX for vSphere Security feature comes with several dynamics. In this section we will cover:

NSX for vSphere Edge Firewall

NSX for vSphere Distributed Firewall

Service Composer

NSX for vSphere Edge Firewall

The NSX for vSphere Edge Appliance provides a stateful firewall for North-South and East-West traffic flows.

The NSX for vSphere Edge supports stateful firewalling capabilities, which complements the Distributed Firewall (DFW) enabled in the Kernel of the ESXi hosts. While the DFW is mostly utilized to enforce security policies for communication between workloads connected to logical networks (the so called east-west communication), the firewall on the NSX Edge is usually filtering communications between the logical space and the external physical network (north-south traffic flows).

NSX for vSphere Edge Firewall Features:

Industry-standard UI Optimized for NetOps

Advanced Connection Tracking

Attack and Malformed Packet checks

Object-based rules

Multiple objects per rule

Application Definition Groups

VMware Dynamic Objects

Cert-compliant logging and structured syslog

Traffic Stats per rule

Comment & Name fields for Change Management

The NSX for vSphere Edge services gateway provides several virtual machine form factors.

     

Total Number of Firewall Connections

Number of

 

Size

vCPU

RAM

Firewall Rules

Comments

Compact

1

64 MB

64000

2000

Suitable for basic Firewall

         

Suitable for medium level

Large

2

1

GB

1000000

2000

Firewall

         

Suitable for high performance Firewall +

XLarge

6

8

GB

1000000

2000

Load Balancer

         

Suitable for high

Quad Large

4

1

GB

1000000

2000

performance Firewall

NSX for vSphere Distributed Firewall

The firewall has evolved in recent years. Originally the firewall was a physical device that was placed at the perimeter of the network to inspect traffic entering the data center.

The next stage in the evolution was firewall appliances running in virtual machines. From a hypervisor perspective, one virtual machine was talking to another virtual machine. The virtual machine acting as the firewall had to be the default gateway for the other virtual machines running on that host. In some cases, firewalls also used to run inside the virtual machine to provide an additional layer of security.

The Distributed Firewall is a hypervisor kernel-embedded firewall that provides visibility and control for virtualized workloads and networks. The Distributed Firewall offers multiple sets of configurable rules for network layers 2, 3, and 4. The Distributed Firewall focuses on East-West access. The following image illustrates the logical representation of the Distributed Firewall.

the logical representation of the Distributed Firewall. NSX Distributed Firewall provides security-filtering

NSX Distributed Firewall provides security-filtering functions on every host, inside the hypervisor and at kernel level.

NSX Distributed Firewall offers a centralized configuration (using the vSphere Web Client) with distributed enforcement of policy rules (i.e applied down to vNIC level). It integrates Spoof Guard functionality to prevent spoofing in virtual environment and protects against IP and MAC spoofing.

The Distributed Firewall provides distributed enforcement of policy rules. The Distributed Firewall is configured using the vSphere Web Client. The Distributed Firewall is independent of the distributed router.

The Distributed Firewall is meant for East-West traffic or horizontal traffic. The NSX for vSphere Edge firewall focuses on the North-South traffic enforcement at the tenant or data center perimeter.

The Distributed Firewall policy is independent of where the virtual machine is located. If a virtual machine is migrated to another host using vMotion, the firewall policy will follow the virtual machine.

Distributed firewall rules are enforced at the vNIC layer before encapsulation or after decapsulation. The distributed firewall policies are independent of whether a virtual machine is connected to a VXLAN or VLAN. Distributed firewall rules are independent of virtual machine location.

The Distributed Firewall can enforce rules even if the virtual machines are on the same layer 2 segment. Policy rules always follow a virtual machine if the virtual machine is migrated to another host.

Service Composer

Service Composer helps you provision and assign network and security services to applications in a virtual infrastructure. This is illustrated in the following diagram below.

You map services to a security group, and the services are applied to the virtual

You map services to a security group, and the services are applied to the virtual machines in the security group. Define security policies based on service profiles already defined (or blessed) by the security team. Apply these policies to one or more security groups where your workloads are members.

A

security policy is a collection of the following service configurations:

Firewall rules that define the traffic to be allowed to, from, or within the security group that apply to vNICs.

Endpoint service, which is data security or third party solution provider services such as antivirus or vulnerability management services that apply to virtual machines.

Network introspection services, which are services that monitor your network such as intrusion prevention systems that apply to virtual machines.

If

a virtual machine belongs to more than one security group, the services that are applied to the

virtual machine depends on the precedence of the security policy mapped to the security groups.

Traffic leaves the virtual machine and is sent to the integrated partner product. Some partners have integrated products into NSX. This traffic flow happens before the traffic reaches the network.

Summary

In this chapter we have discussed an overview and high-level representation of network virtualization

concepts and where NSX for vSphere fits in. We have discussed about NSX for vSphere core components, logical switching, distributed routing, edge gateway features and security from a high- level. In the next chapter we will introduce the overview of the NSX for vSphere security architecture, NSX for vSphere networking and security connectivity. Also we will discuss about the use cases of NSX for vSphere networking and security features.

2

Introduction to the NSX Networking and Security Architecture

In this chapter we are going to focus on the VMware NSX® for vSphere networking and security platform which delivers the entire networking and security model in software, and which is decoupled from the traditional networking hardware, representing a transformative leap forward in a data center networking and security architecture.

Networking and Security is fundamentally a very real and serious concern for any organization today, and the thought of a security breach of any type is unimaginable! In this ever-increasing multi-tenant environment, it is just not viable that any organization could have data from a virtual machine that exists in one tenant being accessed by another virtual machine from another tenant. One of the core features of network and security virtualization is the ability to provide extensibility, which can be implemented in various ways, as in the case of networking; NSX for vSphere has the capability to extend a network across boundaries such that the virtual machines (VMs) being extended are not aware or require any change in their routing or MAC addresses. This is a layer 2-network extensibility solution and basically lets you move virtual machines (VMs) between two different networks without reconfiguring the virtual machines (VMs). While this ability to dynamically provision networks and associated security services across the entire data center is a critical factor for any organization today, it does potentially introduce and open up the data center to a myriad of security risks.

With the introduction of NSX for vSphere in your organization, the network and security virtualization is abstracted from the application workload communication and from the physical network hardware and topology, allowing the network security to break free from these physical constraints and whereby we can apply granularity in the network security, based on user, application, and business context.

When implementing any networking and security virtualization technology, such as NSX for vSphere, organizations need to ensure that they can continue to maintain a secure environment and one that meet’s their regulatory and compliance obligations. In order to achieve this, organizations are required to evaluate the risks that might affect protected information and mitigate those risks through risk-appropriate standards, processes, and best practices.

VMware considers standards compliance as one of the main motivations behind the management of information security. Compliance requires you by law or rule to abide by specific standards meant to protect specific types of information. Over the past several years, the compliance landscape in IT security has become quite complex. Many governments and industry associations worldwide have put in place regulations and standards that mandate the protection of various types of information. For example, in the United States, there are industry-specific privacy regulations, such as the Health Insurance Portability and Accountability Act (HIPPA) in health care and the Gramm-Leach-Bliley Act (GLBA) in financial services, as well as a patchwork of state regulations that dictate requirements for handling personal information. Companies operating in Europe also have to abide by the European Union’s Data Protection Directive and its specific implementations by individual countries. Privacy regulations are also now emerging in other geographies, such as India and China. Globally, credit card data must be protected as per the Payment Card Industry (PCI) Data Security Standard.

With a continuous stream of new, such as the Criminal Justice Information Services, CJIS or updated regulations and standards on the horizon, the compliance landscape will continue to be complex.

Overview of VMware’s NSX® Networking and Security Components

We are now going to turn our attention and look at the different components that make up VMware’s NSX for vSphere Software Networking and Security Virtualization Platform. We will primarily focus on the NSX for vSphere Edge, Distributed Firewall, Flow Monitoring Role-Based Access Control, Service Composer and Monitoring options available for you and your organization to use. This is by no means a complete inventory of all the network and security features that are available for you to consume in the NSX for vSphere product, merely an appetizer!

High Level Overview

VMware’s NSX for vSphere is undeniably the leading network and security virtualization platform in the industry today. Similar to the concept of virtual machines for compute, virtual networks and security services are programmatically provisioned and managed independently of the underlying hardware. NSX for vSphere reproduces the entire network and security model in software, enabling any network topology from simple to complex multi-tier networks to be created and provisioned in a matter seconds. It enables a library of logical networking elements and services, such as logical switches, routers, firewalls, load balancers, VPN, and workload security. Users can create isolated virtual networks and security services through using custom combinations of these features and capabilities.

When you look at VMware’s NSX for vSphere from a high level view you can start to picture and imagine the power of network and security virtualization, and the benefits of abstracting these network and security functions into software which provides a rich and diverse amount of control, scalability and automation to such a solution. VMware ‘s NSX for vSphere product and solution is one of the key building blocks of a software-defined data center (SDDC) approach.

The NSX for vSphere Platform Solution

Before we take a look at some of the individual network and security components of the NSX for vSphere product, let us take a brief look at what it takes and the process needed to build the NSX for vSphere Platform solution.

Before we take a brief look at some of the individual network and security components, lets take a look at the BIG NSX for vSphere component picture, which depicts the core technology used in delivering these network and security features.

Diagram 1 – NSX for vSphere Component Overview

NSX for vSphere Manger It all starts with NSX for vSphere Manager, which as you

NSX for vSphere Manger

It all starts with NSX for vSphere Manager, which as you see in Diagram 1 – NSX for vSphere Component Overview; the NSX manager sits in the management plane and is responsible for various aspects of configuration, such as the Appliance itself, which includes Time Settings, Syslog Server Settings, Network Settings, TLSv1.2/SSL Certificate Settings, Backup and Restore Settings, Upgrading to newer release Settings. In addition to this, there is also the NSX for vSphere Management Service Components, which provides the configuration settings for both the vCenter Server and SSO Lookup Service.

The NSX for vSphere Manager communicates with a singular vCenter Server and is the interface that is used for the NSX for vSphere API as well as for third-party applications that integrate with NSX for vSphere. As you can see here, the NSX for vSphere Manager is handling all the management tasks and as stated above there is a one direct correlation of one vCenter Server to one NSX for vSphere Manager. As an example, if the vRealize Automation Center (formerly vCloud Automation Center) is present with multiple vCenter Servers, each of those vCenter Servers will have an NSX for vSphere Manager instance.

When you initially log in to the NSX for vSphere Manager UI here is what you will see:

Diagram 2 – NSX for vSphere Manager UI

what you will see: Diagram 2 – NSX for vSphere Manager UI After logging in to

After logging in to the NSX for vSphere Manager, you would go the Manage “Appliance Settings”

where you would configure the initial settings as mentioned previously.

The following points highlight the main functions of the NSX for vSphere Manager, which are:

A single point of configuration and control for the NSX for vSphere platform solution

Packaged as a Virtual Appliance, containing all components required to run NSX on vSphere

Deployed in the vCenter Server to provide the NSX for vSphere functionally

Deployed with one instance per vCenter Server

The vSphere Web Client UI is a required component

Optionally configured with SSO server to share authentication with the vCenter Server

The following diagram represents the relationship between the different components.

Diagram – NSX for vSphere Manager User Interface

Diagram – NSX for vSphere Manager User Interface As you can see in the diagram, the

As you can see in the diagram, the NSX for vSphere components require a number of ports to be open in order for successful communication between the NSX for vSphere Manager, vCenter Server and REST client that may be used:

Port 443 between a REST client and NSX for vSphere Manager.

Port 443 to access the NSX for vSphere Manager Management User Interface.

Port 9443 to access the vSphere Web Client

As you are reading this section and looking at the Diagram 3, the NSX for vSphere Manager User Interface you may be asking yourself, what about high availability of the NSX for vSphere Manager! Is that not a single point of failure? You would be right in thinking this and need to take the necessary measures in order to make sure the NSX for vSphere Manager is highly available. A common approach and guideline to this would be to use the vSphere HA feature, as after all, the NSX for vSphere Manager is a virtual machine (VM).

NSX for vSphere Edge Appliance

In NSX for vSphere there are a number of networking and security capabilities that are available for both user configuration and consumption, and can be accessed by using the NSX for vSphere Manager Plugin that gets instantiated in the vSphere Web Client, which we will see later, and by using the NSX for vSphere API, which is a REST API. It is also worth noting that the NSX for vSphere features are also consumed and exposed by other VMware products, such as vRealize Automation (formerly vCloud Automation Center) and vCloud Director. These network and security services are once again provided through the integration with the vSphere Web Client. When you deploy a NSX for vSphere Edge appliance you will observe that it has two personalities! What that means, is you can configure the NSX for vSphere Edge appliance as either an Edge Services Gateway (ESG) or Logical Distributed Router (DLR). Please see the following diagram that depicts this.

Let’s take a brief look at the capabilities that are included in the NSX for

Let’s take a brief look at the capabilities that are included in the NSX for vSphere Edge Appliance:

§ Firewall – Providing a 5 tuple rule configuration with IP, Port ranges, Grouping Objects, and vCenter Server Containers

§ Network Address Translation – Source and Destination NAT capabilities

§ DHCP –Configuration of IP Pools, gateways, DNS servers and search domains

§ Routing – Both Static and Dynamic Routing protocols are supported, including: OSPF, BGP and ISIS

§ Site-to-Site VPN – IPsec site-to-site VPN between two Edges or a third party vendor’s VPN terminator, such as Juniper and Cisco as well as others.

§ SSLVPN – Allow remote users to access the private networks behind the Edge Services Gateway (ESG)

§ L2VPN – Stretch your layer 2 networks across datacenters

§ High Availability – Active-Standby High Availability (HA) capability which works well with vSphere HA

§ DNS – Allow the configuring of DNS as a relay

§ Syslog – Allow remote syslog servers

These networking and security features in NSX for vSphere are key to providing multi-tenancy capabilities at the virtual datacenter layer, while still supporting self-service for consumers of these services. This is important to provide the required functionality for Enterprise and Service Providers who are looking to build and offer secure networking and security services.

NSX for vSphere Security Features

We are now going to take a look at the NSX for vSphere Edge Appliance security features, namely the IPsec VPN, Layer 2 VPN and SSLVPN, and in particular we are going to describe what features are available, where you would typically deploy them and finally give you an example of how you would configure them.

But before we do, it is important to understand that the VMware NSX for vSphere Platform is not only the network virtualization platform but also the security platform to for the software-defined data center. The NSX for vSphere Platform brings both Virtualization and Security to your existing network and transforms network and security operations and economics.

The software-defined data center extends the virtualization and security concepts like abstraction, pooling, and automation to all data center resources and services. Components of the software-defined data center can be implemented together, or in a phased approach:

• Compute virtualization, network virtualization, security, and software-defined storage deliver

abstraction, pooling, and automation of the compute, network, security and storage infrastructure

services.

• Automated management delivers a framework for policy-based management of data center application and services.

In the following diagram you can see how security is extended across the software-defined data center.

is extended across the software-defined data center. As you are aware virtualizing the network abstracts

As you are aware virtualizing the network abstracts application workload communications from the physical network and hardware topology. This virtualization is critical in allowing network security to break free from the physical constraints. Virtualization enables the network security to be based on

user, application, and business context.

In the NSX for vSphere Platform we can think of security that is split into the following categories:

§ NSX for vSphere Edge Services Gateway Firewall (VPN Services)

§ NSX for vSphere Distributed Firewall

§ NSX for vSphere Flow Monitoring

§ NSX for vSphere Role-Based Access Control

§ NSX for vSphere Service Composer

We will cover these in brief, but will spend more time on the NSX for vSphere Edge Services Gateway Firewall (VPN Services).

NSX for vSphere Edge Services Gateway VPN Services

VMware customer experience and independent analyst research demonstrate it is possible to build a fully virtualized DMZ which is secure, scalable and cost effective using VMware’s NSX for vSphere platform. In this section we are going to look at how we provide some guidance around developing an architecture, and the design the deployment that is required to both realize the benefits and mitigate the risks. There has been a great deal of confusion on what VPN features actually exists in the VMware’s NSX for vSphere platform. With the NSX for vSphere v6.1.x product you will see the following features available. Please see the diagram VPN Services:

§ VPN

o

IPSec VPN

o

L2 VPN

§ SSL VPN-Plus

Diagram: VPN Services

VPN o L2 VPN § SSL VPN-Plus Diagram: VPN Services Now of these '3' VPN features

Now of these '3' VPN features is as follows:

§ IPSec VPN - For IPSec VPN you can configure two Edge Services Gateway’s (ESG) and create a site-to-site tunnel between the two Edge Services Gateway’s. Then the networks behind the two Edge Services Gateway’s would be reachable to each other. This is a Layer 3 Site-to-Site solution and provides the ability to interconnect two different networks. –

§ L2VPN - For L2VPN this is used to extend a network across boundaries such that the VMs being extended are not aware or require any change in their routing or MAC addresses etc. This is a Layer 2 Site-to-Site solution and basically lets you move VM's between two different networks without reconfiguring the VMs.

§ SSL VPN-Plus - SSL VPN-Plus being a user based solution, you can use one Edge Services Gateway’s and configure SSL VPN on it. Then the Private network behind the Edge Services Gateway will be reachable via the end users machine once connected via the SSL VPN client.

Note that the L2VPN feature is used by vCloud Connector for its Data Center Extensions (DCE).

NSX Edge Services Gateway IPSec VPN

The NSX for vSphere Edge Services Gateway (ESG) supports certificate authentication, pre-shared key mode, IP unicast traffic, and no dynamic routing protocol between the NSX for vSphere Edge Gateway instance and remote VPN routers. Behind each remote VPN router, you can configure multiple subnets to connect to the internal network behind an NSX for vSphere Edge Services Gateway instance through IPsec VPN tunnels. These subnets and the internal network behind an NSX for vSphere Edge Services Gateway instance must have address ranges that do not overlap.

You can deploy an NSX for vSphere Edge Services Gateway (ESG) behind a NAT device. In this deployment, the NAT device translates the VPN address of a NSX for vSphere Edge Services Gateway instance to a publicly accessible address facing the Internet. Remote VPN routers use this public address to access the NSX for vSphere Edge Services Gateway instance. You can also place remote VPN routers behind a NAT device. You must provide the VPN native address and the VPN Gateway ID to set up the tunnel. On both ends, static one-to-one NAT is required for the VPN address. You can have a maximum of 64 tunnels across a maximum of 10 sites.

A common question that arises based on the above, is the IPSec Tunnels says refer to have a

maximum of 64 tunnels across a maximum of 10 sites. But what exactly defines a site? What this means is that the NSX for vSphere Edge Services Gateway in the datacenter is limited by a vSphere VM’s limitation of 10 interfaces. This limits the number of remote NSX for vSphere Edge Services Gateway ‘s that can connect to a single data center Edge (10 being the maximum). Once connectivity

is available, NSX for vSphere Edge Services Gateway can support 64 IPsec tunnels (carrying Logical

Switch traffic). So, using a single NSX for vSphere Edge Services Gateway you can connect 10

offices with 64 Logical Switches tunneled.

§ Encapsulating Security Payload (ESP) tunnel mode is used:

Ø 64 tunnels are supported across a maximum of 10 sites.

§ Internet Key Exchange v1

§ Multiple non-overlapping local and peer subnets can be configured.

§ Industry standard IPsec implementation:

Ø Full interoperability with Cisco, Juniper, Sonicwall, and others

§ Supports both the pre-shared key (PSK) and certificate authentication mode.

§ Supported encryption algorithms are AES (default), AES256, and TripleDES.

IPsec Security Protocols Internet Key Exchange

IPsec is a framework of open standards. Many technical terms are in the logs of the NSX for vSphere Edge Services Gateway instance and other VPN appliances that you can use to troubleshoot the IPsec VPN. You might encounter some of these standards.

Internet Security Association and Key Management Protocol (ISAKMP): This protocol is defined by RFC 2408 for establishing Security Associations (SA) and cryptographic keys in an Internet environment. ISAKMP only provides a framework for authentication and key exchange and is designed to be key exchange independent.

Oakley: This protocol is a key agreement protocol that allows authenticated parties to exchange keying material across an insecure connection by using the Diffie-Hellman key exchange algorithm.

Internet Key Exchange (IKE): This protocol is a combination of ISAKMP framework and Oakley. The NSX for vSphere Edge Services Gateway Edge provides IKEv2.

IKE has two phases. Phase 1 sets up mutual authentication of the peers, negotiates cryptographic parameters, and creates session keys. Phase 2 negotiates an IPsec tunnel by creating keying material for the IPsec tunnel to use. Phase 2 either uses the IKE phase one keys as a base or performs a new key exchange. The following phase 1 parameters are used by NSX for vSphere Edge Services Gateway:

Main mode

3DES or AES (configurable)

SHA-1

MODP group 2 (1024 bits)

Pre-shared secret (configurable)

Security association lifetime of 28800 seconds (eight hours)

ISAKMP aggressive mode disabled

The following IKE phase 2 parameters are supported by NSX for vSphere Edge Services Gateway:

3DES or AES (matches the phase 1 setting)

SHA-1

ESP tunnel mode

MODP group 2 (1024 bits)

Perfect forward secrecy for rekeying

Security association lifetime of 3600 seconds (one hour)

Selectors for all IP protocols, all ports, between the two networks, using IPv4 subnets

The Diffie-Hellman (DH) key exchange protocol is a cryptographic protocol that allows two parties that have no previous knowledge of one another to jointly establish a shared secret key over an insecure communications channel. The NSX for vSphere Edge Services Gateway supports DH group 2 (1024 bits) and group 5 (1536 bits).

IPsec Security Protocols Encapsulating Security Payload

Encapsulating Security Payload (ESP) is a member of the IPsec protocol suite. In IPsec, it provides origin authenticity, integrity, and confidentiality protection of packets. ESP in Tunnel Mode encapsulates the entire original IP packet with a new packet header. ESP protects the whole inner IP packet (including the inner header). The outer header remains unprotected. ESP operates directly on IP, using IP protocol number 50.

ESP tunnel mode:

Confidentiality (encryption)

Connectionless integrity

Data origin authentication

Protection against replay attacks

IPsec ESP Tunnel Mode Packet

IPsec ESP Tunnel Mode Packet Encapsulating Security Payload (ESP) processes a packet in tunnel mode, the

Encapsulating Security Payload (ESP) processes a packet in tunnel mode, the entire packet is surrounded by the ESP header, ESP trailer, and ESP authentication data.

§ ESP header: Contains two fields, the SPI and Sequence Number, and comes before the encrypted data.

§ ESP trailer: Placed after the encrypted data. The ESP trailer contains padding that is used to align the encrypted data through a Padding and Pad Length field.

§ ESP authentication data: Contains an integrity check value.

What you are seeing in the diagram above is the original packet that is transmitted and is both encrypted and authenticated.

Security is one of the top concerns among organizations evaluating any networking and security product, and the NSX for vSphere platform is no different. Offering customers, the ability to implement their own security measures, with a comprehensive set of security tools is a very attractive proposition. Among these tools, one of the most important to enterprise customers and service providers alike is the ability to securely interconnect physical and virtual datacenters with virtual private networks (VPNs).

Virtual Private Networks are an important feature, as this enables IT organizations to securely connect their own physical, virtual, and cloud environments to virtual datacenters hosted by service provider as an example. With connectivity to multiple organizations secured by VPN, organizations can freely move their date securely in an out without having to worry about loss or corruption of data in transit. From the enterprise datacenter’s perspective, a virtual datacenter in the public Cloud is simply another subnet in its network topology. For cloud service providers, supporting VPNs makes it easier to attract customers and garner more of their workloads, increasing revenue and strengthening partnerships with customers.

Fortunately, VMware provides a suite of Networking and Security products, whereby one of these features is the NSX for vSphere platform, which has a comprehensive IPsec VPN feature. Using this feature, you can securely interconnect your enterprise datacenters with virtual datacenters. With the NSX for vSphere platform IPsec VPN Edge Services Gateway, customers can secure communication between their datacenters. The result is that customers can treat this feature as providing a seamless extension of their own datacenters, making this inter-connect straightforward and secure.

IPsec VPN Network Topologies

To interconnect multiple private organizations or sites over public networks securely, requires the functionality of the NSX for vSphere Platform and in particular the IPsec VPN Edge Services Gateway, thus making them work as if they are extensions of a single datacenter. The network topologies could include the following an example:

network topologies could include the following an example: Single Site Topology - NSX for vSphere platform

Single Site Topology - NSX for vSphere platform Edge Services Gateway IPsec VPNs can for example connect different virtual datacenters hosted by the same service provider, even hosted in the same vCloud Director instance (See Figure 1 above). This example secures communication between networks hosted on shared infrastructure.

Multi-Site Topology – NSX for vSphere platform Edge Services Gateway IPsec VPNs can connect to multiple deployments. For example, an enterprise private cloud can be securely connected to the organization’s virtual datacenter in a service provider’s public cloud (See Figure 1 above). Similarly, virtual datacenters hosted by multiple service providers can be interconnected. These examples secure communication between clouds over public networks.

Enterprise Site Topology - NSX for vSphere platform Edge Services Gateway Edge VPNs can securely connect enterprises with fixed router or firewall-based VPNs to virtual datacenters hosted by providers of vCloud Powered services. Because NSX for vSphere platform Edge Services Gateway supports industry-standard IPsec-based VPNs, a wide range of devices, including those from Check Point, Cisco, and Juniper, can be used to terminate the VPN at the enterprise location.

Edge Services Gateway IPsec VPN Use Case

The network topologies that most enterprise and service provider customers are most likely to encounter involve two use cases for NSX for vSphere platform Edge Services Gateway IPsec VPNs:

Connecting multiple virtual datacenters regardless of location. This single use case supports both multi-site and single-site deployments and connecting enterprise environments with virtual datacenters hosted by service providers.

Connecting enterprise datacenters with virtual datacenters.

From the standpoint of implementing these use cases with NSX for vSphere platform Edge Services Gateway IPsec VPNs, the main difference is the endpoints. In the first case, both endpoints are NSX for vSphere platform Edge Services Gateway appliances located at the perimeter of a virtual datacenter. In the second case, a NSX for vSphere platform Edge Services Gateway Edge appliance establishes a VPN with a physical device located in an enterprise datacenter.

Edge Services Gateway IPsec VPN Prerequisites

In order to establish a site-to-site IPsec VPN, a small number of prerequisites must be fulfilled:

Each VPN appliance, whether a NSX for vSphere platform Edge Services Gateway instance or a physical appliance, must have a fixed IP address that makes the appliance visible to each other. In the case of multi-site VPNs, this requires public IP addresses. In the case of single-site VPNs, private addresses can be used as long as the appliances are on the same network or the addresses are routable.

The NSX for vSphere platform Edge Services Gateway appliance must allow the following protocols to pass: Encapsulating Security Payload (ESP), Internet Key Exchange (IKE), and NAT traversal.

Edge Services Gateway IPsec VPN Protocols

Firstly, there are in fact two versions of IKE, IKE version 1 (IKEv1) and IKE version 2 (IKEv2). The NSX for vSphere platform Edge Services Gateway IPsec VPN feature in v6.1.x only supports IKEv1. IKEv1 is based on the Internet Security Association and Key Management Protocol (ISAKMP) framework.

IKEv1

UDP port 500

IKE has two phases:

Phase 1: Authenticated communication channel for the key exchange

Phase 2: Security Associations for both sides

Auto-plumbing of the firewall for the control channel – the NSX for vSphere platform Edge Services Gateway automatically adds the required firewall rules to accept communication with the peer, both for tunnel control and data traffic.

Encapsulating Security Payload (ESP) Tunnel Mode

Confidentiality (encryption)

Connectionless integrity

Data origin authentication

Protection against replay attacks

Tunnel mode

Edge Services Gateway IPsec VPN Key Characteristics

ESP Tunnel mode is used to provide a VPN between two networks or between a host and a network. With the version NSX for vSphere platform Edge Services Gateway v6.1.x 64 tunnels are supported

Multiple non-overlapping local and peer subnets can be configured

Only IKEv1 is supported

NAT Traversal (NAT-T) is used in situations where network address translation is interposed between the two-NSX for vSphere platform Edge Services Gateway devices. NAT Traversal overcomes the problems inherent in encrypting IPsec ESP packets that include translated addresses that must be modified in the payload, thus causing checksum errors and other

incompatibilities. NAT Traversal and all the other IPsec protocols including IKE and ESP only pass between the NSX for vSphere platform Edge Services Gateway Edge devices. The internal virtual machines communicating to the NSX for vSphere platform Edge Services Gateway devices do not need to be aware of the existence of the tunnel

Based on Industry Standard IPsec implementation

Fully interoperable with standard vendors, such as Cisco, Juniper, Sonicwall, etc…

Support for both pre-shared key (PSK) and certificate authentication mode - in this model both the peers use a pre-shared key for encryption and authentication. These keys are agreed to and configured on each peer prior to starting the VPN

Supported Encryption Algorithms include, 3DES and AES (128 and 256 bits)

AES-NI support, which was added in NSX for vSphere platform Edge Services Gateway v6.1.x, provides higher performance - AES-NI is Intel’s Advanced Encryption Standard New Instructions, which uses a hardware dependent Westermere microarchitecture, such as the Xeon E56xx model. No user configuration is required in order to enable AES-NI. The encryption overhead for packet traffic in a VPN application can be high. The Intel AES-NI feature can substantially reduce the demand on the CPUs of the ESXi hosts. The NSX for vSphere platform Edge Services Gateway offloads the AES encryption of data to the hardware on supported Intel Xeon and second-generation Intel Core processors. In the following example it is anticipated that up to a 40 percent performance increase could be achieved by supporting the Intel AES-NI (AES New Encryption Instruction Set). In addition to this, and from a security perspective there is no user configuration necessary as AES-NI support in hardware is auto detected. The NSX for vSphere platform Edge Services Gateway also supports certificate authentication, pre-shared key mode, and IP unicast traffic.

Diagram: VPN Services using AES-NI

and IP unicast traffic. Diagram: VPN Services using AES-NI In order to make the NSX for

In order to make the NSX for vSphere platform Edge Services Gateway IPsec VPN work end-to-end, the following ports need to be opened on both peers and all firewall devices in between, This includes UDP 500 for IKE, UDP 4500 for NAT-T and ip proto 50 for ESP.

Adding an Edge Services Gateway IPsec VPN

Launch your favorite browser, such as Firefox, Chrome or Internet Explorer and enter the IP Address or name of the NSX for vSphere Manager. Once you have successfully logged in go to the Networking and Security View -> NSX Edges.

the NSX for vSphere Manager. Once you have successfully logged in go to the Networking and

Open the NSX Edges, in this example Edge Id edge-7, and you will see the following:

this example Edge Id edge-7, and you will see the following: Now, click on the Add

Now, click on the Add button and you will be presented with the following:

the Add button and you will be presented with the following: What is important to note

What is important to note here is that you must configure at least one external IP address on the NSX for vSphere Edge Services Gateway to provide the IPsec VPN service.

For the local NSX for vSphere Edge Services Gateway instance, you need to provide an ID, the external IP address, and the CIDR block for the local subnets. You will also be required to enter the

same set of information for the peer endpoint also.

For the remote NSX for vSphere Edge Services Gateway instance, you will need to provide the same information, but from the remote perspective.

Finally, you will need to select an encryption algorithm, type of authentication, Diffie- Hellman Group, and Extension is required.

NSX for vSphere SSL VPN-Plus Services

Conventional full access SSL VPNs send TCP/IP data in a second TCP/IP stack for encryption over the Internet. The result is that application layer data is encapsulated twice in two separate TCP streams. When packet loss occurs, which happens even under optimal Internet conditions), a performance degradation effect called TCP-over-TCP meltdown occurs. In essence, two TCP instances are correcting a single packet of IP data, undermining network throughput and causing connection timeouts. TCP optimization eliminates this TCP-over-TCP problem, ensuring optimal performance.

The NSX for vSphere SSL VPN-Plus Service fundamentally enables individual remote users to connect securely to private networks behind an NSX for vSphere Edge Services Gateway, and thus allows remote users to access applications and servers from the private networks. This service also provides two access modes, one being Web access mode, where there is no client installed, and full network access mode, which requires that a client is installed.

The NSX for vSphere Edge Services Gateway provides users with access to protected resources by establishing an SSL encrypted tunnel between a laptop, such as Mac OS X or Windows and the NSX for vSphere Edge Services Gateway. The NSX for vSphere SSL VPN-Plus Service is intended to be deployed as a substitute for more complicated IPsec client-to-site or jump server solutions. The NSX for vSphere SSL VPN-Plus Service does not support mobile clients, and does not deliver common end-user features such as reverse proxy, custom portal, and SSL offload. The primary use cases and capabilities of the NSX for vSphere SSL VPN-Plus Service are somewhat different from capabilities that are provided by the Horizon View. View is the VMware comprehensive approach to virtual desktop infrastructure, secure mobility, and end-user remote access.

One of the real advantages of the NSX for vSphere SSL VPN-Plus Service is you can access your corporate LAN by using the Web-access mode or with a downloadable SSL client, but there is no special hardware or software that is required.

An important aspect of the NSX for vSphere SSL VPN-Plus Service is its ability to

An important aspect of the NSX for vSphere SSL VPN-Plus Service is its ability to provide administrative users with full tunnel access to protected resources by establishing an SSL encrypted tunnel between a laptop and the NSX for vSphere Edge Services Gateway, which in turn provides you with a Secure Management Access Server.

NSX for vSphere Edge Services Gateway SSL VPN-Plus Secure Management Access Server

Gateway SSL VPN-Plus Secure Management Access Server L2 VPN and Stretched Logical Netwo rks We have

L2 VPN and Stretched Logical Networks

We have covered both IPSec VPN and SSL-VPN Services, but will now take a look at the lesser- known feature of the NSX for vSphere Edge Services Gateway, which is the L2 VPN. The L2 VPN enables you to stretch a Layer 2 subnet over Layer 3, tunnelled through an SSL VPN. The two sites form a Layer 2 adjacency; this could either be within the same datacenter or across datacenter locations. With the current version of the NSX for vSphere 6.1.x platform it is now possible to trunk multiple logical networks, whether that be a VLAN to VLAN, VLAN to VXLAN or VXLAN to VXLAN. It is also possible to deploy a standalone NSX for vSphere Edge Services Gateway on a remote site without that site being “NSX Enabled” that is, connecting to a VLAN. As this is not a particularly well-understood area, it is important we cover two new abstracts.

§ Trunk interface, this allows multiple internal networks, which could be either a VLAN or VXLAN to be trunked. Interfaces that are configured to support 802.1Q Ethernet frames are called Trunk interfaces.

§ Local Egress Optimization enables the NSX for vSphere Edge Services Gateway to route any packets sent towards the Egress Optimization IP address locally, and send everything else over the tunnel. This is important for VM Mobility, as if the default gateway for the virtual machines, which belongs to the subnets you are stretching, is the same across the two sites you need this setting to ensure traffic will be locally routed on each site. If you decide you need to migrate a Virtual Machine (VM) from one site to another, you can do so without touching the Guest OS network configuration.

NSX for vSphere Edge Services Gateway L2 VPN Use Case

A very common use case for L2 VPN is Cloud Bursting, where a Private Cloud service bursts into a Public Cloud when the demand is required. Effectively this is a Hybrid Cloud solution.

The following diagram is from the NSX for vSphere v6.1 Administration Guide. As you can see in this scenario and diagram below, VLAN 10 on site A is stretched to VXLAN 5010 on site B. Similarly, for VLAN 11 is stretched to VXLAN 5011 on site B. Again, this is an example where an NSX data centre is extended to a non-NSX data centre.

to VXLAN 5011 on site B. Again, this is an example where an NSX data centre

In the diagram and scenario below, this could be used for a Private Cloud to Private Cloud Migration or DR Site. As you can see the VXLAN 5010 and 5011 have been stretched to site B and mapped to the same VNI. A VXLAN Number Identifier (VNI) is a 24-bit number that gets added to the VXLAN frame and uniquely identifies the segment to which the inner Ethernet frame belongs. You can have multiple VNIs, which can exist in the same transport zone. In the NSX for vSphere Platform the VNI start at 5000.

zone. In the NSX for vSphere Platform the VNI start at 5000. NSX for vSphere Edge

NSX for vSphere Edge Services Gateway L2 VPN Topology

The following diagram and topology depicts a VXLAN to VXLAN extension. Here we are stretching VXLAN 5004 at Site B, which is the Branch Web Tier, to VXLAN 5003 at Site A, which is the Web Tier.

This will allow an administrator to extend his datacenter to public clouds in such a

This will allow an administrator to extend his datacenter to public clouds in such a way that he can manage resources (VMs, vApps, templates) in public clouds as if they where in the local datacenter.

NSX for vSphere Edge Services Gateway Firewall

A logical firewall provides security mechanisms for dynamic virtual data centers, and includes components to address different deployment use cases.

The NSX for vSphere Edge Services Gateway Firewall focuses on the North-South traffic enforcement at the tenant or data center perimeter as shown below. There is also the NSX for vSphere Distributed Firewall, which focuses on East-West traffic, and together, these components address the end-to-end firewall needs of virtual data centers. It is also possible to deploy either or both of these technologies.

The NSX for vSphere Edge Services Gateway firewall provides perimeter security functionality, which is a

The NSX for vSphere Edge Services Gateway firewall provides perimeter security functionality, which is a stateful firewall for North-South traffic flows. It supports dynamic routing, is virtualization context aware and the configurations and management is performed from one central location where Firewall rules are applied in ascending number order.

where Firewall rules are applied in ascending number order. The types of firewall rules that the

The types of firewall rules that the NSX for vSphere Edge Services Gateway firewall supports are as follows:

When you first deploy NSX for vSphere Edge Services Gateway a set of default rules are created

during the deployment. The following example depicts this:

during the deployment. The following example depicts this: Internal rules are created by the NSX for

Internal rules are created by the NSX for vSphere Edge Services gateway in support of services configured by the user, and the user creates user rules, as depicted below. The firewall rule type does not affect the application of the rule.

rule type does not affect the application of the rule. Virtualization context awareness refers to the

Virtualization context awareness refers to the ability for the NSX for vSphere Edge Services gateway firewall to filter traffic flows based on IP and TCP/UDP header information. The NSX for vSphere Edge Services gateway firewall can also filter traffic flows based on virtualization specific information, such as Cluster, Logical switch, Security Group and Virtual machine for example. Please see the following diagram that depicts this:

example. Please see the following diagram that depicts this: Populating the NSX for vSphere Edge Services

Populating the NSX for vSphere Edge Services Gateway firewall rules requires that you enter specific information when creating the firewall rule. For example, you are required to assign the following information:

ü Name – This is just a free form description.

ü Source

ü Destination – The source and destination rules can include the IP address in the packet or information provided by the VMware vCenter Server, such as virtual machine name or resource pool.

ü Service - A service in the firewall context is a collection of TCP/UDP ports that form the

components for successful communication with an end system.

ü Action - The Action option allows the rule to accept or deny the traffic. This can include logging for the rule, Network Address Translation support can be enabled and the rule can be applied on ingress or egress.

This complete rule is also shown in the diagram below:

This complete rule is also shown in the diagram below: This diagram also shows the action

This diagram also shows the action option for Network Address Translation, which is not commonly known:

Network Address Translation, which is not commonly known: After the NSX for vSphere Edge Services Gateway

After the NSX for vSphere Edge Services Gateway firewall rule(s) has been created, you are required to publish the changes. When this process has taken place the changes take effect immediately. This is a common issue whereby after completing the creation of the NSX for vSphere Edge Services Gateway firewall rule(s) you forget to publish the rule. As soon as you make a change to a rule you will see the following dialog:

firewall rule(s) you forget to publish the rule. As soon as you make a change to

Also note that when you are creating your NSX for vSphere Edge Services Gateway firewall rule(s) that they are processed in order. This means that the first rule that matches the traffic being examined is applied and the traffic is passed or dropped.

Flow Monitoring

Flow Monitoring is a traffic analysis tool that provides a detailed view of the traffic on virtual machines. The Flow Monitoring output defines which machines are exchanging data and over which application. This data includes the number of sessions, packets, and bytes transmitted per session. Session details include sources, destinations, direction of sessions, applications, and ports being used. Session details can be used to create firewall allow or block rules. You can use this information to audit network traffic, define and refine firewall policies and identity threats to your network.

The NSX for vSphere Distributed Firewall has visibility of all traffic flows that have taken place in the logical switches. By drilling down into the traffic data, it is possible to evaluate the use of your resources and send session information to the Distributed Firewall to create a rule or block rule at any level. This is a very power security feature.

NSX for vSphere Hardening Guide

In this section we look at the high level guidance for securing or evaluating the NSX for vSphere platform and infrastructure. The recommendations expressed herein are not exhaustive. Instead, they intend to represent high value actions one may take toward securing or evaluating the NSX for vSphere platform infrastructure. Additionally, these recommendations are intend to be specific for NSX for vSphere v6.1.x. The recommendations are categorized in the following sections:

1. Common Recommendations

2. Management Plane Recommendations

3. Control Pane Recommendations

4. Data Plane Recommendations

The above sections are defined in the VMware NSX for vSphere Hardening Guide. To obtain the latest version of this guide, please visit https://communities.vmware.com/docs/DOC-28142. If you have questions, comments, or have identified ways to improve this guide, please send comments in the comments area.

This VMware NSX for vSphere Hardening Guide is intended for various roles, including network and security architects, security officers, virtual infrastructure administrators, cloud infrastructure architects, cloud administrators, cloud customers, cloud providers, and auditors. Additionally, individuals and organizations that seek a starting point for the network and security controls to consider when adopting or building a network and security infrastructure may find the recommendations in this section valuable.

VMware engages with various partners to perform the security assessment of the NSX for vSphere platform. These reviews provide an assessment as well as focusing on major, and newly added features such as the integration of Software Defined Networking (SDN) and Software Defined Data Center (SDDC). The VMware NSX for vSphere Hardening assessment of the NSX for vSphere platform is primarily focused on the networking and security attacks, configuration issues, secure defaults, and protocols in use. Using a combination of targeted source code review, active testing, fuzzing and discussions we are able to locate and determine if any significant vulnerabilities exists.

Separately or in concert, many of these issues can result in a complete datacenter compromise.

I think it is fait to say that the NSX for vSphere platform is complex, and is not a platform that you can just drop in, and requires a great deal of thought before you go building or deploying such an environment.

There is no doubt in anyone’s mind that there are inherent risks of the Software Defined Networking however, software defined networking paired with network and security virtualization promises to offer a myriad of benefits and allows for entire Software Defined Data Centers (SDDC), a key part of VMware's vision for the current and future products.

It is also important to know and understand the potential and inherent risks of this new platform in

order to help guide you as you work with VMware’s NSX platform technology.

One of the true values of software defined networking and security is it allows agile movement of virtual machines and networks and security services between physical hosts and the datacenter(s). The dynamic nature of this technology requires that underlying hosts be fully connected at the physical and IP layer. However, with this mass of connectivity comes a myriad of associated risks. All software has flaws, and the reimplementation of core networking protocols, parsers, and switching methods will repeat and likely inherit historic vulnerabilities from the age of physical networking and security.

A Denial-of-Service (DoS) becomes a much greater issue now. In the physical networking world,

dedicated hardware handles much of the parsing and routing of packets. In a software networking and security world, it is the software component that must parse, reparse, perform table lookups, and generally be aware and about the encapsulation and fragmentation and so on., and overall spending much more CPU time deciding how to handle each packet. A potential software bug in any stage of this packet handling can lead to resource exhaustion, software crashes, and other scenarios that result

in a total Denial-of-Service (DoS) and thus potentially a loss of networking and security services for

hundreds of hosts, and also to potentially affecting an entire datacenter!

It is also true that software-defined networking and security also extends traditional network and

security attacks to multiple datacenters. Traditionally local attacks, such as ARP spoofing, can now be conducted across layer three networks in geographically diverse locations. Additionally, if any vulnerability in the software network and security stack allows these attacks to leak onto the physical network, physical hosts in multiple datacenters affecting multiple customers can also be compromised.

In

a very real sense software-defined networking and security as it is currently designed everything is

in

the basket of VM containment. If a virtual machine escape is ever performed or if an attacker

discovers a technique for sending un-encapsulated packets on physical networks, expected security is lost. As described above, every physical host must be completely connected at the IP and physical layer, exposing an extremely broad attack surface. Once an attacker has a method of sending and receiving data on this physical network, the attacker can move laterally between hosts unabated by firewalls or routers, as these are no longer security relevant devices. Software-defined networking and

security is a powerful technology that is necessary to organizations and companies now and in the long-term future. However, like all software, it can be fragile and networking and security vulnerabilities have broad ramifications not traditionally realized in physical networking platforms.

Throughout your security assessments, it should be noted that you look at recurring weaknesses, as

these are good candidates for systematic fixes as well as areas that you should look at and be subject

to additional testing. These can also be considered in secure guidelines and threat modeling.

Here are some security measures to consider.

Much of the NSX for vSphere platform can be protected with TLSv1/SSL if properly configured, but consistent usage and strong defaults are still elusive. When protecting the NSX for vSphere Manager as well as the management REST APIs we use TLS v1.2, as the control plane uses TLS in all other communications.

Summary

In this chapter we have discussed about the overview of the NSX for vSphere security architecture, NSX for vSphere networking and security connectivity. Also we have discussed about the use cases of NSX for vSphere networking and security features. In the next chapter we will discuss about the deployment process of NSX for vSphere and walk you through the process of various different component deployment.

3

Deployment of NSX for vSphere

This deployment of NSX will be divided into three sections.

Installation of Management Plane

Installation Control Plane

Installation of Data Plane

Each section will be described with step–by-step procedure from installing the first component of NSX Management Server to VM Production ready network for L2-L3 Services. Integration of L4-L7 services will not cover in this book, this will discuss in the upcoming books.

Installation of Management Plane

NSX Manager is the management plane for vCenter Server to provide the administration through plugin interface. NSX Manager is available as appliance from VMware and provides easy to download and deploy methodology. NSX Manager requires an IP address to communicate with vCenter Server in the same subnet. Once NSX Manager is installed, the plugin will be active on the vCenter Server after registration of its FQDN or IP Address with the NSX Manager.

To install the NSX Manager, download all the required binary images for VMware NSX from download section on the VMware portal.

https://my.vmware.com/web/vmware/info/slug/networking_security/vmware_nsx/6_x

This download will require valid login account ID with VMware NSX subscription.

Import the NSX Manager (ovf file) from your server on the selected cluster or host in the vCenter Server, accept the extra configuration, EULA and provide the details for desired destination.

Configure the CLI “admin” User and Privilege Mode Password, Network Properties like hostname, IP address,

Configure the CLI “admin” User and Privilege Mode Password, Network Properties like hostname, IP address, Netmask, Gateway, DNS & NTP setting.

IP address, Netmask, Gateway, DNS & NTP setting. After completing all the steps, NSX Manager appliance

After completing all the steps, NSX Manager appliance will be deployed in few minutes and power on the NSX Manager Virtual Machine in the vCenter Server.

Login into NSX Manager IP address using any web browser with admin user and password defined during installation.

This will open NSX Manager Virtual Appliance Management. Click on Manage vCenter Registration to install

This will open NSX Manager Virtual Appliance Management. Click on Manage vCenter Registration to install the Network and Security Plugin into vCenter Server.

install the Network and Security Plugin into vCenter Server. On the left side under Components select

On the left side under Components select NSX Management Service and select Edit under vCenter Server. Enter the required information to register vCenter Server with NSX Manager like vCenter Server Name or IP Address, vCenter User Name and Password, and click ok to proceed.

Accept the Trust Certificate and proceed with this certificate. Once registration is completed, Network and

Accept the Trust Certificate and proceed with this certificate. Once registration is completed, Network and Security icon will show up in the vCenter Server home section. This icon will provide you all the features to manage the NSX.

icon will provide you all the features to manage the NSX. Installation of Control plane NSX

Installation of Control plane

NSX Controllers are the control plane for all logical switches and at least three controllers cluster is recommended for any production environment. These controllers maintain database and information of all the virtual machines, hosts, logical switches, logical routers and VXLANs. The controller cluster will be distributed across hosts to provide the high availability for all the logical switches in

case of failure of any host. Define the anti-affinity rule to separate the controllers on separate ESXi servers.

Click on Network & Security plugin in vCenter Server to open the management of NSX. Select the Installation on left side, and click on green plus symbol to add a Controller under NSX Controller nodes.

plus symbol to add a Controller under NSX Controller nodes. This will open Add Controller window;

This will open Add Controller window; select the relevant Datacenter, Cluster/Resource Pool, Datastore, and Host from drop down option.

Add Controller window; select the relevant Datacenter, Cluster/Resource Pool, Datastore, and Host from drop down option.

Select to Connect this controller to Management Distributed vSwitch PortGroup for network connectivity.

Distributed vSwitch PortGroup for network connectivity. Create the IP pool required for network connectivity. This

Create the IP pool required for network connectivity. This IP pool will be used by this controller and subsequent controllers installed. Enter the Pool name, Gateway, Prefix Length, Primary & Secondary DNS, DSN Suffix and Static IP Pool.

After this is done we just need to set a management password for the controller.

After this is done we just need to set a management password for the controller.

just need to set a management password for the controller. Once the controller will be deployed.

Once the controller will be deployed. Repeat the steps for controller 2 and 3. It is recommended to have minimum 3 controllers in production environment, however for testing purpose one or two controllers can be used.

Installation of Data Plane

After installation of controllers, the clusters will be configured for VXLAN transport traffic. The host preparation will install VXLAN interface on each host and enable it for VXLAN communication. This interface will encapsulate the network segment packets from Virtual Machines to transport it to other hosts and this make virtual machines to communicate over L3 boundaries. The VMware Distributed Switch will become VMware NSX Switch, once below three VIBs will installed on each hosts.

VXLAN

Distributed Router

Distributed Firewall

Once the hosts will be enabled with VXLAN Transport,

Click on Network & Security plugin in vCenter Server to open the management of NSX. Select the Installation on left side, and click on Host Preparation Tab, click on Install option for each cluster. This install will enabled the VXLAN feature for that cluster and hosts inside the cluster.

VXLAN feature for that cluster and hosts inside the cluster. Click on Yes option to install

Click on Yes option to install on the selected cluster.

Click on Yes option to install on the selected cluster. This will take some time to

This will take some time to do the installation. Sometime, this installation will show warning message and unable to do the installation. In this case, refresh the vCenter Server and verify Firewall and VXLAN column should be green and enabled for that cluster.

Click on the Logical Network Preparation Tab, Under VXLAN Transport sub tab, Click on configuring

Click on the Logical Network Preparation Tab, Under VXLAN Transport sub tab, Click on configuring the host for VXLAN transport. Select the Distributed Switch, enter the desired VLAN, MTU 1600, create IP Address pool for VMKNic and assign Static Etherchannel Policy for each cluster.

and assign Static Etherchannel Policy for each cluster. Once each cluster or host will have configured

Once each cluster or host will have configured with VXLAN Tranport IP address, verify that each host get another vmkernal IP address and enabled with vxlan in TCPIP stack column.

Tranport IP address, verify that each host get another vmkernal IP address and enabled with vxlan

Once all the clusters and hosts are configured for VXLAN transport IP address, click on Segment ID sub tab. Segment ID pools are used by logical segments for VXLAN Network Identifier (VNI). You can enable multicast address, when configuring the Hybrid or Multicast Control plan modes.

when configuring the Hybrid or Multicast Control plan modes. After adding Segment ID Pool, Click on

After adding Segment ID Pool, Click on Transport Zones sub tab under Logical Preparation Tab, to create the global transport zone for all the clusters, click on Green Sign to add transport zone.

all the clusters, click on Green Sign to add transport zone. Enter the name, Description, Control

Enter the name, Description, Control Plane Mode and Select the clusters to add into transport zone.

After this basic setup for VMware NSX is ready for all clusters and hosts. Addition

After this basic setup for VMware NSX is ready for all clusters and hosts. Addition of new hosts and clusters will follow the above process for Virtual Machine network provisioning.

Next steps are to create the logical switches, edge router and distributed logical routers.

Under the Network and Security plugin within vCenter click on the Logical Switches menu, deploy a Logical Switch, click the Green plus. Enter Name as Logical-Transport, Description and Control Plane Mode as Unicast.

Repeat the steps for Logical-Web, Logical-App and Logical-DB switches. Each logical switch will get one

Repeat the steps for Logical-Web, Logical-App and Logical-DB switches. Each logical switch will get one Segment ID assigned earlier in the pool.

switch will get one Segment ID assigned earlier in the pool. Attach the Web, App &

Attach the Web, App & DB VMs with created above respective logical switches. Click on little plus icon with three blue boxes, attach the virtual machines associate with web logical switch.

the virtual machines associate with web logical switch. Select vNICs associated to the VM need to

Select vNICs associated to the VM need to be attached to the logical switch.

Login into Virtual Machine’s console, verify the IP Address of the Web VM1 and ping

Login into Virtual Machine’s console, verify the IP Address of the Web VM1 and ping with IP address of Web VM2. Ping will work, these 2 Web VMs running on two different hosts are connected to layer 2 network and communicating over VXLAN logical switch.

layer 2 network and communicating over VXLAN logical switch. In the vCenter Server, click on the

In the vCenter Server, click on the Networking and Security plugin. Select NSX edge, click the green plus. Select the Logical (Distributed) Router from the Install Type and fill the name, hostname and description, click Next.

Set an administrative password and username. Choose Enable SSH access to access through CLI. Select

Set an administrative password and username. Choose Enable SSH access to access through CLI.

username. Choose Enable SSH access to access through CLI. Select the Cluster, Datastore, Host and Folder

Select the Cluster, Datastore, Host and Folder to place this Edge VM, click ok.

Select the Management Port group on Distributed Switch, click Ok

Select the Management Port group on Distributed Switch, click Ok

Select the Management Port group on Distributed Switch, click Ok

Click the Green plus on the add interface, fill the Primary IP address and subnet prefix length for management traffic, click ok.

and subnet prefix length for management traffic, click ok. Add the Uplink for DLR, Select the

Add the Uplink for DLR, Select the Type Uplink, connected to Logical-Transport switch created earlier, click on green plus to add the IP address, subnet prefix length for this interface, click ok.

Similarly, add the interface for Web, App & DB tier applications and respective IP addresses.
Similarly, add the interface for Web, App & DB tier applications and respective IP addresses.

Similarly, add the interface for Web, App & DB tier applications and respective IP addresses.

After adding all the required interfaces, click on Finish.

After adding all the required interfaces, click on Finish.

After adding all the required interfaces, click on Finish.

Once the DLR route is deployed, the kernel routing will be enabled for Virtual Machines.

the kernel routing will be enabled for Virtual Machines. Ping gateway IP address of respective Web,

Ping gateway IP address of respective Web, App & DB tier and it should be reachable.

Web, App & DB tier and it should be reachable. Verify IP route in the DLR

Verify IP route in the DLR router and execute show ip route command. The routes should be visible from different tiers.

Next step is to deploy the Edge Service Gateway (ESG) to connect the application to

Next step is to deploy the Edge Service Gateway (ESG) to connect the application to corporate or external network. The ESG can provide routing, firewall, load balancer, VPN, Layer 2 bridging and more services. We will deploy here routing Services.

In the vCenter Server, click on the Networking and Security plugin. Select NSX edge, click the green plus. Select the Edge Services Gateway (ESG) from the Install Type and fill the name, hostname and description, click Next.

Select the Edge Services Gateway (ESG) from the Install Type and fill the name, hostname and

Fill the username and password and select the Enable SSH access, click Next.

and password and select the Enable SSH access, click Next. Select the Datacenter from the dropdown

Select the Datacenter from the dropdown menu, Appliance Size as per your requirements. Select the Enable auto rule generation option, click on NSX Edge Appliance to place in the vCenter Server. Choose the Resource Pool, Select Datastore, Host and Folder for ESG VM placement, click ok and Next.

the vCenter Server. Choose the Resource Pool, Select Datastore, Host and Folder for ESG VM placement,

Add the internal and uplink interfaces for ESG Appliance. Click on Add Edge NSX interface, fill the name, and select type Uplink. Click on Connected to change distributed port group for uplink interface. Click on green plus sign, add the IP address and Subnet prefix length, click ok twice.

add the IP address and Subnet prefix length, click ok twice. Similarly configured the internal interface

Similarly configured the internal interface to communicate with DLR appliance in the same subnet and management interface.

Choose Configure the Default Gateway option, select the Uplink Network Interface in vNIC drop down, add the gateway details of SVI Interface created on Nexus Switch for communication to external network to internal network, MTU 1500, click Next.

In Firewall and HA, choose Configure Firewall default policy and select Default Traffic Policy with

In Firewall and HA, choose Configure Firewall default policy and select Default Traffic Policy with Accept and logging with Disable option. If HA has been configured, then here you can specify the keep alive link and relevant configurations and click Next.

keep alive link and relevant configurations and click Next. Verify the details and Select Finish to

Verify the details and Select Finish to deploy the ESG appliance.

Now ESG and DLR appliances are ready, we will configure dynamic routing between the two

Now ESG and DLR appliances are ready, we will configure dynamic routing between the two routers.

we will configure dynamic routing between the two routers. Now we will configure the Open Shortest

Now we will configure the Open Shortest Path First (OSPF) dynamic routing protocol between NSX Edge and LDR.

Select NSX Edges and double-click on the Logical Distributed Router that was deployed previously. Under the Manage tab, select Routing, Global Configuration and select Edit on Default Gateway and enter the default gateway IP address of NSX Edge in the same subnet, click save.

Edit on Dynamic Router Configuration, Select the Router ID as Uplink interface and click on

Edit on Dynamic Router Configuration, Select the Router ID as Uplink interface and click on Enable OSPF, click on Save.

the Router ID as Uplink interface and click on Enable OSPF, click on Save. Accept the

Accept the changes and click Publish Changes.

Select the OSPF tab on the left side; click the Edit button for OSPF Configuration.

Select the OSPF tab on the left side; click the Edit button for OSPF Configuration.

the left side; click the Edit button for OSPF Configuration. Select the Enable OSPF option; fill

Select the Enable OSPF option; fill the Protocol Address and Forwarding Address, Click OK to finish.

Enter the Area ID, Select Type as Normal and Authentication as None. Click on Ok.

Enter the Area ID, Select Type as Normal and Authentication as None. Click on Ok.

Type as Normal and Authentication as None. Click on Ok. Next click the Green Plus under

Next click the Green Plus under Area Definitions. OSPF neighbors need to peer with routers with the same area ID. We defined Area 10 earlier and therefore we need to use this again.

Next click the Green Plus under Area Definitions. Select the Uplink interface and Area as defined earlier.

Review the changes and now click Publish Changes. This will enable OSPF on Logical Router.

Review the changes and now click Publish Changes. This will enable OSPF on Logical Router.

Click the Route Redistribution, verify that OSPF is enabled and there is Route Distribution Rule

Click the Route Redistribution, verify that OSPF is enabled and there is Route Distribution Rule with permit. This will allow the OSFP route to be permitted in the hypervisor kernel and present the OSPF route to ESG router.

permit. This will allow the OSFP route to be permitted in the hypervisor kernel and present

Similarly, we need to enable OSPF on the Edge Service Gateway (ESG), double click the ESG, Select the Routing sub tab under Manage Tab. Notice the Default Gateway is already populated from the deployment window.

Gateway is already populated from the deployment window. Select the Edit button next to Dynamic Routing

Select the Edit button next to Dynamic Routing Configuration. Select the Router ID from drop down menu, choose Enable OSPF option, click Save.

ID from drop down menu, choose Enable OSPF option, click Save. Click Publish Changes to save

Click Publish Changes to save the changes and publish them.

Select the OSPF menu on the left side. Click the Green Plus under the Area

Select the OSPF menu on the left side.

Select the OSPF menu on the left side. Click the Green Plus under the Area Definitions,

Click the Green Plus under the Area Definitions, select vNIC and Area from drop down menu. Keep Advance parameter as default, click Save.

Publish Changes window will pop up after save it and click on Publish Changes to

Publish Changes window will pop up after save it and click on Publish Changes to populate the setting to other routers.

Let’s verify that all the routes from internal network are published on the ESG router.

Let’s verify that all the routes from internal network are published on the ESG router.

Open the console session of ESG route and execute the show ip route command. You will see that all Web, App & DB tier networks will be listed in the ESG router and it will make OSPF adjacency with DLR router.

that all Web, App & DB tier networks will be listed in the ESG router and

Summary

In this chapter, we have discussed about the minimum requirements for VMware NSX deployment and walked through the deployment of management, control and data components of VMware NSX. In the next chapter we will discuss about VMware NSX Edge Services Gateway and its different functions, such as NAT, firewall, DHCP, HA etc.

4

Edge Services Gateway

The VMware NSX Edge services gateway provides services such as routing, perimeter firewall,

network address translation (NAT), DHCP, VPN, load balancing, and high availability.

The NSX Edge services gateway is a key component in the communication link providing network

function virtualization in an agile, software-defined data center while providing high-speed

throughput.

NSX Edge provides the following benefits:

Near real-time service instantiation

Support for dynamic service differentiation per tenant or application

Scalability and Redundancy NSX Edge

Equal Cost Multi-Path functionality (ECMP) introduced in VMware NSX release 6.1 ECMP has the potential to offer substantial increases in bandwidth by load-balancing traffic over multiple paths as well as providing fault tolerance for failed paths. This is a feature which is available on physical networks but we are now introducing this capability for virtual networking as well. ECMP uses a dynamic routing protocol to learn the next-hop towards a final destination and to converge in case of failures. For a great demo of how this works, you can start by watching this video, which walks you through these capabilities in VMware NSX.

To keep pace with the growing demand for bandwidth, the data center must meet scale out requirements, which provide the capability for a business or technology to accept increased volume without redesign of the overall infrastructure. The ultimate goal is avoiding the “rip and replace” of the existing physical infrastructure in order to keep up with the growing demands of the applications. Data centers running business critical applications need to achieve near 100 percent uptime. In order to achieve this goal, we need the ability to quickly recover from failures affecting the main core components. Recovery from catastrophic events needs to be transparent to end user experiences.

ECMP with VMware NSX 6.1 allows you to use up to a maximum of 8 ECMP Paths simultaneously. In a specific VMware NSX deployment, those scalability and resilience improvements are applied to

the “on-ramp/off-ramp” routing function offered by the Edge Services Gateway (ESG) functional component, which allows communication between the logical networks and the external physical infrastructure.

logical networks and the external physical infrastructure. External user’s traffic arriving from the physical core

External user’s traffic arriving from the physical core routers can use up to 8 different paths (E1-E8) to reach the virtual servers (Web, App, DB).

In the same way, traffic returning from the virtual server’s hit the Distributed Logical Router (DLR), which can choose up to 8 different paths to get to the core network.

Path Determination

When a traffic flow needs to be routed, the round robin algorithm is used to pick up one of the links as the path for all traffic of this flow. The algorithm ensures to keep in order all the packets related to this flow by sending them through the same path. Once the next-hop is selected for a particular Source

IP and Destination IP pair, the route cache stores this. Once a path has been chosen, all packets related to this flow will follow the same path.

There is a default IPv4 route cache timeout, which is 300 seconds. If an entry is inactive for this period of time, it is then eligible to be removed from route cache. Note that these settings can be tuned for your environment.

Distributed Logical Router (DLR):

The DLR will choose a path based on a Hashing algorithm of Source IP and Destination IP.

Failure scenario of Edge Devices

In order to work with ECMP the requirement is to use a dynamic routing protocol: OSPF or BGP. If we take OSPF for example, the main factor influencing the traffic outage experience is the tuning of the OSPF timers. OSPF will send hello messages between neighbours, the OSPF “Hello” protocol is used and determines the Interval as to how often an OSPF Hello is sent.

Another OSPF timer called “Dead” Interval is used, which is how long to wait before we consider an OSPF neighbour as “down”. The OSPF Dead Interval is the main factor that influences the convergence time. Dead Interval is usually 4 times the Hello Interval but the OSPF (and BGP) timers can be set as low as 1 second (for Hello interval) and 3 seconds (for Dead interval) to speed up the traffic recovery.

ECMP failed Edge

ECMP failed Edge

In the example above, the E1 NSX Edge has a failure; the physical routers and DLR detect E1 as Dead at the expiration of the Dead timer and remove their OSPF neighbour relation with him. As a consequence, the DLR and the physical router remove the routing table entries that originally pointed to the specific next-hop IP address of the failed ESG.

As a result, all corresponding flows on the affected path are re-hashed through the remaining active units. It’s important to emphasize that network traffic that was forwarded across the non-affected paths remains unaffected.

Troubleshooting and visibility

With ECMP it’s important to have introspection and visibility tools in order to troubleshoot optional point of failure. Let’s look at the following topology.

Troubleshooting

Troubleshooting

A user outside our Datacenter would like to access the Web Server service inside the Datacenter. The user IP address is 192.168.100.86 and the web server IP address is 172.16.10.10. This User traffic will hit the Physical Router (R1), which has established OSPF adjacencies with E1 and E2 (the Edge devices). As a result, R1 will learn how to get to the Web server from both E1 and E2 and will get two different active paths towards 172.16.10.10. R1 will pick one of the paths to forward the traffic to reach the Web server and will advertise the user network subnet 192.168.100.0/24 to both E1 and E2 with OSPF.

E1 and E2 are NSX for vSphere Edge devices that also establish OSPF adjacencies with the DLR. E1 and E2 will learn how to get to the Web server via OSPF control plane communication with the DLR. From the DLR perspective, it acts as a default gateway for the Web server. This DLR will form an

OSPF adjacency with E1 and E2 and have 2 different OSPF routes to reach the user network. From the DLR we can verify OSPF adjacency with E1, E2.

We can use the command: “show ip ospf neighbor”:
We can use the command: “show ip ospf neighbor”:

From this output we can see that the DLR has two Edge neighbours: 198.168.100.3 and 192.168.100.10. The next step will be to verify that ECMP is actually working.

We can use the command: “show ip route”
We can use the command: “show ip route”

The output from this command shows that the DLR learned the user network 192.168.100.0/24 via two different paths, one via E1 = 192.168.10.1 and the other via E2 = 192.168.10.10. Now we want to display all the packets which were captured by an NSX for vSphere Edge interface.

In the example below and in order to display the traffic passing through interface vNic_1, and which is not OSPF protocol control packets, we need to type this command:

debug packet display interface vNic_1 not_ip_proto_ospf

We can see an example with a ping running from host 192.168.100.86 to host 172.16.10.10

a ping running from host 192.168.100.86 to host 172.16.10.10 If we would like to display the

If we would like to display the captured traffic to a specific ip address 172.16.10.10, the command capture would look like:

“debug packet display interface vNic_1 dst_172.16.10.10”
“debug packet display interface vNic_1 dst_172.16.10.10”

* Note: When using the command “debug packter display interface” we need to add underscore between the expressions after the interface name.

Useful CLI for Debugging ECMP

To check which ECMP path is chosen for a flow

debug packet display interface IFNAME

To check the ECMP configuration

show configuration routing-global

To check the routing table

To check the forwarding table

show ip forwarding

Useful CLI for Dynamic Routing

show ip ospf neighbor show ip ospf database show ip ospf interface show ip bgp neighbors show ip bgp

ECMP Deployment Consideration

Equal Cost Multipath, ECMP traffic involved Asymmetric routing between Edges and DLR or between Edge and physical routers. In Asymmetric routing, a packet traverses from a source to a destination in one path and takes a different path when it returns to the source. Because of this ECMP currently implies stateless behavior. This means that there is no support for stateful services such as the Firewall, Load Balancing or NAT on the NSX Edge Services Gateway.

ECMP with Asymmetric routing is not a problem by itself, but will cause problems when more than one NSX Edge in place and stateful services inserted in the path of the traffic.

Problem Statement

User from outside try to access Web VM inside the Data Center. the traffic will pass through E1 Edge. From E1 the traffic will go to DLR transverse NSX distributed firewall and get to Web VM. When Web VM respond back the traffic will hit the DLR default gateway. DLR have two option to route the traffic E1 or E2. If DLR choose E2 the traffic will get the E2 and will Dropped!

The reason for this is E2 does not aware the state of session started at E1, replay packet from Red VM arrived to E2 are not match any existing session at E2.

From E2 perspective this is new session need to validate, any new TCP session should start with SYN, since this is not the begin of the session E2 will drop it!

Asymmetric Routing with Edge Firewall Enabled

Asymmetric Routing with Edge Firewall Enabled

Note: NSX Distributed firewall is not part of this problem. NSX Distributed firewall implements at the vNic level, all traffic get in/out through the same vNic. There is no Asymmetric route in the vNic level, and this is the reason when we vMotion VM, the Firewall Rule, Connection state is move with the VM itself.

ECMP and Edge Firewall NSX

In version 6.1 when we enable ECMP on NSX Edge, we get the following message

the VM itself. ECMP and Edge Firewall NSX In version 6.1 when we enable ECMP on

The firewall service disabled by default:

Starting from 6.1.2 Edge Firewall not disabled automatic on ESG when ECMP is enabled, administrator will need to turn off Firewall when enable ECMP.

NSX Edge and DRS Rules

The NSX Edge Cluster Connect Logical to the Physical world and contain an NSX Edge gateway and DLR Control VM. There are cases in design that the Edge Cluster contains the NSX Controller’s. Important design key point of edge cluster is to survive different host failure or entire chassis with minimal impact of the workload connectivity.

In the figure below we deploy NSX Edges, E1 and E2, in ECMP mode and they are active/active. The DLR Control VMs runs active/passive while both E1 and E2 running a dynamic routing protocol with the active DLR Control VM. When the DLR learns a new route from E1 or E2, it updates this info to the NSX Controller. The NSX Controller updates the routing tables to each of the ESXi hosts, which are running this DLR instance.

So that happens in this scenario when the ESXi host, which contains the Edge E1

So that happens in this scenario when the ESXi host, which contains the Edge E1 failed? Let me explain what happens.

The active DLR will update the NSX controller to remove E1 as next hop, the NSX Controller will update the ESXi host and as a result the “Web” VM traffic will be routed to Edge E2. The time it takes to re-route the traffic depends on the dynamic protocol converge time.

In the scenario when the ESXi or Chassis contains both the Edge E1 and the

In the scenario when the ESXi or Chassis contains both the Edge E1 and the active DLR fails, we will face a longer outage in the forwarded traffic.

The reason for this is that the active DLR is down and cannot update the Controller on failures. The ESXi will continue to forward traffic to Edge E1 until the passive DLR becomes active, learns that the Edge E1 is down and updates the NSX Controller.

The Golden Rule is that we must ensure that when the Edge gateway and the

The Golden Rule is that we must ensure that when the Edge gateway and the DLR belong to the same tenant they will not reside in the same ESXi host. It is better to distribute them between ESXi hosts and reduce the affected functions.

By default, when we deploy a NSX Edge or DLR in active/passive mode, the system takes care of creating a DRS anti-affinity rule and this prevents the active/passive VM’s from running in the same ESXi host.

DRS anti affinity rules

DRS anti affinity rules

We need to build new DRS rules as these default rules will not prevent us from getting to the previous scenario.

The figure below describes my Lab logical view. This topology is built from two different tenants where each tenant is being represented in a differenced color and has its own Edge and DLR.

I removed the connectivity to the physical world in order to simplify the diagram.

color and has its own Edge and DLR. I removed the connectivity to the physical world

My physical Edge Cluster has four ESXi hosts which are distributed over two physical chassis:

Chassis A: esxcomp-01a, esxcomp-02a Chassis B: esxcomp-01b, esxcomp-02b

esxcomp-01a, esxcomp-02a Chassis B: esxcomp-01b, esxcomp-02b You need to create DRS Host Group for each Chassis,

You need to create DRS Host Group for each Chassis, we start with creating a container for all the ESXi hosts in Chassis A, this container group configured is in DRS Host Group.

Edge Cluster -> Manage -> Settings -> DRS Groups

Click on Create Add button and call this group “Chassis A”.

Container type need to be “Host DRS Group” and Add ESXi host running on Chassis A, in my lab its esxomp-01a and esxcom-02a.

Another group Chassis B, in my lab its esxcomp-01b and esxcomp-02b: VM’s DRS Group for

Another group Chassis B, in my lab its esxcomp-01b and esxcomp-02b:

group Chassis B, in my lab its esxcomp-01b and esxcomp-02b: VM’s DRS Group for Chassis A:

VM’s DRS Group for Chassis A:

We need to create a container for VMs that will run in Chassis A. At this point we just name it as Chassis A, but we are not actually put the VMs in Chassis A.

This Container type is “VM DRS Group”:

VM DRS Group for Chassis B:

VM DRS Group for Chassis B:

VM DRS Group for Chassis B:

At this point we have four DRS groups:

At this point we have four DRS groups: Now we need to take the DRS object

Now we need to take the DRS object we created before: “Chassis A” and “VM to Chassis A” and tie them together. The next step is to do the same for “Chassis b” and “VM to Chassis B”

* This configuration needs to be part of “DRS Rules”.

Edge Cluster -> Manage -> Settings -> DRS Rules

Click on the Add button in DRS Rules, in the name enter something like: “VM’s Should Run on Chassis A”

In the Type select “Virtual Machine to Hosts” because we want to bind the VM’s group to the Hosts Group. In the VM group name choose “VM to Chassis A” object.

Below the VM group selection we need to select the group & hosts binding enforcement type. We have two different options:

Should run on hosts in group

Must run on hosts in group

If we choose “Must” option, in the event of the failure of all our ESXi hosts in this group (for example Chassis A as critical power outage) other ESXi hosts in the cluster (Chassis B) will not be considered as option for vSphere HA recovery VM’s.

“Should” option will take other ESXi hosts as recovery option.

Same for Chassis B: Now the problem with the current of the DRS rules and

Same for Chassis B:

Same for Chassis B: Now the problem with the current of the DRS rules and the

Now the problem with the current of the DRS rules and the VM placement in this Edge cluster is that the Edge and DLR are actually running in the same ESXi host. We need to create anti-affinity DRS rules.

Anti-Affinity Edge and DLR:

An Edge and DLR that belong to the same tenant should not run in the same ESXi host.

For Green Tenant:

An Edge and DLR that belong to the same tenant should not run in the same

For Blue Tenant:

An Edge and DLR that belong to the same tenant should not run in the same

The Final Result:

In the case of a failure of one of the ESXi hosts we don’t face the problem where Edge and DLR are on the same ESXi host, even if we have a catastrophic event of a chassis A or B failure.

problem where Edge and DLR are on the same ESXi host, even if we have a

Edge NAT

One of the most important NSX Edge features is NAT. With NAT (Network Address Translation) we can change the Source or Destination of the IP address and TCP/UDP port. Combined NAT and Firewall rules can lead to confusion when we try to determine the correct IP address to apply the firewall rule.

To create the correct rule, we need to understand the packet flow inside the NSX Edge in details. In NSX Edge we have two different type of NAT, SNAT (Source NAT) and DNAT (Destination NAT).

SNAT

Allow to translate internal IP address (for example private IP describe RFC 1918) to public External IP address. In figure bellow any VM’s from VXLAN 5001 that need outside connectivity to WAN can translated to an external IP address which is configured on the Edge. For example, VM1 with IP address 172.16.10.11 need to communicate with WAN internet, NSX Edge can translate it to a 192.168.100.50 IP address configured on Edge external interface.

Users from outside are not aware of the internal Private IP address.

IP address configured on Edge external interface. Users from outside are not aware of the internal

DNAT

Allow to access internal private server from outside world. In the example figure below, users from the WAN need to communicate with Server 172.16.10.11. NSX Edge DNAT was created so that the users from outside connect to 192.168.100.51 and NSX Edge will translate this IP address to

172.16.10.11.

so that the users from outside connect to 192.168.100.51 and NSX Edge will translate this IP

The following figure is the outline of the Packet flow process inside the Edge. The important parts are where the SNAT/DNAT Action and firewall decision action are being taken.

Action and firewall decision action are being taken. We can see from this process that the

We can see from this process that the ingress packet will evaluate against firewall rules before SNAT/DNAT action.

Note: The actual packet flow details are more complicated with more action/decisions in Edge flow, but the emphasis is on the NAT and Firewall. NAT function will work only if firewall service is enable.

in Edge flow, but the emphasis is on the NAT and Firewall. NAT function will work

Firewall rules and SNAT

Because of this packet flow the firewall rule for SNAT need to be apply at internal IP address object and not NAT IP address.

For example, when a VM1 172.16.10.11 needs to communicant with WAN, the firewall rule needs to be:

to communicant with WAN, the firewall rule needs to be: Firewall rules and DNAT Because of

Firewall rules and DNAT

Because of this packet flow the firewall rule for DNAT need to be apply at NAT IP address object and not Private IP address after the NAT. When a user from the WAN will send traffic to 192.168.100.51, this packet will be checked against this F.W rule and then the NAT will change it to

172.16.10.11

F.W rule and then the NAT will change it to 172.16.10.11 DNAT Configuration User from outside

DNAT Configuration

User from outside need to access to internal web server with public IP address. The server internal IP address is 172.16.100.11, the NAT IP address is 192.168.100.6

First step is to Create External IP on the Edge, this IP is secondary because this edge already has main IP address. Btw the main IP address is mark with black Dot (192.168.100.3).

For this example, the DNAT IP address is 192.168.100.6.

Btw the main IP address is mark with black Dot (192.168.100.3). For this example, the DNAT

Create DNAT Rule in the Edge:

Create DNAT Rule in the Edge: Now pay attention to firewall rule need to be open

Now pay attention to firewall rule need to be open one the Edge. The user coming from outside and try to access 192.168.100.6, so the firewall rule need to allow this access.

so the firewall rule need to allow this access. Verification: There are few ways to verify

Verification:

There are few ways to verify NAT. in our example users from any source address need to access IP address 192.168.100.6, after the DNAT action happen the translated packet is 172.16.10.11.

The output of the command:

show nat

The output of the command: show nat The output of the command: Show firewall flow

The output of the command:

Show firewall flow

The output of the command: show nat The output of the command: Show firewall flow

We can see that packet entered to edge designated to 192.168.100.6, the return packet coming from different IP address 172.16.10.11. That mean NAT happens here.

We can capture the traffic and see the actual packet.

Capture Edge traffic on its outside interface vNic_0, in this example user source IP address is 192.168.110.10 and destination is 192.168.100.6

The command to capture this is as below:

Debug packet display interface vNic_0

port_80_and_src_192.168.110.10

display interface vNic_0 port_80_and_src_192.168.110.10 Capture edge traffic on its internal interface vNic_1, we

Capture edge traffic on its internal interface vNic_1, we can see destination IP address as changed to 172.16.10.11 because of DNAT:

IP address as changed to 172.16.10.11 because of DNAT: SNAT configuration All server from VXLAN 172.16.10.0/24

SNAT configuration

All server from VXLAN 172.16.10.0/24 need to NAT on the outside interface of the Edge with IP address 192.168.100.3

SNAT Configuration:

SNAT Configuration: Edge Firewall Rules: Allow to 172.16.10.0/24 to go out For verification, use the following

Edge Firewall Rules:

Allow to 172.16.10.0/24 to go out

Configuration: Edge Firewall Rules: Allow to 172.16.10.0/24 to go out For verification, use the following command

For verification, use the following command

show nat

PAT PAT – Port Address translation allow to change Layer4 TCP/UDP port. For example, we

PAT

PAT – Port Address translation allow to change Layer4 TCP/UDP port. For example, we would like to mask our internal SSH server port for all users from outside. The new port will be TCP/222 instead of regular SSH TCP/22 port.

The user establishes to Server with port TCP/222 but NSX Edge will change it to TCP/22.

NAT Order

For this scenario, we need to create two different SNAT rule.

SNAT Rule 1:

All VXLAN 5001 172.16.10.0/24 need to translate to the outside interface of Edge device, which is

192.168.100.3

SNAT Rule 2:

Web-SRV-01a is on VXLAN 5001 with IP address 172.16.0.11 that need to translate on outside interface of Edge device, which is 192.168.100.4

on outside interface of Edge device, which is 192.168.100.4 In this example traffic will never hit

In this example traffic will never hit rule number 4 because 172.16.10.4 is part of subnet

172.16.10.0/24.

We need to re-order the NAT and put the more specific NAT before the wide network.

is part of subnet 172.16.10.0/24. We need to re-order the NAT and put the more specific

After re-order:

After re-order: Summary In this chapter, we have discussed about the VMware NSX Edge Services Gateway

Summary

In this chapter, we have discussed about the VMware NSX Edge Services Gateway and its different functions, such as NAT, firewall, DHCP, HA etc in detail. We have discussed about design choices and how will they impact the configuration as well. In the next chapter we will discuss about one of the most powerful feature of VMware NSX and that is Distributed Firewall and its various use cases and troubleshooting when it goes wrong.

5

Distributed Firewall

With the release of NSX-v, a very strong and agile security technology was introduced to primarily address and overcome certain limitations that were inherent in the vCloud Networking and Security App product (also known and referred to as vShield App). One of the most noticeable and eagerly awaited aspects, was the new version of firewall capabilities, and in particular the ability to deliver close to line rate throughput due to an “in-kernel” firewalling module that was moved from a previously service virtual machine based architectural security model.

The Distributed Firewall now provides a very flexible model as it is now embedded in the VMkernel of the ESXi host and provides a secure juncture between the virtual machines and with the physical world.

Micro-segmentation

Fortune 500 enterprises have been losing the war against cyber-criminal forces, according to the 2013 State of Cybercrime Survey from PwC and CSO magazine which included responses from 500 U.S. executives, security experts, and others from both private and public sectors. These organizations are turning to the latest and best tools for self-defense while trying to determine what economic impact fighting cyber-crime will have on their organizations.

Hence, it requires you to continuously re-evaluate the current security standards and compliance adherence in your datacenter strictly with a lot of CAPEX involved to achieve the desired level of security. With the evolution of the NSX-v platform, this has been simplified with the “Micro- segmentation” security model.

Micro-segmentation is not new, but conceptually introduces zones in your NSX for vSphere environment. This enables you to manage your workload security without having to introduce hardware firewalls to track sessions for your file servers, RDP or SSH to critical business servers and so on.

Micro-segmentation helps you to critically define zones in your environment to retain workload security with mobility without having to perform tedious reconfiguration when the workload moves from one rack to another or even one datacenter to another. To achieve this functionality, the NSX-v product provides an out-of-the-box Distributed Firewall feature, which can be leveraged to define rules for the “East-West” traffic generated in your datacenter.

Generally, it is seen in the datacenter environment that the “East-West” traffic is the one that is growing with the amount of application and business needs that continue to increase at a brisk pace. Due to the growing business demands and concerns about security in modern cyber security warfare, it has become critical to understand the points surrounding security hardening. According to NIST/Forrester research recently which gave birth to the “Zero trust security” model, most of the breaches in the cyber security have occurred due to the uncontrolled access or inefficient RBAC (Role based access control) which pose a high security threat if a specific asset is compromised or stolen which gives rise to most of the datacenter network exploits.

which gives rise to most of the datacenter network exploits. To further complicate the situation, most

To further complicate the situation, most of the security devices today operate at the datacenter core network that creates a traffic trombone effect directly impacting the performance. It is the hypervisor- embedded nature of the distributed firewall that delivers close to line rate throughput to enable higher workload consolidation on physical servers. The distributed nature of the distributed firewall provides a scale-out architecture that extends firewall capacity when additional hosts are added to a data center.

The distributed firewall enables you to perform the traffic introspection at the virtual machine’s vNIC level at almost line rate performance, which overcomes the above said constraint.

How does the Distributed Firewall Operates? In the legacy security application for virtual environment, for

How does the Distributed Firewall Operates?

In the legacy security application for virtual environment, for example vCloud Networking and Security App, formerly vShield App and before that known as vShield Zones leveraged a “Service VM” deployed on every single ESXi host to perform the inspection.

Actually, vCloud Networking and Security App Firewall comprised of three entities and the vCloud Networking and Security Manager (aka vShield Manager) which provided the centralized management for all vCloud Networking and Security (aka vShield) products.

vCloud Networking and Security Manager: Within the context of the vCloud Networking and Security App Firewall, the vCloud Networking and Security Manager is a centralized management console which allows users to:

Define firewall policies to control the traffic in/out of the environment.

Define Spoofguard Policies

Define Namespace configuration (also known as realm)

View historical flow data going in/out of the environment.

Lifecycle management of the vCloud Networking and Security Manager App appliance

The three components of the vCloud Networking and Security App Firewall are:

1. vCloud Networking and Security dvFilter module: An ESX Loadable Kernel Module (LKM) that sits in the ESX hypervisor and provides hooks to channelize virtual machine traffic to the vCloud Networking and Security App service VM using VMware dvfilter APIs. The LKM module is also known as fastpath while its counterpart the Service VM is known as slowpath.

vCloud Networking and Security App Service VM aka VSA is a Service VM or appliance which performs the network traffic introspections. It reports, allows or denies traffic for a virtual machine based on the policies configured by end user via the vCloud Networking and Security Manager. The vCloud Networking and Security dvFilter module redirects all the traffic for protected VMs to this service VM.

2. The vSA service VM provides control and datapath processes. The control path process also known as Sysmgr communicates with the vCloud Networking and Security Manager (a java process) over a secured, encrypted channel to receive the control information i.e. firewall configurations, Spoofguard configuration, flow configuration and other data.

3. vCloud Networking and Security dvFilter properties on protected virtual machines: Three properties and associated values are added to the vmx configuration file of every vnic of every virtual machine that is on a host where vCloud Networking and Security App has been installed. These values inform the VMs on a given ESXi host that vCloud Networking and Security App is present. All vCloud Networking and Security appliances are excluded from receiving these properties.

NOTE: slow path and fast path are just VMware internal names to developers. A fast path is a Linux Kernel Module (LKM) sitting in the kernel/hypervisor/ESXi and performs processing of the traffic in real time and its pretty fast, while slow path is a Service VM getting its data from fast path and then processing traffic, there is a round trip involved so it will slow the traffic.

This again raised concerns about the network performance as it was limited to less than 2 Gbps. NSX- v deploys an in-kernel firewall module called vSIP (VMware Internetworking Service Insertion Platform) that handles all traffic inspection sitting at the vNIC level of all virtual machine. The components of distributed firewall are described below:

Message Bus based on AMQP: Advanced Message Queuing Protocol

Message Bus is used by the NSX-v Manager to reliably and securely transfer firewall rules down to the ESXi host.

AMQP is an open standard application layer protocol for message-oriented middleware. The defining features of AMQP are message orientation, queuing, routing (including point-to-point and publish-and-subscribe), reliability and security.

vSIP: VMware Internetworking Service Insertion Platform

vSIP is the distributed firewall kernel module component.

vSIP receives firewall rules from NSX-v Manager through the User World Agent (UWA) and downloads them down to each virtual machine vNIC (using vNIC-FW constructs)

Note: The VMware Internetworking Service-Insertion Platform is also a framework that provides the ability to dynamically introduce 3 rd party as well as VMware’s own virtual and physical security and networking services into VMware’s virtual network.

vNIC-FW:

Construct (or memory space) where firewall rules are stored and enforced. This space contains rules table and connections tracker table

space contains rules table and connections tracker table The above diagram explains the communication of the

The above diagram explains the communication of the components involved in Distributed Firewall Solution.

tracker table The above diagram explains the communication of the components involved in Distributed Firewall Solution.

The above screen shot shows NSX-v Manager running RabbitMQ Server on port 5671 establishes a connection with vsfwd (User world agent) running RabbitMQ client on the ESXI host.

(User world agent) running RabbitMQ client on the ESXI host. The above screen shot shows VMware

The above screen shot shows VMware Internetworking Service Insertion Platform module installed on an ESXI host.

Before we look at how RabbitMQ fits into the NSX-v and ESXi platform, we will start with a brief introduction about what RabbitMQ actually is!

RabbitMQ provides robust messaging for applications, in particular for products such as NSX-v and vCloud Director for Service Providers to name but a few. Messaging describes the sending and receiving of data (in the form of messages) between systems. Messages are exchanged between programs or applications, similar to the way people communicate by email but with selectable guarantees on delivery, speed, security and the absence of spam.

A messaging infrastructure (a.k.a. message-oriented middleware or enterprise service bus) makes it

easier for developers to create complex applications by decoupling individual program components. Rather than communicating directly, the messaging infrastructure facilitates the exchange of data between components. The components need know nothing about each other’s status, availability or implementation, which allows them to be distributed over heterogeneous platforms and turned off and on as required.

In a NSX-v deployment, the NSX-v platform uses the open standard AMQP protocol to publish

messages associated with Blocking Tasks or Notifications. AMQP is the wire protocol natively understood by RabbitMQ and many similar messaging systems, and defines the wire format of messages, as well as specifying the operational details of how messages are published and consumed.

A RabbitMQ server or _broker_, runs within the NSX-v network environment, for example deployed

into NSX-v platform underlying installation as a virtual appliance. Clients belonging to the NSX-v infrastructure itself, as well as other applications interested in notifications connect to the RabbitMQ broker. Such clients then publish messages to, or consume messages from the broker.

The RabbitMQ broker is written in the Erlang programming language and runs on the Erlang virtual machine. Notes on Erlang-related security and operational issues are presented later in this chapter.

The following RabbitMQ parameters are created on the ESXi host when prepared from NSX-v Manager to create the message bus. This message bus is responsible for pushing down the rules to every single ESXi host. The NSX-v Manager establishes this connection over port 5671.

When an ESXi host is being prepared from vSphere web client using the Networking and

When an ESXi host is being prepared from vSphere web client using the Networking and Security plugin, the modules or vSphere Installation bundle (VIB) gets installed in every esxi host which is added to the cluster. These VIBs are available in the NSX-v Manager and gets deployed on every ESXi host using the ESX Agent Manager (EAM) component available in vCenter Server which comes bundled with vCenter Management Web Services. EAM acts as a broker between any solution and the ESXi hosts to deploy their bundles and modules, which can cohesively work to provide any services to the virtual machines. For example: vCloud Networking and Security App deploys a virtual machine on every ESXi host and also deploys a kernel dvfilter module (if not already present) and they together work in fastpath-slowpath model.

The above RabbitMQ parameters are also configured as a part of the preparation process and includes some critical parameters; for example: NSX-v Manager IP Address, certificate thumbprint, RabbitMQ port etc. These parameters help vsfwd or the user world agent running the RabbitMQ client software to connect to NSX-v Manager and operate together.

You could easily traverse through this set of parameters that are available using either the command line or GUI. Here’s an example of esxcfg-advcfg to check the value of /UserVars/RmqIpAddress which shows the IP address of NSX-v Manager:

which shows the IP address of NSX-v Manager: Note: Modifying any of these parameters could cause

Note: Modifying any of these parameters could cause adverse effects on the solution. Hence it is strictly recommended that none of the above parameters should be modified unless advised by VMware Technical Support.

Alignment of slots for the Distributed Firewall

A Slot are the points of injection of services or modules which acts as intermediary between the Guest

Operating System and the physical networking world. You could resemble a slot to a tap in the networking however the difference here is a tap only monitors the traffic and a slot has a module attached to it which can redirected or process the traffic as required. This redirection or processing is based on the policies enforced either by NSX or any third party for advanced services like Deep Packet Inspection. IO Chains are in kernel buffers which holds packet during processing of packets before it is handed over the distributed switch.

The following slots are created in the ESXI kernel to provide services for the Distributed Firewall and redirection capabilities when using NSX for vSphere.

Slot 0 : DVFilter – (Distributed Virtual Filter) Distributed Virtual Filter or DVFilter is the vmkernel module between the protected vNIC at Slot 0 associated Distributed Virtual Switch (DVS) port, and is instantiated when a virtual machine with a protected virtual NIC gets created. It monitors the incoming and outgoing traffic on the protected virtual NIC and performs stateless filtering.

Slot 1: Switch Security (sw - sec) sw-sec module learns VM’s IP and MAC address. sw-sec is a critical component which captures DHCP ACK and ARP broadcast messages and forwards this information as unicast to the NSX for vSphere Controller to maintain the ARP table in order perform the ARP suppression feature. sw-sec is the layer where NSX IP spoofguard is implemented,

NOTE: The Distributed Firewall (DFW) currently does not perform any inspection in DHCPv6 packets. The spoofguard feature is limited to MAC and IPv4/IPv6 addresses.

Slot - 2: VMware - sfw:

This is the place where DFW firewall rules are stored and enforced, VMware-sfw contains rules table and connections table.

Slot - 4: 3rd Party Integration:

This is the place where the redirection rules are written to send traffic to a third party firewall for traffic introspection. The point to note here is, only if a traffic is allowed by DFW at slot 2, the traffic

is redirected to a third party firewall (for instance PAN) to be inspected.

The below screenshot shows the Distributed Firewall module which is attached to slot 2 and switch security module which is attached to slot 1.

What happens in case of VM mobility? Distributed Firewall comes with a feature referred to

What happens in case of VM mobility?

Distributed Firewall comes with a feature referred to as “vMotion Callback” which moves the security rules with VM as it migrates within the datacenter or even across datacenters which are prepared for NSX. This feature helps you to seamlessly migrate the virtual machine across racks without having the needs to perform reconfiguration on your firewall and other security devices to maintain and meet the compliance. It maintains rules and connection tracker tables to ensure that even if the VM is migrated to the last most rack in the environment, no connections are passed uninspected the security of the datacenter is still maintained across all levels.

The below diagram shows the tables maintained by dvfilter module on ESXI host. Rule table contains all the rules enforced to a specific object (for instance: a virtual machine) and the connection tracker tables holds all the live flows which were allowed on the ESXI host.

Here’s a small depiction of how the rules move along with the virtual machine: The

Here’s a small depiction of how the rules move along with the virtual machine:

The machine was initially on a different host and hence the rules and filters were associated to that specific host.

If you look through the command line, you would ideally see the filter associated with

If you look through the command line, you would ideally see the filter associated with the virtual machine on the ESXI host where this machine is powered ON. This filter is a unique name which is one per virtual machine running on each host. The filter is the complete set of policies (rules, address sets etc) which are defined through the distributed firewall pane on the Networking and Security section of vSphere web client.

The filter associated with the virtual machines enlists all the rulesets which has to be processed for any traffic originated or destined for a virtual machine.

any traffic originated or destined for a virtual machine. The UUID mentioned in the filter shows

The UUID mentioned in the filter shows which virtual machine this filter is associated with it. You can retrieve this uuid from the virtual machine configuration file.

virtual machine this filter is associated with it. You can retrieve this uuid from the virtual

What you can also retrieve is the different rulesets and address sets that are associated with this specific filter through command line and identify what are the different rules that are applied to each object.

are the different rules that are applied to each object. The above screen shot shows the

The above screen shot shows the different rules that are applied to the virtual machine.

The below screen shot shows different address sets associated with the filters and applied to specific rules above. For instance, rule 1009 in the above screen shot has ip-securitygroup-13 and the ip address associated with the security group are shown below.

in the above screen shot has ip-securitygroup-13 and the ip address associated with the security group

2-Way Traffic Introspection

Since Distributed firewall sits exacts in between Guest and the Distributed Virtual switch and is enforced on the vNIC layer, the inspection of the traffic happens on all the vNICs it crossed to reach the guest. For instance, in the fig, the source VM sends out a packet to the destination is checked first at the Source’s vNIC level and then passed to the Distributed switch. Before the packet could be passed to the Destination VM’s GOS, it is again checked at the Destination VM’s vNIC level. This ensures that all the check happens diligently before the packet is delivered to the appropriate machine. For an instance, if a rule RULE1 (using “applied to” option) is available only at source vm and a rule RULE2 (using “applied to” option) is available only at the destination vm, this mechanism ensures that both the rules are honoured before the packet is delivered.

both the rules are honoured before the packet is delivered. What happens in case of a
both the rules are honoured before the packet is delivered. What happens in case of a

What happens in case of a failure of this DFW functionality?

Since this is a module which operates at the kernel level, it is highly unlikely that the module would fail as it gets loaded as a part of the boot image. However, in case of any failures of the distributed firewall functionality; for an instance an ESXi host maxed out on the CPU, the traffic would be blocked by default and packets would start dropping for the VMs which are protected. You can change this default behavior to allow all traffic to go through using an API call which would allow traffic to go through in the event of failure of distributed firewall:

Configuring Fail-Safe for Distributed Firewall:

PUT https://<nsxmgr fqdn or ip>/api/2.1/app/failsafemode

Request Body: FAIL_OPEN

Visibility and Packet Walks

For a fresh session, once the packet is generated by the source, the lookup is followed in the below order:

Lookup happens against the connection tracker table

If no entry is found, the lookup is then done against the rule table

If a matching rule is found, action is taken appropriately.

If a matching rule is found, action is taken appropriately. Rule Lookup (first packet) For an

Rule Lookup (first packet)

For an existing session, once the packet is generated by the source, the lookup is followed in the below order:

Lookup is done against the connection tracker table

Since the Connection tracker table already has entry in it, the packet is forwarded.

Rule Lookup (subsequent packets) Rule Priority Rules are prioritized from top to bottom as placed

Rule Lookup (subsequent packets)

Rule Priority

Rules are prioritized from top to bottom as placed in the Distributed Firewall section in the Networking and Security plugin. The first match is taken up and actioned appropriately (allow or deny). For example, if you have an allow rule for SSH for a VM on top and then you have a deny all traffic rule for the same VM below, SSH is allowed.

an allow rule for SSH for a VM on top and then you have a deny

Also, L2 based rules (Data link layer attributes) takes priority over L3 based rules (network layer attributes). Definition of rule is 5-tuple** based.

And finally there are other rules which are generated as third party solutions (for example, Palo Alto Networks) that are used to provide advanced L7 based firewall rules.

** A 5-tuple refers to a set of five different values that comprise a Transmission Control Protocol/Internet Protocol (TCP/IP) connection. It includes a source IP address/port number, destination IP address/port number and the protocol in use.

destination IP address/port number and the protocol in use. The above screen shot shows the L3

The above screen shot shows the L3 section of the Distributed Firewall which correlates to Network and Transport layer parameters that can be applied to inspect traffic.

layer parameters that can be applied to inspect traffic. The above screen shot show the L2

The above screen shot show the L2 section of the Distributed Firewall which correlates to the Data link layer parameters that can be applied to a rule to inspect the traffic.

If you look carefully through the command line, you will see the segregation of the rules sets.

line, you will see the segregation of the rules sets. The rules are under “ruleset domain-c25_L2

The rules are under “ruleset domain-c25_L2 are Ethernet based rules and takes precedence over the L3 based rules. In this case we applied a block rule on L2 and and allow rule on L3 between 2 VMs. The end-result as expected:

allow rule on L3 between 2 VMs. The end-result as expected: Also, the rules are prioritized

Also, the rules are prioritized in the order they are created or applied in the Distributed Firewall window. The rules are applied from a top-down order and the first match of the rule takes precedence over the rule available at the bottom. You can move a rule up or down using the icons provided in the Distributed Firewall window.

Move a rule up Move a rule down Partner Security Services are intended to populated

Move a rule up

Move a rule up Move a rule down Partner Security Services are intended to populated rules

Move a rule down

Partner Security Services are intended to populated rules created by the partner solutions that work in conjunction with the NSX solution. Distributed Firewall and Partner security services works side by side to provide advanced firewall services (for instance; Palo Alto Networks for DPI)

The traffic from the NSX environment is steered to a PAN based firewall appliance, which performs an advanced level of packet introspection and provides secured services across the complete environment. However, the traffic is only steered to the PAN firewall if it is allowed in the Distributed Firewall. That’s exactly why I said they work side by side!

it is allowed in the Distributed Firewall. That’s exactly why I said they work side by

Partner Security Services

Distributed Firewall – Static Objects

Firewall rules can be applied at various levels and different objects which DFW understands. Listed are some of the objects on which you can apply the policies:

1. vCenter Containers (Clusters, Datacenters portgroups etc)

2. IP Sets(IPV6 Complaint) and MAC sets

3. Active Directory Groups

Sets(IPV6 Complaint) and MAC sets 3. Active Directory Groups Distributed Firewall – Dynamic Object NSX comes
Sets(IPV6 Complaint) and MAC sets 3. Active Directory Groups Distributed Firewall – Dynamic Object NSX comes

Distributed Firewall – Dynamic Object

NSX comes with a very powerful and dynamic container called security group which can help maintain security for growing workloads with similar attributes to be applied with similar policy. It is a predefined set of policies which can cater to needs of future workload without the need of associating policies to the workloads once they are created. For example: If a NSX administrator wants all future Windows workloads to be a part of the same security group with a set of firewall policy to block all connections except RDP, he can do so by defining a “Security Group” with the dynamic membership with “Windows” as Guest OS and and associating a “Security Policy” with the RDP enabled and default policy to be blocked.

If a new VM is created with Windows OS, it automatically moves to this newly created security group and applies the pre-created firewall policies

Security Groups Creation of Security Groups To create a security group, go to Security Groups

Security Groups

Creation of Security Groups

To create a security group, go to Security Groups under Service Composer and Click on the symbol.

a. Type a name for the security group (recommended to follow a specific naming convention for ease of administration for example, Win2k8-DCs)

for ease of administration for example, Win2k8-DCs) b. Select a criterion for “Dynamic membership”. For

b. Select a criterion for “Dynamic membership”. For example: for a “Guest OS type” select Windows Server 2008 R2. This defines your future membership also and hence you need to be very careful during the definition.

c. Select static objects to be a part of the group. For example, there is

c. Select static objects to be a part of the group. For example, there is a legacy DC running Windows Server 2003 which is also required to be a part of the Domain Controller Group.

there is a legacy DC running Windows Server 2003 which is also required to be a

d. Select the objects to be excluded from the complete list. For example, a Domain Controller which is undergoing migration and soon decommissioned.

which is undergoing migration and soon decommissioned. The security group membership can be defined by the

The security group membership can be defined by the following formula:

(Expression result + Inclusions result) – Exclusions results

Security Policy

With Security policy we can create templates containing DFW policy approved by security admins and this becomes “how you want to protect” your environment, then apply this on security groups “WHAT you want to protect”.

We can apply security policy to more than one security group, for example below apply

We can apply security policy to more than one security group, for example below apply “Web Security Policy” to both “Security Group 1” and Security Group 2”.

to both “Security Group 1” and Security Group 2”. Different option even contrary is to apply

Different option even contrary is to apply two different security policy to same security groups. For example, Security Policy 1 and Security Policy 2 to “WEB Security Groups”:

The Precedency of Security policy will be determined by “Weight” value configure by Security Admins.

The Precedency of Security policy will be determined by “Weight” value configure by Security Admins. Here’s a small depiction of the above use case:

We have created 2 security policies namely:

“Allow ICMP SP” and “Allow HTTP SP” and apply this to pre-Created Security Group “Web Servers”

Create “Allow ICMP SP”:

Security Group “Web Servers” Create “Allow ICMP SP”: In Create “Firewall Rules” the Action is Allow.

In Create “Firewall Rules” the Action is Allow.

1. The Weight is 4300

2. The source filed is: any

3. Destination is: “Policy Security Groups”. This is the interesting part, because this security policy work as template we may reuse it for different security groups, the goal is to no tied this template to specific security group.

Ready to compute and click Finish: At firewall tab we can note this point we

Ready to compute and click Finish:

Ready to compute and click Finish: At firewall tab we can note this point we did

At firewall tab we can note this point we did not apply this Security Policy to any security groups, this policy is not activated yet. We can see it as gray policy in the firewall tab, another point here to see in the designation there is no security group:

Create “Allow WEB SP” is the same way: Pay Attention to Weight Field is 1300,

Create “Allow WEB SP” is the same way:

Create “Allow WEB SP” is the same way: Pay Attention to Weight Field is 1300, is

Pay Attention to Weight Field is 1300, is lower than previously “ALLOW ICMP SP” 4300. Create the WEB rule, same concept like previous:

SP” 4300. Create the WEB rule, same concept like previous: The Firewall policy order “Allow ICMP”

The Firewall policy order “Allow ICMP” before “Allow WEB”

Now we Applied both security policy on same security group. With “Apply Policy”: Chose “Allow

Now we Applied both security policy on same security group. With “Apply Policy”:

we Applied both security policy on same security group. With “Apply Policy”: Chose “Allow ICMP Security

Chose “Allow ICMP Security Policy”:

we Applied both security policy on same security group. With “Apply Policy”: Chose “Allow ICMP Security

And do the same for second security policy called “Allow WEB SP”. In Security Policy tab view, we can see the results of this action:

for second security policy called “Allow WEB SP”. In Security Policy tab view, we can see

From firewall tab we can see that now we have two activated service composer security rules.

now we have two activated service composer security rules. In the service composer Canvas view, we

In the service composer Canvas view, we have excellent summery of security services applied to security group:

rules. In the service composer Canvas view, we have excellent summery of security services applied to

“Applied To” Field

“Applied to” field helps to narrow down the objects on which a policy applies to. By default, if the rule is written without an “Applied To” field select with an object, the rule gets applied to all objects referenced in the rule.

the rule gets applied to all objects referenced in the rule. For instance, if we create

For instance, if we create a specific rule on Distributed Firewall, and do not filter the objects where it has to be applied to, it gets applied to all the Clusters that have been prepared from NSX Manager to provide the Firewall services. Here’s a small explanation about it.

I have 4 VMs on the ESXi host and each of these filters are associated to each VM.

However, I have only created rules related to the last 2 VMs that you see

However, I have only created rules related to the last 2 VMs that you see in the pictures for which the UUID is mentioned i.e., web-sv-01a and web-sv-02a as shown in the below figure:

web-sv-01a and web-sv-02a as shown in the below figure: However, when I check the rule sets

However, when I check the rule sets of VMs that are not a part of the rules (either source or destination), I can see even their filters having the rules, which refers to these VMs.

Look at the rule 1006 which shows it is a part of web-sv-02a as well

Look at the rule 1006 which shows it is a part of web-sv-02a as well as an another machine which is hosted on the same ESXi host but not part of the rule. This is where the “applied to” filter comes into picture:

After applying the “Applied To” field only for web VMs, this is what it looks like:

to” filter comes into picture: After applying the “Applied To” field only for web VMs, this
The rule 1006 is no more applied to any other VM except the web based

The rule 1006 is no more applied to any other VM except the web based VMs, in this case web-sv-

02a.

Identity Firewall

The coolest feature of NSX that brings you with more granularity of control over the user’s traffic through integration with Active Directory. Identity Firewall allows you to control what a user can access amongst the network resources.

For example: If you want to allow only the Domain Admins to map a shared drive or if you want only specific users to RDP into a specific set of servers.

This fine grain control is achieved by integration your Active Directory with NSX domain. Here’s how you can accomplish the tasks:

Integrate your NSX with Active Directory

Go to NSX Manager and click on the NSX Manager that you wish to integrate with AD.

Under Domains, click on the plus symbol to start adding your NSX manager to a

Under Domains, click on the plus symbol to start adding your NSX manager to a domain.

click on the plus symbol to start adding your NSX manager to a domain. Provide the

Provide the Name of the Active Directory Domain

click on the plus symbol to start adding your NSX manager to a domain. Provide the

Provide Details and Credentials about AD

Provide Details and Credentials about AD Provide the Security Event Log Access Details Summary of the

Provide the Security Event Log Access Details

Provide Details and Credentials about AD Provide the Security Event Log Access Details Summary of the

Summary of the information provided

Provide Details and Credentials about AD Provide the Security Event Log Access Details Summary of the

Create Identity Based Firewall Rules

Adding rules with Security Groups with the membership of AD Groups.

Identity Based Firewall Rules Adding rules with Security Groups with the membership of AD Groups. Create

Create a Security Group

Identity Based Firewall Rules Adding rules with Security Groups with the membership of AD Groups. Create
Define the membership and add a group from AD Write rules with the created security

Define the membership and add a group from AD

Define the membership and add a group from AD Write rules with the created security groups.

Write rules with the created security groups.

Now, you might have a question. How does it write rules in the backend? Here’s the answer:

The Address Set shows IP address of NSX Manager. Application Level Gateway (ALG) Application Level
The Address Set shows IP address of NSX Manager. Application Level Gateway (ALG) Application Level

The Address Set shows IP address of NSX Manager.

Application Level Gateway (ALG)

Application Level Gateway (ALG) is the ability of a firewall or a NAT device that can either allow or block applications that uses dynamic ephemeral ports to communicate. In the absence of ALG, it could be a nightmare for security and network administrators with the options of trade off between communication and security. A network administrator can suggest opening a large number of ports which would pose security threat for the network or the given server while a security administrator can suggest blocking all other ports except the known ports which again breaks the communication. ALG reads the network address found inside the application payload and opening respective ports for preceding communication and also synchronizing data across multiple sessions across different ports. For example: FTP uses different ports for session initiation/control connection and actual data transfers. An ALG would manage any information passed on the control connection as well as data connection in the above case.

NSX-v acts as ALG for few protocols such as FTP, CIFS, ORACLE TNS, MS-RPC, SUN-RPC.

Backup and Recovery

NSX offers you a very agile solution with various options of recoverability in case of errors or misconfigurations.

You can export your complete set of working rule set and then go ahead and make any changes at any point of time with the option to revert back to the working state. This helps you to gain a snapshot- oriented operation wherein you can quickly fall back to a working configuration if in case a change in any rules causes disruption to the services.

To export your current firewall configuration, all you need to do is hit “Export Configuration” icon on the Distributed Firewall option on the left hand pane.

on the Distributed Firewall option on the left hand pane. NSX also saves the complete configuration

NSX also saves the complete configuration during modifications or specific intervals as shown in the picture.

To be able to use the saved configuration, you need to load these saved configurations

To be able to use the saved configuration, you need to load these saved configurations by using the icon.

need to load these saved configurations by using the icon. You can also upload a previously

You can also upload a previously exported firewall configuration to the saved configuration and load that specific configuration.

However, if in case during a restore operation of firewall configuration, if there were other

However, if in case during a restore operation of firewall configuration, if there were other rules which were applied by partner security solutions which are not a part of these saved configurations, they would be removed from the partner security services. To sync the new set of rules from the partner security solutions, you need to click “Synchronize Firewall Configuration” from the Service Composer tab.

the partner security solutions, you need to click “Synchronize Firewall Configuration” from the Service Composer tab.

You can also perform this sync operation using the following API:

URL: https://<nsxmgr-ip>/ api/2.0/services/policy/serviceprovider/firewall Request: GET Request Body:

<keyValues>

<keyValue>

<key>forceSync</key>

</keyValue>

</keyValues>

Working with Distributed Firewall API

NSX offers Restful APIs for Distributed Firewall Management. You can customize your operations using these API calls by integration these into your home grown CMP or using these as custom workflows in the vRealize Automation Platform. For instance, if you want a specific rule to be added the moment