Sei sulla pagina 1di 157

COMMERCIAL INTERNET BILLING :

The SNMP Billing Model

For ISPs and Tier 1,2,3

Billing is based upon a different model than for consumer For consumer it is at the transaction, usage level For carriers and providers it is a BULK BUSINESS

Internet Interconnect Business Model


At its most fundamental level, internetinterconnect is a bulk business Routing of traffic is by IP and by destination.. NOT BY TRAFFIC type or usage There is NO WAY to differentiate this traffic, ESPECIALLY AT THE COMMERCIAL Level BGP and Walled Garden approaches can CHANNEL TRAFFIC to specific routers but ultimately, billing is by what passes through a particular router / port

Monitoring Methods

Login and Check (manual)

Log in and run performance statistics and other data. Using a monitoring device or counter in the device and monitor from a remote SERVER (SNMP) Any host can automatically see what all other hosts were doing by enabling promiscuous mode. However, modern switched network technologies such as those used on modern Ethernets provide, in effect, point-to-point links between pairs of devices, so it is hard for other devices to see traffic. (called "SPAN", for Switched Port Analyzer, by Cisco, and given other names by some other vendors) on routers and switches. referred to as Switched Port Analyzer (SPAN); some other vendors have other names for it, such as Roving Analysis Port (RAP) on 3Com switches. Low-cost alternative to network taps, and solves many of the same problems. Not all routers and switches support port mirroring and, on those that do, using port mirroring can affect the performance of the router or switch.

Remote Management

Promiscuous Mode

Port Mirroring

The Internet-Interconnect Architecture


ISP Point Of Presence
Internet Service Provider Customer Cloud Tier 2 Point Of Presence

Boundary Routers To Other Interconnect Partners (BGP-4 )

Tier 2 Transit Cloud

Leased Line Boundary Access Router BRAS (ISPClient) Boundary Access Router BRAS (Tier 2

Volume Based Accounting Points


Boundary Routers To Other Interconnect Partners (BGP-4 )

Internet Service Provider Customer Cloud

Tier 2 Transit Cloud

Leased Line Boundary Access Router BRAS (ISPClient) Boundary Access Router BRAS (Tier 2

Differentiated (Destination or Type) Accounting Points (Filtration Reqmts)

Internet Service Provider Customer Cloud

Tier 2 Transit Cloud

Leased Line Boundary Access Router BRAS (ISPClient) Boundary Access Router BRAS (Tier 2 Boundary Routers To Other Interconnect Partners (BGP-4 )

Network Traffic Measurement

Principal Methods
Sniffers
Deep

Packet Inspection SNMP or other Server/Polling Method

Sniffers

Program that monitors and analyzes network traffic, detecting bottlenecks and problems. A network manager can keep traffic flowing efficiently. A sniffer can also be used legitimately or illegitimately to capture data being transmitted on a network. A network router reads every packet of data passed to it, determining whether it is intended for a destination within the router's own network or whether it should be passed further along the Internet. A router with a sniffer, however, may be able to read the data in the packet as well as the source and destination addresses.

What a Sniffer Does


A sniffer program runs on a localized network, typically where many machines are connected. As the different machines request information from the Internet, packets are sent back and forth from those machines. A sniffer analyzes, or "sniffs," those packets. By analyzing the packets, the sniffer can tell what machine is requesting what information, and what information is in that packet. Essentially, someone running a sniffer program is eavesdropping on any other computer on the network. This makes it easy to see what you're doing, whether it's receiving email, using instant messaging, downloading illegal files or checking out your bank account.

A packet analyzer (also known as a network analyzer, protocol analyzer, or sniffer, or for particular types of networks, an Ethernet sniffer or wireless sniffer) is a computer program or a piece of computer hardware that can intercept and log traffic passing over a digital network or part of a network. As data streams flow across the network, the sniffer captures each packet and, if needed, decodes the packet's raw data, showing the values of various fields in the packet, and analyzes its content according to the appropriate RFC or other specifications.

Programs that tap into a computer network with the purpose of intercepting data traveling between two network machines are called sniffing software. This type of software program can be used to intercept and interpret data on a computer or network, including browser passwords, chat programs, user settings and network traffic. Sniffing software can also be known as a packet scanner, a packet analyzer or a network analyzer. Types of sniffing software include Internet protocol (IP) sniffing software, hypertext markup language (HTML) sniffing software, port scanners, and packet sniffing programs. Software used for network sniffing is most often used legitimately by network administrators to identify the source of communication problems among different network machines.

Sniffing can be done through a wireless connection, or it can be performed using software installed on a computer that is part of the wired network. Common programs used for sniffing include Carnivore , snoop and SkyGrabber. Not all sniffing is done using sniffing software. Network administrators often have hardware scanners that perform network analysis. Hardware that analyzes network data includes Bluetooth-based sniffing devices and analysis hardware that taps straight into a computer port.

On wired broadcast LANs, depending on the network structure (hub or switch), one can capture traffic on all or just parts of the network from a single machine within the network; There are some methods to avoid traffic narrowing by switches to gain access to traffic from other systems on the network (e.g., ARP spoofing). For network monitoring purposes, it may also be desirable to monitor all data packets in a LAN by using a network switch with a so-called monitoring port, whose purpose is to mirror all packets passing through all ports of the switch when systems (computers) are connected to a switch port. To use a network tap is an even more reliable solution than to use a monitoring port, since taps are less likely to drop packets during high traffic loads.

Network Tap / Monitoring Port

A network tap is a hardware device which provides a way to access the data flowing across a computer network. In many cases, it is desirable for a third party to monitor the traffic between two points in the network. If the network between points A and B consists of a physical cable, a "network tap" may be the best way to accomplish this monitoring. The network tap has (at least) three ports: an A port, a B port, and a monitor port. A tap inserted between A and B passes all traffic through unimpeded, but also copies that same data to its monitor port, enabling a third party to listen. Network taps are commonly used for network intrusion detection systems, VoIP recording, network probes, RMON probes, packet sniffers, and other monitoring and collection devices and software that require access to a network segment. Taps are used in security applications because they are non-obtrusive, are not detectable on the network (having no physical or logical address), can deal with full-duplex and non-shared networks, and will usually pass through traffic even if the tap stops working or loses power.

DPI

Deep Packet Inspection (DPI) equipment is intended to identify the applications being used on the network Some of these devices can go much further; IE Narus, for instance, can
look

inside all traffic from a specific IP address, pick out the HTTP traffic, then drill even further down to capture only traffic headed to and from Gmail, and can even reassemble e-mails as they are typed out by the user.

DPC vs DPI

Packets can either be analyzed in realtime, or might be captured and analyzed later, a practice known as deep packet capture or DPC. Both techniques can reveal a wealth of data about network traffic. Applications may leave telltale signatures or patterns in packets they generate, allowing for accurate detection of program use across a network in realtime. Deep packet inspection is often used in large corporate networks to detect worms, viruses, and trojans that cant be seen by other security software like firewalls. DPI can also be used to limit or prioritize certain types of network traffic, a practice known as traffic shaping.

ISPs around the world use DPI technology in a variety of ways. Some use it to generate statistical information about the traffic that flows across their network, while others use network appliances purpose-built hardware that sits on an ISPs network to perform comprehensive monitoring of user traffic. The most advanced of these network appliances have the ability to act on this data in realtime. Some broadband providers, for example, use DPI to block or slow down file-sharing services. Network neutrality advocates fear this could lead to a multi-tiered Internet, a system in which the programs and services a customer is able to use online is dependent upon how much the customer pays.

By intercepting a large number of packets, ISPs and governments can reconstruct e-mails, listen in on voice over Internet Protocol (VoIP) calls, or even track users across different websites in order to display targeted advertising. Several ISPs in both the U.S. and U.K. have used this more advanced version of deep packet inspection to inject targeted advertising into websites their customers visit. Governments sometimes use DPI for surveillance and censorship purposes on the Internet. For example, Chinas Golden Shield Project, also known as The Great Firewall of China," is believed to use DPI. The U.S. National Security Agency has used commercial network appliances with deep packet inspection to monitor e-mails and VoIP calls.

Production DPI Heuristically Based

Microsoft, Cisco, Checkpoint, Symantec, Nortel, SonicWall, NAI, Juniper/Netscreen, and others, have, in the past eighteen months started manufacturing firewall appliances that implement Deep Packet Inspection (DPI). In general, the DPI engine scrutinizes each packet (including the data payload) as it traverses the firewall, and rejects or allows the packet based upon a ruleset that is implemented by the firewall administrator. The inspection engine implements the ruleset based upon signature-based comparisons, heuristic, statistical, or anomaly-based techniques, or some combination of these.

Heuristics and Statistical Sampling Methods

Heuristic algorithms are often employed because they may be seen to "work" without having been mathematically proven to meet a given set of requirements. One common pitfall in implementing a heuristic method to meet a requirement comes when the engineer or designer fails to realize that the current data set does not necessarily represent future system states. While the existing data can be pored over and an algorithm can be devised to successfully handle the current data, it is imperative to ensure that the heuristic method employed is capable of handling future data sets. This means that the engineer or designer must fully understand the rules that generate the data and develop the algorithm to meet those requirements and not just address the current data sets. Statistical analysis should be conducted when employing heuristics to estimate the probability of incorrect outcomes. If one seeks to use a heuristic as a means of solving a search or knapsack problem, then one must be careful to make sure that the heuristic function which one is choosing to use is an admissible heuristic. Given a heuristic function labeled as:

Statistical anomaly and signature based IDSes All Intrusion Detection Systems use one of two detection techniques: Statistical anomaly-based IDS A statistical anomaly-based IDS determines normal network activity like what sort of bandwidth is generally used, what protocols are used, what ports and devices generally connect to each other- and alert the administrator or user when traffic is detected which is anomalous(not normal).[2] Signature-based IDS Signature based IDS monitors packets in the Network and compares with pre-configured and pre-determined attack patterns known as signatures. The issue is that there will be lag between the new threat discovered and Signature being applied in IDS for detecting the threat.During this lag time your IDS will be unable to identify the threat.[

Deep Packet Inspection promises to enhance firewall capabilities by adding the ability to analyze and filter SOAP and other XML messages, dynamically open and close ports for VoIP application traffic, perform in-line AV and spam screening, dynamically proxy IM traffic, eliminate the bevy of attacks against NetBIOS-based services, traffic-shape or do away with the many flavors of P2P traffic (recently shown to account for ~35% of internet traffic), and perform SSL session inspection. Deep Packet Inspection essentially collapses Intrusion Detection (IDS) functionality into the firewall appliance so that both a firewall and an in-line IDS are implemented on the same device. Many of these products have recently been shown to be vulnerable to exploitation of software defects in their DPI inspection engines, however. These data suggest that the addition of these enhanced functions to firewalls may, in fact, weaken, rather that strengthen network perimeter security.

Shallow Packet Inspection

Traditionally, firewalls have provided a physical and logical demarcation between the inside and the outside of a network. The first firewalls were basically just gateways between two networks with IP forwarding disabled. Most contemporary firewalls share a common set of characteristics: it is a single point between two or more networks where all traffic must pass (choke point);

it can be configured to allow or deny IP (and other protocol) traffic;


it provides a logging function for audit purposes; it provides a NAT function; the operating system is hardened; it often serves as a VPN endpoint; and, it fails closed - that is, if the firewall crashes in some way, no traffic is forwarded between interfaces. Steven Bellovin classically stated, "Firewalls are barriers between "us" and "them" for arbitrary values of "them."" One of the first commercial firewalls, The DEC SEAL, was comprised of three systems. One of these, the Gate, or packet-screening device, relied upon the kernel to pass packet headers to a user-space program, screend, which informed the kernel whether or not to forward the packet. Policy was defined in the screend configuration file and this policy was then implemented by the kernel. IP packet filtering firewalls all share the same basic mechanism: As an IP packet traverses the firewall, the headers are parsed, and the results are compared to a ruleset defined by a system administrator. The ruleset, commonly based upon source and/or destination IP address, source and/or destination port, or a combination of the two, defines what type of traffic is subsequently allowed or denied. Interestingly, some early (and not particularly popular) packet filtering implementations required that the system administrator define specific byte fields with the packet headers, and the specific byte patterns to match against. The point here is that packet filtering (and the code that performs these tasks) based upon parsing of IP headers has been common for many years. Stateful Inspection Firewall Technology, a term coined by Check Point Software Technologies (Patent #5,606,668), describes a method for the analysis and tracking of sessions based upon source/destination IP address and source/destination ports. A stateful inspection firewall registers connection data and compiles this information in a kernel-based state table. A stateful firewall examines packet headers and, essentially, remembers something about them (generally source/destination IP address/ports). The firewall then uses this information when processing later packets. Interestingly, Lance Spitzner (http://www.spitzner.net/) showed that, contrary to what one would expect, sequence numbers, and other header information is not utilized by Check Point in order to maintain connection state tracking.

Medium Depth Packet Inspection

The DEC SEAL also required a Gatekeeper device, which acted as an application proxy (AP). Application proxies or gateways are a second, common type of firewall mechanism. An AP functions by providing intermediary services for hosts that reside on different networks, while maintaining complete details of the TCP connection state and sequencing. In practice, a client host (running, for example, a web browser application) negotiates a service request with the AP, which acts as a surrogate for the host that provides services (the webserver). Two connections are required for a session to be completed - one between the client and the AP, and one between the AP and the server. No direct connection exists between hosts. Additionally, APs typically possess the ability to do a limited amount of packet filtering based upon rudimentary application-level data parsing. APs are considered by most people to be more secure than packet filtering firewalls, but performance and scalability factors have limited their distribution. Although current stateful firewall technologies provide for tracking the state of a connection, most provide only limited analysis of the application data. Several firewall vendors, including Check Point, Cisco, Symantec, Netscreen, and NAI have integrated additional application-level data analysis into the firewall. Checkpoint, for example, initially added application proxies for TELNET, FTP, and HTTP to the FW-1 product. Cisco's PIX fixup protocol initially provided for limited application parsing of FTP, HTTP, H.323, RSH, SMTP, and SQLNET. Both vendors have since added support for additional applications

Deep Packet Inspection

To address the limitations of Packet-Filtering, Application Proxy, and Stateful Inspection, a technology known as Deep Packet Inspection (DPI) was developed. DPI operates at L3-7 of the OSI model. DPI engines parse the entire IP packet, and make forwarding decisions by means of a rule-based logic that is based upon signature or regular expression matching. That is, they compare the data within a packet payload to a database of predefined attack signatures (a string of bytes). Additionally, statistical or historical algorithms may supplement static pattern matching. Analysis of packet headers can be done economically since the locations of packet header fields are restricted by protocol standards. However, the payload contents are, for the most part, unconstrained. Therefore, searching through the payload for multiple string patterns within the datastream is a computationally expensive task. The requirement that these searches be performed at wirespeed adds to the cost. Additionally, because the signature database is dynamic, it must be easily updateable. Promising approaches to these problems include a software-based approach (Snort implementing the Boyer-Moore algorithm), and a hardware-based approach (FPGA's running a Bloom filter algorithm). DPI technology can be effective against buffer overflow attacks, denial of service (DoS) attacks, sophisticated intrusions, and a small percentage of worms that fit within a single packet. However, the complexity and immaturity of these systems have resulted in a number of recent exploits, as will be shown below.

Example Exploits Snort RPC Preprocessing Vulnerability Researchers at Internet Security Systems (ISS) discovered a remotely exploitable buffer overflow in the Snort stream4 preprocessor module. When the RPC decoder normalizes fragmented RPC records, it incorrectly checks the lengths of what is being normalized against the current packet size, leading to an overflow condition. The RPC preprocessor is enabled by default. Remote attackers may exploit the buffer overflow condition to run arbitrary code on a Snort sensor with the privileges of the Snort IDS process, which typically runs as the superuser. Trend Micro InterScan VirusWall Remote Overflow An implementation flaw in the InterScan VirusWall SMTP gateway allows a remote attacker to execute code with the privileges of the daemon. Due to an implementation fault in VirusWall's handling of a UUencoded file name, it is possible for a remote attacker to specify an arbitrarily long string, overwriting the stack with user defined data, and allowing a remote attacker to execute arbitrary code. Microsoft ISA Server 2000 H.323 Filter Remote Buffer Overflow Vulnerability The H.323 filter used by Microsoft ISA Server 2000 is prone to remote buffer overflow vulnerability. The condition presents itself due to insufficient boundary checks performed by the Microsoft Firewall Service on specially crafted H.323 traffic. Successful exploitation of this vulnerability may allow a remote attacker to execute arbitrary code in the context of Microsoft Firewall Service running on ISA Server 2000. This may lead to complete control of the vulnerable system. Cisco SIP Fixup Denial of Service (DoS) The Cisco PIX Firewall may reset when receiving fragmented SIP INVITE messages. Cisco H.323 Vulnerabilities Multiple Cisco products contain vulnerabilities in the processing of H.323 messages, which are typically used in Voice over Internet Protocol (VoIP) or multimedia applications. Check Point FireWall-1 H.323 Vulnerabilities FireWall-1 is affected by the recently reported vulnerabilities in various products' H.323 protocol implementation. The vulnerabilities are caused due to various errors in the processing of H.225 messages over TCP.

SNMP (Server/Volume Based)

An Introduction to SNMP

Created in 1988 as a short-term solution to manage elements in the growing Internet and other attached networks
Achieved

widespread acceptance. Derived from its predecessor SGMP (Simple Gateway Management Protocol)
Was

intended to be replaced by a solution based on the CMIS/CMIP (Common Management Information Service/Protocol) architecture. Never received the widespread acceptance of SNMP.

The Internet-Interconnect Architecture


ISP Point Of Presence
Internet Service Provider Customer Cloud Tier 2 Point Of Presence

Boundary Routers To Other Interconnect Partners (BGP-4 )

Tier 2 Transit Cloud

Leased Line Boundary Access Router BRAS (ISPClient) Boundary Access Router BRAS (Tier 2

Simple Network Management Protocol (SNMP) is an "Internet-standard protocol for managing devices on IP networks. Devices that typically support SNMP include routers, switches, servers, workstations, printers, modem racks, and more."[1] It is used mostly in network management systems to monitor network-attached devices for conditions that warrant administrative attention. SNMP is a component of the Internet Protocol Suite as defined by the Internet Engineering Task Force (IETF). It consists of a set of standards for network management, including an application layer protocol, a database schema, and a set of data objects.[2]

In typical SNMP uses, one or more administrative computers, called managers, have the task of monitoring or managing a group of hosts or devices on a computer network. Each managed system executes, at all times, a software component called an agent which reports information via SNMP to the manager. Essentially, SNMP agents expose management data on the managed systems as variables. The protocol also permits active management tasks, such as modifying and applying a new configuration through remote modification of these variables. The variables accessible via SNMP are organized in hierarchies. These hierarchies, and other metadata (such as type and description of the variable), are described by Management Information Bases (MIBs).

An SNMP-managed network consists of three key components: Managed device Agent software which runs on managed devices Network management system (NMS) software which runs on the manager

A managed device is a network node that implements an SNMP interface that allows unidirectional (read-only) or bidirectional access to node-specific information. Managed devices exchange node-specific information with the NMSs. Sometimes called network elements, the managed devices can be any type of device, including, but not limited to, routers, access servers, switches, bridges, hubs, IP telephones, IP video cameras, computer hosts, and printers. An agent is a network-management software module that resides on a managed device. An agent has local knowledge of management information and translates that information to or from an SNMP specific form. A network management system (NMS) executes applications that monitor and control managed devices. NMSs provide the bulk of the processing and memory resources required for network management. One or more NMSs may exist on any managed network.

SNMP itself does not define which information (which variables) a managed system should offer. Rather, SNMP uses an extensible design, where the available information is defined by management information bases (MIBs). MIBs describe the structure of the management data of a device subsystem; they use a hierarchical namespace containing object identifiers (OID). Each OID identifies a variable that can be read or set via SNMP. MIBs use the notation defined by ASN.1.

SNMP operates in the Application Layer of the Internet Protocol Suite (Layer 7 of the OSI model). The SNMP agent receives requests on UDP port 161. The manager may send requests from any available source port to port 161 in the agent. The agent response will be sent back to the source port on the manager. The manager receives notifications (Traps and InformRequests) on port 162. The agent may generate notifications from any available port. When used with Transport Layer Security or Datagram Transport Layer Security requests are received on port 10161 and traps are sent to port 10162.[3]. SNMPv1 specifies five core protocol data units (PDUs). Two other PDUs, GetBulkRequest and InformRequest were added in SNMPv2 and carried over to SNMPv3

The seven SNMP protocol data units (PDUs) are as follows: [edit] GetRequest A manager-to-agent request to retrieve the value of a variable or list of variables. Desired variables are specified in variable bindings (values are not used). Retrieval of the specified variable values is to be done as an atomic operation by the agent. A Response with current values is returned. [edit] SetRequest A manager-to-agent request to change the value of a variable or list of variables. Variable bindings are specified in the body of the request. Changes to all specified variables are to be made as an atomic operation by the agent. A Response with (current) new values for the variables is returned. [edit] GetNextRequest A manager-to-agent request to discover available variables and their values. Returns a Response with variable binding for the lexicographically next variable in the MIB. The entire MIB of an agent can be walked by iterative application of GetNextRequest starting at OID 0. Rows of a table can be read by specifying column OIDs in the variable bindings of the request. [edit] GetBulkRequest

Optimized version of GetNextRequest. A manager-to-agent request for multiple iterations of GetNextRequest. Returns a Response with multiple variable bindings walked from the variable binding or bindings in the request. PDU specific non-repeaters and max-repetitions fields are used to control response behavior. GetBulkRequest was introduced in SNMPv2.
[edit] Response Returns variable bindings and acknowledgement from agent to manager for GetRequest, SetRequest, GetNextRequest, GetBulkRequest and InformRequest. Error reporting is provided by error-status and error-index fields. Although it was used as a response to both gets and sets, this PDU was called GetResponse in SNMPv1. [edit] Trap Asynchronous notification from agent to manager. Includes current sysUpTime value, an OID identifying the type of trap and optional variable bindings. Destination addressing for traps is determined in an application-specific manner typically through trap configuration variables in the MIB. The format of the trap message was changed in SNMPv2 and the PDU was renamed SNMPv2-Trap. [edit] InformRequest Acknowledged asynchronous notification from manager to manager. This PDU uses the same format as the SNMPv2 version of Trap. Manager-tomanager notifications were already possible in SNMPv1 (using a Trap), but as SNMP commonly runs over UDP where delivery is not assured and dropped packets are not reported, delivery of a Trap was not guaranteed. InformRequest fixes this by sending back an acknowledgement on receipt. Receiver replies with Response parroting all information in the InformRequest. This PDU was introduced in SNMPv2.

The PRTG network traffic logger uses SNMP, packet sniffing, and NetFlow / sFlow to measure network traffic and network throughput. SNMP is suitable for basic port monitoring, or if you just want to know your overall bandwidth performance. Packet sniffing can be used for more advanced network traffic measurement: this method scans all data packets flowing through your network and classifies the data streams by protocol, IP address, or other parameters, allowing for an indepth bandwidth analysis. If your network devices support NetFlow or sFlow, you can also use this technology to control network traffic: PRTG reads the pre-aggregated monitoring data from these devices and shows it in easy to read graphs and tables. The level of detail is similar to the data obtained from packet sniffing.

AT&T Labs Research

SNMP
Simple Network Measurements Please!
Matthew Roughan (+many others) <roughan@research.att.com>

40

Outline
AT&T Labs Research

Part I: SNMP traffic data

Simple Network Management Protocol Why? How? What?


What can you do? Why not? Putting time series and traffic modeling together

Part II: Wavelets

Part III: Modeling

Traffic modeling deals with stationary processes (typically) Time series gives us a way of getting a stationary process But the analysis requires an understanding of the traffic model

41

AT&T Labs Research

Part I: SNMP Traffic Data

42

Data Availability Traffic Data


AT&T Labs Research

43

Data Availability packet traces


AT&T Labs Research

Packet traces limited availability special equipment needed (O&M expensive even if box is cheap) lower speed interfaces (only recently OC48 available, no OC192) huge amount of data generated 44

AT&T Labs Research

Data Availability flow level data

Flow level data not available everywhere historically poor vendor support (from some vendors) large volume of data (1:100 compared to traffic) feature interaction/performance impact 45

AT &T Lab sRes ear ch

Data Availability SNMP

SNMP traffic data MIB II (including IfInOctets/IfOutOctets) is available almost everywhere manageable volume of data no significant impact on router performance 46

AT &T Lab sRes ear ch

SNMP

Advantages (MIB-II: IfInOctets/IfOutOctets)


Simple,

Easy, available anywhere that supports

SNMP Relatively low volume It is used by operations already (lots of historical data)

Disadvantages
Data

quality

Ambiguous Missing

data Irregular sampling


Octets
47 counters only tell you link utilizations

AT &T Lab sRes ear ch

SNMP traffic data


poll

poller
data
Management system agent

router

Like an Odometer 999408


SNMP Octets Counter

SNMP Polls
48

AT &T Lab sRes ear ch

Irregularly sampled data

Why?
Missing

data (transport over UDP, often in-band) Delays in polling (jitter) Poller sync
Multiple

pollers Staggered polls

Why care?
Time

series analysis Comparisons between links


Did

traffic shed from link A go to link B Calculation of traffic matrices


Totals

49

(e.g. total traffic to Peer X)

AT &T Lab sRes ear ch

Applications

Capacity planning
Network

at the moment is hand-crafted Want to automate processes Provisioning for failure scenarios requires adding loads

Traffic engineering
Even BGP

if done by hand, you need to see results

Event detection
Operations

are fire-fighters Dont care about events if they go 50 away

AT &T Lab sRes ear ch

Part II: Wavelet Analysis


Multi-scale Multi-resolution

51

AT &T Lab sRes ear ch

Discrete Wavelet Transform

Replace sinusoidal basis functions of FFT with wavelet basis functions Implementation in pyramidal filter banks
HP FIR LP FIR 2 2 HP FIR LP FIR 2 2 HP FIR LP FIR 2 2

d (1,)

d (2,)
d (3,)

a (3,)

52

AT &T Lab sRes ear ch

Dyadic grid

no redundancy, no loss of information Each frequency/scale examined at a resolution matched to its scale
4

Scale

3 2

1
time 53

AT &T Lab sRes ear ch

Dyadic grid: smoothing

Zero the fine scale details and reconstruct

Scale

3 2

1
time 54

AT &T Lab sRes ear ch

Dyadic grid: compression

Keep the coefficients above some threshold

Scale

3 2

1
time 55

AT &T Lab sRes ear ch

What can you do with wavelets


Compression Smoothing/interpolation Anomaly detection/identification


DoS Flash

crowds

Multiple dimensional analysis of data LRD/self-similarity analysis

56

AT &T Lab sRes ear ch

Example: compression

57

AT &T Lab sRes ear ch

Example: compression (by averaging)

58

AT &T Lab sRes ear ch

Example: compression (Haar)

59

AT &T Lab sRes ear ch

Example: compression (Daubechies)

60

AT &T Lab sRes ear ch

Example: interpolation

Wavelet based

61

AT &T Lab sRes ear ch

Example: anomaly detection

Wavelet based

62

AT &T Lab sRes ear ch

Wavelets, wavelets everywhere and not a

Parameter tuning
How

do know it will work next time?

Scale of dyadic grid doesnt match patterns in data


5

minute measurements 24 hour cycle, 7 day cycle But dyadic grid is in powers of 2 CWT looses many of the advantages of DWT

Example
Compression Look
63 for parameters/wavelet that dont loose

AT &T Lab sRes ear ch

Part III: Modeling

Putting together theory from


Time

series analysis Traffic theory

To SNMP data
In

particular for backbone traffic


64

AT &T Lab sRes ear ch

Total traffic into a city for 2 weeks

65

AT &T Lab sRes ear ch

Model

Traffic data has several components


Trend,
Long

Tt
term changes in traffic

Seasonal
Daily

(periodic) component, stochastic component,

St Wt It

and weekly cycles variation

Stationary
Normal

Transient
DoS,

anomalies,

Flash crowds, Rerouting (BGP, link failures)

many ways you could combine these components


66

standard

time series analysis

AT &T Lab sRes ear ch

A Simple Model (for backbone traffic)


Based on Norros model Non-stationary mean Stochastic component unspecified (for the moment)

xt mt amt Wt I t

mt Tt St
67

AT &T Lab sRes ear ch

Why this model?

Behaves as expected under multiplexing x xi


m a

m m a m
i i i i i

i i

Good model for backbone traffic


Lots

of multiplexing

68

What does a model get you?


AT&T Labs Research

Decomposition
MA for trend (window > period of seasonal component) SMA for seasonal component (average at same time of day/week) Several methods for segmenting It

Interpolation
Linear, or wavelet based for short gaps (<3 hours) Model based for long gaps (>3 hours)

Understanding of the effect of multiplexing


Should be understood

People still seem to misunderstand

How smooth is backbone traffic (is it LRD)

Capacity planning
69

Example: decomposition
AT&T Labs Research

Data

=>

Decomposition

trend

70

Example: interpolation
AT&T Labs Research

Model based vs linear

71

Conclusion
AT&T Labs Research

SNMP is a good data source

Available everywhere You need to do some work to extract useful data There is still more info. to get (packet traces, flow data, ) Not always obvious how to set parameters A framework for other algorithms A way to decide what information is important A way of seeing how smooth traffic really is

Wavelets are a flexible tool for extracting info

Traffic model gives you a little more


Effect of multiplexing

Algorithms are applicable to other traffic data


72

Network Operations: Time Scales uMinutes to hours Denial-of-service attacks Router and link failures Serious congestion uHours to weeks Time-of-day or day-of-week engineering Outlay of new routers and links Addition/deletion of customers or peers uWeeks to years Planning of new capacity and topology changes Evaluation of network designs and routing protocols

Collection of Measurement Data uNeed to transport measurement data Produced and consumed in different systems Usual scenario: large number of measurement devices, and a small number of aggregation points Usually in-band transport of measurement data uReliable transport: better data quality But, device needs to maintain state and be addressable, and measurements may overload a congested link uUnreliable transport: simpler measurement device But, uncertainty due to lost measurement data, and the loss process might be hard to model

Collection of Measurement Data uNeed to transport measurement data Produced and consumed in different systems Usual scenario: large number of measurement devices, and a small number of aggregation points Usually in-band transport of measurement data uReliable transport: better data quality But, device needs to maintain state and be addressable, and measurements may overload a congested link uUnreliable transport: simpler measurement device But, uncertainty due to lost measurement data, and the loss process might be hard to model

Simple Network Management Protocol (SNMP) uDefinition Router CPU utilization, link utilization, link loss, Standardized protocol and naming hierarchy Collected from every router/link every few minutes uOutline Management Information Base (MIB) Applications of SNMP traffic statistics Limitations of SNMP for network operations

SNMP: Applications of SNMP Traffic Statistics uDriving wall-board at operations center Complete view of every link in the network uUsage-based billing Tracking customer traffic on coarse time scale uAlarming on significant changes Detect high load or high packet loss uPlanning outlay of new capacity Trending analysis to predict requirements uInference of the offered traffic matrix more on this in part 2

SNMP: Measurement Limitations uStatistics are hard-coded No local accumulation of statistics No customizable alerting functions uHighly aggregated traffic information Aggregate link statistics (load, loss, etc.) Cannot drill down into more detail uProtocol is simple, but dumb Cannot express complex queries over MIB

SNMP: Conclusions uSNMP link statistics Highly-aggregated view of every link uApplications Network-wide view of aggregate traffic Detecting (but not diagnosing) problems uAdvantages Open standard that is universally supported Low volume of data, and low overhead on routers uDisadvantages Coarse-grain view in time (e.g., 1-5 minutes) Coarse-grain view in space (e.g., entire link) Unreliable transfer of the measurement data

Packet Monitoring uDefinition Passively collect packets on links Record IP, TCP/UDP, or application-layer traces uOutline Tapping a link and capturing packets Operational applications for packet traces Placement of the packet monitor Practical challenges in collecting the data

Packet Monitoring: Selecting the Traffic uFilter to focus on a subset of the packets IP addresses (e.g., to/from specific machines) Protocol (e.g., TCP, UDP, or ICMP) Port numbers (e.g., HTTP, DNS, BGP, Kazaa) uCollect first n bytes of packet (snap length) Medium access control header (if present) IP header (typically 20 bytes) IP+UDP header (typically 28 bytes) IP+TCP header (typically 40 bytes) Application-layer message (entire packet)

Packet Monitoring: IP Header Traces uSource/destination IP addresses Popular Web servers and heavy customers uTraffic breakdown by protocol Amount of traffic not using congestion control uPacket delay through the router Identification of typical delays and anomalies uDistribution of packet sizes Workload models for routers uBurstiness of link traffic over time Provisioning rules for allocating link capacity uThroughput between src/dest pairs Detection of performance problems

Packet Monitoring: TCP Header Analysis uSource and destination port numbers Popular applications (HTTP, Kazaa, DNS) # of parallel connections between source-dest pairs uSequence/ACK numbers and timestamps Out-of-order/lost packets Violations of congestion control uNumber of packets/bytes per connection Size of typical Web transfers Frequency of bulk transfers uSYN flags from client machines Unsuccessful connection requests Denial-of-service attacks

Packet Monitoring: System Constraints u High data rate Bandwidth limits on CPU, I/O, memory, and disk/tape Could monitor lower-speed links (edge of network) u High data volume Space limitations in main memory and on disk/tape Could do online analysis to sample, filter, & aggregate u High processing load CPU/memory limits for extracting and analyzing Could do offline processing for time-consuming analysis u General solutions to system constraints Sub-select the traffic (addresses/ports, first n bytes) Operating system and interface card support Efficient/robust software and hardware for the monitor

Packet Monitoring: PSAMP IETF Activity uGoals of the psamp group Minimal functionality for packet-level measurement Tunable trade-offs between overhead and accuracy Measurement data for a variety of important applications uBasic idea: parallel filter/sample banks Filter on header fields (src/dest, port #s, protocol) 1-out-of-N sampling (random, periodic, or hash) Record key IP and TCP/UDP header fields Send measurement record to a collection system uReferences http://ops.ietf.org/lists/psamp/

Packet Monitoring: Conclusions uPacket monitoring Detailed, fine-grain view of individual links uAdvantages Finest level of granularity (individual IP packets) Primary source of application-level information uDisadvantages Expensive to build and deploy Large volume of measurement data Difficult to deploy over a large network Hard to collect on high-speed links Hard to reconstruct application-level info

Flow Measurement: Outline uDefinition Passively collect statistics about groups of packets Group packets based on headers and time Essentially a form of aggregation uOutline Definition of an IP flow Applications of flow measurement Mechanics of collecting flow-level measurements Reducing the measurement overheads

Flow Measurement: Versus Packet Monitoring u Basic statistics (available from both) Traffic mix by IP addresses, port numbers, and protocol Average packet size u Traffic over time Both: traffic volumes on a medium-to-large time scale Packet: burstiness of the traffic on a small time scale u Statistics per TCP connection Both: number of packets & bytes transferred over the link Packet: frequency of lost or out-of-order packets, and the number of application-level bytes delivered u Per-packet info (available only from packet traces) TCP seq/ack #s, receiver window, per-packet flags, Probability distribution of packet sizes Application-level header and body (full packet contents)

Flow Measurement: Evicting Cache Entries u Flow timeout Remove idle flows (e.g., no packet in last 60 sec) Periodic sequencing through the cache u Cache replacement Remove flow(s) when cache is full Evict existing flow(s) upon creating new entry Apply eviction policy (LRU, random flow, etc.) u Long-lived flows Remove flow(s) that persist for a long time (e.g., 30 min) otherwise flow statistics dont become available and the byte and packet counters might overflow

Flow Measurement: Aggregation uDefine flows at a coarser level Ignore TCP/UDP port numbers, ToS bits, etc. Source and destination IP address blocks Source and destination autonomous system (AS) uAdvantages Reduce the size of the flow cache Reduce the number of flow records uDisadvantage Lost information for basic traffic reporting Impacted by the view of routing (prefixes) at this router

IETF Standards Activity u Real-Time Traffic Flow Meter (RTFM) Past working group on describing and measuring flows Meter with flow table and packet matching/handling Meter readers that transport usage data from the meters Manager for downloading rule sets to the meter SNMP for downloading rules and reading usage data u Internet Protocol Flow eXport (IPFX) Distinguishing flows (interfaces, IP & transport header fields, ) Metering (reliability, sampling, timestamps, flow timeout) Data export (information model, reliability, confidentiality, integrity, anonymization, reporting times)

Flow Measurement: Conclusions uFlow measurement Medium-grain view of traffic on links uAdvantages Lower measurement volume than packet traces Available on high-end line cards (Cisco Netflow) Control over overhead via aggregation and sampling uDisadvantages Computation and memory requirements for flow cache Loss of fine-grain timing and per-packet information Not uniformly supported by router vendors

Path Matrix: Operational Uses uCongested link Problem: easy to detect, hard to diagnose Which traffic is responsible? Which traffic affected? uCustomer complaint Problem: customer has limited visibility to diagnose How is the traffic of a given customer routed? Where does the traffic experience loss and delay? uDenial-of-service attack Problem: spoofed source address, distributed attack Where is the attack coming from? Who is affected?

Traffic Matrix: Operational Uses uShort-term congestion and performance problems Problem: predicting link loads after a routing change Map the traffic matrix onto the new set of routes uLong-term congestion and performance problems Problem: predicting link loads after topology changes Map traffic matrix onto the routes on new topology uReliability despite equipment failures

Populating the Domain-Wide Models uInference: assumptions about traffic and routing Traffic data: byte counts per link (over time) Routing data: path(s) between each pair of nodes uMapping: assumptions about routing Traffic data: packet/flow statistics at network edge Routing data: egress point(s) per destination prefix uDirect observation: no assumptions Traffic data: packet samples at every link Routing data: none

Network tomography is the study of a network's internal characteristics using information derived from end point data. The word tomography is used to link the field, in concept, to other processes that infer the internal characteristics of an object from external observation, as is done in magnetic resonance imaging or positron emission tomography (even though the term tomography strictly refers to imaging by slicing). The field is a recent development in electrical engineering and computer science, founded in 1996.[1] Network tomography advocates that it is possible to map the path data takes through the Internet by examining information from "edge nodes," the computers where data is originated and requested from. The field is useful for engineers attempting to develop more efficient computer networks. Data derived from network tomography studies can be used to increase quality of service by limiting link packet loss and increasing routing optimization.

Mapping: Remove Traffic Assumptions uAssumptions Know the egress point where traffic leaves the domain Know the path from the ingress to the egress point uApproach Collect fine-grain measurements at ingress points Associate each record with path and egress point Sum over measurement records with same path/egress uRequirements Packet or flow measurement at the ingress points Routing table from each of the egress points

Mapping: Challenges uLimitations Need for fine-grain data from ingress points Large volume of traffic measurement data Need for forwarding tables from egress point Data inconsistencies across different locations uDirections for future work Vendor support for packet measurement (psamp) Distributed infrastructure for collecting data Online monitoring of topology and routing data

Direct Observation: Overcoming Uncertainty uInternet traffic Fluctuation over time (burstiness, congestion control) Packet loss as traffic flows through the network Inconsistencies in timestamps across routers uIP routing protocols Changes due to failure and reconfiguration Large state space (high number of links or paths) Vendor-specific implementation (e.g., tie-breaking) Multicast groups that send to (dynamic) set of receivers uBetter to observe the traffic directly as it travels

Direct Observation: Straw-Man Approaches uPath marking Each packet carries the path it has traversed so far Drawback: excessive overhead uPacket or flow measurement on every link Combine records across all links to obtain the paths Drawback: excessive measurement and CPU overhead uSample the entire path for certain packets Sample and tag a fraction of packets at ingress point Sample all of the tagged packets inside the network Drawback: requires modification to IP (for tagging

Direct Observation: Trajectory Sampling uSample packets at every link without tagging Pseudo random sampling (e.g., 1-out-of-100) Either sample or dont sample at each link Compute a hash over the contents of the packet uDetails of consistent sampling x: subset of invariant bits in the packet Hash function: h(x) = x mod A Sample if h(x) < r, where r/A is a thinning factor uExploit entropy in packet contents to do sampling

In IP networks today, link load measurements are readily available via the Simple Network Management Protocol (SNMP). SNMP is useful because it is supported by most devices in an IP network. The SNMP data that is available on a device is defined in an abstract data structure known as a Management Information Base (MIB). A Network Measurement Station (NMS) periodically requests or polls the appropriate SNMP MIB data from a router (or other device). The standard MIBs defined on most routers/switches include a cyclic counter of the number of bytes transmitted and received on each of its interfaces. Hence we can obtain basic traffic statistics for the entire network with little additional infrastructure supportall we need is an SNMP poller that periodically records these counters. However, one should note carefully that SNMP counters (on devices) do not count the number of packets per interval, but only a running total. In order to compute packets per interval, we need to send polls at precise times. A typical polling interval for SNMP is five minutes.

SNMP data has many known limitations. Data may be lost in transit (SNMP uses unreliable UDP transport), or by the NMS, for instance, if the NMS crashes or reboots. Data may be incorrect through poor SNMP agent implementations, or because a counter has wrapped multiple times (this is easier than you might expect as old versions of SNMP used 32-bit counters and these could wrap quite quickly on a high-speed link, e.g., in less than 4 seconds on a 10Gbps link), counter resets (say after a router reboot), or because the timing of SNMP polls is somewhat hard to control.

This jitter in poll timing arises because(i)NMSs must perform polls to many devices, and cannot perform them all concurrently; (ii)timing on typical commodity hardware is not always very accurate [10]; (iii)SNMP processes on routers and switches are given low-priority and may therefore have a delayed response; (iv)poll packets may take some time to transit the network.

The net effect is that the time at which we aim to conduct a poll and the actual time of the poll are often offset by some jitter. This problem is compounded in some systems that do not even record when the poll was sent/received at the NMS (let alone the actual time the poll was answered by the network device), but only the intended time of the poll in the polling schedule. Obviously, the quality of such measurements varies depending on the NMS system, and the SNMP agent implementation on routers or other network devices. Some systems implicitly perform a crude interpolation when reporting the polling times, whereas other systems may make use of proprietary features of certain network devices to improve the accuracy of the timestamps. Other systems attempt to provide reliable transport of polls through retransmission (though this improves reliability at the expense of increasing delays between the desired and actual polling times). However, even where these facilities exist, the question still remains of how accurate the measured timestamps and values are. One should never simply accept that these will be accurate, given the many difficulties of getting timestamps in non-real-time systems [10] without accurate hardware clocks. Moreover, SNMP implementations are often add-ons, and given little consideration in the original design and architecture of devices, and given low priority in terms of testing and maintenance.

Many network managers assume that the errors in measurements are negligible. However, such assumptions are dangerous because errors can feed into management processes, corrupting the results, resulting in congestion or wasted resources. The size and nature of errors in a set of SNMP measurements will depend on the polling software, the network devices in question, and even the traffic on the network. It is important that ongoing calibration of measurements is a part maintaining quality in a network.

Note that what we propose here is different from compliance testing, such as one might conduct on an SNMP agent [11]. Such compliance testing is necessary, but only shows that an SNMP agent correctly responds to polls, and so forth. An agent can respond correctly and still the measurements contain errors such as those due to timing. Likewise, benchmarking and simulation [9] are of little use in this domain because we are interested in the performance of a particular SNMP/NMS system, and the details of a deployment are hard to really capture (e.g., what are the failure rates of the NMS, what are the delays in agent responses for an SNMP agent on a router under realistic traffic and control loads). The difficulty of calibrating SNMP systems in the field is that the major alternative source of data, flow-level data, is unsuitable for the task because the timing of flows is random (not fixed to the granularity of the SNMP measurements) and hence the datasets are incommensurate. The only (currently) practical source of ground truth data would be a packet trace, and few operators are willing to pay the cost of installation and management of the devices necessary to collect such data from high-speed links. The alternative proposed here is to use the redundancy already present in many SNMP datasets to self-calibrate the data. More specifically, many operators would collect SNMP data from the interfaces at either end of a substantial set of links in their network. We exploit this redundancy by performing comparisons between measurements from either end of the link to assess errors.

Overview of Capabilities Cisco routers and switches contain SNMP agents that can respond to standard SNMP get and set operations. That is, a management station can ask the Cisco device for information via an SNMP get, or it can tell the device to change some setting or take some actions, via a set operation. The device can also spontaneously originate traps or SNMPv2c inform notifications.

Insert HP Openview stuff here

Potrebbero piacerti anche