Documenti di Didattica
Documenti di Professioni
Documenti di Cultura
Billing is based upon a different model than for consumer For consumer it is at the transaction, usage level For carriers and providers it is a BULK BUSINESS
At its most fundamental level, internetinterconnect is a bulk business Routing of traffic is by IP and by destination.. NOT BY TRAFFIC type or usage There is NO WAY to differentiate this traffic, ESPECIALLY AT THE COMMERCIAL Level BGP and Walled Garden approaches can CHANNEL TRAFFIC to specific routers but ultimately, billing is by what passes through a particular router / port
Monitoring Methods
Log in and run performance statistics and other data. Using a monitoring device or counter in the device and monitor from a remote SERVER (SNMP) Any host can automatically see what all other hosts were doing by enabling promiscuous mode. However, modern switched network technologies such as those used on modern Ethernets provide, in effect, point-to-point links between pairs of devices, so it is hard for other devices to see traffic. (called "SPAN", for Switched Port Analyzer, by Cisco, and given other names by some other vendors) on routers and switches. referred to as Switched Port Analyzer (SPAN); some other vendors have other names for it, such as Roving Analysis Port (RAP) on 3Com switches. Low-cost alternative to network taps, and solves many of the same problems. Not all routers and switches support port mirroring and, on those that do, using port mirroring can affect the performance of the router or switch.
Remote Management
Promiscuous Mode
Port Mirroring
Leased Line Boundary Access Router BRAS (ISPClient) Boundary Access Router BRAS (Tier 2
Leased Line Boundary Access Router BRAS (ISPClient) Boundary Access Router BRAS (Tier 2
Leased Line Boundary Access Router BRAS (ISPClient) Boundary Access Router BRAS (Tier 2 Boundary Routers To Other Interconnect Partners (BGP-4 )
Principal Methods
Sniffers
Deep
Sniffers
Program that monitors and analyzes network traffic, detecting bottlenecks and problems. A network manager can keep traffic flowing efficiently. A sniffer can also be used legitimately or illegitimately to capture data being transmitted on a network. A network router reads every packet of data passed to it, determining whether it is intended for a destination within the router's own network or whether it should be passed further along the Internet. A router with a sniffer, however, may be able to read the data in the packet as well as the source and destination addresses.
A sniffer program runs on a localized network, typically where many machines are connected. As the different machines request information from the Internet, packets are sent back and forth from those machines. A sniffer analyzes, or "sniffs," those packets. By analyzing the packets, the sniffer can tell what machine is requesting what information, and what information is in that packet. Essentially, someone running a sniffer program is eavesdropping on any other computer on the network. This makes it easy to see what you're doing, whether it's receiving email, using instant messaging, downloading illegal files or checking out your bank account.
A packet analyzer (also known as a network analyzer, protocol analyzer, or sniffer, or for particular types of networks, an Ethernet sniffer or wireless sniffer) is a computer program or a piece of computer hardware that can intercept and log traffic passing over a digital network or part of a network. As data streams flow across the network, the sniffer captures each packet and, if needed, decodes the packet's raw data, showing the values of various fields in the packet, and analyzes its content according to the appropriate RFC or other specifications.
Programs that tap into a computer network with the purpose of intercepting data traveling between two network machines are called sniffing software. This type of software program can be used to intercept and interpret data on a computer or network, including browser passwords, chat programs, user settings and network traffic. Sniffing software can also be known as a packet scanner, a packet analyzer or a network analyzer. Types of sniffing software include Internet protocol (IP) sniffing software, hypertext markup language (HTML) sniffing software, port scanners, and packet sniffing programs. Software used for network sniffing is most often used legitimately by network administrators to identify the source of communication problems among different network machines.
Sniffing can be done through a wireless connection, or it can be performed using software installed on a computer that is part of the wired network. Common programs used for sniffing include Carnivore , snoop and SkyGrabber. Not all sniffing is done using sniffing software. Network administrators often have hardware scanners that perform network analysis. Hardware that analyzes network data includes Bluetooth-based sniffing devices and analysis hardware that taps straight into a computer port.
On wired broadcast LANs, depending on the network structure (hub or switch), one can capture traffic on all or just parts of the network from a single machine within the network; There are some methods to avoid traffic narrowing by switches to gain access to traffic from other systems on the network (e.g., ARP spoofing). For network monitoring purposes, it may also be desirable to monitor all data packets in a LAN by using a network switch with a so-called monitoring port, whose purpose is to mirror all packets passing through all ports of the switch when systems (computers) are connected to a switch port. To use a network tap is an even more reliable solution than to use a monitoring port, since taps are less likely to drop packets during high traffic loads.
A network tap is a hardware device which provides a way to access the data flowing across a computer network. In many cases, it is desirable for a third party to monitor the traffic between two points in the network. If the network between points A and B consists of a physical cable, a "network tap" may be the best way to accomplish this monitoring. The network tap has (at least) three ports: an A port, a B port, and a monitor port. A tap inserted between A and B passes all traffic through unimpeded, but also copies that same data to its monitor port, enabling a third party to listen. Network taps are commonly used for network intrusion detection systems, VoIP recording, network probes, RMON probes, packet sniffers, and other monitoring and collection devices and software that require access to a network segment. Taps are used in security applications because they are non-obtrusive, are not detectable on the network (having no physical or logical address), can deal with full-duplex and non-shared networks, and will usually pass through traffic even if the tap stops working or loses power.
DPI
Deep Packet Inspection (DPI) equipment is intended to identify the applications being used on the network Some of these devices can go much further; IE Narus, for instance, can
look
inside all traffic from a specific IP address, pick out the HTTP traffic, then drill even further down to capture only traffic headed to and from Gmail, and can even reassemble e-mails as they are typed out by the user.
DPC vs DPI
Packets can either be analyzed in realtime, or might be captured and analyzed later, a practice known as deep packet capture or DPC. Both techniques can reveal a wealth of data about network traffic. Applications may leave telltale signatures or patterns in packets they generate, allowing for accurate detection of program use across a network in realtime. Deep packet inspection is often used in large corporate networks to detect worms, viruses, and trojans that cant be seen by other security software like firewalls. DPI can also be used to limit or prioritize certain types of network traffic, a practice known as traffic shaping.
ISPs around the world use DPI technology in a variety of ways. Some use it to generate statistical information about the traffic that flows across their network, while others use network appliances purpose-built hardware that sits on an ISPs network to perform comprehensive monitoring of user traffic. The most advanced of these network appliances have the ability to act on this data in realtime. Some broadband providers, for example, use DPI to block or slow down file-sharing services. Network neutrality advocates fear this could lead to a multi-tiered Internet, a system in which the programs and services a customer is able to use online is dependent upon how much the customer pays.
By intercepting a large number of packets, ISPs and governments can reconstruct e-mails, listen in on voice over Internet Protocol (VoIP) calls, or even track users across different websites in order to display targeted advertising. Several ISPs in both the U.S. and U.K. have used this more advanced version of deep packet inspection to inject targeted advertising into websites their customers visit. Governments sometimes use DPI for surveillance and censorship purposes on the Internet. For example, Chinas Golden Shield Project, also known as The Great Firewall of China," is believed to use DPI. The U.S. National Security Agency has used commercial network appliances with deep packet inspection to monitor e-mails and VoIP calls.
Microsoft, Cisco, Checkpoint, Symantec, Nortel, SonicWall, NAI, Juniper/Netscreen, and others, have, in the past eighteen months started manufacturing firewall appliances that implement Deep Packet Inspection (DPI). In general, the DPI engine scrutinizes each packet (including the data payload) as it traverses the firewall, and rejects or allows the packet based upon a ruleset that is implemented by the firewall administrator. The inspection engine implements the ruleset based upon signature-based comparisons, heuristic, statistical, or anomaly-based techniques, or some combination of these.
Heuristic algorithms are often employed because they may be seen to "work" without having been mathematically proven to meet a given set of requirements. One common pitfall in implementing a heuristic method to meet a requirement comes when the engineer or designer fails to realize that the current data set does not necessarily represent future system states. While the existing data can be pored over and an algorithm can be devised to successfully handle the current data, it is imperative to ensure that the heuristic method employed is capable of handling future data sets. This means that the engineer or designer must fully understand the rules that generate the data and develop the algorithm to meet those requirements and not just address the current data sets. Statistical analysis should be conducted when employing heuristics to estimate the probability of incorrect outcomes. If one seeks to use a heuristic as a means of solving a search or knapsack problem, then one must be careful to make sure that the heuristic function which one is choosing to use is an admissible heuristic. Given a heuristic function labeled as:
Statistical anomaly and signature based IDSes All Intrusion Detection Systems use one of two detection techniques: Statistical anomaly-based IDS A statistical anomaly-based IDS determines normal network activity like what sort of bandwidth is generally used, what protocols are used, what ports and devices generally connect to each other- and alert the administrator or user when traffic is detected which is anomalous(not normal).[2] Signature-based IDS Signature based IDS monitors packets in the Network and compares with pre-configured and pre-determined attack patterns known as signatures. The issue is that there will be lag between the new threat discovered and Signature being applied in IDS for detecting the threat.During this lag time your IDS will be unable to identify the threat.[
Deep Packet Inspection promises to enhance firewall capabilities by adding the ability to analyze and filter SOAP and other XML messages, dynamically open and close ports for VoIP application traffic, perform in-line AV and spam screening, dynamically proxy IM traffic, eliminate the bevy of attacks against NetBIOS-based services, traffic-shape or do away with the many flavors of P2P traffic (recently shown to account for ~35% of internet traffic), and perform SSL session inspection. Deep Packet Inspection essentially collapses Intrusion Detection (IDS) functionality into the firewall appliance so that both a firewall and an in-line IDS are implemented on the same device. Many of these products have recently been shown to be vulnerable to exploitation of software defects in their DPI inspection engines, however. These data suggest that the addition of these enhanced functions to firewalls may, in fact, weaken, rather that strengthen network perimeter security.
Traditionally, firewalls have provided a physical and logical demarcation between the inside and the outside of a network. The first firewalls were basically just gateways between two networks with IP forwarding disabled. Most contemporary firewalls share a common set of characteristics: it is a single point between two or more networks where all traffic must pass (choke point);
The DEC SEAL also required a Gatekeeper device, which acted as an application proxy (AP). Application proxies or gateways are a second, common type of firewall mechanism. An AP functions by providing intermediary services for hosts that reside on different networks, while maintaining complete details of the TCP connection state and sequencing. In practice, a client host (running, for example, a web browser application) negotiates a service request with the AP, which acts as a surrogate for the host that provides services (the webserver). Two connections are required for a session to be completed - one between the client and the AP, and one between the AP and the server. No direct connection exists between hosts. Additionally, APs typically possess the ability to do a limited amount of packet filtering based upon rudimentary application-level data parsing. APs are considered by most people to be more secure than packet filtering firewalls, but performance and scalability factors have limited their distribution. Although current stateful firewall technologies provide for tracking the state of a connection, most provide only limited analysis of the application data. Several firewall vendors, including Check Point, Cisco, Symantec, Netscreen, and NAI have integrated additional application-level data analysis into the firewall. Checkpoint, for example, initially added application proxies for TELNET, FTP, and HTTP to the FW-1 product. Cisco's PIX fixup protocol initially provided for limited application parsing of FTP, HTTP, H.323, RSH, SMTP, and SQLNET. Both vendors have since added support for additional applications
To address the limitations of Packet-Filtering, Application Proxy, and Stateful Inspection, a technology known as Deep Packet Inspection (DPI) was developed. DPI operates at L3-7 of the OSI model. DPI engines parse the entire IP packet, and make forwarding decisions by means of a rule-based logic that is based upon signature or regular expression matching. That is, they compare the data within a packet payload to a database of predefined attack signatures (a string of bytes). Additionally, statistical or historical algorithms may supplement static pattern matching. Analysis of packet headers can be done economically since the locations of packet header fields are restricted by protocol standards. However, the payload contents are, for the most part, unconstrained. Therefore, searching through the payload for multiple string patterns within the datastream is a computationally expensive task. The requirement that these searches be performed at wirespeed adds to the cost. Additionally, because the signature database is dynamic, it must be easily updateable. Promising approaches to these problems include a software-based approach (Snort implementing the Boyer-Moore algorithm), and a hardware-based approach (FPGA's running a Bloom filter algorithm). DPI technology can be effective against buffer overflow attacks, denial of service (DoS) attacks, sophisticated intrusions, and a small percentage of worms that fit within a single packet. However, the complexity and immaturity of these systems have resulted in a number of recent exploits, as will be shown below.
Example Exploits Snort RPC Preprocessing Vulnerability Researchers at Internet Security Systems (ISS) discovered a remotely exploitable buffer overflow in the Snort stream4 preprocessor module. When the RPC decoder normalizes fragmented RPC records, it incorrectly checks the lengths of what is being normalized against the current packet size, leading to an overflow condition. The RPC preprocessor is enabled by default. Remote attackers may exploit the buffer overflow condition to run arbitrary code on a Snort sensor with the privileges of the Snort IDS process, which typically runs as the superuser. Trend Micro InterScan VirusWall Remote Overflow An implementation flaw in the InterScan VirusWall SMTP gateway allows a remote attacker to execute code with the privileges of the daemon. Due to an implementation fault in VirusWall's handling of a UUencoded file name, it is possible for a remote attacker to specify an arbitrarily long string, overwriting the stack with user defined data, and allowing a remote attacker to execute arbitrary code. Microsoft ISA Server 2000 H.323 Filter Remote Buffer Overflow Vulnerability The H.323 filter used by Microsoft ISA Server 2000 is prone to remote buffer overflow vulnerability. The condition presents itself due to insufficient boundary checks performed by the Microsoft Firewall Service on specially crafted H.323 traffic. Successful exploitation of this vulnerability may allow a remote attacker to execute arbitrary code in the context of Microsoft Firewall Service running on ISA Server 2000. This may lead to complete control of the vulnerable system. Cisco SIP Fixup Denial of Service (DoS) The Cisco PIX Firewall may reset when receiving fragmented SIP INVITE messages. Cisco H.323 Vulnerabilities Multiple Cisco products contain vulnerabilities in the processing of H.323 messages, which are typically used in Voice over Internet Protocol (VoIP) or multimedia applications. Check Point FireWall-1 H.323 Vulnerabilities FireWall-1 is affected by the recently reported vulnerabilities in various products' H.323 protocol implementation. The vulnerabilities are caused due to various errors in the processing of H.225 messages over TCP.
An Introduction to SNMP
Created in 1988 as a short-term solution to manage elements in the growing Internet and other attached networks
Achieved
widespread acceptance. Derived from its predecessor SGMP (Simple Gateway Management Protocol)
Was
intended to be replaced by a solution based on the CMIS/CMIP (Common Management Information Service/Protocol) architecture. Never received the widespread acceptance of SNMP.
Leased Line Boundary Access Router BRAS (ISPClient) Boundary Access Router BRAS (Tier 2
Simple Network Management Protocol (SNMP) is an "Internet-standard protocol for managing devices on IP networks. Devices that typically support SNMP include routers, switches, servers, workstations, printers, modem racks, and more."[1] It is used mostly in network management systems to monitor network-attached devices for conditions that warrant administrative attention. SNMP is a component of the Internet Protocol Suite as defined by the Internet Engineering Task Force (IETF). It consists of a set of standards for network management, including an application layer protocol, a database schema, and a set of data objects.[2]
In typical SNMP uses, one or more administrative computers, called managers, have the task of monitoring or managing a group of hosts or devices on a computer network. Each managed system executes, at all times, a software component called an agent which reports information via SNMP to the manager. Essentially, SNMP agents expose management data on the managed systems as variables. The protocol also permits active management tasks, such as modifying and applying a new configuration through remote modification of these variables. The variables accessible via SNMP are organized in hierarchies. These hierarchies, and other metadata (such as type and description of the variable), are described by Management Information Bases (MIBs).
An SNMP-managed network consists of three key components: Managed device Agent software which runs on managed devices Network management system (NMS) software which runs on the manager
A managed device is a network node that implements an SNMP interface that allows unidirectional (read-only) or bidirectional access to node-specific information. Managed devices exchange node-specific information with the NMSs. Sometimes called network elements, the managed devices can be any type of device, including, but not limited to, routers, access servers, switches, bridges, hubs, IP telephones, IP video cameras, computer hosts, and printers. An agent is a network-management software module that resides on a managed device. An agent has local knowledge of management information and translates that information to or from an SNMP specific form. A network management system (NMS) executes applications that monitor and control managed devices. NMSs provide the bulk of the processing and memory resources required for network management. One or more NMSs may exist on any managed network.
SNMP itself does not define which information (which variables) a managed system should offer. Rather, SNMP uses an extensible design, where the available information is defined by management information bases (MIBs). MIBs describe the structure of the management data of a device subsystem; they use a hierarchical namespace containing object identifiers (OID). Each OID identifies a variable that can be read or set via SNMP. MIBs use the notation defined by ASN.1.
SNMP operates in the Application Layer of the Internet Protocol Suite (Layer 7 of the OSI model). The SNMP agent receives requests on UDP port 161. The manager may send requests from any available source port to port 161 in the agent. The agent response will be sent back to the source port on the manager. The manager receives notifications (Traps and InformRequests) on port 162. The agent may generate notifications from any available port. When used with Transport Layer Security or Datagram Transport Layer Security requests are received on port 10161 and traps are sent to port 10162.[3]. SNMPv1 specifies five core protocol data units (PDUs). Two other PDUs, GetBulkRequest and InformRequest were added in SNMPv2 and carried over to SNMPv3
The seven SNMP protocol data units (PDUs) are as follows: [edit] GetRequest A manager-to-agent request to retrieve the value of a variable or list of variables. Desired variables are specified in variable bindings (values are not used). Retrieval of the specified variable values is to be done as an atomic operation by the agent. A Response with current values is returned. [edit] SetRequest A manager-to-agent request to change the value of a variable or list of variables. Variable bindings are specified in the body of the request. Changes to all specified variables are to be made as an atomic operation by the agent. A Response with (current) new values for the variables is returned. [edit] GetNextRequest A manager-to-agent request to discover available variables and their values. Returns a Response with variable binding for the lexicographically next variable in the MIB. The entire MIB of an agent can be walked by iterative application of GetNextRequest starting at OID 0. Rows of a table can be read by specifying column OIDs in the variable bindings of the request. [edit] GetBulkRequest
Optimized version of GetNextRequest. A manager-to-agent request for multiple iterations of GetNextRequest. Returns a Response with multiple variable bindings walked from the variable binding or bindings in the request. PDU specific non-repeaters and max-repetitions fields are used to control response behavior. GetBulkRequest was introduced in SNMPv2.
[edit] Response Returns variable bindings and acknowledgement from agent to manager for GetRequest, SetRequest, GetNextRequest, GetBulkRequest and InformRequest. Error reporting is provided by error-status and error-index fields. Although it was used as a response to both gets and sets, this PDU was called GetResponse in SNMPv1. [edit] Trap Asynchronous notification from agent to manager. Includes current sysUpTime value, an OID identifying the type of trap and optional variable bindings. Destination addressing for traps is determined in an application-specific manner typically through trap configuration variables in the MIB. The format of the trap message was changed in SNMPv2 and the PDU was renamed SNMPv2-Trap. [edit] InformRequest Acknowledged asynchronous notification from manager to manager. This PDU uses the same format as the SNMPv2 version of Trap. Manager-tomanager notifications were already possible in SNMPv1 (using a Trap), but as SNMP commonly runs over UDP where delivery is not assured and dropped packets are not reported, delivery of a Trap was not guaranteed. InformRequest fixes this by sending back an acknowledgement on receipt. Receiver replies with Response parroting all information in the InformRequest. This PDU was introduced in SNMPv2.
The PRTG network traffic logger uses SNMP, packet sniffing, and NetFlow / sFlow to measure network traffic and network throughput. SNMP is suitable for basic port monitoring, or if you just want to know your overall bandwidth performance. Packet sniffing can be used for more advanced network traffic measurement: this method scans all data packets flowing through your network and classifies the data streams by protocol, IP address, or other parameters, allowing for an indepth bandwidth analysis. If your network devices support NetFlow or sFlow, you can also use this technology to control network traffic: PRTG reads the pre-aggregated monitoring data from these devices and shows it in easy to read graphs and tables. The level of detail is similar to the data obtained from packet sniffing.
SNMP
Simple Network Measurements Please!
Matthew Roughan (+many others) <roughan@research.att.com>
40
Outline
AT&T Labs Research
Traffic modeling deals with stationary processes (typically) Time series gives us a way of getting a stationary process But the analysis requires an understanding of the traffic model
41
42
43
Packet traces limited availability special equipment needed (O&M expensive even if box is cheap) lower speed interfaces (only recently OC48 available, no OC192) huge amount of data generated 44
Flow level data not available everywhere historically poor vendor support (from some vendors) large volume of data (1:100 compared to traffic) feature interaction/performance impact 45
SNMP traffic data MIB II (including IfInOctets/IfOutOctets) is available almost everywhere manageable volume of data no significant impact on router performance 46
SNMP
SNMP Relatively low volume It is used by operations already (lots of historical data)
Disadvantages
Data
quality
Ambiguous Missing
poller
data
Management system agent
router
SNMP Polls
48
Why?
Missing
data (transport over UDP, often in-band) Delays in polling (jitter) Poller sync
Multiple
Why care?
Time
49
Applications
Capacity planning
Network
at the moment is hand-crafted Want to automate processes Provisioning for failure scenarios requires adding loads
Traffic engineering
Even BGP
Event detection
Operations
Multi-scale Multi-resolution
51
Replace sinusoidal basis functions of FFT with wavelet basis functions Implementation in pyramidal filter banks
HP FIR LP FIR 2 2 HP FIR LP FIR 2 2 HP FIR LP FIR 2 2
d (1,)
d (2,)
d (3,)
a (3,)
52
Dyadic grid
no redundancy, no loss of information Each frequency/scale examined at a resolution matched to its scale
4
Scale
3 2
1
time 53
Scale
3 2
1
time 54
Scale
3 2
1
time 55
crowds
56
Example: compression
57
58
59
60
Example: interpolation
Wavelet based
61
Wavelet based
62
Parameter tuning
How
minute measurements 24 hour cycle, 7 day cycle But dyadic grid is in powers of 2 CWT looses many of the advantages of DWT
Example
Compression Look
63 for parameters/wavelet that dont loose
To SNMP data
In
65
Model
Tt
term changes in traffic
Seasonal
Daily
St Wt It
Stationary
Normal
Transient
DoS,
anomalies,
standard
Based on Norros model Non-stationary mean Stochastic component unspecified (for the moment)
xt mt amt Wt I t
mt Tt St
67
m m a m
i i i i i
i i
of multiplexing
68
Decomposition
MA for trend (window > period of seasonal component) SMA for seasonal component (average at same time of day/week) Several methods for segmenting It
Interpolation
Linear, or wavelet based for short gaps (<3 hours) Model based for long gaps (>3 hours)
Should be understood
Capacity planning
69
Example: decomposition
AT&T Labs Research
Data
=>
Decomposition
trend
70
Example: interpolation
AT&T Labs Research
71
Conclusion
AT&T Labs Research
Available everywhere You need to do some work to extract useful data There is still more info. to get (packet traces, flow data, ) Not always obvious how to set parameters A framework for other algorithms A way to decide what information is important A way of seeing how smooth traffic really is
Effect of multiplexing
Network Operations: Time Scales uMinutes to hours Denial-of-service attacks Router and link failures Serious congestion uHours to weeks Time-of-day or day-of-week engineering Outlay of new routers and links Addition/deletion of customers or peers uWeeks to years Planning of new capacity and topology changes Evaluation of network designs and routing protocols
Collection of Measurement Data uNeed to transport measurement data Produced and consumed in different systems Usual scenario: large number of measurement devices, and a small number of aggregation points Usually in-band transport of measurement data uReliable transport: better data quality But, device needs to maintain state and be addressable, and measurements may overload a congested link uUnreliable transport: simpler measurement device But, uncertainty due to lost measurement data, and the loss process might be hard to model
Collection of Measurement Data uNeed to transport measurement data Produced and consumed in different systems Usual scenario: large number of measurement devices, and a small number of aggregation points Usually in-band transport of measurement data uReliable transport: better data quality But, device needs to maintain state and be addressable, and measurements may overload a congested link uUnreliable transport: simpler measurement device But, uncertainty due to lost measurement data, and the loss process might be hard to model
Simple Network Management Protocol (SNMP) uDefinition Router CPU utilization, link utilization, link loss, Standardized protocol and naming hierarchy Collected from every router/link every few minutes uOutline Management Information Base (MIB) Applications of SNMP traffic statistics Limitations of SNMP for network operations
SNMP: Applications of SNMP Traffic Statistics uDriving wall-board at operations center Complete view of every link in the network uUsage-based billing Tracking customer traffic on coarse time scale uAlarming on significant changes Detect high load or high packet loss uPlanning outlay of new capacity Trending analysis to predict requirements uInference of the offered traffic matrix more on this in part 2
SNMP: Measurement Limitations uStatistics are hard-coded No local accumulation of statistics No customizable alerting functions uHighly aggregated traffic information Aggregate link statistics (load, loss, etc.) Cannot drill down into more detail uProtocol is simple, but dumb Cannot express complex queries over MIB
SNMP: Conclusions uSNMP link statistics Highly-aggregated view of every link uApplications Network-wide view of aggregate traffic Detecting (but not diagnosing) problems uAdvantages Open standard that is universally supported Low volume of data, and low overhead on routers uDisadvantages Coarse-grain view in time (e.g., 1-5 minutes) Coarse-grain view in space (e.g., entire link) Unreliable transfer of the measurement data
Packet Monitoring uDefinition Passively collect packets on links Record IP, TCP/UDP, or application-layer traces uOutline Tapping a link and capturing packets Operational applications for packet traces Placement of the packet monitor Practical challenges in collecting the data
Packet Monitoring: Selecting the Traffic uFilter to focus on a subset of the packets IP addresses (e.g., to/from specific machines) Protocol (e.g., TCP, UDP, or ICMP) Port numbers (e.g., HTTP, DNS, BGP, Kazaa) uCollect first n bytes of packet (snap length) Medium access control header (if present) IP header (typically 20 bytes) IP+UDP header (typically 28 bytes) IP+TCP header (typically 40 bytes) Application-layer message (entire packet)
Packet Monitoring: IP Header Traces uSource/destination IP addresses Popular Web servers and heavy customers uTraffic breakdown by protocol Amount of traffic not using congestion control uPacket delay through the router Identification of typical delays and anomalies uDistribution of packet sizes Workload models for routers uBurstiness of link traffic over time Provisioning rules for allocating link capacity uThroughput between src/dest pairs Detection of performance problems
Packet Monitoring: TCP Header Analysis uSource and destination port numbers Popular applications (HTTP, Kazaa, DNS) # of parallel connections between source-dest pairs uSequence/ACK numbers and timestamps Out-of-order/lost packets Violations of congestion control uNumber of packets/bytes per connection Size of typical Web transfers Frequency of bulk transfers uSYN flags from client machines Unsuccessful connection requests Denial-of-service attacks
Packet Monitoring: System Constraints u High data rate Bandwidth limits on CPU, I/O, memory, and disk/tape Could monitor lower-speed links (edge of network) u High data volume Space limitations in main memory and on disk/tape Could do online analysis to sample, filter, & aggregate u High processing load CPU/memory limits for extracting and analyzing Could do offline processing for time-consuming analysis u General solutions to system constraints Sub-select the traffic (addresses/ports, first n bytes) Operating system and interface card support Efficient/robust software and hardware for the monitor
Packet Monitoring: PSAMP IETF Activity uGoals of the psamp group Minimal functionality for packet-level measurement Tunable trade-offs between overhead and accuracy Measurement data for a variety of important applications uBasic idea: parallel filter/sample banks Filter on header fields (src/dest, port #s, protocol) 1-out-of-N sampling (random, periodic, or hash) Record key IP and TCP/UDP header fields Send measurement record to a collection system uReferences http://ops.ietf.org/lists/psamp/
Packet Monitoring: Conclusions uPacket monitoring Detailed, fine-grain view of individual links uAdvantages Finest level of granularity (individual IP packets) Primary source of application-level information uDisadvantages Expensive to build and deploy Large volume of measurement data Difficult to deploy over a large network Hard to collect on high-speed links Hard to reconstruct application-level info
Flow Measurement: Outline uDefinition Passively collect statistics about groups of packets Group packets based on headers and time Essentially a form of aggregation uOutline Definition of an IP flow Applications of flow measurement Mechanics of collecting flow-level measurements Reducing the measurement overheads
Flow Measurement: Versus Packet Monitoring u Basic statistics (available from both) Traffic mix by IP addresses, port numbers, and protocol Average packet size u Traffic over time Both: traffic volumes on a medium-to-large time scale Packet: burstiness of the traffic on a small time scale u Statistics per TCP connection Both: number of packets & bytes transferred over the link Packet: frequency of lost or out-of-order packets, and the number of application-level bytes delivered u Per-packet info (available only from packet traces) TCP seq/ack #s, receiver window, per-packet flags, Probability distribution of packet sizes Application-level header and body (full packet contents)
Flow Measurement: Evicting Cache Entries u Flow timeout Remove idle flows (e.g., no packet in last 60 sec) Periodic sequencing through the cache u Cache replacement Remove flow(s) when cache is full Evict existing flow(s) upon creating new entry Apply eviction policy (LRU, random flow, etc.) u Long-lived flows Remove flow(s) that persist for a long time (e.g., 30 min) otherwise flow statistics dont become available and the byte and packet counters might overflow
Flow Measurement: Aggregation uDefine flows at a coarser level Ignore TCP/UDP port numbers, ToS bits, etc. Source and destination IP address blocks Source and destination autonomous system (AS) uAdvantages Reduce the size of the flow cache Reduce the number of flow records uDisadvantage Lost information for basic traffic reporting Impacted by the view of routing (prefixes) at this router
IETF Standards Activity u Real-Time Traffic Flow Meter (RTFM) Past working group on describing and measuring flows Meter with flow table and packet matching/handling Meter readers that transport usage data from the meters Manager for downloading rule sets to the meter SNMP for downloading rules and reading usage data u Internet Protocol Flow eXport (IPFX) Distinguishing flows (interfaces, IP & transport header fields, ) Metering (reliability, sampling, timestamps, flow timeout) Data export (information model, reliability, confidentiality, integrity, anonymization, reporting times)
Flow Measurement: Conclusions uFlow measurement Medium-grain view of traffic on links uAdvantages Lower measurement volume than packet traces Available on high-end line cards (Cisco Netflow) Control over overhead via aggregation and sampling uDisadvantages Computation and memory requirements for flow cache Loss of fine-grain timing and per-packet information Not uniformly supported by router vendors
Path Matrix: Operational Uses uCongested link Problem: easy to detect, hard to diagnose Which traffic is responsible? Which traffic affected? uCustomer complaint Problem: customer has limited visibility to diagnose How is the traffic of a given customer routed? Where does the traffic experience loss and delay? uDenial-of-service attack Problem: spoofed source address, distributed attack Where is the attack coming from? Who is affected?
Traffic Matrix: Operational Uses uShort-term congestion and performance problems Problem: predicting link loads after a routing change Map the traffic matrix onto the new set of routes uLong-term congestion and performance problems Problem: predicting link loads after topology changes Map traffic matrix onto the routes on new topology uReliability despite equipment failures
Populating the Domain-Wide Models uInference: assumptions about traffic and routing Traffic data: byte counts per link (over time) Routing data: path(s) between each pair of nodes uMapping: assumptions about routing Traffic data: packet/flow statistics at network edge Routing data: egress point(s) per destination prefix uDirect observation: no assumptions Traffic data: packet samples at every link Routing data: none
Network tomography is the study of a network's internal characteristics using information derived from end point data. The word tomography is used to link the field, in concept, to other processes that infer the internal characteristics of an object from external observation, as is done in magnetic resonance imaging or positron emission tomography (even though the term tomography strictly refers to imaging by slicing). The field is a recent development in electrical engineering and computer science, founded in 1996.[1] Network tomography advocates that it is possible to map the path data takes through the Internet by examining information from "edge nodes," the computers where data is originated and requested from. The field is useful for engineers attempting to develop more efficient computer networks. Data derived from network tomography studies can be used to increase quality of service by limiting link packet loss and increasing routing optimization.
Mapping: Remove Traffic Assumptions uAssumptions Know the egress point where traffic leaves the domain Know the path from the ingress to the egress point uApproach Collect fine-grain measurements at ingress points Associate each record with path and egress point Sum over measurement records with same path/egress uRequirements Packet or flow measurement at the ingress points Routing table from each of the egress points
Mapping: Challenges uLimitations Need for fine-grain data from ingress points Large volume of traffic measurement data Need for forwarding tables from egress point Data inconsistencies across different locations uDirections for future work Vendor support for packet measurement (psamp) Distributed infrastructure for collecting data Online monitoring of topology and routing data
Direct Observation: Overcoming Uncertainty uInternet traffic Fluctuation over time (burstiness, congestion control) Packet loss as traffic flows through the network Inconsistencies in timestamps across routers uIP routing protocols Changes due to failure and reconfiguration Large state space (high number of links or paths) Vendor-specific implementation (e.g., tie-breaking) Multicast groups that send to (dynamic) set of receivers uBetter to observe the traffic directly as it travels
Direct Observation: Straw-Man Approaches uPath marking Each packet carries the path it has traversed so far Drawback: excessive overhead uPacket or flow measurement on every link Combine records across all links to obtain the paths Drawback: excessive measurement and CPU overhead uSample the entire path for certain packets Sample and tag a fraction of packets at ingress point Sample all of the tagged packets inside the network Drawback: requires modification to IP (for tagging
Direct Observation: Trajectory Sampling uSample packets at every link without tagging Pseudo random sampling (e.g., 1-out-of-100) Either sample or dont sample at each link Compute a hash over the contents of the packet uDetails of consistent sampling x: subset of invariant bits in the packet Hash function: h(x) = x mod A Sample if h(x) < r, where r/A is a thinning factor uExploit entropy in packet contents to do sampling
In IP networks today, link load measurements are readily available via the Simple Network Management Protocol (SNMP). SNMP is useful because it is supported by most devices in an IP network. The SNMP data that is available on a device is defined in an abstract data structure known as a Management Information Base (MIB). A Network Measurement Station (NMS) periodically requests or polls the appropriate SNMP MIB data from a router (or other device). The standard MIBs defined on most routers/switches include a cyclic counter of the number of bytes transmitted and received on each of its interfaces. Hence we can obtain basic traffic statistics for the entire network with little additional infrastructure supportall we need is an SNMP poller that periodically records these counters. However, one should note carefully that SNMP counters (on devices) do not count the number of packets per interval, but only a running total. In order to compute packets per interval, we need to send polls at precise times. A typical polling interval for SNMP is five minutes.
SNMP data has many known limitations. Data may be lost in transit (SNMP uses unreliable UDP transport), or by the NMS, for instance, if the NMS crashes or reboots. Data may be incorrect through poor SNMP agent implementations, or because a counter has wrapped multiple times (this is easier than you might expect as old versions of SNMP used 32-bit counters and these could wrap quite quickly on a high-speed link, e.g., in less than 4 seconds on a 10Gbps link), counter resets (say after a router reboot), or because the timing of SNMP polls is somewhat hard to control.
This jitter in poll timing arises because(i)NMSs must perform polls to many devices, and cannot perform them all concurrently; (ii)timing on typical commodity hardware is not always very accurate [10]; (iii)SNMP processes on routers and switches are given low-priority and may therefore have a delayed response; (iv)poll packets may take some time to transit the network.
The net effect is that the time at which we aim to conduct a poll and the actual time of the poll are often offset by some jitter. This problem is compounded in some systems that do not even record when the poll was sent/received at the NMS (let alone the actual time the poll was answered by the network device), but only the intended time of the poll in the polling schedule. Obviously, the quality of such measurements varies depending on the NMS system, and the SNMP agent implementation on routers or other network devices. Some systems implicitly perform a crude interpolation when reporting the polling times, whereas other systems may make use of proprietary features of certain network devices to improve the accuracy of the timestamps. Other systems attempt to provide reliable transport of polls through retransmission (though this improves reliability at the expense of increasing delays between the desired and actual polling times). However, even where these facilities exist, the question still remains of how accurate the measured timestamps and values are. One should never simply accept that these will be accurate, given the many difficulties of getting timestamps in non-real-time systems [10] without accurate hardware clocks. Moreover, SNMP implementations are often add-ons, and given little consideration in the original design and architecture of devices, and given low priority in terms of testing and maintenance.
Many network managers assume that the errors in measurements are negligible. However, such assumptions are dangerous because errors can feed into management processes, corrupting the results, resulting in congestion or wasted resources. The size and nature of errors in a set of SNMP measurements will depend on the polling software, the network devices in question, and even the traffic on the network. It is important that ongoing calibration of measurements is a part maintaining quality in a network.
Note that what we propose here is different from compliance testing, such as one might conduct on an SNMP agent [11]. Such compliance testing is necessary, but only shows that an SNMP agent correctly responds to polls, and so forth. An agent can respond correctly and still the measurements contain errors such as those due to timing. Likewise, benchmarking and simulation [9] are of little use in this domain because we are interested in the performance of a particular SNMP/NMS system, and the details of a deployment are hard to really capture (e.g., what are the failure rates of the NMS, what are the delays in agent responses for an SNMP agent on a router under realistic traffic and control loads). The difficulty of calibrating SNMP systems in the field is that the major alternative source of data, flow-level data, is unsuitable for the task because the timing of flows is random (not fixed to the granularity of the SNMP measurements) and hence the datasets are incommensurate. The only (currently) practical source of ground truth data would be a packet trace, and few operators are willing to pay the cost of installation and management of the devices necessary to collect such data from high-speed links. The alternative proposed here is to use the redundancy already present in many SNMP datasets to self-calibrate the data. More specifically, many operators would collect SNMP data from the interfaces at either end of a substantial set of links in their network. We exploit this redundancy by performing comparisons between measurements from either end of the link to assess errors.
Overview of Capabilities Cisco routers and switches contain SNMP agents that can respond to standard SNMP get and set operations. That is, a management station can ask the Cisco device for information via an SNMP get, or it can tell the device to change some setting or take some actions, via a set operation. The device can also spontaneously originate traps or SNMPv2c inform notifications.