Sei sulla pagina 1di 8

SDN Contribution

REGEX TOY: TESTING REGULAR EXPRESSIONS IN ABAP

Summary
Regular expressions are a powerful tool for processing text-based information effectively and efficiently. The Regex Toy is a small, interactive tool aimed at ABAP developers who want to test their regular expressions quickly. It improves upon similar tools available on the Web by reflecting specifically the semantics of regular expressions in ABAP. Created on: 4 May 2006

Author Bio
Ralph Benzinger joined SAP in 2003. He is currently a developer in the ABAP language core development group at Walldorf, where he is working on the ABAP compiler and run-time environment. Ralph received a Ph.D. in Computer Science from Cornell University in the United States. Prior to joining SAP, he worked as management consultant in the Business Technology Office of McKinsey & Company.

2006 SAP AG

Table of Contents
REGEX TOY: TESTING REGULAR EXPRESSIONS IN ABAP ..................................................... 1 Testing Regular Expressions in ABAP ............................................................................................ 2 The Regex Toy ................................................................................................................................ 3 Summary.......................................................................................................................................... 6 Related Content............................................................................................................................... 7 Copyright.......................................................................................................................................... 8

Testing Regular Expressions in ABAP


Since the introduction of regular expressions (or regexes, for short) into ABAP with NetWeaver Release 2004s, their adoption is steadily picking up as more and more developers discover their power and usefulness in processing text-based data. Information validation, extraction, and transformation often become a simple matter of a carefully crafted regex that would otherwise require many lines of ABAP code. But the many benefits of regular expressions are not my concern here. Instead, I would like to dwell upon the difficulty in writing correct expressions that all users of regexes can testify to. The complexity of some of the more advanced features such as submatching, combined with the relative illegibility of regexes, has more than once lead to unexpected results that kept me scratching my head. In order to track down why my regex does or doesn't match a given text when it really should or shouldn't I either meditate over the regex pattern until inspiration hits me, or I fiddle around with both pattern and text by trial-and-error until I figure out what went wrong. Generally, I choose the latter approach and even wrote my own tool to assist me. There are, in fact, many such regex matching tools freely available on the Web, the most popular ones being The Regex Coach, Visual Regexp, and The Regulator (currently MIA). While all of these tools offer great functionality, they also uniformly suffer from one major defect they do not adequately reflect the way regexes work in ABAP. There are basically two different "schools" when it comes to regular expressions: Perl and POSIX. Both flavors differ significantly in how matches are computed: whereas POSIX returns the leftmost-longest match, Perl computes what I call the leftmost-first match. For example, the statement FIND REGEX '(a+|[ab]+)b' IN 'xxxaaabaabbaxxx' MATCH OFFSET off MATCH LENGTH len. will find the underlined sequence xxxaaabaabbaxxx using POSIX-style matching, but would find the underlined sequence xxxaaabaabbaxxx instead when using Perl-style pattern matching. Simply speaking, POSIX uses backtracking to search exhaustively for the longest match for a given regex, whereas Perl stops as soon as a match is found. The crux is that ABAP uses POSIX-style regexes, while most other tools including the ones mentioned above use Perl-style regexes. Thus, using these tools for ABAP development can be extremely misleading.

2006 SAP AG

The Regex Toy


Fortunately, there is no need to worry here NetWeaver brings along its own regex tool called The Regex Toy, although you may need to upgrade your 2004s installation to a more recent support package (SP7, I believe) in order to use it. To invoke the Regex Toy, follow the link in the ABAP online documentation on regular expressions, or simply start report DEMO_REGEX_TOY. You'll be presented with the following screen:

Really straightforward, isn't it? You enter your regex, some text and an optional replacement string at the top, select among a few options, and voila your matches (or replacements) will be highlighted in red. The bottom part will also show any submatches that your very first match captured with parentheses. Let's look at some examples. Suppose we want to redact all bold text from an HTML document, like this: REPLACE ALL OCCURRENCES OF REGEX '<b>.*</b>' IN htmltext WITH 'X'. Playing around with the Regex Toy quickly shows that this pattern will not do:

2006 SAP AG

We see that the .* after the <b> tag matches greedily anything up to the very last </b> tag clearly, this is not what we had in mind. Fixing this is slightly involved, but regex <b>([^<]|<(?!/b>))*</b> will do:

2006 SAP AG

As an example for extracting information with submatches, suppose that we want to write a regex that separates a file path such as /foo/bar/quux.txt into the directory part /foo/bar, the base name quux, and the extension txt. Our first try would be to match anything up to / as the directory, then anything up to . as the base name, and the remainder as the extension: FIND REGEX '(.*)/(.*)\.(.*)' IN text SUBMATCHES dir base ext. We then use the Regex Toy to test this expression against a number of sample paths. Eventually, we will discover that our regex breaks for paths that have a dot inside the directory path but no extension:

2006 SAP AG

As we can see, everything after the dot is put erroneously into the third submatch, i.e., the extension. This flaw (along with others) is fixed with the improved regex (.*/)?([^.]*)(\..*)?. Readers interested in the rationale for these two mildly complex regexes should keep an open eye for further articles on regular expressions on SDN.

Summary
Of course the Regex Toy is mostly that a toy rather than a tool. It does, however, all that I needed at the time when I wrote it. Let me know of any killer features that you would like to see included in a future update. And I'm positively soliciting creative readers to send me a funny cat limerick to replace my crude poetry!

2006 SAP AG

Related Content
1. ABAP Online Documentation (index search for "regex") 2. Eddy De Clercq: "Express Yourself Regularly", SDN 3. Wikipedia article on Regular Expressions

2006 SAP AG

Copyright
Copyright 2006 SAP AG. All rights reserved. No part of this publication may be reproduced or transmitted in any form or for any purpose without the express permission of SAP AG. The information contained herein may be changed without prior notice. Some software products marketed by SAP AG and its distributors contain proprietary software components of other software vendors. Microsoft, Windows, Outlook, and PowerPoint are registered trademarks of Microsoft Corporation. IBM, DB2, DB2 Universal Database, OS/2, Parallel Sysplex, MVS/ESA, AIX, S/390, AS/400, OS/390, OS/400, iSeries, pSeries, xSeries, zSeries, z/OS, AFP, Intelligent Miner, WebSphere, Netfinity, Tivoli, and Informix are trademarks or registered trademarks of IBM Corporation in the United States and/or other countries. Oracle is a registered trademark of Oracle Corporation. UNIX, X/Open, OSF/1, and Motif are registered trademarks of the Open Group. Citrix, ICA, Program Neighborhood, MetaFrame, WinFrame, VideoFrame, and MultiWin are trademarks or registered trademarks of Citrix Systems, Inc. HTML, XML, XHTML and W3C are trademarks or registered trademarks of W3C, World Wide Web Consortium, Massachusetts Institute of Technology. Java is a registered trademark of Sun Microsystems, Inc. JavaScript is a registered trademark of Sun Microsystems, Inc., used under license for technology invented and implemented by Netscape. MaxDB is a trademark of MySQL AB, Sweden. SAP, R/3, mySAP, mySAP.com, xApps, xApp, SAP NetWeaver, and other SAP products and services mentioned herein as well as their respective logos are trademarks or registered trademarks of SAP AG in Germany and in several other countries all over the world. All other product and service names mentioned are the trademarks of their respective companies. Data contained in this document serves information purposes only. National product specifications may vary. These materials are subject to change without notice. These materials are provided by SAP AG and its affiliated companies ("SAP Group") for informational purposes only, without representation or warranty of any kind, and SAP Group shall not be liable for errors or omissions with respect to the materials. The only warranties for SAP Group products and services are those that are set forth in the express warranty statements accompanying such products and services, if any. Nothing herein should be construed as constituting an additional warranty. These materials are provided as is without a warranty of any kind, either express or implied, including but not limited to, the implied warranties of merchantability, fitness for a particular purpose, or non-infringement. SAP shall not be liable for damages of any kind including without limitation direct, special, indirect, or consequential damages that may result from the use of these materials. SAP does not warrant the accuracy or completeness of the information, text, graphics, links or other items contained within these materials. SAP has no control over the information that you may access through the use of hot links contained in these materials and does not endorse your use of third party web pages nor provide any warranty whatsoever relating to third party web pages. Any software coding and/or code lines / strings (Code) included in this documentation are only examples and are not intended to be used in a productive system environment. The Code is only intended better explain and visualize the syntax and phrasing rules of certain coding. SAP does not warrant the correctness and completeness of the Code given herein, and SAP shall not be liable for errors or damages caused by the usage of the Code, except if such damages were caused by SAP intentionally or grossly negligent.

2006 SAP AG

Potrebbero piacerti anche