Documenti di Didattica
Documenti di Professioni
Documenti di Cultura
Published: November 18, 2009 , Updated: January 23, 2010 , Author: mkyong
Regular expression is an art of the programing, its hard to debug , learn and understand, but the powerful features are still attract many developers to code regular expression. Lets explore the following 10 practical regular expression ~ enjoy :)
7. Time Format Regular Expression Pattern Time in 12-Hour Format Regular Expression Pattern
(1[012]|[1-9]):[0-5][0-9](\\s)?(?i)(am|pm) ( 1[012] | [1-9] ) : [0-5][0-9] (\\s)? (?i) (am|pm) #start of group #1 # start with 10, 11, 12 # or # start with 1,2,...9 #end of group #1 # follow by a semi colon (:) # follow by 0..5 and 0..9, which means 00 to 59 # follow by a white space (optional) # next checking is case insensitive # follow by am or pm
10. HTML links Regular Expression Pattern HTML A tag Regular Expression Pattern
(?i)<a([^>]+)>(.+?)</a> ( #start of group #1
# all checking are case insensive #end of group #1 #start with "<a" # start of group #2 # anything except (">"), at least one character # end of group #2 # follow by ">" # match anything # end with "</a>
Here is a simple Java Link extractor to extract the href value from 1st pattern, and use 2nd pattern to extract the link from 1st pattern value. Of course with some logic as below.
"\\s*(?i)href\\s*=\\s*(\"([^\"]*\")|'[^']*'|([^'\">\\s]+))"; public HTMLLinkExtrator(){ patternTag = Pattern.compile(HTML_A_TAG_PATTERN); patternLink = Pattern.compile(HTML_A_HREF_TAG_PATTERN); } /** * Validate html with regular expression * @param html html content for validation * @return Vector links and link text */ public Vector<HtmlLink> grabHTMLLinks(final String html){ Vector<HtmlLink> result = new Vector<HtmlLink>(); matcherTag = patternTag.matcher(html); while(matcherTag.find()){ String href = matcherTag.group(1); //href String linkText = matcherTag.group(2); //link text matcherLink = patternLink.matcher(href); while(matcherLink.find()){ String link = matcherLink.group(1); //link result.add(new HtmlLink(link, linkText)); } } return result; } class HtmlLink { String link; String linkText; HtmlLink(String link, String linkText){ this.link = link; this.linkText = linkText; } @Override public String toString() { return new StringBuffer("Link : ") .append(this.link) .append(" Link Text : ") .append(this.linkText).toString(); } } }
package com.mkyong.regex; import java.util.Vector; import org.testng.Assert; import org.testng.annotations.*; import com.mkyong.regex.HTMLLinkExtrator.HtmlLink; /** * HTML link extrator Testing * @author mkyong * */ public class HTMLLinkExtratorTest { private HTMLLinkExtrator htmlLinkExtrator; @BeforeClass public void initData(){ htmlLinkExtrator = new HTMLLinkExtrator(); } @DataProvider public Object[][] HTMLContentProvider() { return new Object[][]{ new Object[] {"abc hahaha <a href='http://www.google.com'>google</a>"}, new Object[] {"abc hahaha <a HREF='http://www.google.com'>google</a>"}, new Object[] {"abc hahaha <A HREF='http://www.google.com'>google</A> , " + "abc hahaha <A HREF='http://www.google.com' target='_blank'>google</A>"}, new Object[] {"abc hahaha <A HREF='http://www.google.com' target='_blank'>google</A>"}, new Object[] {"abc hahaha <A target='_blank' HREF='http://www.google.com'>google</A>"}, new Object[] {"abc hahaha <a HREF=http://www.google.com>google</a>"}, }; } @Test(dataProvider = "HTMLContentProvider") public void ValidHTMLLinkTest(String html) { Vector<HtmlLink> links = htmlLinkExtrator.grabHTMLLinks(html); Assert.assertTrue(links.size()!=0); for(int i=0; i<links.size() ; i++){ HtmlLink htmlLinks = links.get(i); System.out.println(htmlLinks); } } }
Link : 'http://www.google.com' Link Text : google Link : 'http://www.google.com' Link Text : google Link : 'http://www.google.com' Link Text : google Link : 'http://www.google.com' Link Text : google Link : http://www.google.com Link Text : google PASSED: ValidHTMLLinkTest("abc hahaha <a href='http://www.google.com'>google</a>") PASSED: ValidHTMLLinkTest("abc hahaha <a HREF='http://www.google.com'>google</a>") PASSED: ValidHTMLLinkTest("abc hahaha <A HREF='http://www.google.com'>google</A> , abc hahaha <A HREF='http://www.google.com' target='_blank'>google</A>") PASSED: ValidHTMLLinkTest("abc hahaha <A HREF='http://www.google.com' target='_blank'>google</A>") PASSED: ValidHTMLLinkTest("abc hahaha <A target='_blank' HREF='http://www.google.com'>google</A>") PASSED: ValidHTMLLinkTest("abc hahaha <a HREF=http://www.google.com>google</a>") =============================================== com.mkyong.regex.HTMLLinkExtratorTest Tests run: 6, Failures: 0, Skips: 0 =============================================== =============================================== mkyong Total tests run: 6, Failures: 0, Skips: 0 ===============================================