请教一个正则表达式的错误
时间:2008-06-07 05:35:04
来源:论坛整理 作者: 编辑:chinaitzhe
- HTML code
Code highlighting produced by Actipro CodeHighlighter (freeware) http://www.CodeHighlighter.com/ <!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN"> <html> <head> <base href="http://localhost:80/myjsp/"> <title>My JSP 'index.jsp' starting page</title> <meta http-equiv="pragma" content="no-cache"> <meta http-equiv="cache-control" content="no-cache"> <meta http-equiv="expires" content="0"> <meta http-equiv="keywords" content="keyword1,keyword2,keyword3"> <meta http-equiv="description" content="This is my page"> <!-- <link rel="stylesheet" type="text/css" href="styles.css"> --> </head> <body> <input type="hidden" value="7"> This is my JSP page. 哈哈<br> for Index,index.<br> <a href="index.jsp?currentPage=2">1</a> </body> </html>
- Java code
Code highlighting produced by Actipro CodeHighlighter (freeware) http://www.CodeHighlighter.com/ import java.util.regex.Matcher; import java.util.regex.Pattern; import com.heaton.bot.HTTPSocket; public class Experiment { public static void main(String args[]){ try { HTTPSocket http = new HTTPSocket(); http.send(args[0], null); System.out.println(http.getBody()); String output = getTxtWithoutHTMLElement(http.getBody()); System.out.println(output); } catch (Exception e) { } } public static String getTxtWithoutHTMLElement (String original) { if(original==null||"".equals(original.trim())) { return original; } Pattern pattern = Pattern.compile("<.*>",Pattern.DOTALL); Matcher matcher = pattern.matcher(original); StringBuffer strbuffer = new StringBuffer(); while (matcher.find()) { matcher.appendReplacement(strbuffer,""); } matcher.appendTail(strbuffer); return strbuffer.toString(); } }
网友回复:正则表达问题
<.*>
是匹配所有尖括号中的内容,而整个html文档的第一个字符和最后一个字符刚好凑成一个 <.*>
所以你的整个文档都会被匹配
可以换成
<[^> ]*>
意思是说:在 <> 之内,并且不再包含> 的内容才符合匹配标准
也就避免了 <> 的嵌套问题
另外,最后的StringBuffer操作和while循环可以用matcher.replaceAll方法代替,用不着写那么麻烦的循环
那段的逻辑我没认真看,不知道有没有问题,我的写法是:
public static String getTxtWithoutHTMLElement (String original)
{
if(original==null|| " ".equals(original.trim()))
{
return original;
}
Pattern pattern = Pattern.compile( " <[^> ]*> ",Pattern.DOTALL);
Matcher matcher = pattern.matcher(original);
return matcher.replaceAll( " ");
return strbuffer.toString();
}
网友回复:
呃,最后一个return strbuffer.toString(); 忘记删除了,我想楼主能够看懂:P
另外Pattern.DOTALL参数也可以去掉
网友回复:我明白了 谢谢 不过假如是去除CSS元素还是要用 <style.*? </style>
关键字:请教,一个,正则,表达式,错误,
上一篇:[每日一贴]没可用分了
下一篇:下面没有链接了











文章评论
共有 0 位网友发表了评论 此处只显示部分留言 点击查看完整评论页面