DetectHtml

A Java static method to detect text that has been marked up with HTML tags or entities.

I needed to detect self-contained HTML tags or entities in user supplied data to make formatting determinations. After searching the Internet I found a few examples as regular expressions. Most of the examples failed my initial test cases and didn't handle conditions such as text without tags that contained HTML entity escape codes.

I continued to refine the regular expression until I came up with a good meta expression that handled:

Start and End tag combinations in single or multi-line text values.
Text marked up with self-closing tags such as <br/> or <hr/>
Text marked up with HTML entity escape sequences like < or ½

I also wanted to make sure that it didn't match other common text phrases that may be misinterpreted as HTML.

Logic expressions such as: "If A<B then B>A"
Ampersand usage: AT&T, D&B, etc...
Malformed or partial HTML: </body></html>

Sample Usage

    String htmlContent="<a href=\"http://www.example.com/\">\nclick here\n</a>";
    if (DetectHtml.isHtml(htmlContent))
      System.out.println("htmlContent is HTML");

Please Note:

This in no way will check user provided HTML for safety. You still need to sanitize your HTML. I recommend OWASP to sanitize your HTML.

No dependencies required. Just refactor the class into your project and you're done.

--Dave

Name		Name	Last commit message	Last commit date
Latest commit History 10 Commits
src/org/github		src/org/github
test		test
testdata		testdata
.classpath		.classpath
.gitignore		.gitignore
.project		.project
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

DetectHtml

Sample Usage

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

DetectHtml

Sample Usage

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages