TagSoup

Open source SAX-compliant HTML parser written in Java
Download

TagSoup Ranking & Summary

Advertisement

  • Rating:
  • License:
  • Freeware
  • Price:
  • FREE
  • Publisher Name:
  • TagSoup Team
  • Publisher web site:
  • http://home.ccil.org/~cowan/XML/tagsoup/
  • Operating Systems:
  • Mac OS X
  • File Size:
  • 87 KB

TagSoup Tags


TagSoup Description

Open source SAX-compliant HTML parser written in Java TagSoup is an open source SAX-compliant parser written in Java that, instead of parsing well-formed or valid XML, parses HTML as it is found in the wild: poor, nasty and brutish, though quite often far from short. TagSoup is designed for people who have to process this stuff using some semblance of a rational application design. By providing a SAX interface, it allows standard XML tools to be applied to even the worst HTML. TagSoup also includes a command-line processor that reads HTML files and can generate either clean HTML or well-formed XML that is a close approximation to XHTML.TagSoup is designed as a parser, not a whole application; it isn't intended to permanently clean up bad HTML, as HTML Tidy does, only to parse it on the fly. Therefore, it does not convert presentation HTML to CSS or anything similar. TagSoup does guarantee well-structured results: tags will wind up properly nested, default attributes will appear appropriately, and so on.NOTE: TagSoup is licensed and distributed under the terms of the Apache License, Version 2.0.


TagSoup Related Software