Download Jericho HTML Parser for Windows 3.2

Findsoft Windows Development Components & Libraries Jericho HTML Parser for Windows
Advertisement
Jericho HTML Parser for Windows 3.2
Download Now
(2.06 Mb)

Average Rating:

95%
User Rating:

0 votes
Visitors Rating:

0 votes
General Info

Hits: 250 visitors

Publisher: Martin Jericho

OS Support: Windows All

License: Freeware

Date added: 21 Mar 2011

Last Update: 21 Mar 2011

Downloads:: 113

See full specifications >>

Your opinion can help!
Add your review now!

Publisher's description

Jericho HTML Parser is a java library allowing analysis and manipulation of parts of an HTML document

Jericho HTML Parser is a java library allowing analysis and manipulation of parts of an HTML document, including server-side tags, while reproducing verbatim any unrecognised or invalid HTML. It also provides high-level HTML form manipulation functions.

It is an open source library released under both the Eclipse Public License (EPL) and GNU Lesser General Public License (LGPL). You are therefore free to use it in commercial applications subject to the terms detailed in either one of these licence documents.

The javadocs provide comprehensive documentation of the entire API, as well as being a very useful reference on aspects of HTML and XML in general.

[b]Features:[/b]

The library distinguishes itself from other HTML parsers with the following major features:

* The presence of badly formatted HTML does not interfere with the parsing of the rest of the document, which makes the library ideal for use with "real-world" HTML that chokes other parsers.
* ASP, JSP, PSP, PHP and Mason server tags are explicitly recognised by the parser. This means that normal HTML is still parsed properly even if there are server tags inside them, which is common for example when dynamically setting element attributes.
* A new stream based parsing option using the StreamedSource class, which allows memory efficient processing of large files using an event iterator. This is essentially a StAX alternative with the ability to process HTML and non-validating XML, as well as several other features not available in other streaming parsers.
* In its standard form it is neither an event nor tree based parser, but rather uses a combination of simple text search, efficient tag recognition and a tag position cache. The text of the whole source document is first loaded into memory, and then only the relevant segments searched for the relevant characters of each search operation.
* Compared to a tree based parser such as DOM, the memory and resource requirements can be far better if only small sections of the document need to be parsed or modified. Incorrect or badly formatted HTML can easily be ignored, unlike tree based parsers which must identify every node in the document from top to bottom.
* Compared to an event based parser such as SAX, the interface is on a much higher level and more intuitive, and a tree representation of the document element hierarchy is easily created if required.
* The begin and end positions in the source document of all parsed segments are accessible, allowing modification of only selected segments of the document without having to reconstruct the entire document from a tree.
* The row and column number of each position in the source document are easily accessible.
* Provides a simple but comprehensive interface for the analysis and manipulation of HTML form controls, including the extraction and population of initial values, and conversion to read-only or data display modes. Analysis of the form controls also allows data received from the form to be stored and presented in an appropriate manner.
* Custom tag types can be easily defined and registered for recognition by the parser.
* Built-in functionality to extract all text from HTML markup, suitable for feeding into a text search engine such as Apache Lucene.
* Built-in functionality to render HTML markup with simple text formatting.
* Built-in functionality to format HTML source code that indents elements according to their depth in the document element hierarchy. (Click here for an online demonstration)
* Built-in functionality to compact HTML source code by removing all unnecessary white space.


Available Translations:None

Version History

Version 3.2 added on: 21 Mar 2011


Related Tags:

    identify          unlike          every          bottom          interface          ignored          easily          small          sections          modified          Incorrect          higher          level          positions          accessible          modification          selected          begin          required          intuitive      

Related software downloads:

no screenshot JAutodoc 1.8.0
JAutodoc is an Eclipse Plugin for automatically adding Javadoc and file headers to your source code.
16 Aug 2010 - Freeware | 225 Downloads
no screenshot Data Access Components for MySQL 6.10
MyDAC is an enhanced VCL/VCL.NET/CLX library for fast direct access to MySQL from Delphi, C++Builder, and Kylix. Includes full support for all MySQL d
05 May 2011 - Shareware | 116 Downloads
no screenshot Oracle Data Access Components 7.20
ODAC is a VCL/VCL.NET/CLX component library for fast direct access to Oracle from Delphi, C++Builder, and Kylix. Includes comprehensive support for Or
05 May 2011 - Shareware | 116 Downloads
DTK ANPR SDK 1.2.54
A powerful developer library for vehicle license plate recognition (LPR).
22 Sep 2010 - Shareware | 81 Downloads
no screenshot iReport 3.7.5
iReport is the most popular visual reporting tool for JasperReports (Java reporting library) and JasperServer (reporting server)
29 Sep 2010 - Freeware | 109 Downloads
no screenshot SQL Server Data Access Components 5.10
SDAC is a VCL/VCL.NET/CLX component library for fast direct access to SQL Server from Delphi and C++Builder. Supports all SQL Server data types and fu
05 May 2011 - Shareware | 86 Downloads

Button for your site

To link back to this page, please copy code below and insert in your page. All visits come thru this code will add a 10 point vote to this software. Save the image button on your server

Example:
    




Download Time
56K
5m 1s
64K
4m 24s
128K
2m 12s
768K
18s
1.44M
12s



Copyright (c) 2006-2009 Findsoft. All rights reserved.