Extensible Link Checker Library

Version 1.0
(C) 2002 Thomas Weinbrenner
API documentation

This software is hosted on sourceforge. SourceForge.net Logo

HTML Link Checking

This is a open source checker für HTML links written completely in Java.

Functionality

This link checker works like this:

What's the benefit of using a library?

According to my humble experience, sometimes you must have a closer look at the HTML-tags to make complete link-checking working. I used it in an intranet which used frame technology, the left frame was the navigation, the right frame the content. Some links had to change both, the navigation and the content. We used a Javascript-function for that like in
   <a href="javascript:changeNavigationAndContent ('thisIsAUrl.html')">Link</a>
I used a web-site specific class which parsed the url out of the javascript code and added it to the link-checkers list of "still to visit URLs". Additionally, I calculated the URL of the navigation from that URL and added it also.

What you must do to get it working

Java Development Environment

You need a Java Development Environment. This is not a complete program, it's a library.

Web Server

You need a Web-Server to get the link checker running, because the HTML is retrieved by using the HTTP-protocol. The advantage is that you can even link-check server pages like JSP or ASP files without using a JSP or ASP-code parser.

If you don't have a HTTP-Server, use for example Tomcat or Apache from the Apache Aoftware Foundation.

Demo application

It's best to start with the provided Demo application. This application checks the JDK documentation which a Java Developer usually has.

Class path

Include the libraries
    linkchecker.jar 
    antlr.jar
in your classpath.

How to configure

Web-Site specific informaton

Every Web-Site has specific configurations. These include the start URL(s), and the location of the files in the file system. The link checker also needs the mapping between a file name and a URL.

All those information are capsuled in interface WebSiteInfo. There is a default implementation in class DefaultWebSiteInfo which implements a simple one to one mapping between URLs and file names.

Reporting

All reporting about links is done by using interface Report. The class HtmlReport implements this interface, saves all information in memory, and after the link checking process is finished, prepares a report using HTML format.

Acknowlegements

This link checker would not have been written if I were not curious about that neat Lexer/Parser tool called ANTLR. So I wrote a HTML Lexer, and the first usage for that was a link checker which I indeed used in a production intranet environment to check the links.

See ANTLR by Terence Parr. It's really great.

Licence

This program is free software; you can redistribute it and/or
modify it under the terms of the GNU General Public License
as published by the Free Software Foundation; either version 2
of the License, or (at your option) any later version.

This program is distributed in the hope that it will be useful,
but WITHOUT ANY WARRANTY; without even the implied warranty of
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
GNU General Public License for more details.

You should have received a copy of the GNU General Public License
along with this program; if not, write to the Free Software
Foundation, Inc., 59 Temple Place - Suite 330, Boston, MA  02111-1307, USA.
See the GNU Public Licence.