DEADLINKCHECK(1) 12/07/99 DEADLINKCHECK(1) Dead Link Check Dead Link Check 0.4.0 NAME deadlinkcheck - Dead Link Check (v0.4.0) SYNOPSIS deadlinkcheck [-help] [-verb | -Verb [-indicator]] [-proxy proxy | -Proxy] [[-timeCache value] | [-noCache]] [-Timeout value[:maxvalue]] [-later [percent]] [-userRedirect] [-Content [rule[:...]]] [[-output [filename]] [-splitOutput]] [-rawOutput | -detOutput] [-codeConversion] [-HTMLoutput] [-Dif] urls | filename WARNING deadlinkcheck is a program that is still being evolved. Current version is already stable, but may not be fully functional. BETA STATUS The option -Content (introduced in v0.4.0) is considered beta. DESCRIPTION Dead Link Check (DLC) is a Perl script designed to find information on validity of HTTP references. The script may use/generate a cache file for avoiding redoing network requests if the user want to check added entries. The script works by reading entries from a file (or a list of links from the command line) and output results in file(s) (or STDOUT). DLC was created as an extension to Public Bookmark Generator (PBM), but can be used by itself. OPTIONS POSSIBLE VALUES To obtain options values and options default values, run deadlinkcheck -help OPTIONS -help print a description of all the deadlinkcheck options. -verb run the script in verbose mode, printing advanced information on STDERR. -Verb run the script in maximum verbose mode, printing advanced run information on STDERR. -indicator show a progress indicator (in percent) at the beginning of the verbose output indicator. - 1 - Formatted: November 14, 2024 DEADLINKCHECK(1) 12/07/99 DEADLINKCHECK(1) Dead Link Check Dead Link Check 0.4.0 -proxy proxy set the http and ftp proxy to proxy. -Proxy get proxy information from environment variables (http_proxy, ftp_proxy). -timeCache value will set number of trusted days for links cache to value. Using a value of 0 will delete every entry in the cache. -noCache will check links without reading/writing information from/to cache file. -Timeout value[:maxvalue] set network timeout for requests to value. In case of retry, the script may extend the timeout value to maxvalue. -later [percent] some requests to ftp links may not be successful at a certain time. This option allow to retry the links later in the processing. The percent optional argument is to force to retry the ftp request every percent (or so). -userRedirect will follow user HTTP-EQUIV redirections. Will force a GET to see if the user did not use a META command to force a redirection. -Content [rule[:...]] will parse the content of the checked web page for information that may indicate that the page is an error return (some web servers do not always return proper HTTP responses) and/or was moved. This option is based on the use of rules. Those rules can be of two types : error or move. You can use those type names to force the use of all error or move rules. The help rule is used to provide information on all the selected rules. will output results into filename if option is set. Print to STDOUT if output option is not used. If no filename is provided, will save results to default option value. -splitOutput will output results into different files according to the first number of their HTTP return code. -rawOutput will output only raw HTTP addresses. It is recommended to use this output mode only if output is saved into split files. - 2 - Formatted: November 14, 2024 DEADLINKCHECK(1) 12/07/99 DEADLINKCHECK(1) Dead Link Check Dead Link Check 0.4.0 -detOutput will output detailed results. -codeConversion will print text instead of return code in a detailed output mode. -HTMLoutput will output results as HTML code (easier to follow links). -Dif will not print the "DLC information Footer" (added at the end of HTML outputs). RESTRICTIONS SUPPORTED HTTP REQUESTS As of now only links starting with file:/, ftp://, and http:// are supported. SPECIAL CONCEPTS INPUT LINKS Links can be given to the script in two forms; on the command line, or from an input file. The input file may be user provided or Public Bookmark Generator created. The input file may contain up to two information per line, the second being optional. Both information must be separated by a tabulation. Those informations are HTTPlink and HTTPname, where the first one is a fully qualified HTTP reference, and the second one is the name to be printed in reference to this link. Public Bookmark Generator created file fills the second field by the fully qualified name in the bookmark list (folders are separated by |). Example : |Work Informations|Developpements|Public Bookmark Generator READING MAXIMUM VERBOSE OUTPUT When using the Maximum Verbose option, a few informations are printed. In order : [ xxx.xx % ] : a progress indicator (if indicator option is selected). * : means that the script is retrying a ftp site (later option is selected). (url) : indicates the url being checked at that time. @ : indicates that the script is doing a network connection (by using - 3 - Formatted: November 14, 2024 DEADLINKCHECK(1) 12/07/99 DEADLINKCHECK(1) Dead Link Check Dead Link Check 0.4.0 the cache, some already checked network connections can be avoided). [return code and action] : indicates the return code for this request, and optionally, actions to be taken. Example : [ 80.00 % ] * (ftp://ftp.redhat.com/) @ [401 -> Retry later] means that the script already processed 80 % of the provided urls, is retrying a ftp site which url is ftp://ftp.redhat.com/ by doing a network connection, and the result of this request is 401 (Unauthorized) so the action to be taken is to Retry later. LINKS CACHE Dead Link Check uses a link cache file to fasten access to the same link in case of multiple run. The cache file (stored in the directory of the run) contains a time stamp information so that after a certain time, the links are not valid anymore. RETURN CODES RFC 2616 tells us that : Informational 1xx 100 is Continue, 101 is Switching Protocols, Successful 2xx 200 is OK, 201 is Created, 202 is Accepted, 203 is Non-Authoritative Information, 204 is No Content, 205 is Reset Content, 206 is Partial Content, Redirection 3xx 300 is Multiple Choices, 301 is Moved Permanently, 302 is Moved Temporalily, 304 is Not Modified, 305 is Use Proxy, 307 is Temporary Redirect, Client Error 4xx 400 is Bad Request, 401 is Unauthorized, - 4 - Formatted: November 14, 2024 DEADLINKCHECK(1) 12/07/99 DEADLINKCHECK(1) Dead Link Check Dead Link Check 0.4.0 403 is Forbidden, 404 is Not Found, 405 is Method Not Allowed, 406 is Not Acceptable, 407 is Proxy Authentication Required, 408 is Request Time-out, 409 is Conflict, 410 is Gone, 411 is Length Required, 412 is Precondition Failed, 413 is Request Entity Too Large, 414 is Request-URI Too Large, 415 is Unsupported Media Type, 416 is Requested range not satisfiable, 417 is Expectation Failed, Server Error <5xx> 500 is Internal Server Error, 501 is Not Implemented, 502 is Bad Gateway, 503 is Service Unavailable, 504 is Gateway Time-out, 505 is HTTP Version not supported. A special return code (399) has been introduced to handle user moved web pages (using HTTP-EQUIV in the HTML source code) when the option -userRedirect is set. Another special return code (398) is used when detecting an infinite loop redirection. A redirect code (397) may be seen when the option -Content is used with at least one rule of type move. It tells that the web page may have moved. A special page not found code (499) may be seen when the option -Content is used with at least one rule of type error. It tells that the web page may not exist. Some ftp sites may return 400 and 401 codes, still they may exist (they may just be unavailable at the time of the request). Some http sites may return 500 code, still they may exist (the site may have not been available before timeout). Note that if the script encounter return codes not defined in RFC 2616, it will output those links in a special section. - 5 - Formatted: November 14, 2024 DEADLINKCHECK(1) 12/07/99 DEADLINKCHECK(1) Dead Link Check Dead Link Check 0.4.0 PROXY Use of a WWW cache server can be done using either the -proxy or the -Proxy option. The first one will read the proxy server from the command line, the second one will extract it from the environment variables (http_proxy and ftp_proxy). If no proxy option is used, the script will run without proxy. RULES This option is considered in beta status. The rules are based on regular expression parsing of the content of the web page checked. This slows down the processing of DLC, and may not return a proper result, it is recommanded to check the web page to verify. Since it is based on text processing, it can only recognize entries for which it has rules, and may well miss some error or moved links. Extending the rules is an easy process if you know how to use regular expressions in Perl, and are willing to edit the code of "deadlinkcheck". Inside the source code is a section called "How to create a rule ?" which should help you create (or modify) some rules. If you do so, please send a diff (or simply the function) by e-mail to martial@users.sourceforge.net. HOMEPAGE You can find the Dead Link Check homepage at : http://dlc.sourceforge.net/. You may also want to check Public Bookmark Generator, which web page is http://pbm.sourceforge.net/. BUGS For bug reporting please send an e-mail to the author at martial@users.sourceforge.net with [DLC] in the title. ACKNOWLEDGMENTS Here are the list of person who helped improve this script, and that the author wish to thank : v0.1 Marc Bednarek and Jimmy Graham for helping solving tricky little response codes and providing me with some examples and information on how to resolve or do some http requests. v0.1.1 Wojciech Zwiefka for reporting a bug on infinite redirection loop. - 6 - Formatted: November 14, 2024 DEADLINKCHECK(1) 12/07/99 DEADLINKCHECK(1) Dead Link Check Dead Link Check 0.4.0 v0.1.2 Geoffrey Leach for reporting a bug on Timeout/maxTimeout interaction. v0.3 Josha Foust for reporting an http to ftp redirect bug. v0.3.1 Josha Foust for reporting a redirect bug, and a bug on url naming v0.3.2 Olivier Galibert for reporting the lowercased URLs bug v0.4.0 The sourceforge team for the fantastic job they are doing providing Open Source Coders such facilities. LICENSE Copyright (C) 1999 Martial MICHEL This program is free software; you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation; either version 2 of the License, or (at your option) any later version. This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details. More license information : http://www.gnu.org/copyleft/gpl.html RELEASES v0.0 : April 1st, 1999 v0.1 : April 12th, 1999 v0.1.1 : April 15th, 1999 v0.1.2 : April 16th, 1999 v0.2 : May 10th, 1999 v0.2.1 : May 11th, 1999 v0.3 : July 21st, 1999 v0.3.1 : October 4th, 1999 v0.3.2 : October 6th, 1999 v0.4.0 : December 7th, 1999 AUTHOR Martial MICHEL (martial@users.sourceforge.net) - 7 - Formatted: November 14, 2024 DEADLINKCHECK(1) 12/07/99 DEADLINKCHECK(1) Dead Link Check Dead Link Check 0.4.0 - 8 - Formatted: November 14, 2024