packages icon



 AGREP(l)                                                           AGREP(l)
                                Jan 17, 1992



 NAME
      agrep - search a file for a string or regular expression, with
      approximate matching capabilities

 SYNOPSIS
      agrep [ -#cdehiklnpstvwxBDGIS ] pattern [ -f patternfile ] [
      filename... ]

 DESCRIPTION
      agrep searches the input filenames (standard input is the default, but
      see a warning under LIMITATIONS) for records containing strings which
      either exactly or approximately match a pattern. A record is by
      default a line, but it can be defined differently using the -d option
      (see below).  Normally, each record found is copied to the standard
      output.  Approximate matching allows finding records that contain the
      pattern with several errors including substitutions, insertions, and
      deletions.  For example, Massechusets matches Massachusetts with two
      errors (one substitution and one insertion).  Running agrep -2
      Massechusets foo outputs all lines in foo containing any string with
      at most 2 errors from Massechusets.  agrep supports many kinds of
      queries including arbitrary wild cards, sets of patterns, and in
      general, regular expressions.  See PATTERNS below.  It supports most
      of the options supported by the grep family plus several more (but it
      is not 100% compatible with grep).  For more information on the
      algorithms used by agrep see Wu and Manber, "Fast Text Searching With
      Errors," Technical report #91-11, Department of Computer Science,
      University of Arizona, June 1991 (available by anonymous ftp from
      cs.arizona.edu in agrep/agrep.ps.1), and Wu and Manber, "Agrep -- A
      Fast Approximate Pattern Searching Tool", To appear in USENIX
      Conference 1992 January (available by anonymous ftp from
      cs.arizona.edu in agrep/agrep.ps.2).  As with the rest of the grep
      family, the characters `$', `^', `*', `[', `]', `^', `|', `(', `)',
      `!', and `\' can cause unexpected results when included in the
      pattern, as these characters are also meaningful to the shell.  To
      avoid these problems, one should always enclose the entire pattern
      argument in single quotes, i.e., 'pattern'.  Do not use double quotes
      (").  When agrep is applied to more than one input file, the name of
      the file is displayed preceding each line which matches the pattern.
      The filename is not displayed when processing a single file, so if you
      actually want the filename to appear, use /dev/null as a second file
      in the list.

 OPTIONS
      -#   # is a non-negative integer (at most 8) specifying the maximum
           number of errors permitted in finding the approximate matches
           (defaults to zero).  Generally, each insertion, deletion, or
           substitution counts as one error.  It is possible to adjust the
           relative cost of insertions, deletions and substitutions (see -I
           -D and -S options).





                                    - 1 -      Formatted:  November 14, 2024






 AGREP(l)                                                           AGREP(l)
                                Jan 17, 1992



      -c   Display only the count of matching records.

      -d 'delim'
           Define delim to be the separator between two records.  The
           default value is '$', namely a record is by default a line.
           delim can be a string of size at most 8 (with possible use of ^
           and $), but not a regular expression.  Text between two delim's,
           before the first delim, and after the last delim is considered as
           one record.  For example, -d '$$' defines paragraphs as records
           and -d '^From ' defines mail messages as records.  agrep matches
           each record separately.  This option does not currently work with
           regular expressions.

      -e pattern
           Same as a simple pattern argument, but useful when the pattern
           begins with a `-'.

      -f patternfile
           patternfile contains a set of (simple) patterns.  The output is
           all lines that match at least one of the patterns in patternfile.
           Currently, the -f option works only for exact match and for
           simple patterns (any meta symbol is interpreted as a regular
           character); it is compatible only with -c, -h, -i, -l, -s, -v,
           -w, and -x options.  see LIMITATIONS for size bounds.

      -h   Do not display filenames.

      -i   Case-insensitive search - e.g., "A" and "a" are considered
           equivalent.

      -k   No symbol in the pattern is treated as a meta character. For
           example, agrep -k 'a(b|c)*d' foo will find the occurrences of
           a(b|c)*d in foo whereas agrep 'a(b|c)*d' foo will find substrings
           in foo that match the regular expression 'a(b|c)*d'.

      -l   List only the files that contain a match.  This option is useful
           for looking for files containing a certain pattern.  For example,
           " agrep -l 'wonderful'  * " will list the names of those files in
           current directory that contain the word 'wonderful'.

      -n   Each line that is printed is prefixed by its record number in the
           file.

      -p   Find records in the text that contain a supersequence of the
           pattern.  For example,
            agrep -p DCS foo will match "Department of Computer Science."

      -s   Work silently, that is, display nothing except error messages.
           This is useful for checking the error status.





                                    - 2 -      Formatted:  November 14, 2024






 AGREP(l)                                                           AGREP(l)
                                Jan 17, 1992



      -t   Output the record starting from the end of delim to (and
           including) the next delim. This is useful for cases where delim
           should come at the end of the record.

      -v   Inverse mode - display only those records that do not contain the
           pattern.

      -w   Search for the pattern as a word - i.e., surrounded by non-
           alphanumeric characters.  The non-alphanumeric must surround the
           match;  they cannot be counted as errors.  For example, agrep -w
           -1 car will match cars, but not characters.

      -x   The pattern must match the whole line.

      -y   Used with -B option. When -y is on, agrep will always output the
           best matches without giving a prompt.

      -B   Best match mode.  When -B is specified and no exact matches are
           found, agrep will continue to search until the closest matches
           (i.e., the ones with minimum number of errors) are found, at
           which point the following message will be shown: "the best match
           contains x errors, there are y matches, output them? (y/n)" The
           best match mode is not supported for standard input, e.g.,
           pipeline input.  When the -#, -c, or -l options are specified,
           the -B option is ignored.  In general, -B may be slower than -#,
           but not by very much.

      -Dk  Set the cost of a deletion to k (k is a positive integer).  This
           option does not currently work with regular expressions.

      -G   Output the files that contain a match.

      -Ik  Set the cost of an insertion to k (k is a positive integer).
           This option does not currently work with regular expressions.

      -Sk  Set the cost of a substitution to k (k is a positive integer).
           This option does not currently work with regular expressions.

 PATTERNS
      agrep supports a large variety of patterns, including simple strings,
      strings with classes of characters, sets of strings, wild cards, and
      regular expressions.

      Strings
           any sequence of characters, including the special symbols `^' for
           beginning of line and `$' for end of line.  The special
           characters listed above ( `$', `^', `*', `[', `^', `|', `(', `)',
           `!', and `\' ) should be preceded by `\' if they are to be
           matched as regular characters.  For example, \^abc\\ corresponds
           to the string ^abc\, whereas ^abc corresponds to the string abc
           at the beginning of a line.



                                    - 3 -      Formatted:  November 14, 2024






 AGREP(l)                                                           AGREP(l)
                                Jan 17, 1992



      Classes of characters
           a list of characters inside [] (in order) corresponds to any
           character from the list.  For example, [a-ho-z] is any character
           between a and h or between o and z.  The symbol `^' inside []
           complements the list.  For example, [^i-n] denote any character
           in the character set except character 'i' to 'n'.  The symbol `^'
           thus has two meanings, but this is consistent with egrep.  The
           symbol `.' (don't care) stands for any symbol (except for the
           newline symbol).

      Boolean operations
           agrep supports an `and' operation `;' and an `or' operation `,',
           but not a combination of both.  For example, 'fast;network'
           searches for all records containing both words.

      Wild cards
           The symbol '#' is used to denote a wild card.  # matches zero or
           any number of arbitrary characters.  For example, ex#e matches
           example.  The symbol # is equivalent to .* in egrep.  In fact, .*
           will work too, because it is a valid regular expression (see
           below), but unless this is part of an actual regular expression,
           # will work faster.

      Combination of exact and approximate matching
           any pattern inside angle brackets <> must match the text exactly
           even if the match is with errors.  For example, <mathemat>ics
           matches mathematical with one error (replacing the last s with an
           a), but mathe<matics> does not match mathematical no matter how
           many errors we allow.

      Regular expressions
           The syntax of regular expressions in agrep is in general the same
           as that for egrep.  The union operation `|', Kleene closure `*',
           and parentheses () are all supported.  Currently '+' is not
           supported.  Regular expressions are currently limited to
           approximately 30 characters (generally excluding meta
           characters).  Some options (-d, -w, -f, -t, -x, -D, -I, -S) do
           not currently work with regular expressions.  The maximal number
           of errors for regular expressions that use '*' or '|' is 4.

 EXAMPLES
      agrep -2 -c ABCDEFG foo
           gives the number of lines in file foo that contain ABCDEFG within
           two errors.

      agrep -1 -D2 -S2 'ABCD#YZ' foo
           outputs the lines containing ABCD followed, within arbitrary
           distance, by YZ, with up to one additional insertion (-D2 and -S2
           make deletions and substitutions too "expensive").





                                    - 4 -      Formatted:  November 14, 2024






 AGREP(l)                                                           AGREP(l)
                                Jan 17, 1992



      agrep -5 -p abcdefghij /usr/dict/words
           outputs the list of all words containing at least 5 of the first
           10 letters of the alphabet in order.  (Try it:  any list starting
           with academia and ending with sacrilegious must mean something!)

      agrep -1 'abc[0-9](de|fg)*[x-z]' foo
           outputs the lines containing, within up to one error, the string
           that starts with abc followed by one digit, followed by zero or
           more repetitions of either de or fg, followed by either x, y, or
           z.

      agrep -d '^From ' 'breakdown;internet' mbox
           outputs all mail messages (the pattern '^From ' separates mail
           messages in a mail file) that contain keywords 'breakdown' and
           'internet'.

      agrep -d '$$' -1 '<word1> <word2>' foo
           finds all paragraphs that contain word1 followed by word2 with
           one error in place of the blank. In particular, if word1 is the
           last word in a line and word2 is the first word in the next line,
           then the space will be substituted by a newline symbol and it
           will match.  Thus, this is a way to overcome separation by a
           newline.  Note that -d '$$' (or another delim which spans more
           than one line) is necessary, because otherwise agrep searches
           only one line at a time.

      agrep '^agrep' <this manual>
           outputs all the examples of the use of agrep in this man pages.

 SEE ALSO
      ed(1), ex(1), grep(1V), sh(1), csh(1).

 BUGS/LIMITATIONS
      Any bug reports or comments will be appreciated! Please mail them to
      sw@cs.arizona.edu or udi@cs.arizona.edu Regular expressions do not
      support the '+' operator (match 1 or more instances of the preceding
      token).  These can be searched for by using this syntax in the
      pattern:

           'pattern(pattern)*'

      (search for strings containing one instance of the pattern, followed
      by 0 or more instances of the pattern).  The following can cause an
      infinite loop: agrep pattern * > output_file.  If the number of
      matches is high, they may be deposited in output_file before it is
      completely read leading to more matches of the pattern within
      output_file (the matches are against the whole directory).  It's not
      clear whether this is a "bug" (grep will do the same), but be warned.
      The maximum size of the patternfile is limited to be 250Kb, and the
      maximum number of patterns is limited to be 30,000.  Standard input is
      the default if no input file is given.  However, if standard input is



                                    - 5 -      Formatted:  November 14, 2024






 AGREP(l)                                                           AGREP(l)
                                Jan 17, 1992



      keyed in directly (as opposed to through a pipe, for example) agrep
      may not work for some non-simple patterns.  There is no size limit for
      simple patterns.  More complicated patterns are currently limited to
      approximately 30 characters.  Lines are limited to 1024 characters.
      Records are limited to 48K, and may be truncated if they are larger
      than that.  The limit of record length can be changed by modifying the
      parameter Max_record in agrep.h.

 DIAGNOSTICS
      Exit status is 0 if any matches are found, 1 if none, 2 for syntax
      errors or inaccessible files.

 AUTHORS
      Sun Wu and Udi Manber, Department of Computer Science, University of
      Arizona, Tucson, AZ 85721.  {sw|udi}@cs.arizona.edu.







































                                    - 6 -      Formatted:  November 14, 2024