NAME
uniq —
merge or filter adjacent identical
lines
SYNOPSIS
uniq |
[-c]
[-u|-d]
[-iz] [-f
fields] [-s
skip] [-w
limit] [from
[to]] |
uniq |
[-D|-G|--{all-repeated|group}=none|separate|prepend|append|both]
[-iz] [-f
fields] [-s
skip] [-w
limit] [from
[to]] |
DESCRIPTION
Copies consecutive lines from from (standard
input stream if "-", the default) to
to (standard output stream if
"-", the default; otherwise created
a=rw - umask
and truncated – equivalent to shell
>):
- by default
- writing only the first line of each equal sequence,
- with
-u - writing only locally-unique lines,
- with
-d - writing only the first of each sequence of duplicates,
- with
-D - writing each duplicate value in a sequence, potentially separated by empty lines,
- with
-G - separating equal sequences with empty lines.
By default, the entire line is compared;
-f slices off fields leading
fields (defined as a maximal series of blanks (spaces or tabs in the
C locale) followed by a maximal
series of nonblanks), then -s slices off
skip leading characters, then
-w yields a maximum of limit
characters.
The entire line is
always written.
Unless -i, comparisons are byte-wise;
otherwise, they're case-insensitive across characters in the current locale
(invalid sequences are assumed to have a length of 1
byte and yield the maximum character).
The last of -udDG specified, if any,
applies.
OPTIONS
-c,--count- Prepend each written line with the number of lines it had coalesced.
-u,--unique- Only write lines that are non-equal to their neighbours, i.e. are the sole members of a sequence of length 1.
-d,--repeated- Write only the first line of each equal sequence longer than 1.
-D,--all-repeated,--all-repeated=none- Write all lines of each equal sequence longer than 1.
--all-repeated=separate- Likewise, but separate sequences with an empty line.
--all-repeated=prepend- Likewise, but prefix each sequence with an empty line.
--all-repeated=append- Likewise, but suffix each sequence with an empty line.
--all-repeated=both- Likewise, but prefix and suffix the first such sequence, suffixing the subsequent ones.
-G,--group,--group=separate- Write all lines of all sequences, separating sequences with an empty line.
--group=none- Likewise, but don't insert empty lines. This is mostly equivalent to
cat. --group=prepend,--group=append,--group=both- Analogous to
--all-repeated=.
All--all-repeatedand--groupvalues are prefix-matched (--group=b is equivalent to--group=both, &c.). -i,--ignore-case- Compare lines case-insensitively according to the current locale.
-z,--zero-terminated- Line separator is NUL instead of newline.
-f,--skip-fields=fields- Skip the first fields maximal series of blanks then nonblanks for comparison.
-s,--skip-chars=skip- Skip the first skip characters for comparison.
-w,--check-chars=limit- Compare up to limit characters.
EXIT STATUS
1 if from or to couldn't be opened.
EXAMPLES
Exercise all slicing/comparison options:
$printf'%s\n' 'a 0ąQ' ' b 1ĄWo' |uniq-ci-f1-s2-w1 2 a 0ąQ
SEE ALSO
sort(1) to make equivalent lines adjacent, or its
-u flag, which can uniquify lines based on collation
sequence instead of equality.
STANDARDS
Conforms to IEEE Std 1003.1-2024
(“POSIX.1”), except
0 is allowed for
-fs; the standard allows any (or no) number
alignment for the -c column — this
implementation matches the GNU system at
7 columns and a
space, deviating from the AT&T UNIX of
4 and a space. The input file is specified to be a text
file, which must not contain NULs: most other implementations terminate the
line at the first NUL.
-Dizw, --group are
extensions, originating from the GNU system; the -G
spelling is an extension; the GNU system forbids
--all-repeated=append,
--all-repeated=both, and
--group=none.
Because -fsw operate on
characters, they are not suitable for slicing arbitrary data: set
LC_ALL=C
(LC_CTYPE,
POSIX) to slice by
byte (this also replicates the (broken) behaviour of the GNU system; the
same applies to -i, questionable though its
usefulness in that domain may be).
HISTORY
Appeared in Version 3 AT&T UNIX as uniq(I):
NAMEuniq -- report repeated lines in a fileSYNOPSISuniq [ -ud ] [ input [ output ] ]
Version 4 AT&T UNIX sees a SYNOPSIS of
-c always applying the default filter, overriding
-ud (if specified), the count aligned to
4 columns, followed by a space,
-n is equivalent to present-day
-f n, and
+n to -s
n (though, expectedly, byte-wise). The maximal line size
is 1000
bytes, unprotected against overflows, and terminating at a NUL.
Version 7 AT&T UNIX exits 1 on failure to open either file and writes the error to the standard error stream.
4.4BSD sees a rewrite, citing IEEE Std 1003.2 (“POSIX.2”), with a SYNOPSIS of
uniq
[-c | -d |
-u] [-f
fields] [-s
chars] [input_file
[output_file]]usage: uniq [-c | -du] [-f fields]
[-s chars] [input [output]]-c excludes either of
-du, and specifying both -du
is equivalent to the default output (curiously, this matches all prior
manuals, which read
-n and
+n options are undocumented
beyond a COMPATIBILITY mention, but
recognised for compatibility. Fields are separated not by blanks
(isblank(): space
(0x20),
tab
(0x09)) but
by whitespace (isspace(): also the
vertical tab
(0x0B),
form-feed
(0x0C),
and carriage return
(0x0D)).
X/Open Portability Guide Issue 2
(“XPG2”) includes Version 4 AT&T
UNIX uniq verbatim.
X/Open Portability Guide Issue 3 (“XPG3”) adds APPLICATION USAGE, entirely shaded IN ("Internationalised functionality", defined as optional), of:
LC_COLLATE environment variable must be equal to the
value it had when the input files were sorted.
If uniq does not support selection of
collating sequences via LC_COLLATE, the input files
must be sorted according to the collating sequence of the "C"
locale (see
Volume
3, XSI Supplementary Definitions, Chapter 7, C Program Locale).
uniq that aren't in consort with
sort. Unsurprisingly, no implementation does this.
IEEE Std 1003.2-1992
(“POSIX.2”) sees largely-present-day
uniq with -cdufs, the
-n
+m syntax marked obsolete,
-f defined in terms of blanks from the current
locale and -s in terms of characters, likewise, and
"-"-as-standard-input-stream is allowed for
from, but not for to.
from must be a text file — no embedded NULs,
lines of up to LINE_MAX bytes, and must end in a
newline. No mention is made of collation.
Version 3 of the Single UNIX Specification (“SUSv3”) removes the obsolescent syntax and requires, in ENVIRONMENT VARIABLES:
LC_COLLATE- Determine the locale for ordering rules.
uniq section.
IEEE Std 1003.1-2008
(“POSIX.1”) allows the obsolete syntax by allowing the
option delimiter to be +, allows
to being "-" to mean the
standard output stream, explicitly discards newlines for comparison
(matching existing practice), removes the LC_COLLATE
mention and clarifies in EXAMPLES the
current guidance that
sort -u
sort | uniq