1.0 uniq
The uniq command is a filter for finding unique lines in input. It reads input, suppresses duplicates and prints unique lines in its output. Used with the -D option, we can do the inverse and print the duplicate lines only. The input must be sorted for uniq to work correctly. For example,
$ cat names Jame Doe Jane Doe John Doe Erika Mustermann John Doe John Doe Max Mustermann Richard Roe Joe Bloggs Tommy Atkins John Roe Jane Doe John Doe $ $ sort names | uniq Erika Mustermann Jame Doe Jane Doe Joe Bloggs John Doe John Roe Max Mustermann Richard Roe
2.0 Print duplicates
We can find the duplicates with the -D and -d options. The -D option prints all duplicates whereas the -d option prints a line for all instances of a duplicated line.
$ sort names | uniq -D Jane Doe Jane Doe John Doe John Doe John Doe John Doe $ $ sort names | uniq -d Jane Doe John Doe
3.0 Print unique lines only
The -u option suppresses duplicates completely and prints only the lines which are unique in the file.
$ sort names | uniq -u Erika Mustermann Jame Doe Joe Bloggs John Roe Max Mustermann Richard Roe Tommy Atkins
4.0 Print count of occurrences
The -c option prints a count for each line giving its number of occurrences in the file.
$ sort names | uniq -c 1 Erika Mustermann 1 Jame Doe 2 Jane Doe 1 Joe Bloggs 4 John Doe 1 John Roe 1 Max Mustermann 1 Richard Roe 1 Tommy Atkins
5.0 Ignore case
With the -i option, we can run uniq so that it does a case insensitive comparison in finding unique lines.
$ sort -f names | uniq -ic 1 Erika Mustermann 1 Jame Doe 2 Jane doe 1 Joe Bloggs 4 John Doe 1 John Roe 1 Max Mustermann 1 Richard Roe 1 Tommy Atkins $ $ sort -f names | uniq -icd 2 Jane doe 4 John Doe
6.0 Skip fields, characters
We can ask uniq to ignore a number of fields with the -f option. The following command gives unique last names.
$ sort -k2,2 names | uniq -f1 Tommy Atkins Joe Bloggs Jame Doe Erika Mustermann John Roe
Similarly, we can skip a number of characters at the beginning of each line with the -s option.
$ cat class AC12 John Doe AC13 John Doe RA11 Jane Doe RA12 Jane Doe AP12 John Roe AL14 Richard Roe AL15 Richard Roe YM17 Tommy Atkins AS12 Max Mustermann PT14 Erika Mustermann DE12 Joe Bloggs $ $ sort -k2 class | uniq -s5 PT14 Erika Mustermann RA11 Jane Doe DE12 Joe Bloggs AC12 John Doe AP12 John Roe AS12 Max Mustermann AL14 Richard Roe YM17 Tommy Atkins
We can limit the number of characters to be scanned with the -w option. In the following output, the first four characters are unique.
$ sort names | uniq -w4 Erika Mustermann Jame Doe Jane Doe Joe Bloggs John Doe Max Mustermann Richard Roe Tommy Atkins