Basic bash concepts VI Text Processing Hackerrank

Before this post you have learnt
1. Basic bash hackerrank
2. Shell beginner
3. Bash II chapter
4.Bash III chapter
5.Bash IV chapter
6.Bash V chapter
In this post we will discuss about ‘uniq’ command that is used for text processing.

What is Uniq command used in linux/unix

Uniq command is helpful to remove or detect duplicate entries in a file.or we can say it reports or filters out repeated lines in files.

Basic Synatx

uniq [-options]

for example if we have

cat abc.txt
sd
ss
dd
sd
sd
ss
ds
ss
dd

and we apply code

uniq abc.txt

then the output will be like this:

sd
ss
dd
ds

if we want to count number of duplaicate line then

 uniq -c abc.txt 
3 sd
3 ss
2 dd
1 ds

if we want to know that which line is duplicate means no unique single line

then print duplicate only:

 uniq -d abc.txt 
sd
ss
dd

Duplicate line and nu,ber of times

 uniq -D abc.txt 
sd
ss
dd
sd
ss
ss
dd

only unique lines

 uniq -u abc.txt 
ds

there is more like

 uniq -w 2 abc.txt 
 uniq -s 2 abc.txt
 uniq -f 2 abc.txt
/// for comparison

uniq -i /// case sensitive removal

now let’s practice

Basic bash concepts VI Text Processing Hackerrank

Q1.
If this is the file test.txt:

00
00
01
01
00
00
02
02

we want
00
01
02
then
Given a text file, remove the consecutive repetitions of any line.


uniq

Q2. Given a text file, count the number of times each line repeats itself. Only consider consecutive repetitions. Display the space separated count and line, respectively. There shouldn’t be any leading or trailing spaces. Please note that the uniq -c command by itself will generate the output in a different format than the one expected here.
Sample Input

00
00
01
01
00
00
02
02
03
aa
aa
aa
Sample Output

2 00
2 01
2 00
2 02
1 03
3 aa
Explanation

00 is repeated twice
01 is repeated twice
00 is repeated twice
02 is repeated twice
03 occurs once
aa is repeated thrice

there can be answer uniq -c but output is different.
Best solution

uniq -c | cut -c7-

uniq -c|tr -s ' '|cut -f2-4 -d ' '

or

uniq -c | cut -c7-

or

sed "s/^[[:space:]]*//g" 

or

uniq -c - | xargs -l

or

uniq -c | colrm 1 6

Q3.
Given a text file, count the number of times each line repeats itself (only consider consecutive repetions). Display the count and the line, separated by a space. There shouldn’t be leading or trailing spaces. Please note that the uniq -c command by itself will generate the output in a different format.

This time, compare consecutive lines in a case insensitive manner. So, if a line X is followed by case variants, the output should count all of them as the same (but display only the form X in the second column).

So, as you might observe in the case below: aa, AA and Aa are all counted as instances of ‘aa’.
Sample Input

00
00
01
01
00
00
02
02
03
aa
AA
Aa
Sample Output

2 00
2 01
2 00
2 02
1 03
3 aa
Explanation

00 is repeated twice
01 is repeated twice
00 is repeated twice
02 is repeated twice
03 occurs once
aa is repeated thrice (if we ignore case – AA, Aa are the same as ‘aa’)


uniq -c -i | cut -c7-

or

uniq -c -i |tr -s ' '|cut -f2-4 -d ' '


or

Q4,Current Task

Given a text file, display only those lines which are not followed or preceded by identical replications.

Sample Input

A00
a00
01
01
00
00
02
02
A00
03
aa
aa
aa
Sample Output

A00
a00
A00
03
Explanation

The comparison is case sensitive, so the first instance of “A00” and “a00” are considered different, hence unique.
The next instance of A00 is succeeded and preceded by different lines, so that is also included in the output.
The same holds true for 03 – it is succeeded and preceded by different lines, so that is also included in the output.


uniq -u

or
when you forget the u option 😀
 cat /dev/stdin | uniq -c| tr  -s ' ' | cut -d' ' -f2- | grep ^1 | cut -d' ' -f2-

All rights reserved. No part of this Post may be copied, distributed, or transmitted in any form or by any means, without the prior written permission of the website admin, except in the case of brief quotations embodied in critical reviews and certain other noncommercial uses permitted by copyright law. For permission requests, write to the owner, addressed “Attention: Permissions Coordinator,” to the admin @coderinme

A web developer(Front end and Back end), and DBA at csdamu.com. Currently working as Salesforce Developer @ Tech Matrix IT Consulting Private Limited. Check me @about.me/s.saifi

Leave a reply:

Your email address will not be published.