Sed, Grep and Awk

Sed, Grep and Awk are true *nix tools, known for their awkward names and equally awkward syntax. They represent the most immediate access to Regular Expressions (REs) which are themselves worthy of knowledge. Even their attempted replacement, Perl, is also known producing useful yet unreadable code. Though I acknowledge their awkward natures, their usefulness cannot be ignored, and learning how to use each will aid you in your ascension to line processing supremacy. Each is best used in the following manner:

  • Grep: Matching
  • Sed: Replacing and Line Manipulation
  • Awk: Advanced Line Processing
# Insert 'Beginning' at the start of a file, and 'Ending' at the end
sed "1s/\(.*\)/Beginning\n\1/;\$a\\Ending"
 
#Escape shell metacharacters active within double quotes
sed 's/\([\\/\\`\\"$\\\\^.\\+\\{\\}]\)/\\\1/g'
 
#Replace all literal newlines with their representation '\n'
sed -e :a -e '$!N;s/\n/\\n/;ta'
 
# Filter out URL parameters
sed 's_=[^&]*\(&\|$\)_=\1_g'
 
# Get rid of regular expressions in a variable
sed 's:[]\[\^\$\.\*\/]:\\\\&:g'`
 
#Replace last comma(,) in each line with 'and'
sed 's#\(.*\),\([^,]*\)#\1 and\2#'
 
#Match phone numbers with area code in any given format and output in format: (nnn) nnn-nnnn
# SED DOES NOT RESPECT the shorthand character classes \c\s\S\d\D\w\W
sed -e 's#[^0-9]*\([0-9]\{3\}\)[^0-9]*\([0-9]\{3\}\)[^0-9]*\([0-9]\{4\}\)#(\1) \2-\3#'
grep -o '(\?[0-9]\{3\})\? \?[0-9]\{3\}-\?[0-9]\{4\}'
 
# Match CVE Numbers
grep -o 'CVE-[0-9]\{4\}-[0-9]\{1,5\}'
 
# Match input fields with a hidden input type in an HTML file
grep -io ']*hidden[^>]*>' hidden.csv | sed 's#""#"#g;s#value="[^"]*"#value=""#g' | sort -u | less
 
#Parse IIS Logs for a certain IP ADDRESS (127.0.0.1)
grep 127.0.0.1 *.log | grep -v -e ".gif" -e ".jpg" -e ".ico" -e ".css" -e ".pdf" -e "404" | cut -d' ' -f2,4,5,6,10 | awk '{printf "%s %-04s http://site.com%s?%s  Ref:(%s)\n",$1,$2,$3,$4,$5}' | tr -d '-' | sed 's/Ref:()//g' | sed 's/\? //g' | awk '{printf "%s %-04s %-70s\t%s\n",$1,$2,$3,$4}' 
 
#Find all links in a file
egrep -IRo '(((http(s)?|ftp|telnet|news|gopher)://|mailto:)[^\(\)<\"'\''[:space:]]+)' 
 
#Pretty printing fields with awk
awk -F':' '{printf "%-16s %-16s\n",$1,$2}'
 
# 'uniq' the file using only the first field
awk '!x[$1]++'
 
# uniq 3rd field in a file
awk '{ if (! third_col[$3]) print $0;  third_col[$3]++; }'
 
# Lists directories where the tree contains one or more files:
find ./ -type f | awk -F/ '{$NF=""} d[$0]++==0' OFS=/
 
# How many lines in a file that do not start with # and are not empty would fit in a tweet (140 characters)?
grep -v '^#\|^$' shell1liners.sh | awk '{if (length<141) {print "Tweet("length"): " $0;}}'
grep -v '^#\|^$' shell1liners.sh | awk '{if (length>140) {print "No Tweet("length"): " $0;}}'
Posted on by The Shell Shakespear 3 Comments

3 thoughts on “Sed, Grep and Awk

Leave a Reply

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>