Shell Scripting (Part 1)

We think we know shell very well or barely know it? sed "s@([A-Za-z]*)[\.:#,]@\1@g" ; What's"$$" , "$?", "$!", "$-"...?, Why this command ( cd /etc ) doesn't change my path to /etc? $content=$(<file1); print $content ; exec <input_file ...

The good things are most SAs don't need to know shell very well to do most daily tasks; that's because UNIX offers both smart and awkward ways to get things done. However, hardcore UNIX skills are always essential for working efficiently on UNIX and make the fundamental difference/glasswall between Junior and Senior level SAs.

What we know about UNIX, even after 10 years of working with UNIX, is still like a drop of water from the sea... I'm still learning something new about UNIX everyday. The pleasure of acquiring new skills has inspired me and makes me feel proud of being a UNIX administrator... Working and living a life is one thing, working with passion and living a happy/productive life is another thing... I'm very happy and hopefully am productive everyday.


Command Substitution

· The standard output from a command list enclosed in parentheses preceded by a dollar sign ( $(list ) ), or in a brace group preceded by a dollar sign ( ${ list ;} ), or in a pair of grave accents ( ` ` ) (the obsolete form) may be used as part or all of a word; trailing new-lines are removed.
· The command substitution $( cat file ) can be replaced by the equivalent but faster $( <file ) .
· The command substitution $( n <# ) will expand to the current byte offset for file descriptor n.


Example 1, Count a text file's English word frequency (there are many ways to implement it, I'm using a lazy way with command substitution. A classic way of doing this from Brian Kernighan's book is listed at the end of this example):

1. Turn the text file into one word each line:
$for i in $( <my_txtfile)
do
echo $i
done | sort > txt.1

2.Count word's frequency:
$for i in $(uniq txt.1)
do
echo $i: $(grep \b$i\b txt.1 | wc -l)
done

(note: we need to put boundary in grep's command to avoid counting words like 'them' or 'theory' as 'the'.)

output:
succeed.: 2
succeeded.: 1
symbol: 2
symbol.: 1
that: 1
the: 36
The: 2
them:: 1
those: 1
to: 17

Things need to improve in the above example:
1. count "symbol" the same as "symbol." - hint: sed "s/([A-Za-z]*)[\.:#,]/\1/".
2. sort the output by words frequency.

BTW, here is the classic solution for this problem. It is on page 107 from this book.

cat $* |
tr -sc A-Za-z '\012' |
sort |
uniq -c |
sort -nr |

output:
36 the
21 command
18 of
17 to
12 a
8 is
8 file
5 shell

In this classic solution, two key commands were used.

One is a compress run of tr -sc A-Za-z '\012' which turns non-letters into newline '\012' and squeezes them out. The -c option complements (negates) the set of characters in the expression 'A-Za-z'.

The second one is 'uniq -c' which prefix lines by the number of occurrences.

Also, '\012' is the octal values of special character 'newline' or '\n'. Just in case we want to know other common special characters' octal values, here is a list:

Character

Octal Value

Bell

7

Backspace

10

Tab

11

Newline

12

Linefeed

12

Formfeed

14

Carriage Return

15

Escape

33






Example 2,
$ content=$( <web.py ); print $content

#! /usr/bin/env python import sys, webbrowser def main(): args = sys.argv[1:] if not args: print "Usage: %s querystring" % sys.argv[0] return list = [] for arg in args: if '+' in arg: arg = arg.replace('+', '%2B') if ' ' in arg: arg = '"%s"' % arg arg = arg.replace(' ', '+') list.append(arg) s = '+'.join(list) url = "http://www.google.com/search?q=%s" % s webbrowser.open(url) if __name__ == '__main__': main()

Try $(<$content) in the shell and see what's happening?


Shell Special Characters

Here is the meaning of some of them:

$$
pid of the program being executed
$? The exit status of the last command not executed in the bg
$! The pid of the last program sent to the bg
$- The current shell options in effect (see set manpage)

$# number of the positional parameters passed to the command
$* expands to all positional parameters passed to the command
$@ expands to all positional parameters passed to the command, but individually quoted when "$@" is used.
$0 name of the shell or the shell script being executed.
$1 the value of the first positional parameter passed to the command. $2 is the second positional parameter passed to the command, etc. up to $9.


( ) enclose command(s) to be launched in a separate shell (subshell). E.g. ( dir ).
{ } enclose a group of commands to be launched by the current shell. E.g. { dir }. It needs the spaces.
&& is an "AND" connecting two commands. command1 && command2 will execute command2 only if command1 exits with the exit status 0 (no error). For example: cat file1 && cat file2 will display file2 only if displaying file1 succeeded.
|| is an "OR" connecting two commands. command1 || command2 will execute command2 only if command1 exits with the exit status of non-zero (with an error). For example: cat file1 || cat file2 will display file2 only if displaying file1 didn't succeed.
\ ' " and ' are used for quoting.
<> are used for input/output redirection.
| pipes the output of the command to the left of the pipe symbol "|" to the input of the command on the right of the pipe symbol.
; separates multiple commands written on a single line and separate the command words.
& causes the preceding command to execute in the background (i.e., asynchronously, as its own separate process) so that the next command does not wait for its completion.
* when a filename is expected, it matches any filename except those starting with a dot (or any part of a filename, except the initial dot).
? when a filename is expected, it matches any single character.
[ ] when a filename is expected, it maches any single character enclosed inside the pair of [ ].

Shell's I/O

>& : The notation >& specifies output redirection to a file associated with the file descriptor that follows. echo "Invalid number of arguments" >& 2 will write to file descriptor 2's file which is standard error.

">& - " closes the standard output. If preceded by a file descriptor, then the associated file is closed instead. e.g. "ls >&-" go nowhere since standard output is closed by the shell before ls is executed.

"<&-" same for standard input Use exec to redirect I/O:

exec <input_file

will cause all subsequent commands executed that read from standard input will read from "input_file" instead. Use "exec >& 0" to resume back to standard input.

exec > /tmp/output

will cause all subsequent commands write to stand output will write to /tmp/output

exec 2> /tmp/errors
will cause all subsequent commands write to standard error will write to /tmp/errors

In-line Input Redirection (here) <<

command <<word

the shell will use the lines that follow as the standard input for command, up until a line that contatins just word is found.

e.g.

wc -l <<EOF
>a
>b
>c
>d
>EOF
4

Aliasing
(The following aliases are compiled into the kornshell but can be unset/redefined)
autoload='typeset -fu'
command='command '
compound='typeset -C'
fc=hist
float='typeset -lE'
functions='typeset -f'
hash='alias -t - -'
history='hist -l'
integer='typeset -li'
nameref='typeset -n'
nohup='nohup '
r='hist -s'
redirect='command exec'
source='command .'
stop='kill -s STOP'
suspend='kill -s STOP $$'
times='{ { time;} 2>&1;}'
type='whence -v'

Shell's (ksh and bash) Pattern-Matching Operators


Operator Meaning
${variable#pattern}

If the pattern matches the beginning of the variable's value, delete the shortest part that matches and return the rest.

${variable##pattern}

If the pattern matches the beginning of the variable's value, delete the longest part that matches and return the rest.

${{variable%pattern}

If the pattern matches the end of the variable's value, delete the shortest part that matches and return the rest.

${variable%%pattern}

If the pattern matches the end of the variable's value, delete the longest part that matches and return the rest.

These can be hard to remember, so here's a handy mnemonic device: # matches the front because number signs precedenumbers; % matches the rear because percent signs follownumbers. Another mnemonic comes from the typical placement (in the U.S.A., anyway) of the # and % keys on the keyboard. Relative to each other, the # is on the left, and the % is on the right.

The classic use for pattern-matching operators is in stripping components from pathnames, such as directory prefixes and filename suffixes. With that in mind, here is an example that shows how all of the operators work. Assume that the variable path has the value /home/billr/mem/long.file.name; then:

Expression Result
${path##/*/} long.file.name
${path#/*/} billr/mem/long.file.name
$path /home/billr/mem/long.file.name
${path%.*} /home/billr/mem/long.file
${path%%.*} /home/billr/mem/long

The two patterns used here are /*/, which matches anything between two slashes, and .*, which matches a dot followed by anything.

#!/bin/bash

#echo ${.sh.version}
var='A regular expressions test'

echo "1> //e/#"
echo ${var//e/#}
echo "2> //[^e]/#"
echo ${var//[^e]/#}

The above code gives the following output:

1> //e/#
A r#gular #xpr#ssions t#st
2> //[^e]/#
###e######e###e########e##