Level: Intermediate Michael Stutz (), Author, Consultant
12 Dec 2006 Adopt 10 good habits that improve your UNIX® command line efficiency -- and
break away from bad usage patterns in the process. This article takes you step-by-step
through several good, but too often neglected, techniques for command-line operations.
Learn about common errors and how to overcome them, so you can learn exactly why
these UNIX habits are worth picking up.
When you use a system often, you tend to fall into set usage
patterns. Sometimes, you do not start the habit of doing things in the
best possible way. Sometimes, you even pick up bad practices that lead
to clutter and clumsiness. One of the best ways to correct such
inadequacies is to conscientiously pick up good habits that counteract
them. This article suggests 10 UNIX command-line habits worth picking
up -- good habits that help you break many common usage foibles and
make you more productive at the command line in the process. Each habit
is described in more detail following the list of good habits.
Ten good habits to adopt are:
- Make directory trees in a single swipe.
- Change the path; do not move the archive.
- Combine your commands with control operators.
- Quote variables with caution.
- Use escape sequences to manage long input.
- Group your commands together in a list.
- Use
xargs outside of find . - Know when
grep should do the counting -- and when it
should step aside. - Match certain fields in output, not just lines.
- Stop piping
cat s.
Listing 1 illustrates one of the most common bad UNIX habits around: defining directory trees one at a time.
~ $ mkdir tmp ~ $ cd tmp ~/tmp $ mkdir a ~/tmp $ cd a ~/tmp/a $ mkdir b ~/tmp/a $ cd b ~/tmp/a/b/ $ mkdir c ~/tmp/a/b/ $ cd c ~/tmp/a/b/c $
|
It is so much quicker to use the -p option to mkdir
and make all parent directories along with their
children in a single command. But even administrators who know about
this option are still caught stepping through the subdirectories as
they make them on the
command line. It is worth your time to conscientiously pick up the good
habit:
You can use this option to make entire complex directory trees,
which are great to use inside scripts; not just simple hierarchies. For
example:
~ $ mkdir -p project/{lib/ext,bin,src,doc/{html,info,pdf},demo/stat/a}
|
In the past, the only excuse to define directories individually was that your mkdir implementation did not support this option, but this is no longer true on most systems. IBM, AIX®, mkdir , GNU mkdir , and others that conform to the Single UNIX Specification now have this option.
For the few systems that still lack the capability, use the mkdirhier script (see Resources), which is a wrapper for mkdir that does the same function:
~ $ mkdirhier project/{lib/ext,bin,src,doc/{html,info,pdf},demo/stat/a}
|
Another bad usage pattern is moving a .tar archive file to a certain
directory because it happens to be the directory you want to extract it
in. You never need to
do this. You can unpack any .tar archive file into any directory you
like -- that is what the -C option is for. Use the -C option when unpacking an archive file to specify the directory to unpack it in:
~ $ tar xvf -C tmp/a/b/c newarc.tar.gz
|
Making a habit of using -C is preferable to moving the
archive file to where you want to unpack it, changing to that
directory, and only then extracting its contents -- especially if the
archive file belongs somewhere else.
You probably already know that in most shells, you can combine
commands on a single command line by placing a semicolon (;) between
them. The semicolon is a shell control operator, and while it
is useful for stringing together multiple discrete commands on a single
command line, it does not work for everything. For example, suppose you
use a semicolon to combine two commands in which the proper execution
of the second command depends entirely upon the successful completion
of the first. If the first command does not exit as you expected, the
second command still runs -- and fails. Instead, use more appropriate
control operators (some are described in this article). As long as your
shell supports them, they are worth getting into the habit of using
them.
Use the && control operator to combine two commands so that the second is run only
if the first command returns a
zero exit status. In other words, if the first command runs
successfully, the second command runs. If the first command fails, the
second command does not run at all. For example:
~ $ cd tmp/a/b/c && tar xvf ~/archive.tar
|
In this example, the contents of the archive are extracted into the
~/tmp/a/b/c directory unless that directory does not exist. If the
directory does not exist, the
tar command does not run, so nothing is extracted.
Similarly, the ||
control operator separates two commands and runs the second command
only if the first command returns a non-zero exit status. In other
words, if the first command is successful, the second command does not run. If the first command fails, the second command
does run. This operator is often used when testing for whether a given directory exists and, if not, it creates one:
~ $ cd tmp/a/b/c || mkdir -p tmp/a/b/c
|
You can also combine the control operators described in this section. Each works on the last command run:
~ $ cd tmp/a/b/c || mkdir -p tmp/a/b/c && tar xvf -C tmp/a/b/c ~/archive.tar
|
Always be careful with shell expansion and variable names. It is
generally a good idea to enclose variable calls in double quotation
marks, unless you have a good reason not to. Similarly, if you are
directly following a variable name with alphanumeric text, be sure also
to enclose the variable name in curly braces ({}) to
distinguish it from the surrounding text. Otherwise, the shell
interprets the trailing text as part of your variable name -- and most
likely returns a null value. Listing 8 provides examples of various
quotation and non-quotation of variables and their effects.
~ $ ls tmp/ a b ~ $ VAR="tmp/*" ~ $ echo $VAR tmp/a tmp/b ~ $ echo "$VAR" tmp/* ~ $ echo $VARa
~ $ echo "$VARa"
~ $ echo "${VAR}a" tmp/*a ~ $ echo ${VAR}a tmp/a ~ $
|
You have probably seen code examples in which a backslash (\)
continues a long line over to the next line, and you know that most
shells treat what you type over successive lines joined by a backslash
as one long line. However, you might not take advantage of this
function on the command line as often as you can. The backslash is
especially handy if your terminal does not handle multi-line wrapping
properly or when your command line is smaller than usual (such as when
you have a long path on the prompt). The backslash is also useful for
making sense of long input lines as you type them, as in the following
example:
~ $ cd tmp/a/b/c || \ > mkdir -p tmp/a/b/c && \ > tar xvf -C tmp/a/b/c ~/archive.tar
|
Alternatively, the following configuration also works:
~ $ cd tmp/a/b/c \ > || \ > mkdir -p tmp/a/b/c \ > && \ > tar xvf -C tmp/a/b/c ~/archive.tar
|
However you divide an input line over multiple lines, the shell
always treats it as one continuous line, because it always strips out
all the backslashes and extra
spaces.
Note: In most shells, when you press the up arrow key, the entire multi-line entry is redrawn on a single, long input line.
Most shells have ways to group a set of commands together in a list
so that you can pass their sum-total output down a pipeline or
otherwise redirect any or all of its streams to the same place. You can
generally do this by running a list of commands in a subshell or by
running a list of commands in the current shell.
Use
parentheses to enclose a list of commands in a single group. Doing so
runs the commands in a new subshell and allows you to redirect or
otherwise
collect the output of the whole, as in the following example:
~ $ ( cd tmp/a/b/c/ || mkdir -p tmp/a/b/c && \ > VAR=$PWD; cd ~; tar xvf -C $VAR archive.tar ) \ > | mailx admin -S "Archive contents"
|
In this example, the content of the archive is extracted in the
tmp/a/b/c/ directory while the output of the grouped commands,
including a list of extracted files, is mailed to the admin address.
The use of a subshell is preferable in cases when you are redefining
environment variables in your list of commands and you do not want
those definitions to
apply to your current shell.
Run a list of commands in the current shell
Use curly braces ({}) to enclose a list of commands to run in the current
shell. Make sure you include spaces between the braces and the actual
commands, or the shell might not interpret the braces correctly. Also,
make sure that the final command in your list ends with a semicolon, as
in the following example:
Listing 12. Another example of good habit #6: Running a list of commands in the current shell
~ $ { cp ${VAR}a . && chown -R guest.guest a && \ > tar cvf newarchive.tar a; } | mailx admin -S "New archive"
|
Use the xargs tool as a filter for making good use of output culled from the find command. The general precept is that a find run provides a list of files that match some criteria. This list is passed on to xargs , which then runs some other useful command with that list of files as arguments, as in the following example:
~ $ find some-file-criteria some-file-path | \ > xargs some-great-command-that-needs-filename-arguments
|
However, do not think of xargs as just a helper for find ; it is one of those underutilized tools that, when
you get into the habit of using it, you want to try on everything, including the following uses.
In its simplest invocation, xargs
is like a filter that takes as input a list (with each member on a
single line). The tool puts those members on a single space-delimited
line:
~ $ xargs a b c Control-D a b c ~ $
|
You can send the output of any tool that outputs file names through xargs to get a list of arguments for some other tool that takes file names as an argument, as in the following example:
~/tmp $ ls -1 | xargs December_Report.pdf README a archive.tar mkdirhier.sh ~/tmp $ ls -1 | xargs file December_Report.pdf: PDF document, version 1.3 README: ASCII text a: directory archive.tar: POSIX tar archive mkdirhier.sh: Bourne shell script text executable ~/tmp $
|
The xargs command is useful for more than passing file names. Use it any time you need to filter text into a single line:
~/tmp $ ls -l | xargs -rw-r--r-- 7 joe joe 12043 Jan 27 20:36 December_Report.pdf -rw-r--r-- 1 \ root root 238 Dec 03 08:19 README drwxr-xr-x 38 joe joe 354082 Nov 02 \ 16:07 a -rw-r--r-- 3 joe joe 5096 Dec 14 14:26 archive.tar -rwxr-xr-x 1 \ joe joe 3239 Sep 30 12:40 mkdirhier.sh ~/tmp $
|
Technically, a rare situation occurs in which you could get into trouble using xargs .
By default, the end-of-file string is an underscore (_); if that
character is sent as a single input argument, everything after it is
ignored. As a precaution against this, use the -e flag, which, without arguments, turns off the end-of-file string completely.
Avoid piping a grep to wc -l in order to count the number of lines of output. The -c option to grep gives a count of lines that match the specified pattern and is generally faster than a pipe to wc , as in the following example:
~ $ time grep and tmp/a/longfile.txt | wc -l 2811
real 0m0.097s user 0m0.006s sys 0m0.032s ~ $ time grep -c and tmp/a/longfile.txt 2811
real 0m0.013s user 0m0.006s sys 0m0.005s ~ $
|
An addition to the speed factor, the -c option is also a better way to do the counting. With multiple files, grep with the -c option returns a separate count for each file, one on each line, whereas a pipe to wc gives a total count for all files combined.
However, regardless of speed considerations, this example showcases
another common error to avoid. These counting methods only give counts
of the number of lines containing matched patterns -- and if
that is what you are looking for, that is great. But in cases where
lines can have multiple instances of a particular pattern, these
methods do not give you a true count of the actual number of instances matched. To count the number of instances, use wc to count, after all. First, run a grep command with the -o option, if your version supports it. This option outputs only the matched pattern, one on each line, and not the line itself. But you cannot use it in conjunction with the -c option, so use wc -l to count the lines, as in the following example:
~ $ grep -o and tmp/a/longfile.txt | wc -l 3402 ~ $
|
In this case, a call to wc is slightly faster than a second call to grep with a dummy pattern put in to match and count each line (such as grep -c ).
A tool like awk is preferable to grep when you want to match the pattern in only a specific field in the lines of output and not just anywhere in the lines.
The following simplified example shows how to list only those files modified in December:
~/tmp $ ls -l /tmp/a/b/c | grep Dec -rw-r--r-- 7 joe joe 12043 Jan 27 20:36 December_Report.pdf -rw-r--r-- 1 root root 238 Dec 03 08:19 README -rw-r--r-- 3 joe joe 5096 Dec 14 14:26 archive.tar ~/tmp $
|
In this example, grep filters the lines, outputting all files with Dec
in their modification dates as well as in their names. Therefore, a
file such as December_Report.pdf is matched, even if it has not been
modified since January. This probably is not what you want. To match a
pattern in a particular field, it is better to use awk , where a relational operator matches the exact field, as in the following example:
~/tmp $ ls -l | awk '$6 == "Dec"' -rw-r--r-- 3 joe joe 5096 Dec 14 14:26 archive.tar -rw-r--r-- 1 root root 238 Dec 03 08:19 README ~/tmp $
|
See Resources for more details about how to use awk .
A basic-but-common grep usage error involves piping the output of cat to grep to search the contents of a single file. This is absolutely unnecessary and a waste of time, because tools such as grep take file names as arguments. You simply do not need to use cat in this situation at all, as in the following example:
~ $ time cat tmp/a/longfile.txt | grep and 2811
real 0m0.015s user 0m0.003s sys 0m0.013s ~ $ time grep and tmp/a/longfile.txt 2811
real 0m0.010s user 0m0.006s sys 0m0.004s ~ $
|
This mistake applies to many tools. Because most tools take standard
input as an argument using a hyphen (-), even the argument for using cat to intersperse multiple files with stdin is often not valid. It is really only necessary to concatenate first before a pipe when you use cat with one of its several filtering options.
It is good to examine your command-line habits for any bad usage
patterns. Bad habits slow you down and often lead to unexpected errors.
This article presents 10 new habits that can help you break away from
many of the most common usage errors. Picking up these good habits is a
positive step toward sharpening your UNIX command-line skills.
Learn
Get products and technologies
-
To obtain a copy of
mkdirhier , you can download a version from the .
Discuss
|