[Nemeth10] 2.4. Perl programming-stevens0102-ChinaUnix博客

stevens0102的ChinaUnix博客

首页　| 　博文目录　| 　关于我

stevens0102

博客访问： 3945
博文数量： 3
博客积分： 85
博客等级：民兵
技术积分： 40
用户组：普通用户
注册时间： 2011-02-12 14:59

文章分类

全部博文（3）

未分配的博文（3）

文章存档

2011年（3）

我的朋友

相关博文

[Nemeth10] 2.4. Perl programming

分类：系统运维

2011-02-17 08:57:06

2.4. Perl programming

Perl, created by Larry Wall, was the first of the truly great scripting languages. It offers vastly more power than bash, and well-written Perl code is quite easy to read. On the other hand, Perl does not impose much stylistic discipline on developers, so Perl code written without regard for readability can be cryptic. Perl has been accused of being a write-only language.

Here we describe Perl 5, the version that has been standard for the last decade. Perl 6 is a major revision that’s still in development. See for details.

Either Perl or Python (discussed starting on page ) is a better choice for system administration work than traditional programming languages such as C, C++, C#, and Java. They can do more, in fewer lines of code, with less painful debugging, and without the hassle of compilation.

Language choice usually comes down to personal preference or to standards forced upon you by an employer. Both Perl and Python offer libraries of community-written modules and language extensions. Perl has been around longer, so its offerings extend further into the long tail of possibilities. For common system administration tasks, however, the support libraries are roughly equivalent.

Perl’s catch phrase is that “there’s more than one way to do it.” So keep in mind that there are other ways of doing most of what you read in this section.

Perl statements are separated by semicolons. Comments start with a hash mark (#) and continue to the end of the line. Blocks of statements are enclosed in curly braces. Here’s a simple “hello, world!” program:

Since semicolons are separators and not terminators, the last one in a block is optional.

#!/usr/bin/perl
print "Hello, world!\n";

As with bash programs, you must either chmod +x the executable file or invoke the Perl interpreter directly.

$ chmod +x helloworld
$ ./helloworld
Hello, world!

Lines in a Perl script are not shell commands; they’re Perl code. Unlike bash, which lets you assemble a series of commands and call it a script, Perl does not look outside itself unless you tell it to. That said, Perl provides many of the same conventions as bash, such as the use of back-ticks to capture the output from a command.

Variables and arrays

Perl has three fundamental data types: scalars (that is, unitary values such as numbers and strings), arrays, and hashes. Hashes are also known as associative arrays. The type of a variable is always obvious because it’s built into the variable name: scalar variables start with $, array variables start with @, and hash variables start with %.

In Perl, the terms “list” and “array” are often used interchangeably, but it’s perhaps more accurate to say that a list is a series of values and an array is a variable that can hold such a list. The individual elements of an array are scalars, so like ordinary scalar variables, their names begin with $. Array subscripting begins at zero, and the index of the highest element in array @a is $#a. Add 1 to that to get the array’s size.

The array @ARGV contains the script’s command-line arguments. You can refer to it just like any other array.

The following script demonstrates the use of arrays:

#!/usr/bin/perl

@items = ("socks", "shoes", "shorts");
printf "There are %d articles of clothing.\n", $#items + 1;
print "Put on ${items[2]} first, then ", join(" and ", @items[0,1]), ".\n";

The output:

$ perl clothes
There are 3 articles of clothing.
Put on shorts first, then socks and shoes.

There’s a lot to see in just these few lines. At the risk of blurring our laser-like focus, we include several common idioms in each of our Perl examples. We explain the tricky parts in the text following each example. If you read the examples carefully (don’t be a wimp, they’re short!), you’ll have a working knowledge of the most common Perl forms by the end of this chapter.

Array and string literals

In this example, notice first that (...) creates a literal list. Individual elements of the list are strings, and they’re separated by commas. Once the list has been created, it is assigned to the variable @items.

Perl does not strictly require that all strings be quoted. In this particular case, the initial assignment of @items works just as well without the quotes.

@items = (socks, shoes, shorts);

Perl calls these unquoted strings “barewords,” and they’re an interpretation of last resort. If something doesn’t make sense in any other way, Perl tries to interpret it as a string. In a few limited circumstances, this makes sense and keeps the code clean. However, this is probably not one of those cases. Even if you prefer to quote strings consistently, be prepared to decode other people’s quoteless code.

The more Perly way to initialize this array is with the qw (quote words) operator. It is in fact a form of string quotation, and like most quoted entities in Perl, you can choose your own delimiters. The form

@items = qw(socks shoes shorts);

is the most traditional, but it’s a bit misleading since the part after the qw is no longer a list. It is in fact a string to be split at whitespace to form a list. The version

@items = qw[socks shoes shorts];

works, too, and is perhaps a bit truer to the spirit of what’s going on. Note that the commas are gone since their function has been subsumed by qw.

Function calls

Both print and printf accept an arbitrary number of arguments, and the arguments are separated by commas. But then there’s that join(...) thing that looks like some kind of function call; how is it different from print and printf?

In fact, it’s not; print, printf, and join are all plain-vanilla functions. Perl allows you to omit the parentheses in function calls when this does not cause ambiguity, so both forms are common. In the print line above, the parenthesized form distinguishes the arguments to join from those that go to print.

We can tell that the expression @items[0,1] must evaluate to some kind of list since it starts with @. This is in fact an “array slice” or subarray, and the 0,1 subscript lists the indexes of the elements to be included in the slice. Perl accepts a range of values here, too, as in the equivalent expression @items[0..1]. A single numeric subscript would be acceptable here as well: @items[0] is a list containing one scalar, the string “socks”. In this case, it’s equivalent to the literal ("socks").

Arrays are automatically expanded in function calls, so in the expression

join(" and ", @items[0,1])

join receives three string arguments: “ and ”, “socks”, and “shoes”. It concatenates its second and subsequent arguments, inserting a copy of the first argument between each pair. The result is “socks and shoes”.

Type conversions in expressions

In the printf line, $#items + 1 evaluates to the number 3. As it happens, $#items is a numeric value, but that’s not why the expression is evaluated arithmetically; "2" + 1 works just as well. The magic is in the + operator, which always implies arithmetic. It converts its arguments to numbers and produces a numeric result. Similarly, the dot operator (.), which concatenates strings, converts its operands as needed: "2" . (12 ** 2) yields “2144”.

String expansions and disambiguation of variable references

As in bash, double-quoted strings are subject to variable expansion. Also as in bash, you can surround variable names with curly braces to disambiguate them if necessary, as with ${items[2]}. (Here, the braces are used only for illustration; they are not needed.) The $ clues you in that the expression is going to evaluate to a scalar. @items is the array, but any individual element is itself a scalar, and the naming conventions reflect this fact.

Hashes

A hash (also known as an associative array) represents a set of key/value pairs. You can think of a hash as an array whose subscripts (keys) are arbitrary scalar values; they do not have to be numbers. But in practice, numbers and strings are the usual keys.

Hash variables have % as their first character (e.g., %myhash), but as in the case of arrays, individual values are scalar and so begin with a $. Subscripting is indicated with curly braces rather than square brackets, e.g., $myhash{'ron'}.

Hashes are an important tool for system administrators. Nearly every script you write will use them. In the code below, we read in the contents of a file, parse it according to the rules for /etc/passwd, and build a hash of the entries called %names_by_uid. The value of each entry in the hash is the username associated with that UID.

#!/usr/bin/perl

while ($_ = <>) {
($name, $pw, $uid, $gid, $gecos, $path, $sh) = split /:/;
$names_by_uid{$uid} = $name;
}
%uids_by_name = reverse %names_by_uid;

print "\$names_by_uid{0} is $names_by_uid{0}\n";
print "\$uids_by_name{'root'} is $uids_by_name{'root'}\n";

As in the previous script example, we’ve packed a couple of new ideas into these lines. Before we go over each of these nuances, here’s the output of the script:

$ perl hashexample /etc/passwd
$names_by_uid{0} is root
$uids_by_name{'root'} is 0

The while ($_ = <>) reads input one line at a time and assigns it to the variable named $_; the value of the entire assignment statement is the value of the righthand side, just as in C. When you reach the end of the input, the <> returns a false value and the loop terminates.

To interpret <>, Perl checks the command line to see if you named any files there. If you did, it opens each file in sequence and runs the file’s contents through the loop. If you didn’t name any files on the command line, Perl takes the input to the loop from standard input.

Within the loop, a series of variables receive the values returned by split, a function that chops up its input string by using the regular expression passed to it as the field separator. Here, the regex is delimited by slashes; this is just another form of quoting, one that’s specialized for regular expressions but similar to the interpretation of double quotes. We could just as easily have written split ':' or split ":".

The string that split is to divide at colons is never explicitly specified. When split’s second argument is missing, Perl assumes you want to split the value of $_. Clean! Truth be told, even the pattern is optional; the default is to split at whitespace but ignore any leading whitespace.

But wait, there’s more. Even the original assignment of $_, back at the top of the loop, is unnecessary. If you simply say

while (<>) {

Perl automatically stores each line in $_. You can process lines without ever making an explicit reference to the variable in which they’re stored. Using $_ as a default operand is common, and Perl allows it more or less wherever it makes sense.

In the multiple assignment that captures the contents of each passwd field,

($name, $pw, $uid, $gid, $gecos, $path, $sh) = split /:/;

the presence of a list on the left hand side creates a “list context” for split that tells it to return a list of all fields as its result. If the assignment were to a scalar variable, for example,

$n_fields = split /:/;

split would run in “scalar context” and return only the number of fields that it found. Functions you write can distinguish between scalar and list contexts, too, by using the wantarray function. It returns a true value in list context, a false value in scalar context, and an undefined value in void context.

The line

%uids_by_name = reverse %names_by_uid;

has some hidden depths, too. A hash in list context (here, as an argument to the reverse function) evaluates to a list of the form (key1, value1, key2, value2, ...). The reverse function reverses the order of the list, yielding (valueN, keyN, ..., value1, key1). Finally, the assignment to the hash variable %uids_by_name converts this list as if it were (key1, value1, ...), thereby producing a permuted index.

References and autovivification

These are advanced topics, but we’d be remiss if we didn’t at least mention them. Here’s the executive summary. Arrays and hashes can only hold scalar values, but you will often want to store other arrays and hashes within them. For example, returning to our previous example of parsing the /etc/passwd file, you might want to store all the fields of each passwd line in a hash indexed by UID.

You can’t store arrays and hashes, but you can store references (that is, pointers) to arrays and hashes, which are themselves scalars. To create a reference to an array or hash, you precede the variable name with a backslash (e.g., \@array) or use reference-to-array or reference-to-hash literal syntax. For example, our passwdparsing loop would become something like this:

while (<>) {
$array_ref = [ split /:/ ];
$passwd_by_uid{$array_ref->[2]} = $array_ref;
}

The square brackets return a reference to an array containing the results of the split. The notation $array_ref->[2] refers to the UID field, the third member of the array referenced by $array_ref.

$array_ref[2] won’t work here because we haven’t defined an @array_ref array; $array_ref and @array_ref are different variables. Furthermore, you won’t receive an error message if you mistakenly use $array_ref[2] here because @array_ref is a perfectly legitimate name for an array; you just haven’t assigned it any values.

This lack of warnings may seem like a problem, but it’s arguably one of Perl’s nicest features, a feature known as “autovivification.” Because variable names and referencing syntax always make clear the structure of the data you are trying to access, you need never create any intermediate data structures by hand. Simply make an assignment at the lowest possible level, and the intervening structures materialize automatically. For example, you can create a hash of references to arrays whose contents are references to hashes with a single assignment.

Regular expressions in Perl

You use regular expressions in Perl by “binding” strings to regex operations with the =~ operator. For example, the line

if ($text =~ m/ab+c/) {

checks to see whether the string stored in $text matches the regular expression ab+c. To operate on the default string, $_, you can simply omit the variable name and binding operator. In fact, you can omit the m, too, since the operation defaults to matching:

if (/ab+c/) {

Substitutions work similarly:

$text =~ s/etc\./and so on/g; # Substitute text in $text, OR
s/etc\./and so on/g; # Apply to $_

We sneaked in a g option to replace all instances of “etc.” with “and so on”, rather than just replacing the first instance. Other common options are i to ignore case, s to make dot (.) match newlines, and m to make the ^ and $ tokens match at the beginning and end of individual lines rather than only at the beginning and end of the search text.

A couple of additional points are illustrated in the following script:

#!/usr/bin/perl

$names = "huey dewey louie";
$regex = '(\w+)\s+(\w+)\s+(\w+)';

if ($names =~ m/$regex/) {
print "1st name is $1.\n2nd name is $2.\n3rd name is $3.\n";
$names =~ s/$regex/\2 \1/;
print "New names are \"${names}\".\n";
} else {
print qq{"$names" did not match "$regex".\n};
}

The output:

$ perl testregex
1st name is huey.
2nd name is dewey.
3rd name is louie.
New names are "dewey huey".

This example shows that variables expand in // quoting, so the regular expression need not be a fixed string. qq is another name for the double-quote operator.

After a match or substitution, the contents of the variables $1, $2, and so on correspond to the text matched by the contents of the capturing parentheses in the regular expression. The contents of these variables are also available during the replacement itself, in which context they are referred to as \1, \2, etc.

Input and output

When you open a file for reading or writing, you define a “filehandle” to identify the channel. In the example below, INFILE is the filehandle for /etc/passwd and OUTFILE is the filehandle associated with /tmp/passwd. The while loop condition is , which is similar to the <> we have seen before but specific to a particular filehandle. It reads lines from the filehandle INFILE until the end of file, at which time the while loop ends. Each line is placed in the variable $_.

#!/usr/bin/perl

open(INFILE, "open(OUTFILE, ">/tmp/passwd") or die "Couldn't open /tmp/passwd";

while () {
($name, $pw, $uid, $gid, $gecos, $path, $sh) = split /:/;
print OUTFILE "$uid\t$name\n";
}

open returns a true value if the file is successfully opened, short-circuiting (rendering unnecessary) the evaluation of the die clauses. Perl’s or operator is similar to || (which Perl also has), but at lower precedence. or is a generally a better choice when you want to emphasize that everything on the left will be fully evaluated before Perl turns its attention to the consequences of failure.

Perl’s syntax for specifying how you want to use each file (read? write? append?) mirrors that of the shell. You can also use “filenames” such as "/bin/df|" to open pipes to and from shell commands.

Control flow

The example below is a Perl version of our earlier bash script that validated its command-line arguments. You might want to refer to the bash version on page for comparison. Note that Perl’s if construct has no then keyword or terminating word, just a block of statements enclosed in curly braces.

You can also add a postfix if clause (or its negated version, unless) to an individual statement to make that statement’s execution conditional.

#!/usr/bin/perl

sub show_usage {
print shift, "\n" if scalar(@_);
print "Usage: $0 source_dir dest_dir\n";
exit scalar(@_) ? shift : 1;
}
if (@ARGV != 2) {
show_usage;
} else { # There are two arguments
($source_dir, $dest_dir) = @ARGV;
show_usage "Invalid source directory" unless -d $source_dir;
-d $dest_dir or show_usage "Invalid destination directory";
}

Here, the two lines that use Perl’s unary -d operator to validate the directory-ness of $source_dir and $dest_dir are equivalent. The second form (with -d at the start of the line) has the advantage of putting the actual assertion at the beginning of the line, where it’s most noticeable. However, the use of or to mean “otherwise” is a bit tortured; some readers of the code may find it confusing.

Evaluating an array variable in scalar context (specified by the scalar operator in this example) returns the number of elements in the array. This is 1 more than the value of $#array; as always in Perl, there’s more than one way to do it.

Perl functions receive their arguments in the array named @_. It’s common practice to access them with the shift operator, which removes the first element of the argument array and returns its value.

This version of the show_usage function accepts an optional error message to be printed. If you provide an error message, you can also provide a specific exit code. The trinary ?: operator evaluates its first argument; if the result is true, the result of the entire expression is the second argument; otherwise, the third.

As in bash, Perl has a dedicated “else if” condition, but its keyword is elsif rather than elif. (For you who use both languages, these fun, minute differences either keep you mentally nimble or drive you insane.)

As shows, Perl’s comparison operators are the opposite of bash’s; strings use textual operators, and numbers use traditional algebraic notation. Compare with on page .

Table 2.5. Elementary Perl comparison operators
String	Numeric	True if
x eq y	x = y	x is equal to y
x ne y	x != y	x is not equal to y
x lt y	x < y	x is less than y
x le y	x <= y	x is less than or equal to y
x gt y	x > y	x is greater than y
x ge y	x >= y	x is greater than or equal to y

In Perl, you get all the file-testing operators shown in on page except for the -nt and -ot operators, which are available in bash only.

Like bash, Perl has two types of for loops. The more common form iterates through an explicit list of arguments. For example, the code below iterates through a list of animals, printing one per line.

@animals = qw(lions tigers bears);
foreach $animal (@animals) {
print "$animal \n" ;
}

The more traditional C-style for loop is also available:

for ($counter=1; $counter <= 10; $counter++) {
printf "$counter ";
}

We’ve shown these with the traditional for and foreach labels, but those are in fact the same keyword in Perl and you can use whichever form you prefer.

Versions of Perl before 5.10 (2007) have no explicit case or switch statement, but there are several ways to accomplish the same thing. In addition to the obviousbut-clunky option of cascading if statements, another possibility is to use a for statement to set the value of $_ and provide a context from which last can escape:

for ($ARGV[0]) {

m/^websphere/ && do { print "Install for websphere\n"; last; };
m/^tomcat/ && do { print "Install for tomcat\n" ; last; };
m/^geronimo/ && do { print "Install for geronimo\n"; last; };

print "Invalid option supplied.\n"; exit 1;
}

The regular expressions are compared with the argument stored in $_. Unsuccessful matches short-circuit the && and fall through to the next test case. Once a regex matches, its corresponding do block is executed. The last statements escape from the for block immediately.

Accepting and validating input

The script below combines many of the Perl constructs we’ve reviewed over the last few pages, including a subroutine, some postfix if statements, and a for loop. The program itself is merely a wrapper around the main function get_string, a generic input validation routine. This routine prompts for a string, removes any trailing newline, and verifies that the string is not null. Null strings cause the prompt to be repeated up to three times, after which the script gives up.

#!/usr/bin/perl

$maxatt = 3; # Maximum tries to supply valid input

sub get_string {
my ($prompt, $response) = shift;
# Try to read input up to $maxatt times
for (my $attempts = 0; $attempts < $maxatt; $attempts++) {
print "Please try again.\n" if $attempts;
print "$prompt: ";
$response = readline(*STDIN);
chomp($response);
return $response if $response;
}
die "Too many failed input attempts";
}
# Get names with get_string and convert to uppercase
$fname = uc get_string "First name";
$lname = uc get_string "Last name";
printf "Whole name: $fname $lname\n";

The output:

$ perl validate
First name: John Ball
Last name: Park
Whole name: JOHN BALL PARK

The get_string function and the for loop both illustrate the use of the my operator to create variables of local scope. By default, all variables are global in Perl.

The list of local variables for get_string is initialized with a single scalar drawn from the routine’s argument array. Variables in the initialization list that have no corresponding value (here, $response) remain undefined.

The *STDIN passed to the readline function is a “typeglob,” a festering wart of language design. It’s best not to inquire too deeply into what it really means, lest one’s head explode. The short explanation is that Perl filehandles are not first-class data types, so you must generally put a star in front of their names to pass them as arguments to functions.

In the assignments for $fname and $lname, the uc (convert to uppercase) and get_string functions are both called without parentheses. Since there is no possibility of ambiguity given the single argument, this works fine.

Perl as a filter

You can use Perl without a script by putting isolated expressions on the command line. This is a great way to do quick text transformations and one that largely obsoletes older filter programs such as sed, awk, and tr.

Use the -pe command-line option to loop through STDIN, run a simple expression on each line, and print the result. For example, the command

ubuntu$ perl -pe 's#/bin/sh$#/bin/bash#' /etc/passwd
root:x:0:0:root:/root:/bin/bash
daemon:x:1:1:daemon:/usr/sbin:/bin/bash
...

replaces /bin/sh at the end of lines in /etc/passwd with /bin/bash, emitting the transformed passwd file to STDOUT. You may be more accustomed to seeing the text substitution operator with slashes as delimiters (e.g., s/foo/bar/), but Perl allows any character. Here, the search text and replacement text both contain slashes, so it’s simpler to use # as the delimiter. If you use paired delimiters, you must use four of them instead of the normal three, e.g., s(foo)(bar).

Perl’s -a option turns on autosplit mode, which separates input lines into fields that are stored in the array named @F. Whitespace is the default field separator, but you can set another separator pattern with the -F option.

Autosplit is handy to use in conjunction with -p or its nonautoprinting variant, -n. For example, the commands below use perl -ane to slice and dice the output from two variations of df. The third line then runs join to combine the two sets of fields on the Filesystem field, producing a composite table that includes fields drawn from both versions of the df output.

suse$ df -h | perl -ane 'print join("\t", @F[0..4]), "\n"' > tmp1
suse$ df -i | perl -ane 'print join("\t", @F[0,1,4]), "\n"' > tmp2
suse$ join tmp1 tmp2
Filesystem Size Used Avail Use% Inodes IUse%
/dev/hda3 3.0G 1.9G 931M 68% 393216 27%
udev 126M 172K 126M 1% 32086 2%
/dev/hda1 92M 26M 61M 30% 24096 1%
/dev/hda6 479M 8.1M 446M 2% 126976 1%
...

A script version with no temporary files would look something like this:

#!/usr/bin/perl

for (split(/\n/, 'df -h')) {
@F = split;
$h_part{$F[0]} = [ @F[0..4] ];
}

for (split(/\n/, 'df -i') {
@F = split;
print join("\t", @{$h_part{$F[0]}}, $F[1], $F[4]), "\n";
}

The truly intrepid can use -i in conjunction with -pe to edit files in place; Perl reads the files in, presents their lines for editing, and saves the results out to the original files. You can supply a pattern to -i that tells Perl how to back up the original version of each file. For example, -i.bak backs up passwd as passwd.bak. Beware—if you don’t supply a backup pattern, you don’t get backups at all. Note that there’s no space between the -i and the suffix.

Add-on modules for Perl

CPAN, the Comprehensive Perl Archive Network at , is the warehouse for user-contributed Perl libraries. Installation of new modules is greatly facilitated by the cpan command, which acts much like a yum or APT package manager dedicated to Perl modules. If you’re on a Linux system, check to see if your distribution packages the module you’re looking for as a standard feature—it’s much easier to install the system-level package once and then let the system take care of updating itself over time.

On systems that don’t have a cpan command, try running perl -MCPAN -e shell as an alternate route to the same feature:

$ sudo perl -MCPAN -e shell

cpan shell -- CPAN exploration and modules installation (v1.9205)
ReadLine support available (maybe install Bundle::CPAN or Bundle::CPANxxl?)

cpan[1]> install Class::Date
CPAN: Storable loaded ok (v2.18)
CPAN: LWP::UserAgent loaded ok (v5.819)
CPAN: Time::HiRes loaded ok (v1.9711)
... several more pages of status updates ...

It’s possible for users to install Perl modules in their home directories for personal use, but the process isn’t necessarily straightforward. We recommend a liberal policy regarding system-wide installation of third-party modules from CPAN; the community provides a central point of distribution, the code is open to inspection, and module contributors are identified by name. Perl modules are no more dangerous than any other open source software.

Many Perl modules use components written in C for better performance. Installation involves compiling these segments, so you need a complete development environment including the C compiler and a full set of libraries.

As with most languages, the most common error found in Perl programs is the reimplementation of features that are already provided by community-written modules. Get in the habit of visiting CPAN as the first step in tackling any Perl problem. It saves development and debugging time.

Tom Christiansen commented, “That wouldn’t be my own first choice, but it is a good one. My nominee for the most common error in programs is that they are usually never rewritten. When you take English composition, you are often asked to turn in an initial draft and then a final revision, separately. This process is just as important in programming. You’ve heard the adage ‘Never ship the prototype.’ Well, that’s what’s happening: people hack things out and never rewrite them for clarity and efficiency.”

阅读(285) | 评论(0) | 转发(0) |

上一篇：Shell 中大括号（curly brackets）的含义

下一篇：没有了

给主人留下些什么吧！~~

感谢所有关心和支持过ChinaUnix的朋友们

16024965号-6