Saturday, June 23rd, 2007
Perl’s sort
function is very useful in its simplest form, but it can become very powerful quickly if you learn how to use the extra functionality that it provides. Here we look at the simplest cases of sorting, then we go on to examine ways to perform custom sorting, and finally we look at an example that uses sort
on a more complex data structrure.
Prerequisites Different parts of this article assume different levels of expertise, but in order to understand everything, you need to know about arrays and references. Also, it would be nice (but not essential) to have encountered the join
function before.
See also The official documentation for sort
.
The Simplest Case
The simplest case of sort
would be something like that:
@theSortedList = sort @theList;
print join(‘, ‘, @theSortedList).“\n“;
Which produces the following output:
1, 2, 2, 4, 6, 65, 93
Note that Perl will sort a list in the “natural order” of the data, so if you have a list that contains both numbers and strings, it will be sorted in alphanumeric order:
@theSortedList = sort @theList;
print join(‘, ‘, @theSortedList).“\n“;
3, 7, apple, four, pear
A common misconception about sort has to do with the fact that the function does not have any effect on the original list. sort
does not sort the original list, instead it takes the elements of a list, sorts them and returns them as a new list. So, the following does nothing:
…and you probably meant to say that:
Custom Sorting
The default behaviour of sort is to sort numbers and strings in ascending order. This not always desirable, sometimes we would like to order things in descending order. Also, we might want to override the default behaviour of sorting in the most ‘natural’ way: we may want to sort a numerical list in an aplhanumerical manner (to look at the numbers as text).
The behaviour of sort
can be modified by providing a, usually short, block of Perl code before the list. This is how a descening sort would look like:
@theSortedList = sort {$b <=> $a} @theList;
print join(‘, ‘, @theSortedList).“\n“;
93, 65, 6, 4, 2, 2, 1
The block of code that customises the sorting is {$b <=> $a}
. The two variables $a
and $b
are not declared elsewhere in our program, they are both internal variables declared by sort
and they have a special meaning: they represent the two elements from list @theList
that are compared in pairs by sort
to determine how to order the list. The order of the two variables in this block of code is significant: $b
and $a
comes after and that causes the sorting to happen in descending order. The <=>
operator causes the comparison to happen numerically: $a
and $b
are compared as numbers.
Here is an example of descending aplhanumerical (string) sorting. The string comparison cmp
operator sould be used instead:
@theSortedList = sort {$b cmp $a} @theList;
print join(‘, ‘, @theSortedList).“\n“;
pear, four, apple, 7, 3
@theSortedList = sort {$a cmp $b} @theList;
print join(‘, ‘, @theSortedList).“\n“;
1, 100, 2, 2, 2000, 205, 4, 6, 65, 93
Yes, it looks crazy, but aplhanumerically, the string ‘205′ comes before the string ‘4′ because it starts with the character ‘2′ which is a ’smaller’ than character ‘4′.
Also, you can process the passed elements $a
and $b
before sorting them. This would prove useful for sorting a list of strings by ignoring the case of the strings involved by converting to lowercase (lc()
) before the comparison:
Finally, $a
and $b
can be used as lookups to another data structure. Here is an example of sorting the keys of a hash according to the values these keys are pointing to:
‘a‘ => 40,
‘b‘ => 45,
‘c‘ => 7,
‘d‘ => 100,
‘e‘ => 2
};
@sortedKeys = sort { $h{$a} <=> $h{$b} } keys %h;
print join(‘, ‘, @sortedKeys).“\n“;
e, c, a, b, d
@theList = qw( z g b aa c );
@theSortedList = sort backwards @theList;
print join(‘, ‘, @sortedKeys).“\n“;
z, g, c, b, aa
Dissecting Custom Sort
Before we continue, we have to take a closer look at the mechanism of customising sort
. The block of code that is used to compare the elements, is a Perl expression that is required to evaluate to -1, 0 or 1. This is very similar to how the cmp
and <=>
operators work. According to the Perl manual:
Binary “cmp” returns -1, 0, or 1 depending on whether the left argument is stringwise less than, equal to, or greater than the right argument.
and:
Binary “<=>” returns -1, 0, or 1 depending on whether the left argument is numerically less than, equal to, or greater than the right argument.
This essentially means that (10 <=> 2)
evaluates to 1, (4 <=> 4)
evaluates to 0, and (5 <=> 7)
evaluates to -1. The same applies to strings.
This inherent compatibility of sort
and the cmp
and <=>
operators means that usually the code block that customises sort
is just one comparison using one of the two operators, but actually the block can be any Perl expression that evaluates to -1, 0 or 1.
For example, suppose that we have a list where the elements are made up of a letter followed by a digit, and we would like to sort by digit, and in the cases where the digits are equal we would like to sort by the prefixing letter:
- @theList = qw( f2 b1 o9 a1 c2 g9 d8 );
- @theSortedList = sort {
- substr($a, 1, 1) <=> substr($b, 1, 1)
- ||
- substr($a, 0, 1) cmp substr($b, 0, 1)
- } @theList;
- print join(‘, ‘, @theSortedList).“\n“;
a1, b1, c2, f2, d8, g9, o9
In this example, the comparison logic is implemented in lines 3, 4 and 5. On line 3, the second characters (the numbers) of each element are extracted using the substr
function, and they are compared numerically. If the two numbers are found to be different, line 3 and in fact the whole block of code evaluates to -1 or 1, and the comparison is complete. Because of the or (||
) operator on line 4, the comparison continues only if the two numbers are equal, in which case line 3 evaluates to 0. In this case, line 5 is evaluated, which extracts the first characters of the each of the two elements and proceeds to compare them alphanumerically. This example demostrates that any sorting logic can be implemented as long as the result of the comparison is returned as -1, 0 or 1.
Sorting an array of arrays
Suppose you have the following array or arrays:[“aa“, “cc“, “b“],
[“cc“, “dd“, “ye“],
[“dd“, “cc“, “hoho“],
[“dd“, “aa“, “tq“],
[“aa“, “cc“, “a“]
];
aa | cc | b |
cc | dd | ye |
dd | cc | hoho |
dd | aa | tq |
aa | cc | a |
We would like to sort the array of arrays according to all the columns of the above table, giving priority to the first column, then looking at the second column and finally the third. This kind of sorting can be achieved with the following script:
- #!/usr/bin/perl -w
- use strict;
- use Data::Dumper;
- my $table = [
- [“aa“, “cc“, “b“],
- [“cc“, “dd“, “ye“],
- [“dd“, “cc“, “hoho“],
- [“dd“, “aa“, “tq“],
- [“aa“, “cc“, “a“]
- ];
- my @sorted_table = sort tableSorter @$table;
- print Dumper(@sorted_table);
- sub tableSorter ($$) {
- my($row1, $row2) = @_;
- my $column1comparison = $row1->[0] cmp $row2->[0];
- if($column1comparison != 0){
- return $column1comparison;
- }
- my $column2comparison = $row1->[1] cmp $row2->[1];
- if($column2comparison != 0){
- return $column2comparison;
- }
- return $row1->[2] cmp $row2->[2];
- }
On line 14 we sort using the tableSorter
subroutine which implements our comparison logic. Note that the prototype of this function has to be $$
(line 18). The two elements are passed to this subroutine by sort
on line 19, and these are no others than $a
and $b
, the special variables we discussed before. In this particular case the elements are references to the rows of our table. The first comparison is performed on lines 21 to 24 and if the sub-elements from column 1, the subroutine proceeds to use columnn 2 and then column 3 for subsequent comparisons.
The module Data::Dumper
is being used here (line 16) to produce a summary of the resulting sorted data structure, and it produces the following output:
$VAR1 = [ 'aa', 'cc', 'a' ]; $VAR2 = [ 'aa', 'cc', 'b' ]; $VAR3 = [ 'cc', 'dd', 'ye' ]; $VAR4 = [ 'dd', 'aa', 'tq' ]; $VAR5 = [ 'dd', 'cc', 'hoho' ];
Which is conceptually equivalent to the following correctly sorted table:
aa | cc | a |
aa | cc | b |
cc | dd | ye |
dd | aa | tq |
dd | cc | hoho |
- my $column1comparison = $row1->[0] cmp $row2->[0];
- if($column1comparison != 0){
- return $column1comparison;
- }
- my $column2comparison = $row1->[1] cmp $row2->[1];
- if($column2comparison != 0){
- return $column2comparison;
- }
- return $row1->[2] cmp $row2->[2];
$row1->[0] cmp $row2->[0]
|| $row1->[1] cmp $row2->[1]
|| $row1->[2] cmp $row2->[2];
- my @sorted_table = sort tableSorter @$table;
$a->[0] cmp $b->[0]
|| $a->[1] cmp $b->[1]
|| $a->[2] cmp $b->[2];
} @$table;
Where is the return statement?
One final clarification about sorter subroutines: Why does the above tableSorter
subroutine need a return
statement, while the anonymous sorter block of code (also a subroutine) does not require a return
? They both must return a value, so why the different syntax?
The syntax is in fact different, but it means exactly the same. There are two ways to return a value from a subroutine in Perl: you can either do it explicitly using a return statement, or, if there is no return statement, the value of the last statement of the subroutine is used as the return value. So the following subroutines are exactly equivalent:
my($a, $b) = @_;
return $a + $b;
}
my($a, $b) = @_;
$a + $b;
}
The last statement of the second subroutine is $a + $b
and this is what is returned, irrespective of the presence of a return statement. The following sort statement uses a custom subroutine which returns the value of $b <=> $a
, since $b <=> $a
is the last (and only) statement of the anonymous subroutine:
The following statement is exactly equivalent (but longer):
So the answer is that the syntax is different, but semantically the same: it makes no difference at all. We just tend to use the shorthand that omits the return
statement in the case of the anonymous sorter subroutine because it makes the sort line shorter and it fits the overall brevity. On the other hand, when you write full named subroutines, the most common way to return a value is by using the return statement.
Conclusion
As with almost every aspect of Perl, the sort
function can be used in various degrees of complexity. With sort
you can start small by using it in its most simple form, customise it a bit to suit your needs (for descending sorts etc), or go all the way to implement a sorting logic for more complex data structures.
Some points to remember: Don’t forget to do something with the sorted list (e.g. store it somewhere), because sort
itself does not affect the original list. Lists are sorted in the “natural” order of data, unless this behaviour is overridden by you. Custom sorter subroutines have to have the $$
prototype and they have to return -1, 0 or 1.
Good luck with sorting!
This is the most useful site for perl sort function I have come across. I will keep visiting
Your blog is interesting!
Keep up the good work!
very useful article for me
10x