ArrayList and LinkedList are two
Collections classes used for storing lists of object references. For
example, you could have an ArrayList of Strings, or a LinkedList of
Integers. This tip compares the performance of ArrayList and
LinkedList, and offers some suggestions about which of these classes is
the right choice in a given situation.
The first key point is
that an ArrayList is backed by a primitive Object array. Because of
that, an ArrayList is much faster than a LinkedList for random access,
that is, when accessing arbitrary list elements using the get method.
Note that the get method is implemented for LinkedLists, but it
requires a sequential scan from the front or back of the list. This
scan is very slow. For a LinkedList, there's no fast way to access the
Nth element of the list.
Consider the following example.
Suppose you have a large list of sorted elements, either an ArrayList
or a LinkedList. Suppose too that you do a binary search on the list.
The standard binary search algorithm starts by checking the search key
against the value in the middle of the list. If the middle value is too
high, then the upper half of the list is eliminated. However, if the
middle value is too low, then the lower half of the list is ignored.
This process continues until the key is found in the list, or until the
lower bound of the search becomes greater than the upper bound.
Here's a program that does a binary search on all the elements in an ArrayList or a LinkedList:
import java.util.*;
public class ListDemo1 {
static final int N = 10000;
static List values;
// make List of increasing Integer values
static {
Integer vals[] = new Integer[N];
Random rn = new Random();
for (int i = 0, currval = 0; i < N; i++) {
vals[i] = new Integer(currval);
currval += rn.nextInt(100) + 1;
}
values = Arrays.asList(vals);
}
// iterate across a list and look up every
// value in the list using binary search
static long timeList(List lst) {
long start = System.currentTimeMillis();
for (int i = 0; i < N; i++) {
// look up a value in the list
// using binary search
int indx = Collections.binarySearch(
lst, values.get(i));
// sanity check for result
// of binary search
if (indx != i) {
System.out.println(
"*** error ***\n");
}
}
return System.currentTimeMillis() - start;
}
public static void main(String args[]) {
// do lookups in an ArrayList
System.out.println("time for ArrayList = " +
timeList(new ArrayList(values)));
// do lookups in a LinkedList
System.out.println(
"time for LinkedList = " +
timeList(new LinkedList(values)));
}
}
The
ListDemo1 program sets up a List of sorted Integer values. It then adds
the values to an ArrayList or a LinkedList. Then
Collections.binarySearch is used to search for each value in the list.
When you run this program, you should see a result that looks something like this:
time for ArrayList = 31
time for LinkedList = 4640
ArrayList
is about 150 times faster than LinkedList. (Your results might differ
depending on your machine characteristics, but you should see a
distinct difference in the result for ArrayList as compared to that for
LinkedList. The same is true for the other programs in this tip.)
Clearly, LinkedList is a bad choice in this situation. The binary
search algorithm inherently uses random access, and LinkedList does not
support fast random access. The time to do a random access in a
LinkedList is proportional to the size of the list. By comparison,
random access in an ArrayList has a fixed time.
You can use the RandomAccess marker interface to check whether a List supports fast random access:
void f(List lst) {
if (lst instanceof RandomAccess) {
// supports fast random access
}
}
ArrayList
implements the RandomAccess interface, and LinkedList. does not. Note
that Collections.binarySearch does take advantage of the RandomAccess
property, to optimize searches.
Do these results prove that
ArrayList is always a better choice? Not necessarily. There are many
cases where LinkedList does better. Also note that there are many
situations where an algorithm can be implemented efficiently for
LinkedList. An example is reversing a LinkedList using
Collections.reverse. The internal algorithm does this, and gets
reasonable performance, by using forward and backward iterators.
Let's
look at another example. Suppose you have a list of elements, and you
do a lot of element inserting and deleting to the list. In this case,
LinkedList is the better choice. To demonstrate that, consider the
following "worst case" scenario. In this demo, a program repeatedly
inserts elements at the beginning of a list. The code looks like this:
import java.util.*;
public class ListDemo2 {
static final int N = 50000;
// time how long it takes to add
// N objects to a list
static long timeList(List lst) {
long start = System.currentTimeMillis();
Object obj = new Object();
for (int i = 0; i < N; i++) {
lst.add(0, obj);
}
return System.currentTimeMillis() - start;
}
public static void main(String args[]) {
// do timing for ArrayList
System.out.println(
"time for ArrayList = " +
timeList(new ArrayList()));
// do timing for LinkedList
System.out.println(
"time for LinkedList = " +
timeList(new LinkedList()));
}
}
When you run this program, the result should look something like this:
time for ArrayList = 4859
time for LinkedList = 125
These results are pretty much the reverse of the previous example.
When
an element is added to the beginning of an ArrayList, all of the
existing elements must be pushed back, which means a lot of expensive
data movement and copying. By contrast, adding an element to the
beginning of a LinkedList simply means allocating an internal record
for the element and then adjusting a couple of links. Adding to the
beginning of a LinkedList has fixed cost, but adding to the beginning
of an ArrayList has a cost that's proportional to the list size.
So
far, this tip has looked at speed issues, but what about space? Let's
look at some internal details of how ArrayList and LinkedList are
implemented in Java 2 SDK, Standard Edition v 1.4. These details are
not part of the external specification of these classes, but are
illustrative of how such classes work internally.
The LinkedList class has a private internal class defined like this:
private static class Entry {
Object element;
Entry next;
Entry previous;
}
Each
Entry object references a list element, along with the next and
previous elements in the LinkedList -- in other words, a doubly-linked
list. A LinkedList of 1000 elements will have 1000 Entry objects linked
together, referencing the actual list elements. There is significant
space overhead in a LinkedList structure, given all these Entry
objects.
An ArrayList has a backing Object array to store the
elements. This array starts with a capacity of 10. When the array needs
to grow, the new capacity is computed as:
newCapacity = (oldCapacity * 3) / 2 + 1;
Notice
that the array capacity grows each time by about 50%. This means that
if you have an ArrayList with a large number of elements, there will be
a significant amount of space wasted at the end. This waste is
intrinsic to the way ArrayList works. If there was no spare capacity,
the array would have to be reallocated for each new element, and
performance would suffer dramatically. Changing the growth strategy to
be more aggressive (such as doubling the size at each reallocation)
would result in slightly better performance, but it would waste more
space.
If you know how many elements will be in an ArrayList,
you can specify the capacity to the constructor. You can also call the
trimToSize method after the fact to reallocate the internal array. This
gets rid of the wasted space.
So far, this discussion has
assumed that either an ArrayList or a LinkedList is "right" for a given
application. But sometimes, other choices make more sense. For example,
consider the very common situation where you have a list of key/value
pairs, and you would like to retrieve a value for a given key.
You
could store the pairs in an N x 2 Object array. To find the right pair,
you could do a sequential search on the key values. This approach
works, and is a useful choice for very small lists (say 10 elements or
less), but it doesn't scale to big lists.
Another approach is
to sort the key/value pairs by ascending key value, store the result in
a pair of ArrayLists, and then do a binary search on the keys list.
This approach also works, and is very fast. Yet another approach is to
not use a list structure at all, but instead use a map structure (hash
table), in the form of a HashMap.
Which is faster, a binary search on an ArrayList, or a HashMap? Here's a final example that compares these two:
import java.util.*;
public class ListDemo3 {
static final int N = 500000;
// Lists of keys and values
static List keys;
static List values;
// fill the keys list with ascending order key
// values and fill the values list with
// corresponding values (-key)
static {
Integer keyvec[] = new Integer[N];
Integer valuevec[] = new Integer[N];
Random rn = new Random();
for (int i = 0, currval = 0; i < N; i++) {
keyvec[i] = new Integer(currval);
valuevec[i] = new Integer(-currval);
currval += rn.nextInt(100) + 1;
}
keys = Arrays.asList(keyvec);
values = Arrays.asList(valuevec);
}
// fill a Map with key/value pairs
static Map map = new HashMap();
static {
for (int i = 0; i < N; i++) {
map.put(keys.get(i), values.get(i));
}
}
// do binary search lookup of all keys
static long timeList() {
long start = System.currentTimeMillis();
for (int i = 0; i < N; i++) {
int indx = Collections.binarySearch(
keys, keys.get(i));
// sanity check of returned value
// from binary search
if (indx != i) {
System.out.println(
"*** error ***\n");
}
}
return System.currentTimeMillis() - start;
}
// do Map lookup of all keys
static long timeMap() {
long start = System.currentTimeMillis();
for (int i = 0; i < N; i++) {
Integer value = (Integer)map.get(
keys.get(i));
// sanity check of value returned
// from map lookup
if (value != values.get(i)) {
System.out.println(
"*** error ***\n");
}
}
return System.currentTimeMillis() - start;
}
public static void main(String args[]) {
// do timing for List implementation
System.out.println("List time = " +
timeList());
// do timing for Map implementation
System.out.println("Map time = " +
timeMap());
}
}
The
program sets up Lists of keys and values, and then uses two different
techniques to map keys to values. One approach uses a binary search on
a list, the other a hash table.
When you run the ListDemo3 program, you should get a result that looks something like this:
ArrayList time = 1000
HashMap time = 281
In
this example, N has a value of 500000. Approximately, log2(N) - 1
comparisons are required in an average successful binary search, so
each binary search lookup in the ArrayList will take about 18
comparisons. By contrast, a properly implemented hash table typically
requires only 1-3 comparisons. So you should expect the hash table to
be faster in this case.
However, binary search is still
useful. For example, you might want to do a lookup in a sorted list and
then find keys that are close in value to the key used for the lookup.
Doing this is easy with binary search, but impossible in a hash table.
Keys in a hash table are stored in apparent random order. Also, if you
are concerned with worst-case performance, the binary search algorithm
offers a much stronger performance guarantee than a hash table scheme.
You might also consider using TreeMap for doing lookups in sorted
collections of key/value pairs.
Let's summarize the key points presented in this tip:
Appending
elements to the end of a list has a fixed averaged cost for both
ArrayList and LinkedList. For ArrayList, appending typically involves
setting an internal array location to the element reference, but
occasionally results in the array being reallocated. For LinkedList,
the cost is uniform and involves allocating an internal Entry object.
Inserting
or deleting elements in the middle of an ArrayList implies that the
rest of the list must be moved. Inserting or deleting elements in the
middle of a LinkedList has fixed cost.
A LinkedList does not support efficient random access
An
ArrayList has space overhead in the form of reserve capacity at the end
of the list. A LinkedList has significant space overhead per element.
Sometimes a Map structure is a better choice than a List.
阅读(1159) | 评论(0) | 转发(0) |