全部博文(45)
分类: Python/Ruby
2008-06-30 15:28:19
Suppose you have a list in python that looks like this:
and you want to remove all duplicates so you get this result:
How do you do that? ...the fastest way? I wrote a couple of alternative implementations and did a quick benchmark loop on the various implementations to find out which way was the fastest. (I haven't looked at memory usage). The slowest function was 78 times slower than the fastest function.
However, there's one very important difference between the various functions. Some are order preserving and some are not. For example, in an order preserving function, apart from the duplicates, the order is guaranteed to be the same as it was inputted. Eg, uniqify([1,2,2,3])==[1,2,3]
Here are the functions:
And what you've all been waiting for (if you're still reading). Here are the results:
Clearly f5
is the "best" solution. Not only is it really really fast; it's also order preserving and supports an optional transform function which makes it possible to do this:
UPDATE
From the comments I've now added a couple of more functions to the benchmark. Some which don't support uniqify a list of objects that can't be hashed unless passed with a special hashing method. So see all the functions
Here are the new results:
(f2 and f4) were too slow for this testdata.
Keep in mind you can also use:
>>> lst = [1, 1, 3, 4, 4, 5, 6, 7, 6]
>>> set(lst)
set([1, 3, 4, 5, 6, 7])
Isn't that what f6 does, apart from the final conversion to a list again?
Right. I totally missed f6()