分类: Python/Ruby
2009-04-11 07:58:37
>>> long
Traceback (most recent call last):
File "", line 1, in
NameError: name 'long' is not defined
>>> 5L
5L
>>> 5L
File "", line 1
5L
^
SyntaxError: invalid syntax
>>> x = 5 ** 88
>>> type(x)
>>> x
32311742677852643549664402033982923967414535582065582275390625L
>>> x = 5 ** 88
>>> type(x)
>>> x
32311742677852643549664402033982923967414535582065582275390625
>>> int('000111', 2)
7
>>> int('000111', 1)
Traceback (most recent call last):
File "", line 1, in
ValueError: int() base must be >= 2 and <= 36
>>> int('000111', 36)
1333
TypeError: long() can't convert non-string with explicit base
>>> long('555555555555555555555555555555555555555', 6)
2227915756473955677973140996095L
>>> int('0001111', 2)
15
>>> int('5', 36)
5
>>> int('5', 37)
Traceback (most recent call last):
File "", line 1, in
ValueError: int() arg 2 must be >= 2 and <= 36
>>> 010
8
>>> 010 + 8
16
>>> 0xa
10
>>> 0xa + 010 + 2
20
>>> oct(20)
'024'
>>> hex(20)
'0x14'
>>> 0b10
2
>>> 0o10
8
>>> 0x10
16
>>> bin(5)
'0b101'
>>> bin(0x10)
'0b10000'
>>> bin(0o10)
'0b1000'
>>> oct(12)
'0o14'
>>> 5 / 2
2
>>> -5 / 2
-3
>>> 5.0 / 2
2.5
>>> -5 / 2.0
-2.5
>>> complex(5, 0) / 2
(2.5+0j)
def average(*numbers):
return sum(numbers) / len(numbers)
>>> average(1,4)
2
>>> x=-0.0
>>> x
-0.0
>>> x + 0.0
0.0
def average(*numbers):
return sum([n * 1.0 for n
in numbers]) / len(numbers)
>>> average(1,4)
2.5
>>> 5 / 2
2.5
>>> -5 / 2
-2.5
>>> 4 / 2
2.0
>>> complex(5, 0) / 2
(2.5+0j)
>>> 5 // 2
2
>>> 5.0 // 2
2.0
>>> complex(5,0) // 2
Traceback (most recent call last):
File "", line 1, in
TypeError: can't take floor of a complex number.
from __future__ import division
The float() function that turns strings into floating point numbers now understands nan, +inf (or inf), and -inf and turns them into the Not A Number, Positive and Negative Infinity IEEE 754 values. (Case doesn't matter, so NaN, INF, etc., are valid too.)
The math module now has the functions isnan() and isinf(). The isinf() function doesn't distinguish between inf, +inf and -inf. Here are some examples:
>>> float('nan')
nan
>>> float('NaN') # Any case works
nan
>>> float('+inf')
inf
>>> float('-inf')
-inf
>>> float('INF')
inf
>>> float('nan') + float('inf')
nan
>>> float('inf') + float('-inf')
nan
>>> float('inf') - float('-inf')
inf
>>> import math
>>> math.isnan(float('nan'))
True
>>> math.isinf(float('inf'))
True
>>> math.isinf(float('-inf'))
True
>>> math.isinf(float('nan'))
False
>>> math.isnan(float('-inf'))
False
The math module has now also a copysign(x, y) function that returns the absolute value of x with the sign of y. I don't understand why this function exists instead of a simple sign() function that returns -1, 1 or a couple of ispositive(), isnegative() functions. The documentation is very succinct:
>>> help(math.copysign)
Help on built-in function copysign in module math:
copysign(...)
copysign(x,y)
Return x with the sign of y.
However, copysign works as advertised except for NaN. If you try to copy the sign of NaN you get inconsistent results—a negative sign on Mac OS X and a positive sign on Windows. A says this behavior is OK. I disagree. NaN is not a number, and as such has no sign. Trying to copy the sign of NaN is like trying to copy the sign of any other non-number value (string, list, object) and should result in an exception.
Some other functions related to floating point numbers were added to the math module too. math.fsum() adds up the stream of numbers from an iterable, and is careful to avoid loss of precision by using partial sums (unlike the built-in sum() function). If any of the numbers are NaN, the result is NaN. If the partial sum reaches +inf or -inf, the sum() function returns that as the result. The math.fsum() function raises an OverflowError exception, which is more in the spirit of IEEE 754:
>>> sum([1e308, 1, -1e308])
0.0
>>> math.fsum([1e308, 1, -1e308])
1.0
>>> sum([1e100, 1, -1e100, -1])
-1.0
>>> math.fsum([1e100, 1, -1e100, -1])
0.0
>>> x = [1e308, 1e308, -1e308]
>>> sum(x)
inf
>>> math.fsum(x)
Traceback (most recent call last):
File "", line 1, in
OverflowError: intermediate overflow in fsum
>>> sum([float('nan'), 3.3])
nan
>>> math.fsum([float('nan'), -float('nan')])
nan
The functions acosh(), asinh(), and atanh() compute inverse hyperbolic functions. The log1p() function returns the natural logarithm of 1+x (base e). The trunc() function rounds a number toward zero, returning the closest integer value:
>>> math.acosh(30)
4.0940666686320855
>>> math.acosh(1)
0.0
>>> math.asinh(1)
0.88137358701954305
>>> math.asinh(0)
0.0
>>> math.atanh(0.5)
0.54930614433405489
>>> math.log1p(2)
1.0986122886681098
>>> math.trunc(-1.1)
-1
>>> math.trunc(-1.9)
-1
>>> math.trunc(1.1)
1
>>> math.trunc(1.9)
1
>>> math.trunc(3.0)
3
You can convert floating-point numbers to or from hexadecimal strings. The conversion functions convert floats to and from a string representation without introducing rounding errors from the conversion between decimal and binary (if there are enough digits to represent the number fully). Floats have a hex() method that returns a string representation, while the float.fromhex() method converts a string back into a number (as accurately as possible):
>>> x = 4.2
>>> a.hex()
'0x1.0cccccccccccdp+2'
>>> float.fromhex('0x1.0cccccccccccdp+2')
4.2000000000000002
The decimal module was updated to version 1.66 of the General Decimal Specification. New features include some methods for some basic mathematical functions such as exp() and log10():
>>> Decimal(1).exp()
Decimal("2.718281828459045235360287471")
>>> Decimal("2.7182818").ln()
Decimal("0.9999999895305022877376682436")
>>> Decimal(1000).log10()
Decimal("3")
The as_tuple() method of Decimal objects now returns a named tuple (more on named tuples in future article) with sign, digits, and exponent fields:
>>> Decimal('-3.3').as_tuple()
DecimalTuple(sign=1, digits=(3, 3), exponent=-1)
A new variable in the sys module, float_info, is an object that contains information derived from the float.h file about the platform's floating-point support:
>>> sys.float_info
sys.floatinfo(max=1.7976931348623157e+308, max_exp=1024, max_10_exp=308,
min=2.2250738585072014e-308, min_exp=-1021, min_10_exp=-307, dig=15,
mant_dig=53, epsilon=2.2204460492503131e-16, radix=2, rounds=1)
Overall, Python has definitely elevated its level of support for floating point numbers—but it is not perfect yet. The external package is still the best tool for serious number
>>> issubclass(int, numbers.Integral)
True
>>> issubclass(float, numbers.Real)
True
>>> issubclass(complex, numbers.Complex)
True
>>> issubclass(complex, numbers.Real)
False
The Rational ABC is implemented by the fractions.Fraction type from the new fractions module. This module has also a gcd method for finding the greatest common denominator and a couple of conversion functions: from_float() and from_decimal(). It would have been cleaner and more consistent to add a rational numeric type to the language, but that ended up as a class in a module: >>> issubclass(fractions.Fraction, numbers.Rational)
True
>>> fractions.gcd(Fraction(1,3), Fraction(1,2))
Fraction(1, 6)
>>> fractions.gcd(Fraction(2,6), Fraction(2,3))
Fraction(1, 3)
>>> fractions.gcd(6,9)
3
>>> Fraction.from_float(0.5)
Fraction(1, 2)
>>> import math
>>> Fraction.from_float(math.pi)
Fraction(884279719003555, 281474976710656)
>>> Fraction.from_decimal(decimal.Decimal('3.5'))
Fraction(7, 2)
The Decimal class from the decimal module for fixed point and floating point arithmetic (introduced in Python 2.4) doesn't participate in the party, and doesn't implement any of the number ABCs. The PEP-3141 mentions that, after consulting the authors of the decimal module, it was decided it was better not to integrate it at this time. def square(x):
return x * x
The preceding code will fail to return a positive result if you call it with a complex number: >>> square(complex(0,1))
(-1+0j)
You could try an explicit check to ensure that the argument is an int or float, but that's somewhat clunky—especially if you want to support rational numbers too: >>> def square(x):
... assert type(x) == int or \
... type(x) == float or \
... type(x) == Fraction
... return x * x
...
>>> from fractions import Fraction
>>> square(5)
25
>>> square(4.4)
19.360000000000003
>>> square(Fraction(1,3))
Fraction(1, 9)
>>> square(complex(0,1))
Traceback (most recent call last):
File "", line 1, in
File "", line 4, in square
AssertionError
You may also come up with your own numeric types by either subclassing one of the existing types, or one of the ABCs from the numbers module. Then you'll be able to pass your new numbers to code that expects those specific types, and checks its arguments based on an ABC. This can be useful, for example, when you work with integers in a limited domain, or with real numbers that have fixed point semantics but limited precision. I won't give a full-fledged example here, because you need to implement a lot of methods to comply with any of the ABCs. >>> s = 'hello'
>>> u = u'\u05e9\u05dc\u05d5\u05dd'
>>> type(s)
>>> type(u)
Both str and unicode were derived from a common base class called "basestring:"
>>> unicode.__bases__
(,)
>>> unicode.__base__
>>> str.__bases__
(,)
In Python 3.0, all strings are Unicode. The str type has the same semantics as unicode in Python 2.x, and there is no separate unicode type. The basestring base class is gone as well:
>>> s = '\u05e9\u05dc\u05d5\u05dd'
>>> type(s)
Instead of Python 2.x's byte string there are now two types: bytes and bytearray. There are both immutable and mutable versions of a byte array. The bytes type supports a large number of string-like methods, as shown below:
>>> dir(bytes)
['__add__', '__class__', '__contains__',
'__delattr__', '__doc__', '__eq__', '__format__',
'__ge__', '__getattribute__', '__getitem__',
'__getnewargs__', '__gt__', '__hash__', '__init__',
'__iter__', '__le__', '__len__', '__lt__',
'__mul__', '__ne__', '__new__', '__reduce__',
'__reduce_ex__', '__repr__', '__rmul__',
'__setattr__', '__sizeof__', '__str__', '__subclasshook__',
'capitalize', 'center', 'count', 'decode', 'endswith',
'expandtabs', 'find', 'fromhex', 'index', 'isalnum',
'isalpha', 'isdigit', 'islower', 'isspace', 'istitle',
'isupper', 'join', 'ljust', 'lower', 'lstrip',
'partition', 'replace', 'rfind', 'rindex', 'rjust',
'rpartition', 'rsplit', 'rstrip', 'split', 'splitlines',
'startswith', 'strip', 'swapcase', 'title',
'translate', 'upper', 'zfill']
The bytearray type also has the following mutating methods: extend(), insert(), append(), reverse(), pop(), and remove().
They also support the + and * operators (using the same semantics as strings) and the bytearray type also supports += and *=.
You can't convert to or from str without explicit encoding, because neither bytes nor bytearray know about encoding, and str objects must have an encoding. If you try to pass a bytes or bytearray object directly to str() you will get a result of repr(). To convert you must use the decode() method:
>>> a = bytearray(range(48, 58))
>>> a
bytearray(b'0123456789')
>>> s = str(a)
>>> s
"bytearray(b'0123456789')"
>>> s = a.decode()
>>> s
'0123456789'
To convert from a string to bytes or bytearray you must use the string's encode() method or provide an encoding to the constructor of the bytes or bytearray object:
>>> s = '1234'
>>> s
'1234'
>>> b = bytes(s)
Traceback (most recent call last):
File "", line 1, in
TypeError: string argument without an encoding
>>> b = s.encode()
>>> b
b'1234'
>>> b = bytes(s, 'ascii')
>>> b
b'1234'
The string representation has changed too. In Python 2.x the return type of repr() was str, which was an ASCII-based string. In Python 3.0 the return type is still str, but it's now a Unicode string. The default encoding of the string representation is determined by the output device.
PEP 3101: Advanced String Formatting
Python 3.0 brings a powerful new way to format strings that's based on Microsoft's .NET composite formatting (an excellent choice). I have used the string formatting facilities of many programming languages, but the C# formatting (which uses .NET composite formatting) experience was the best by far. It was powerful, flexible, consistent, and well documented.In Python 2.x you can format strings using the % operator or using string.Template. The % operator is convenient; when you want to format only a single argument, you can pass it as is:
import time
>>> time.localtime()
(2008, 12, 31, 10, 32, 16, 2, 366, 0)
>>> 'The current year is %d' % time.localtime()[0]
'The current year is 2008'
To format multiple arguments, you must pack them in a tuple or list:
>>> t = time.localtime()
>>> 'Day: %d, Month: %d, Year: %d' % (t[2], t[1], t[0])
'Day: 31, Month: 12, Year: 2008'
With the tuple/list approach you must specify the arguments in the exact order they will be formatted. Also, if you want the same value to appear multiple times you must format it multiple times:
>>> s = 'The solution to the square of %d is: %d * %d = %d'
>>> s % (5, 5, 5, 5 * 5)
'The solution to the square of 5 is: 5 * 5 = 25'
Alternatively, you can pass a dictionary and specify the dictionary keys in the format string:
>>> d = dict(n=5, result=5 * 5)
>>> s = 'The solution to the square of %(n)d is: %(n)d * %(n)d = %(result)d'
>>> s % d
'The solution to the square of 5 is: 5 * 5 = 25'
As you can see, the dictionary approach lets you specify repeating values just once, but at a high price; it's both more complicated than format string, and requires preparation of the dict rather than simply passing values.
Finally there is also the string.Template class. You use this to prepare compiled templates that you can apply multiple times to different values efficiently, because the format string itself must be parsed only once. This especially important for use cases such as templated web pages or code generation scenarios, where the test results can be large, and parsing the format string can be expensive. The format string is a little different. Named values are preceded by a $ sign and optionally enclosed in curly braces to distinguish them from the surrounding text:
>>> s = 'The solution to the square of ${n} is: ${n} * ${n} = ${result}'
>>> t = string.Template(s)
>>> for i in range(1, 7):
... d = dict(n=i, result=i * i)
... print t.substitute(d)
...
The solution to the square of 1 is: 1 * 1 = 1
The solution to the square of 2 is: 2 * 2 = 4
The solution to the square of 3 is: 3 * 3 = 9
The solution to the square of 4 is: 4 * 4 = 16
The solution to the square of 5 is: 5 * 5 = 25
The solution to the square of 6 is: 6 * 6 = 36
Python 3.0 added a new formatting method called format to the string class. It is intended to replace the % formatting of short format strings and not the string.Template formatting, because it doesn't compile its format string. The format() method understands both positional and keyword arguments within a single format string. You enclose substitution fields in the format string in curly braces. You can reuse the same positional argument multiple times in different fields:
>>> s = 'Addition is commutative. For example: {0} + {1} = {1} + {0}'
>>> s.format(5, 7)
'Addition is commutative. For example: 5 + 7 = 7 + 5'
>>> s.format(4, 3, result=3 * 4)
'4 multiplied by 3 is 12'
You can escape curly braces by doubling them:
>>> '{0} "{{", {1} "}}"'.format('open curly:', 'closed curly:')
'open curly: "{", closed curly: "}"'
The format() method supports both simple fields, which are either strings or base-10 integers, and compound fields. Compound fields are quite useful because they allow you to access object attributes or elements of arrays:
>>> import fractions
>>> r = fractions.Fraction(5, 4)
>>> '{0.numerator} / {0.denominator}'.format(r)
'5 / 4'
>>> 'Day: {0[2]}, Month: {0[1]}, Year: {0[0]}'.format(time.localtime())
'Day: 31, Month: 12, Year: 2008'
The ability to access attributes and array elements simplifies their use because a developer needs to provide only the object or tuple/list/array, not break it up and arrange the parts in the right order. Compare the preceding example to the Python 2.x version presented earlier.
Unlike some templating languages, you may not use arbitrary Python expressions in the format strings. The Python 3.0 format string is limited to objects, attributes, and indexing into tuples/arrays/lists.
The format() method supports a wide array of format specifiers for fine-tuning the display of formatted fields. You separate format specifiers from the field name with a colon (:) character:
'The "{0:10}" is right padded to 10 characters'.format('field')
'The "field " is right-padded to 10 characters'
Objects may define and accept their own format specifiers in the __format__ method (see below), but Python also has a large selection of standard specifiers that apply to every object. The general form of a standard format specifier is:
[[fill]align][sign][#][0]
[minimumwidth][.precision][type]
There are many fine details and constraints. Some format specifiers make sense only for numeric types, or only if other specifiers exist. There are many display options for integers and real numbers, for example:
>>> '{0:@^8.4}'.format(1 / 3)
'@0.3333@'
Ok, what happened here? The ampersand (@) is the fill character. The alignment is centered (^). The precision is 4 and the minimum width is 8, so the number was formatted to have four significant digits (0.3333). The zero and the decimal dot took two other characters, so two additional @ characters were added as padding to get a centered display of eight characters. All this is similar to Python 2.x's % formatting, but much more flexible and powerful.
The real power of the new string formatting becomes evident for custom formatting, which you define by implementing the __format__() method. The signature is:
def __format__(self, format_spec):
...
Suppose you want to have a ColorString class that can format itself to be displayed in different colors. To print colored text (and much more) to the screen in Python you can use on Linux and Mac OS X. On 32-bit Windows you need to use the SetConsoleTextAttribute() API.
Author's Note: The code presented here will not work properly on Windowsit will just print junk characters around the original text instead of changing the colors.
So to print some red text type:
print('\033[31mRed Text\033[0m')
The escape sequence starts with the ESC+[ (also known as the Control Sequence Introducer). The ESC character is non-printable, and can also be written as chr(27) or \x1b (hex notation). Note that the 033 is octal notation for 27. The 31m following the \033[is the incantation used to change the text color to red. The actual text (Red Text) is next, and finally, another incantation restores the colors to their default (\033[0m). Although Python itself has switched its octal notation from 0(number} to 0O{number} the ANSI escape sequences tap into terminal facilities that still use the 0{number} notation.
You can do a lot with the escape sequences, such as change text and background color, move the cursor around the screen (to print in a specific location), erase parts of the screen, hide/show the cursor, and scroll the screen buffer. The examples here focus on changing colors only.
Here's a little module containing a function called colorize() that accepts three arguments: a string, a text color, and a background color. It then wraps the string with the appropriate ANSI escape sequence. First, it prepares a small global dictionary containing all the colors and background colors mapped from a string to their ANSI escape code. The function itself checks whether a color and/or background color were provided by name such as red or green, finds the corresponding codes in the dictionary, and prepares a proper escape sequence to change the colors to the requested colors. Finally, it resets everything to normal. The code shown here has no error checking, so if you request a color name that doesn't exist you will get a KeyError exception:
colors = ['black', 'red', 'green', 'orange', 'blue', 'magenta', 'cyan', 'white']
color_dict = {}
for i, c in enumerate(colors):
color_dict[c] = (i + 30, i + 40)
def colorize(text, color=None, bgcolor=None):
c = None
bg = None
if color is not None:
c = color_dict[color][0]
if bgcolor is not None:
bg = color_dict[bgcolor][1]
s = ''
if c is not None:
s = '\033[%dm' % c
if bg is not None:
s += '\033[%dm' % bg
return '%s%s\033[0m' % (s, text)
You can experiment with this to print various colored strings on colored backgrounds. Here's an example that prints white text on a magenta background:
print(colorize('White on Magenta', 'white', 'magenta'))
This code and the colorize module work in both Python 2.x and 3.0.
With the colorize() function under your belt you can create the ColorString class that formats itself in color. The basic idea is to subclass the built in str class and add a __format__() method that takes the format_spec and passes it as the text color to the colorize() function, which returns the wrapped string:
class ColorString(str):
def __format__(self, format_spec):
s = colorize(self, format_spec)
return s
This implementation lets you change only the text color and not the background, but it makes the format very simple (you just supply the color name). Here is ColorString in action. First, the example prepares a list of ColorString words by splitting a simple sentence ("Yeah, it works!") and then prints each word in a different color, by specifying a format string with the colors red, green, and blue:
words = [ColorString(x) for x in 'Yeah, it works!'.split()]
print('{0:red} {1:green} {2:blue}'.format(*words))
Python 3.0 also has a global format() function used to format single objects. It simply calls the object's __format__() method. Here it is at work with ColorString:
>>> format(ColorString('Gigi'), 'red')
'\x1b[31mGigi\x1b[0m'
This subclassing scheme works fine, but it feels a bit cumbersome to create a special class with a __format__() method whenever you want some custom formatting. In addition, the subclassing scheme requires developers to construct special objects such as ColorString to take advantage of the formatting. Fortunately, you can go even further by implementing your own formatter classes and using them to format any type. For example, it would be convenient to just be able to print text in any color you want. The next example shows a class called ColorFormatter, which subclasses the string.Formatter class and overrides the format_field method. The override colorizes the field if it finds the format_spec in the colors list, or just applies the default formatting by calling Formatter.format_field():
from string import Formatter
class ColorFormatter(Formatter):
def format_field(self, value, format_spec):
if format_spec in colors:
return colorize(value, format_spec)
else:
return Formatter.format_field(self, value, format_spec)
To use a custom formatter you need to instantiate it and then call its format() method to get the formatted string. To make it even more streamlined I assigned the bound format method to a variable named f, so it's easier to use:
formatter = ColorFormatter()
f = formatter.format
print(f('{0:cyan} works very {1:orange}.', 'ColorFormatter', 'well'))
If you have a list of field values or dictionary with named fields you can use the vformat() method, which takes a list for positional arguments and a dictionary for keyword arguments:
formatter = ColorFormatter()
f = formatter.vformat
args = ['The', 'vformat()']
kwargs = dict(m='method', t='too')
print(f('{0:red} {1:blue} {m:green} works {t:magenta}', args, kwargs)
If you are a Windows developer looking for a little Python 3.0 homework a good exercise would be to implement the color ANSI escape codes for Windows by replacing the new print function. Your replacement print function should scan the text to print looking for ANSI escape sequences, parse them, and apply the proper color setting using the SetConsoleTextAttribute() API.
This article showed a wide range of examples detailing how the deep changes in Python 3.0 affect data types, math operations, and string formatting. Beyond these, Python 3.0 also made significant changes to the standard library, which you'll explore in the next article in this series.