Chinaunix首页 | 论坛 | 博客
  • 博客访问: 1199681
  • 博文数量: 185
  • 博客积分: 495
  • 博客等级: 下士
  • 技术积分: 1418
  • 用 户 组: 普通用户
  • 注册时间: 2012-09-02 15:12
个人简介

治肾虚不含糖,专注内核性能优化二十年。 https://github.com/KnightKu

文章分类

全部博文(185)

文章存档

2019年(1)

2018年(12)

2017年(5)

2016年(23)

2015年(1)

2014年(22)

2013年(82)

2012年(39)

分类: Python/Ruby

2013-03-11 16:23:46

原文地址:python不得不说的string 作者:kinfinger


上一篇文章讲解了python的一切都是对象,在实际开发中运用最多的就是字符串了,那么字符串又是什么情况呢?
我们知道在C中存在两种类型:字符与字符串,下面我们来看一看二者的区别


#include 
#include 
int main(int argc,char * argv []){
	/* definition of var */
	int i,len;
	char  ch,*chp;
	char cha[] = {'X','X','X','X','\\\\0'};
	char cha2[] = "XXXX";
	char * chp3=(char *) malloc(sizeof(char)*5);
	char * chp2;


	/* assignmemnt of var  and  action */
	ch = 'a';
	chp = "XXXX";  
	chp2=strdup(chp);
	for( i = 0; i < 4; i++)
		chp3[i] = i+65;
	chp3[4] = '\\\\0';
	printf("************************before change****************************\\\\n");
	printf("ch\\\\t%c\\\\ncha\\\\t%s\\\\ncha2\\\\t%s\\\\nchp\\\\t%s\\\\nchp2\\\\t%s\\\\nchp3\\\\t%s\\\\n",ch,cha,cha2,chp,chp2,chp3);
	printf("************************after change****************************\\\\n");
	ch = 'y';
	cha[3] = 'y'; 
	cha2[3] = 'y'; 
	/*	chp[3]='3'; */
	chp2[3] = 'y';
	chp3[3] = 'y';
	printf("ch\\\\t%c\\\\ncha\\\\t%s\\\\ncha2\\\\t%s\\\\nchp\\\\t%s\\\\nchp2\\\\t%s\\\\nchp3\\\\t%s\\\\n",ch,cha,cha2,chp,chp2,chp3);
	/*********add ********************/
	printf("notice of the difference between  cha and  cha2 : %d,%d\\\\n",strlen(cha),strlen(cha2));
	printf("notice of the difference between  cha and  cha2 : %d,%d\\\\n",sizeof(cha),sizeof(cha2));
	printf("****************************************************\\\\n");
}

编译并运行程序
************************before change****************************
ch      a
cha     XXXX
cha2    XXXX
chp     XXXX
chp2    XXXX
chp3    ABCD
************************after change****************************
ch      y
cha     XXXy
cha2    XXXy
chp     XXXX
chp2    XXXy
chp3    ABCy
notice of the difference between  cha and  cha2 : 4,4
notice of the difference between  cha and  cha2 : 5,5
**************************************************** 如果你把注释去掉,则在运行的时候报错,由此推知chp字符串常量是不能被修改的的,如果你熟悉C程序的内存布局,字符串常量字 .rdata,因此无法修改 
因此在C中除了静态初始化的字符串常量以后,字符都是可以被修改的。

在python中的情况又是怎么样呢?尝试类似的定义:

ch='a'

string='XXXX'
cha=['x','x','x','x']
print ch,string,cha,type(ch),type(string),type(cha),
print hex(id(cha)),hex(id(string)),hex(id(cha[3]))
# dive in string 
# modify  
ch='b'
# string[3]='b'
cha[3]='b'
print ch,string,cha,type(ch),type(string),type(cha),
print hex(id(cha)),hex(id(string)),hex(id(cha[3])) 

# string manipulation 
#iteration of string
for letter in string:
    print "".join(letter),

for letter in string:

    print ord(x),
# string index access 
print 
#string slice 
str=string[:2]
print str,string[2],len(str),type(str)

程序的运行结果:

a XXXX ['x', 'x', 'x', 'x'] 0x1671f30 0x1ec7400 0x15c4830
b XXXX ['x', 'x', 'x', 'b'] 0x1671f30 0x1ec7400 0x15c4e18
X X X X 88 88 88 88
XX X 2
有上面可以发现:在python不存在字符变量与字符数组变量;
类似于C的类型有字符串与list
由ch,string,cha都是对象,python中的一起都是对象,且string的immutable,即不可改变的变量
查看文档help(str)
Help on class str in module __builtin__:
class str(basestring)
 |  str(object) -> string
 |  
 |  Return a nice string representation of the object.
 |  If the argument is a string, the return value is the same object.
 |  
 |  Method resolution order:
 |      str
 |      basestring
 |      object
------------------------------

 |  __getitem__(...)


 |      x.__getitem__(y) <==> x[y]
 然后我们再看list的帮助文档:
 
help(list)
Help on class list in module __builtin__:
class list(object)
 |  list() -> new empty list
 |  list(iterable) -> new list initialized from iterable's items
 |  
仔细往下看
 |  __getitem__(...)
 |      x.__getitem__(y) <==> x[y]
 |  __setitem__(...)
 |      x.__setitem__(i, y) <==> x[i]=y
这样就可以看出,list的元素可以修改的,str的对象不能修改,如果你不确定元素能否修改,你可以通过帮助文档来快速的进行判断。
在着对str的slice比较感兴趣,从帮助文档看以看出
 |  __getslice__(...)
 |      x.__getslice__(i, j) <==> x[i:j]
 |      
 |      Use of negative indices is not supported.
可以看出该slice操作的范围为左开右闭,i为start position,j为end position
该操作和range函数有点类似,但是range的功能是产生一个int的list
上面的代码中,ord,len看起来很简单,但是还是有必要说一下,查看
help(ord)
Help on built-in function ord in module __builtin__:
ord(...)
    ord(c) -> integer
    
    Return the integer ordinal of a one-character string. help(len)
Help on built-in function len in module __builtin__:
len(...)
    len(object) -> integer
    
    Return the number of items of a sequence or mapping.
可以看到ord,len是一个函数,这里就不得不说python的函数式编程了,即在python中函数也是对象
函数主要有三种形式
函数定义  def fun():
函数对象 f=fun 
函数调用  f()=fun()
需要特别注意的是f,与f()是不同的。
函数的定义就不解释,相对来说大家比较熟悉
针对函数对象,这个和其它语言存在很大的不同,是python的特色(可能是我孤陋寡闻哈)
谈到对象,我们知道对象有三个属性,type,value,id
type(f)

可知该类型为函数或是method,至于id与value就不在多少,在查看文档的时候知道函数和方法都是callable,除了用户自定义的函数和方法外还有 
built_in函数与方法,generator function,class type ,classic class ,instance 都是callable,这些详细的介绍后续会给出,现在先讨论用户自定义函数
仅仅知道函数式对象还是不够,函数作为对象还有属性,请看下面的语句:

def myfun():

    """ this is document"""
#    global fun='this is global area'
    return 'mymodule'
x=myfun
print x.__doc__
print x.__name__
# x.__module__='module name '
print '%s' % x.func_doc
print '%s' % x.__module__
print x.func_code
print x.func_globals
print x.func_closure
print x.func_dict
print x()
y=ord
print y.__module__
程序输出: 
 this is document
myfun
 this is document
__main__

{'__builtins__': , '__file__': 'h:\\python\\myfun.py', 'myfun': , 'x': , '__name__': '__main__', '__doc__': None}
None
{}
mymodule
__builtin__
可以看到用户自定义函数与build_in函数的modulename是不同的,函数的属性有些是readable,有些是writable。 
其中两个函数属性比较重要,一个是func_code,它表示了编译的函数体
func_globals 包含了该了函数详细信息的一个字典,该属性返回了一个字典对象,
y=x.func_globals
print type(y)
结果:
 
上面的讨论中,涉及到了generator function,查看文档,信息如下:
 
yield_stmt ::= yield_expression 

The yield statement is only used when defining a generator 
function, and is only used in the body of the generator function. Using a yield statement in a function definition is sufficient 
to cause that definition to create a generator function instead of a normal 
function.
先讨论这里吧,由于是想到哪里,写到哪里,思路有点混论
ps
 
 Sequences
These represent finite ordered sets indexed by 
non-negative numbers. The built-in function len() returns the number of items of a sequence. When the length of a sequence is n, the index set contains the numbers 0, 1, ..., n-1. Item i of sequence a is selected by a[i].
Sequences also support slicing: a[i:j] selects all items 
with index k such that i <= k < j. When used as an expression, a slice is a 
sequence of the same type. This implies that the index set is renumbered so that 
it starts at 0.
Some sequences also support “extended slicing” with a third 
“step” parameter: a[i:j:k] selects all items of a with index x where x = i + n*k, n >= 0 and i <= x < j.
Sequences are distinguished according to their mutability:
Immutable sequences
An object of an immutable sequence type cannot change 
once it is created. (If the object contains references to other objects, these 
other objects may be mutable and may be changed; however, the collection of 
objects directly referenced by an immutable object cannot change.)
The following types are immutable sequences:
Strings
The items of a string are characters. There is no 
separate character type; a character is represented by a string of one item. 
Characters represent (at least) 8-bit bytes. The built-in functions chr() and ord() convert between characters and nonnegative integers representing the byte 
values. Bytes with the values 0-127 usually represent the corresponding ASCII 
values, but the interpretation of values is up to the program. The string data 
type is also used to represent arrays of bytes, e.g., to hold data read from a 
file.
(On systems whose native character set is not ASCII, 
strings may use EBCDIC in their internal representation, provided the functions chr() and ord() implement a mapping between ASCII and EBCDIC, and string comparison preserves 
the ASCII order. Or perhaps someone can propose a better rule?)
Unicode
The items of a Unicode object are Unicode code 
units. A Unicode code unit is represented by a Unicode object of one item and 
can hold either a 16-bit or 32-bit value representing a Unicode ordinal (the 
maximum value for the ordinal is given in sys.maxunicode, and depends on how Python is configured at 
compile time). Surrogate pairs may be present in the Unicode object, and will be 
reported as two separate items. The built-in functions unichr() and ord() convert between code units and nonnegative integers representing the Unicode 
ordinals as defined in the Unicode Standard 3.0. Conversion from and to other 
encodings are possible through the Unicode method encode() and the built-in function unicode().
Tuples
The items of a tuple are arbitrary Python 
objects. Tuples of two or more items are formed by comma-separated lists of 
expressions. A tuple of one item (a ‘singleton’) can be formed by affixing a 
comma to an expression (an expression by itself does not create a tuple, since 
parentheses must be usable for grouping of expressions). An empty tuple can be 
formed by an empty pair of parentheses.
Mutable sequences
Mutable sequences can be changed after they are 
created. The subscription and slicing notations can be used as the target of 
assignment and del (delete) statements.
There are currently two intrinsic mutable sequence types:
Lists
The items of a list are arbitrary Python 
objects. Lists are formed by placing a comma-separated list of expressions in 
square brackets. (Note that there are no special cases needed to form lists of 
length 0 or 1.)
Byte Arrays
A bytearray object is a mutable array. They 
are created by the built-in bytearray() constructor. Aside from being mutable (and 
hence unhashable), byte arrays otherwise provide the same interface and 
functionality as immutable bytes objects.
The extension module array provides an additional example of a mutable sequence type.
 
 Callable types
These are the types to which the function call 
operation (see section Calls) can be applied:
User-defined functions
A user-defined function object is created by a 
function definition (see section Function definitions). It 
should be called with an argument list containing the same number of items as 
the function’s formal parameter list.
Special attributes:
Attribute Meaning
func_doc The function’s documentation string, or None if unavailable Writable
__doc__ Another way of spelling func_doc Writable
func_name The function’s name Writable
__name__ Another way of spelling func_name Writable
__module__ The name of the module the function was defined in, or None if unavailable. Writable
func_defaults A tuple containing default argument values for those arguments that have defaults, or None if no arguments have a default value Writable
func_code The code object representing the compiled function body. Writable
func_globals A reference to the dictionary that holds the function’s global variables — the global namespace of the module in which the function was defined. Read-only
func_dict The namespace supporting arbitrary function attributes. Writable
func_closure None or a tuple of cells that contain bindings for the function’s free variables. Read-only
Most of the attributes labelled “Writable” check the type of the assigned 
value.
Changed in version 2.4: func_name is now 
writable.
Function objects also support getting and setting arbitrary attributes, which 
can be used, for example, to attach metadata to functions. Regular attribute 
dot-notation is used to get and set such attributes. Note that the current 
implementation only supports function attributes on user-defined functions. 
Function attributes on built-in functions may be supported in the 
future. 
Additional information about a function’s definition can be 
retrieved from its code object; see the description of internal types below.
User-defined methods
A user-defined method object combines a class, a 
class instance (or None) and any callable object (normally a user-defined 
function).
Special read-only attributes: im_self is 
the class instance object, im_func is the function object; im_class is 
the class of im_self for bound methods or the class that asked for the 
method for unbound methods; __doc__ is the method’s documentation (same as im_func.__doc__); __name__ is 
the method name (same as im_func.__name__); __module__ is the name of the module the method was defined in, or None if 
unavailable.
Changed in version 2.2: im_self used to refer to the class that defined the 
method.
Changed in version 2.6: For 3.0 forward-compatibility, im_func is 
also available as __func__, and im_self as __self__.
Methods also support accessing (but not setting) the arbitrary 
function attributes on the underlying function object.
User-defined method objects may be created when getting an attribute of a 
class (perhaps via an instance of that class), if that attribute is a 
user-defined function object, an unbound user-defined method object, or a class 
method object. When the attribute is a user-defined method object, a new method 
object is only created if the class from which it is being retrieved is the same 
as, or a derived class of, the class stored in the original method object; 
otherwise, the original method object is used as it is.
When a user-defined method object is created by retrieving a 
user-defined function object from a class, its im_self attribute is None and the method object is said to be unbound. When one is created by retrieving a 
user-defined function object from a class via one of its instances, its im_self attribute is the instance, and the method object is said to be bound. In either 
case, the new method’s im_class attribute is the class from which the retrieval 
takes place, and its im_func attribute is the original function object.
When a user-defined method object is created by retrieving 
another method object from a class or instance, the behaviour is the same as for 
a function object, except that the im_func attribute of the new instance is not the original method object but its im_func attribute.
When a user-defined method object is created by retrieving a 
class method object from a class or instance, its im_self attribute is the class itself (the same as the im_class attribute), and its im_func attribute is the function object underlying the 
class method.
When an unbound user-defined method object is called, the underlying function 
(im_func) is called, with the restriction that the first 
argument must be an instance of the proper class (im_class) 
or of a derived class thereof.
When a bound user-defined method object is called, the underlying function 
(im_func) is called, inserting the class instance (im_self) in 
front of the argument list. For instance, when C is a 
class which contains a definition for a function f(), and x is an instance of C, calling x.f(1) is equivalent to 
calling C.f(x, 1).
When a user-defined method object is derived from a class method object, the 
“class instance” stored in im_self will actually be the class itself, so that calling 
either x.f(1) or C.f(1) is equivalent to 
calling f(C,1) where f is the underlying 
function.
Note that the transformation from function object to (unbound or 
bound) method object happens each time the attribute is retrieved from the class 
or instance. In some cases, a fruitful optimization is to assign the attribute 
to a local variable and call that local variable. Also notice that this 
transformation only happens for user-defined functions; other callable objects 
(and all non-callable objects) are retrieved without transformation. It is also 
important to note that user-defined functions which are attributes of a class 
instance are not converted to bound methods; this only happens when the 
function is an attribute of the class.
Generator functions
A function or method which uses the yield statement (see section The yield 
statement) is called a generator function. Such a 
function, when called, always returns an iterator object which can be used to 
execute the body of the function: calling the iterator’s next() method will cause the function to execute until it provides a value using the yield statement. When the function executes a return statement or falls off the end, a StopIteration exception is raised and the iterator 
will have reached the end of the set of values to be returned.
Built-in functions
A built-in function object is a wrapper around 
a C function. Examples of built-in functions are len() and math.sin() (math is 
a standard built-in module). The number and type of the arguments are determined 
by the C function. Special read-only attributes: __doc__ is 
the function’s documentation string, or None if unavailable; __name__ is 
the function’s name; __self__ is set to None (but see the next item); __module__ is the name of the module the function was defined in or None if 
unavailable.
Built-in methods
This is really a different disguise of a 
built-in function, this time containing an object passed to the C function as an 
implicit extra argument. An example of a built-in method is alist.append(), assuming alist is a list object. In this case, the special read-only attribute __self__ is set to the object denoted by alist.
Class Types




Class types, or “new-style classes,” are callable. These objects normally 
act as factories for new instances of themselves, but variations are possible 
for class types that override __new__(). The arguments of the call are passed to __new__() and, in the typical case, to __init__() to initialize the new instance.




Classic Classes
Class objects are described below. When a 
class object is called, a new class instance (also described below) is created 
and returned. This implies a call to the class’s __init__() method if it has one. Any arguments are 
passed on to the __init__() method. If there is no __init__() method, the class must be called without 
arguments.
Class instances




Class instances are described below. Class instances are callable only when 
the class has a __call__() method; x(arguments) is a shorthand for x.__call__(arguments).   
 



阅读(4946) | 评论(0) | 转发(0) |
给主人留下些什么吧!~~