python不得不说的string-cengku-ChinaUnix博客

倚楼听风雨cengku.blog.chinaunix.net

首页　| 　博文目录　| 　关于我

cengku

博客访问： 1283318
博文数量： 185
博客积分： 495
博客等级：下士
技术积分： 1418
用户组：普通用户
注册时间： 2012-09-02 15:12

个人简介

治肾虚不含糖，专注内核性能优化二十年。 https://github.com/KnightKu

文章分类

全部博文（185）

mpi（1）
ZFS（10）
flash-ssd-nvme（22）
随笔&感悟（11）
操作系统技术（28）
数据结构和算法（0）
Python实践（11）
C语言高级（11）
内核基础技术（63）
未分配的博文（28）

文章存档

2019年（1）

2018年（12）

2017年（5）

2016年（23）

2015年（1）

2014年（22）

2013年（82）

2012年（39）

我的朋友

相关博文

python不得不说的string

分类： Python/Ruby

2013-03-11 16:23:46

原文地址：python不得不说的string 作者：kinfinger

上一篇文章讲解了python的一切都是对象，在实际开发中运用最多的就是字符串了，那么字符串又是什么情况呢？
我们知道在C中存在两种类型：字符与字符串，下面我们来看一看二者的区别

#include 
#include 
int main(int argc,char * argv []){
	/* definition of var */
	int i,len;
	char  ch,*chp;
	char cha[] = {'X','X','X','X','\\\\0'};
	char cha2[] = "XXXX";
	char * chp3=(char *) malloc(sizeof(char)*5);
	char * chp2;


	/* assignmemnt of var  and  action */
	ch = 'a';
	chp = "XXXX";  
	chp2=strdup(chp);
	for( i = 0; i < 4; i++)
		chp3[i] = i+65;
	chp3[4] = '\\\\0';
	printf("************************before change****************************\\\\n");
	printf("ch\\\\t%c\\\\ncha\\\\t%s\\\\ncha2\\\\t%s\\\\nchp\\\\t%s\\\\nchp2\\\\t%s\\\\nchp3\\\\t%s\\\\n",ch,cha,cha2,chp,chp2,chp3);
	printf("************************after change****************************\\\\n");
	ch = 'y';
	cha[3] = 'y'; 
	cha2[3] = 'y'; 
	/*	chp[3]='3'; */
	chp2[3] = 'y';
	chp3[3] = 'y';
	printf("ch\\\\t%c\\\\ncha\\\\t%s\\\\ncha2\\\\t%s\\\\nchp\\\\t%s\\\\nchp2\\\\t%s\\\\nchp3\\\\t%s\\\\n",ch,cha,cha2,chp,chp2,chp3);
	/*********add ********************/
	printf("notice of the difference between  cha and  cha2 : %d,%d\\\\n",strlen(cha),strlen(cha2));
	printf("notice of the difference between  cha and  cha2 : %d,%d\\\\n",sizeof(cha),sizeof(cha2));
	printf("****************************************************\\\\n");
}

编译并运行程序
************************before change****************************
ch a
cha XXXX
cha2 XXXX
chp XXXX
chp2 XXXX
chp3 ABCD
************************after change****************************
ch y
cha XXXy
cha2 XXXy
chp XXXX
chp2 XXXy
chp3 ABCy
notice of the difference between cha and cha2 : 4,4
notice of the difference between cha and cha2 : 5,5
**************************************************** 如果你把注释去掉，则在运行的时候报错，由此推知chp字符串常量是不能被修改的的，如果你熟悉C程序的内存布局，字符串常量字 .rdata，因此无法修改
因此在C中除了静态初始化的字符串常量以后，字符都是可以被修改的。

在python中的情况又是怎么样呢？尝试类似的定义：

ch='a'

string='XXXX'
cha=['x','x','x','x']
print ch,string,cha,type(ch),type(string),type(cha),
print hex(id(cha)),hex(id(string)),hex(id(cha[3]))
# dive in string 
# modify  
ch='b'
# string[3]='b'
cha[3]='b'
print ch,string,cha,type(ch),type(string),type(cha),
print hex(id(cha)),hex(id(string)),hex(id(cha[3])) 

# string manipulation 
#iteration of string
for letter in string:
    print "".join(letter),

for letter in string:

    print ord(x),
# string index access 
print 
#string slice 
str=string[:2]
print str,string[2],len(str),type(str)

程序的运行结果：

a XXXX ['x', 'x', 'x', 'x'] 0x1671f30 0x1ec7400 0x15c4830
b XXXX ['x', 'x', 'x', 'b'] 0x1671f30 0x1ec7400 0x15c4e18
X X X X 88 88 88 88
XX X 2
有上面可以发现：在python不存在字符变量与字符数组变量；
类似于C的类型有字符串与list
由ch,string,cha都是对象，python中的一起都是对象,且string的immutable，即不可改变的变量
查看文档help(str)
Help on class str in module __builtin__:
class str(basestring)
| str(object) -> string
|
| Return a nice string representation of the object.
| If the argument is a string, the return value is the same object.
|
| Method resolution order:
| str
| basestring
| object
------------------------------

| __getitem__(...)

| x.__getitem__(y) <==> x[y]
然后我们再看list的帮助文档：

help(list)
Help on class list in module __builtin__:
class list(object)
| list() -> new empty list
| list(iterable) -> new list initialized from iterable's items
|
仔细往下看
| __getitem__(...)
| x.__getitem__(y) <==> x[y]
| __setitem__(...)
| x.__setitem__(i, y) <==> x[i]=y
这样就可以看出，list的元素可以修改的，str的对象不能修改，如果你不确定元素能否修改，你可以通过帮助文档来快速的进行判断。
在着对str的slice比较感兴趣，从帮助文档看以看出
| __getslice__(...)
| x.__getslice__(i, j) <==> x[i:j]
|
| Use of negative indices is not supported.
可以看出该slice操作的范围为左开右闭，i为start position，j为end position
该操作和range函数有点类似，但是range的功能是产生一个int的list
上面的代码中，ord,len看起来很简单，但是还是有必要说一下，查看
help(ord)
Help on built-in function ord in module __builtin__:
ord(...)
ord(c) -> integer

Return the integer ordinal of a one-character string. help(len)
Help on built-in function len in module __builtin__:
len(...)
len(object) -> integer

Return the number of items of a sequence or mapping.
可以看到ord，len是一个函数，这里就不得不说python的函数式编程了，即在python中函数也是对象
函数主要有三种形式
函数定义 def fun():
函数对象 f=fun
函数调用 f()=fun()
需要特别注意的是f，与f()是不同的。
函数的定义就不解释，相对来说大家比较熟悉
针对函数对象，这个和其它语言存在很大的不同，是python的特色（可能是我孤陋寡闻哈）
谈到对象，我们知道对象有三个属性，type,value,id
type(f)

可知该类型为函数或是method，至于id与value就不在多少，在查看文档的时候知道函数和方法都是callable，除了用户自定义的函数和方法外还有
built_in函数与方法，generator function，class type ,classic class ,instance 都是callable,这些详细的介绍后续会给出，现在先讨论用户自定义函数
仅仅知道函数式对象还是不够，函数作为对象还有属性，请看下面的语句：

def myfun():

    """ this is document"""
#    global fun='this is global area'
    return 'mymodule'
x=myfun
print x.__doc__
print x.__name__
# x.__module__='module name '
print '%s' % x.func_doc
print '%s' % x.__module__
print x.func_code
print x.func_globals
print x.func_closure
print x.func_dict
print x()
y=ord
print y.__module__

程序输出：
this is document
myfun
this is document
__main__



{'__builtins__': , '__file__': 'h:\\python\\myfun.py', 'myfun': , 'x': , '__name__': '__main__', '__doc__': None}

None

{}

mymodule

__builtin__

可以看到用户自定义函数与build_in函数的modulename是不同的，函数的属性有些是readable，有些是writable。 

其中两个函数属性比较重要，一个是func_code，它表示了编译的函数体

func_globals 包含了该了函数详细信息的一个字典，该属性返回了一个字典对象，

y=x.func_globals

print type(y)

结果：

 

上面的讨论中，涉及到了generator function，查看文档，信息如下：

 

yield_stmt ::= yield_expression 



The yield statement is only used when defining a generator 

function, and is only used in the body of the generator function. Using a yield statement in a function definition is sufficient 

to cause that definition to create a generator function instead of a normal 

function.

先讨论这里吧，由于是想到哪里，写到哪里，思路有点混论

ps

 

 Sequences

These represent finite ordered sets indexed by 

non-negative numbers. The built-in function len() returns the number of items of a sequence. When the length of a sequence is n, the index set contains the numbers 0, 1, ..., n-1. Item i of sequence a is selected by a[i].

Sequences also support slicing: a[i:j] selects all items 

with index k such that i <= k < j. When used as an expression, a slice is a 

sequence of the same type. This implies that the index set is renumbered so that 

it starts at 0.

Some sequences also support “extended slicing” with a third 

“step” parameter: a[i:j:k] selects all items of a with index x where x = i + n*k, n >= 0 and i <= x < j.

Sequences are distinguished according to their mutability:

Immutable sequences

An object of an immutable sequence type cannot change 

once it is created. (If the object contains references to other objects, these 

other objects may be mutable and may be changed; however, the collection of 

objects directly referenced by an immutable object cannot change.)

The following types are immutable sequences:

Strings

The items of a string are characters. There is no 

separate character type; a character is represented by a string of one item. 

Characters represent (at least) 8-bit bytes. The built-in functions chr() and ord() convert between characters and nonnegative integers representing the byte 

values. Bytes with the values 0-127 usually represent the corresponding ASCII 

values, but the interpretation of values is up to the program. The string data 

type is also used to represent arrays of bytes, e.g., to hold data read from a 

file.

(On systems whose native character set is not ASCII, 

strings may use EBCDIC in their internal representation, provided the functions chr() and ord() implement a mapping between ASCII and EBCDIC, and string comparison preserves 

the ASCII order. Or perhaps someone can propose a better rule?)

Unicode

The items of a Unicode object are Unicode code 

units. A Unicode code unit is represented by a Unicode object of one item and 

can hold either a 16-bit or 32-bit value representing a Unicode ordinal (the 

maximum value for the ordinal is given in sys.maxunicode, and depends on how Python is configured at 

compile time). Surrogate pairs may be present in the Unicode object, and will be 

reported as two separate items. The built-in functions unichr() and ord() convert between code units and nonnegative integers representing the Unicode 

ordinals as defined in the Unicode Standard 3.0. Conversion from and to other 

encodings are possible through the Unicode method encode() and the built-in function unicode().

Tuples

The items of a tuple are arbitrary Python 

objects. Tuples of two or more items are formed by comma-separated lists of 

expressions. A tuple of one item (a ‘singleton’) can be formed by affixing a 

comma to an expression (an expression by itself does not create a tuple, since 

parentheses must be usable for grouping of expressions). An empty tuple can be 

formed by an empty pair of parentheses.

Mutable sequences

Mutable sequences can be changed after they are 

created. The subscription and slicing notations can be used as the target of 

assignment and del (delete) statements.

There are currently two intrinsic mutable sequence types:

Lists

The items of a list are arbitrary Python 

objects. Lists are formed by placing a comma-separated list of expressions in 

square brackets. (Note that there are no special cases needed to form lists of 

length 0 or 1.)

Byte Arrays

A bytearray object is a mutable array. They 

are created by the built-in bytearray() constructor. Aside from being mutable (and 

hence unhashable), byte arrays otherwise provide the same interface and 

functionality as immutable bytes objects.

The extension module array provides an additional example of a mutable sequence type.

 

 Callable types

These are the types to which the function call 

operation (see section Calls) can be applied:

User-defined functions

A user-defined function object is created by a 

function definition (see section Function definitions). It 

should be called with an argument list containing the same number of items as 

the function’s formal parameter list.

Special attributes:

Attribute	Meaning 

func_doc	The function’s documentation string, or None if unavailable	Writable

__doc__	Another way of spelling func_doc	Writable

func_name	The function’s name	Writable

__name__	Another way of spelling func_name	Writable

__module__	The name of the module the function was defined in, or None if unavailable.	Writable

func_defaults	A tuple containing default argument values for those arguments that have defaults, or None if no arguments have a default value	Writable

func_code	The code object representing the compiled function body.	Writable

func_globals	A reference to the dictionary that holds the function’s global variables — the global namespace of the module in which the function was defined.	Read-only

func_dict	The namespace supporting arbitrary function attributes.	Writable

func_closure	None or a tuple of cells that contain bindings for the function’s free variables.	Read-only

Most of the attributes labelled “Writable” check the type of the assigned 

value.

Changed in version 2.4: func_name is now 

writable.

Function objects also support getting and setting arbitrary attributes, which 

can be used, for example, to attach metadata to functions. Regular attribute 

dot-notation is used to get and set such attributes. Note that the current 

implementation only supports function attributes on user-defined functions. 

Function attributes on built-in functions may be supported in the 

future. 

Additional information about a function’s definition can be 

retrieved from its code object; see the description of internal types below.

User-defined methods

A user-defined method object combines a class, a 

class instance (or None) and any callable object (normally a user-defined 

function).

Special read-only attributes: im_self is 

the class instance object, im_func is the function object; im_class is 

the class of im_self for bound methods or the class that asked for the 

method for unbound methods; __doc__ is the method’s documentation (same as im_func.__doc__); __name__ is 

the method name (same as im_func.__name__); __module__ is the name of the module the method was defined in, or None if 

unavailable.

Changed in version 2.2: im_self used to refer to the class that defined the 

method.

Changed in version 2.6: For 3.0 forward-compatibility, im_func is 

also available as __func__, and im_self as __self__.

Methods also support accessing (but not setting) the arbitrary 

function attributes on the underlying function object.

User-defined method objects may be created when getting an attribute of a 

class (perhaps via an instance of that class), if that attribute is a 

user-defined function object, an unbound user-defined method object, or a class 

method object. When the attribute is a user-defined method object, a new method 

object is only created if the class from which it is being retrieved is the same 

as, or a derived class of, the class stored in the original method object; 

otherwise, the original method object is used as it is.

When a user-defined method object is created by retrieving a 

user-defined function object from a class, its im_self attribute is None and the method object is said to be unbound. When one is created by retrieving a 

user-defined function object from a class via one of its instances, its im_self attribute is the instance, and the method object is said to be bound. In either 

case, the new method’s im_class attribute is the class from which the retrieval 

takes place, and its im_func attribute is the original function object.

When a user-defined method object is created by retrieving 

another method object from a class or instance, the behaviour is the same as for 

a function object, except that the im_func attribute of the new instance is not the original method object but its im_func attribute.

When a user-defined method object is created by retrieving a 

class method object from a class or instance, its im_self attribute is the class itself (the same as the im_class attribute), and its im_func attribute is the function object underlying the 

class method.

When an unbound user-defined method object is called, the underlying function 

(im_func) is called, with the restriction that the first 

argument must be an instance of the proper class (im_class) 

or of a derived class thereof.

When a bound user-defined method object is called, the underlying function 

(im_func) is called, inserting the class instance (im_self) in 

front of the argument list. For instance, when C is a 

class which contains a definition for a function f(), and x is an instance of C, calling x.f(1) is equivalent to 

calling C.f(x, 1).

When a user-defined method object is derived from a class method object, the 

“class instance” stored in im_self will actually be the class itself, so that calling 

either x.f(1) or C.f(1) is equivalent to 

calling f(C,1) where f is the underlying 

function.

Note that the transformation from function object to (unbound or 

bound) method object happens each time the attribute is retrieved from the class 

or instance. In some cases, a fruitful optimization is to assign the attribute 

to a local variable and call that local variable. Also notice that this 

transformation only happens for user-defined functions; other callable objects 

(and all non-callable objects) are retrieved without transformation. It is also 

important to note that user-defined functions which are attributes of a class 

instance are not converted to bound methods; this only happens when the 

function is an attribute of the class.

Generator functions

A function or method which uses the yield statement (see section The yield 

statement) is called a generator function. Such a 

function, when called, always returns an iterator object which can be used to 

execute the body of the function: calling the iterator’s next() method will cause the function to execute until it provides a value using the yield statement. When the function executes a return statement or falls off the end, a StopIteration exception is raised and the iterator 

will have reached the end of the set of values to be returned.

Built-in functions

A built-in function object is a wrapper around 

a C function. Examples of built-in functions are len() and math.sin() (math is 

a standard built-in module). The number and type of the arguments are determined 

by the C function. Special read-only attributes: __doc__ is 

the function’s documentation string, or None if unavailable; __name__ is 

the function’s name; __self__ is set to None (but see the next item); __module__ is the name of the module the function was defined in or None if 

unavailable.

Built-in methods

This is really a different disguise of a 

built-in function, this time containing an object passed to the C function as an 

implicit extra argument. An example of a built-in method is alist.append(), assuming alist is a list object. In this case, the special read-only attribute __self__ is set to the object denoted by alist.

Class Types









Class types, or “new-style classes,” are callable. These objects normally 

act as factories for new instances of themselves, but variations are possible 

for class types that override __new__(). The arguments of the call are passed to __new__() and, in the typical case, to __init__() to initialize the new instance.









Classic Classes

Class objects are described below. When a 

class object is called, a new class instance (also described below) is created 

and returned. This implies a call to the class’s __init__() method if it has one. Any arguments are 

passed on to the __init__() method. If there is no __init__() method, the class must be called without 

arguments.

Class instances









Class instances are described below. Class instances are callable only when 

the class has a __call__() method; x(arguments) is a shorthand for x.__call__(arguments).


            
            
              
			  
			  
				

				
			  
              阅读(5111) | 评论(0) | 转发(0) |
			                
            
            
            
              0
              上一篇：高并发的epoll+线程池，epoll在线程池内
              下一篇：Linux中线程与CPU核的绑定


      
		给主人留下些什么吧！~~
				
				
		
        		
		
        
          评论热议
		  		 
        
           请登录后评论。
            登录 注册


  






  
  
    关于我们 | 关于IT168 | 联系方式 | 广告合作 | 法律声明 | 免费注册
      Copyright  2001-2010 ChinaUnix.net All Rights Reserved 北京皓辰网域网络信息技术有限公司. 版权所有 
      感谢所有关心和支持过ChinaUnix的朋友们
        16024965号-6