无聊之人--除了技术,还是技术,你懂得
分类: Python/Ruby
2013-01-28 10:38:54
列位看官请看如下代码:
0 1 2 3 4 5 6 7 8 9 a b c d e f a b c d e e a=bc c=cd b=bc d=de a b c d e for i in xrange(10): # integer print i, print for c in 'abcdef': # sequnce type print c, print for item in ['a','b','c','d','e','e']: # list iterator print item, print for (key,value) in {"a":"bc","b":"bc","c":"cd","d":"de"}.items(): #dic iterator print '%s=%s' % (key,value), print for item in ('a','b','c','d','e'): # tuple print item, print with open(r'h:\\\\python\\\\xyy.py') as f: # file for line in f: print line, class stu: def __init__(self): self.info=['a','b','c','d','e'] self.index=len(self.info) self.i=0 def __iter__(self): self.i=0 return self def next(self): if self.i == self.index: # reach the end of list raise StopIteration else: m=self.info[self.i]; self.i=self.i+1 return m s=stu() for message in s: print 'message '+message, # the above equals : it=s.__iter__() while 1: try: print it.next(), except StopIteration: break print s='abcedf' it1=iter(s) it2=iter(s) print hex(id(s)) print hex(id(it1)) print hex(id(it2)), print while 1: try: print it1.next(), except StopIteration: break print while 1: try: print it2.next(), except StopIteration: break print message a message b message c message d message e a b c d e 0x1ed7460 0x1ed74f0 0x1ed7510 a b c e d f a b c e d f
下面是help document给出的定义以及如何实现
An object representing a stream of data. Repeated calls to the iterator’s next() method return successive items in the stream. When no more data are available a StopIteration exception is raised instead. At this point, the iterator object is exhausted and any further calls to its next() method just raise StopIteration again. Iterators are required to have an __iter__() method that returns the iterator object itself so every iterator is also iterable and may be used in most places where other iterables are accepted. One notable exception is code which attempts multiple iteration passes. A container object (such as a list) produces a fresh new iterator each time you pass it to the iter() function or use it in a for loop. Attempting this with an iterator will just return the same exhausted iterator object used in the previous iteration pass, making it appear like an empty container.
从上面的例子中我们也可以看到iterator在python中几乎无处不在,简洁的语法,强大的功能,让人爱不释手的同时,不禁会问,python如何实现的?上述代码中给出了一个for 等价调用:
s=stu() for message in s: print 'message '+message, # the above equals : it=s.__iter__() while 1: try: print it.next(), except StopIteration: break这样理解是不是从语法的角度更容易一些?具体的内部是如何实现的呢?
从python 源码分析这本书里面,我们知道,python中的对象都具有如下结构:
[OBJECT.H]
#ifdef Py_TRACE_REFS /* Define pointers to support a doubly-linked list of all live heap objects. */ #define _PyObject_HEAD_EXTRA \\ struct _object *_ob_next; \\ struct _object *_ob_prev; #define _PyObject_EXTRA_INIT 0, 0, #else #define _PyObject_HEAD_EXTRA #define _PyObject_EXTRA_INIT #endif /* PyObject_HEAD defines the initial segment of every PyObject. */ #define PyObject_HEAD \\ _PyObject_HEAD_EXTRA \\ Py_ssize_t ob_refcnt; \\ struct _typeobject *ob_type; #define PyObject_HEAD_INIT(type) \\ _PyObject_EXTRA_INIT \\ 1, type, #define PyVarObject_HEAD_INIT(type, size) \\ PyObject_HEAD_INIT(type) size,
typedef struct _object { PyObject_HEAD } PyObject; #define Py_REFCNT(ob) (((PyObject*)(ob))->ob_refcnt) #define Py_TYPE(ob) (((PyObject*)(ob))->ob_type) #define Py_SIZE(ob) (((PyVarObject*)(ob))->ob_size)
一个获取对象的引用计数,获取对象的类型,还有就是获取对象的大小。我们知道,不同的对象在堆上分配的内存大小事不一样的,那么,python中的对象是知道自己应该分配多少内存呢?一切的奥秘都隐藏在 struct _typeobject *ob_type;,由该定义我们可以知道该对象是何种类型,从而确定对象的内存大小。也就是说每一个对象都一个指向自己类型的指针,那_typeobject是如何定义的呢?看源码:
typedef struct _typeobject { PyObject_VAR_HEAD const char *tp_name; /* For printing, in format "看到了没,这段代码就定义了一个对象所属类型的所有信息,如tp_name 表明了该对象的类型名称,. " */ Py_ssize_t tp_basicsize, tp_itemsize; /* For allocation */ /* Methods to implement standard operations */ destructor tp_dealloc; printfunc tp_print; getattrfunc tp_getattr; setattrfunc tp_setattr; cmpfunc tp_compare; reprfunc tp_repr; /* Method suites for standard classes */ PyNumberMethods *tp_as_number; PySequenceMethods *tp_as_sequence; PyMappingMethods *tp_as_mapping; /* More standard operations (here for binary compatibility) */ hashfunc tp_hash; ternaryfunc tp_call; reprfunc tp_str; getattrofunc tp_getattro; setattrofunc tp_setattro; /* Functions to access object as input/output buffer */ PyBufferProcs *tp_as_buffer; /* Flags to define presence of optional/expanded features */ long tp_flags; const char *tp_doc; /* Documentation string */ /* Assigned meaning in release 2.0 */ /* call function for all accessible objects */ traverseproc tp_traverse; /* delete references to contained objects */ inquiry tp_clear; /* Assigned meaning in release 2.1 */ /* rich comparisons */ richcmpfunc tp_richcompare; /* weak reference enabler */ Py_ssize_t tp_weaklistoffset; /* Added in release 2.2 */ /* Iterators */ getiterfunc tp_iter; iternextfunc tp_iternext; /* Attribute descriptor and subclassing stuff */ struct PyMethodDef *tp_methods; struct PyMemberDef *tp_members; struct PyGetSetDef *tp_getset; struct _typeobject *tp_base; PyObject *tp_dict; descrgetfunc tp_descr_get; descrsetfunc tp_descr_set; Py_ssize_t tp_dictoffset; initproc tp_init; allocfunc tp_alloc; newfunc tp_new; freefunc tp_free; /* Low-level free-memory routine */ inquiry tp_is_gc; /* For PyObject_IS_GC */ PyObject *tp_bases; PyObject *tp_mro; /* method resolution order */ PyObject *tp_cache; PyObject *tp_subclasses; PyObject *tp_weaklist; destructor tp_del; /* Type attribute cache version tag. Added in version 2.6 */ unsigned int tp_version_tag; #ifdef COUNT_ALLOCS /* these must be last and never explicitly initialized */ Py_ssize_t tp_allocs; Py_ssize_t tp_frees; Py_ssize_t tp_maxalloc; struct _typeobject *tp_prev; struct _typeobject *tp_next; #endif } PyTypeObject;
type struct 定义下的第一段需要解释一下,该宏是所有变长对象的的一个宏,
#define PyObject_VAR_HEAD \\ PyObject_HEAD \\ Py_ssize_t ob_size; /* Number of items in variable part */
[PYPORT.H]
/* Py_ssize_t is a signed integral type such that sizeof(Py_ssize_t) == * sizeof(size_t). C99 doesn't define such a thing directly (size_t is an * unsigned integral type). See PEP 353 for details. */ #ifdef HAVE_SSIZE_T typedef ssize_t Py_ssize_t; #elif SIZEOF_VOID_P == SIZEOF_SIZE_T typedef Py_intptr_t Py_ssize_t; #else # error "Python needs a typedef for Py_ssize_t in pyport.h." #endif
从该宏中不难发现,该宏相对于定长对象,只多一个变量,用以表示它包含的对象多少
后面紧跟着的就是,变量声明,在该结构体的定义里面充斥着大量的指向函数的指针,结构体,因此你在阅读的时候可能要花费点功夫
如
typedef void (*destructor)(PyObject *); typedef PyObject *(*getiterfunc) (PyObject *); typedef PyObject *(*iternextfunc) (PyObject *); struct PyMethodDef { const char *ml_name; /* The name of the built-in function/method */ PyCFunction ml_meth; /* The C function that implements it */ int ml_flags; /* Combination of METH_xxx flags, which mostly describe the args expected by the C func */ const char *ml_doc; /* The __doc__ attribute, or NULL */ }; typedef struct PyMethodDef PyMethodDef; destructor tp_dealloc; const char *tp_doc; /* Documentation string * /* Added in release 2.2 */ /* Iterators */ getiterfunc tp_iter; iternextfunc tp_iternext; struct PyMethodDef *tp_methods;
第一个是析构函数,第二个,第三个就是我们这张的主题了,可见在iterator的根还是比较深的,同时也说明python type str有多么的复杂
下面我们就主要讨论这个函数指针,但是直接搜索发现,没有找到该函数的定义,因此我们转变方向,从官方文档入手发现C_API部分有我们需要的信息,即abstract object layer。
正如该层开始所描述:
The functions in this chapter interact with Python objects regardless of their type, or with wide classes of object types。When used on object types for which they do not apply, they will raise a Python exception
object protocol
number protocol
Sequence Protocol
Mapping Protocol
Iterator Protocol
Old Buffer Protocol
抽象对象层的函数是对所有对象的操作进行了抽象,它可以应用于任何对象类型,如果该对象没有该方法,调用时将抛出异常。
这样我们就知道iterator的实现是抽象层实现,果不其然,在其代码部分可以看见:
/* Iterators */ PyAPI_FUNC(PyObject *) PyObject_GetIter(PyObject *); /* Takes an object and returns an iterator for it. This is typically a new iterator but if the argument is an iterator, this returns itself. */ #define PyIter_Check(obj) \ (PyType_HasFeature((obj)->ob_type, Py_TPFLAGS_HAVE_ITER) && \ (obj)->ob_type->tp_iternext != NULL && \ (obj)->ob_type->tp_iternext != &_PyObject_NextNotImplemented) PyAPI_FUNC(PyObject *) PyIter_Next(PyObject *); /* Takes an iterator object and calls its tp_iternext slot, returning the next value. If the iterator is exhausted, this returns NULL without setting an exception. NULL with an exception means an error occurred. */
PyObject_GetIter : 用于获取iterator对象,每次调用都返回一个新的iterator对象,当参数为iterator时,返回该iterator的一个copy
PyIter_Check : 检查参数是否是iteratable?
PyIter_Next : 以iterator为参数,返回一个下一个value,iterator exhausted 时,返回NULL
:
a='abcdef' it1=iter(a) print it1.next() # expect a it2=iter(a) it3=iter(it1) print it3.next() # expect b print hex(id(it1)) print hex(id(it2)) print hex(id(it3))a
PyIter_Check(obj) \ (PyType_HasFeature((obj)->ob_type, Py_TPFLAGS_HAVE_ITER) && \ (obj)->ob_type->tp_iternext != NULL && \ (obj)->ob_type->tp_iternext != &_PyObject_NextNotImplemented)检查对象是否iterable是通过宏来实现,在type的定义中我们type object有两个参数
/* Iterators */ getiterfunc tp_iter; iternextfunc tp_iternext;在这里都被使用到了,宏中第二行还用到了
int PyType_HasFeature(PyObject *o, int feature)
Return true if the type object o sets the feature feature. Type features are denoted by single bit flags.
该函数判断某一个type 对象是否具有某种feature
下面我们使用list来说明我们的iterator,我们忽略其他部分,只管iterator部分哈:
/*********************** List Iterator **************************/ typedef struct { PyObject_HEAD long it_index; PyListObject *it_seq; /* Set to NULL when iterator is exhausted */ } listiterobject; static PyObject *list_iter(PyObject *); static void listiter_dealloc(listiterobject *); static int listiter_traverse(listiterobject *, visitproc, void *); static PyObject *listiter_next(listiterobject *); static PyObject *listiter_len(listiterobject *); PyDoc_STRVAR(length_hint_doc, "Private method returning an estimate of len(list(it))."); static PyMethodDef listiter_methods[] = { {"__length_hint__", (PyCFunction)listiter_len, METH_NOARGS, length_hint_doc}, {NULL, NULL} /* sentinel */ }; PyTypeObject PyListIter_Type = { PyVarObject_HEAD_INIT(&PyType_Type, 0) "listiterator", /* tp_name */ sizeof(listiterobject), /* tp_basicsize */ 0, /* tp_itemsize */ /* methods */ (destructor)listiter_dealloc, /* tp_dealloc */ 0, /* tp_print */ 0, /* tp_getattr */ 0, /* tp_setattr */ 0, /* tp_compare */ 0, /* tp_repr */ 0, /* tp_as_number */ 0, /* tp_as_sequence */ 0, /* tp_as_mapping */ 0, /* tp_hash */ 0, /* tp_call */ 0, /* tp_str */ PyObject_GenericGetAttr, /* tp_getattro */ 0, /* tp_setattro */ 0, /* tp_as_buffer */ Py_TPFLAGS_DEFAULT | Py_TPFLAGS_HAVE_GC,/* tp_flags */ 0, /* tp_doc */ (traverseproc)listiter_traverse, /* tp_traverse */ 0, /* tp_clear */ 0, /* tp_richcompare */ 0, /* tp_weaklistoffset */ PyObject_SelfIter, /* tp_iter */ (iternextfunc)listiter_next, /* tp_iternext */ listiter_methods, /* tp_methods */ 0, /* tp_members */ }; static PyObject * list_iter(PyObject *seq) { listiterobject *it; if (!PyList_Check(seq)) { PyErr_BadInternalCall(); return NULL; } it = PyObject_GC_New(listiterobject, &PyListIter_Type); /*ALLOC A OBJECT */ if (it == NULL) return NULL; it->it_index = 0; Py_INCREF(seq); it->it_seq = (PyListObject *)seq; _PyObject_GC_TRACK(it); /* Gc TRACK IT */ return (PyObject *)it; } static void listiter_dealloc(listiterobject *it) { _PyObject_GC_UNTRACK(it); Py_XDECREF(it->it_seq); PyObject_GC_Del(it); } static int listiter_traverse(listiterobject *it, visitproc visit, void *arg) { Py_VISIT(it->it_seq); return 0; } static PyObject * listiter_next(listiterobject *it) { PyListObject *seq; PyObject *item; assert(it != NULL); seq = it->it_seq; if (seq == NULL) return NULL; assert(PyList_Check(seq)); if (it->it_index < PyList_GET_SIZE(seq)) { item = PyList_GET_ITEM(seq, it->it_index); ++it->it_index; Py_INCREF(item); return item; } Py_DECREF(seq); it->it_seq = NULL; return NULL; } static PyObject * listiter_len(listiterobject *it) { Py_ssize_t len; if (it->it_seq) { len = PyList_GET_SIZE(it->it_seq) - it->it_index; if (len >= 0) return PyInt_FromSsize_t(len); } return PyInt_FromLong(0); }
注意蓝色两行,即为对象的静态初始化,下面给出了引用函数的具体的实现。
[object.h] PyObject * PyObject_SelfIter(PyObject *obj) { Py_INCREF(obj); return obj; }
仔细回想一下,我们从python中的语法,一步一步的探索,从base class 到type class,到abstract class layer,
--> iterator protocol---list iterator 的实现,这样你是不是有一个感性的认识?
东西太多了,思路有点乱,有空细心的整理一把~~~~~~~~~~
To write a loop which iterates over an iterator, the C code should look something like this: PyObject *iterator = PyObject_GetIter(obj); PyObject *item; if (iterator == NULL) { /* propagate error */ } while (item = PyIter_Next(iterator)) { /* do something with item */ ... /* release reference when done */ Py_DECREF(item); } Py_DECREF(iterator); if (PyErr_Occurred()) { /* propagate error */ } else { /* continue doing useful work */ }
友情提示:
Applying iter() to a dictionary always loops over the keys, but dictionaries have methods that return other iterators. If you want to iterate over keys, values, or key/value pairs, you can explicitly call the iterkeys(), itervalues(), or iteritems() methods to get an appropriate iterator.
First, the iterator protocol - when you write
for x in mylist: ...loop body...
Python performs the following two steps:
Gets an iterator for mylist:
Call iter(mylist) -> this returns an object with a next() method.
[This is the step most people forget to tell you about]
Uses the iterator to loop over items:
Keep calling the next() method on the iterator returned from step 1. The return value fromnext() is assigned to x and the loop body is executed. If an exception StopIteration is raised from within next(), it means there are no more values in the iterator and the loop is exited.
The truth is Python performs the above two steps anytime it wants to loop over the contents of an object - so it could be a for loop
The yield keyword reduced to 2 simple facts:
Thus, in the unlikely event that you are failing to do something like this...
> x = myRange(5) > list(x) [0, 1, 2, 3, 4] > list(x) []
... then remember that a generator is a iterator; that is, it is one-time-use. If you want to reuse it, you should call myRange(...) again. Those who absolutely need to clone a generator (e.g. who are doing terrifyingly hackish metaprogramming) can use itertools.tee if absolutely necessary, since the copyable iterator python PEP standards proposal has been deferred.
python manual