Chinaunix首页 | 论坛 | 博客
  • 博客访问: 1804111
  • 博文数量: 335
  • 博客积分: 4690
  • 博客等级: 上校
  • 技术积分: 4341
  • 用 户 组: 普通用户
  • 注册时间: 2010-05-08 21:38
个人简介

无聊之人--除了技术,还是技术,你懂得

文章分类

全部博文(335)

文章存档

2016年(29)

2015年(18)

2014年(7)

2013年(86)

2012年(90)

2011年(105)

分类: Python/Ruby

2013-01-28 10:38:54

列位看官请看如下代码:


0 1 2 3 4 5 6 7 8 9
a b c d e f
a b c d e e
a=bc c=cd b=bc d=de
a b c d e
for i in xrange(10):        # integer 
  print i,
print 
for  c in 'abcdef':     # sequnce type 
  print c,
print 
for item in ['a','b','c','d','e','e']:   # list iterator
  print item,
print 
for (key,value) in {"a":"bc","b":"bc","c":"cd","d":"de"}.items(): #dic  iterator
  print '%s=%s' % (key,value),
print 
for  item in ('a','b','c','d','e'):          # tuple
  print item,
print 
with open(r'h:\\\\python\\\\xyy.py') as f:         # file 
  for line in f:
    print line, 
class  stu:
    def __init__(self):
        self.info=['a','b','c','d','e']
        self.index=len(self.info)
        self.i=0
    def __iter__(self):
        self.i=0
        return self
    def next(self):
        if self.i == self.index:  # reach the end of list
            raise StopIteration
        else:
            m=self.info[self.i];
            self.i=self.i+1
            return m            
s=stu()
for message in s:
    print 'message '+message,
# the above equals :
it=s.__iter__()
while  1:
    try:
        print it.next(),
    except StopIteration:
        break
print 
s='abcedf'
it1=iter(s)
it2=iter(s)
print hex(id(s))
print hex(id(it1))
print hex(id(it2)),
print 
while  1:
    try:
        print it1.next(),
    except StopIteration:
        break
print 
while  1:
    try:
        print it2.next(),
    except StopIteration:
        break
print  message a message b message c message d message e a b c d e
0x1ed7460
0x1ed74f0
0x1ed7510
a b c e d f
a b c e d f



下面是help  document给出的定义以及如何实现

An object representing a stream of data. Repeated calls to the iterator’s next() method return successive items in the stream. When no more data are available a StopIteration exception is raised instead. At this point, the iterator object is exhausted and any further calls to its next() method just raise StopIteration again. Iterators are required to have an __iter__() method that returns the iterator object itself so every iterator is also iterable and may be used in most places where other iterables are accepted. One notable exception is code which attempts multiple iteration passes. A container object (such as a list) produces a fresh new iterator each time you pass it to the iter() function or use it in a for loop. Attempting this with an iterator will just return the same exhausted iterator object used in the previous iteration pass, making it appear like an empty container.

从上面的例子中我们也可以看到iterator在python中几乎无处不在,简洁的语法,强大的功能,让人爱不释手的同时,不禁会问,python如何实现的?上述代码中给出了一个for 等价调用:


s=stu()
for message in s:
    print 'message '+message,
# the above equals :
it=s.__iter__()
while  1:
    try:
        print it.next(),
    except StopIteration:
        break
这样理解是不是从语法的角度更容易一些?具体的内部是如何实现的呢?


从python 源码分析这本书里面,我们知道,python中的对象都具有如下结构:


[OBJECT.H]
#ifdef Py_TRACE_REFS
/* Define pointers to support a doubly-linked list of all live heap objects. */
#define _PyObject_HEAD_EXTRA            \\
    struct _object *_ob_next;           \\
    struct _object *_ob_prev;

#define _PyObject_EXTRA_INIT 0, 0,

#else
#define _PyObject_HEAD_EXTRA
#define _PyObject_EXTRA_INIT
#endif

/* PyObject_HEAD defines the initial segment of every PyObject. */
#define PyObject_HEAD                   \\
    _PyObject_HEAD_EXTRA                \\
    Py_ssize_t ob_refcnt;               \\
    struct _typeobject *ob_type;

#define PyObject_HEAD_INIT(type)        \\
    _PyObject_EXTRA_INIT                \\
    1, type,

#define PyVarObject_HEAD_INIT(type, size)       \\
    PyObject_HEAD_INIT(type) size,
typedef struct _object {
    PyObject_HEAD
} PyObject;
#define Py_REFCNT(ob)           (((PyObject*)(ob))->ob_refcnt)
#define Py_TYPE(ob)             (((PyObject*)(ob))->ob_type)
#define Py_SIZE(ob)             (((PyVarObject*)(ob))->ob_size)



上述定义使用宏定义了一个PyObject,该对象是所有固定长度对象的基类,正如该段代码的注释中所说,该对象其实什么也没有定义,但是所有执行python对象的指针都可以被cast 到 (pyobject *),从而进行各种操作,如多态,紧跟着,使用macro给出了该对象最常用的几个操作


一个获取对象的引用计数,获取对象的类型,还有就是获取对象的大小。我们知道,不同的对象在堆上分配的内存大小事不一样的,那么,python中的对象是知道自己应该分配多少内存呢?一切的奥秘都隐藏在 struct _typeobject *ob_type;,由该定义我们可以知道该对象是何种类型,从而确定对象的内存大小。也就是说每一个对象都一个指向自己类型的指针,那_typeobject是如何定义的呢?看源码:

typedef struct _typeobject {
    PyObject_VAR_HEAD
    const char *tp_name; /* For printing, in format "." */
    Py_ssize_t tp_basicsize, tp_itemsize; /* For allocation */

    /* Methods to implement standard operations */

    destructor tp_dealloc;
    printfunc tp_print;
    getattrfunc tp_getattr;
    setattrfunc tp_setattr;
    cmpfunc tp_compare;
    reprfunc tp_repr;

    /* Method suites for standard classes */

    PyNumberMethods *tp_as_number;
    PySequenceMethods *tp_as_sequence;
    PyMappingMethods *tp_as_mapping;

    /* More standard operations (here for binary compatibility) */

    hashfunc tp_hash;
    ternaryfunc tp_call;
    reprfunc tp_str;
    getattrofunc tp_getattro;
    setattrofunc tp_setattro;

    /* Functions to access object as input/output buffer */
    PyBufferProcs *tp_as_buffer;

    /* Flags to define presence of optional/expanded features */
    long tp_flags;

    const char *tp_doc; /* Documentation string */

    /* Assigned meaning in release 2.0 */
    /* call function for all accessible objects */
    traverseproc tp_traverse;

    /* delete references to contained objects */
    inquiry tp_clear;

    /* Assigned meaning in release 2.1 */
    /* rich comparisons */
    richcmpfunc tp_richcompare;

    /* weak reference enabler */
    Py_ssize_t tp_weaklistoffset;

    /* Added in release 2.2 */
    /* Iterators */
    getiterfunc tp_iter;
    iternextfunc tp_iternext;

    /* Attribute descriptor and subclassing stuff */
    struct PyMethodDef *tp_methods;
    struct PyMemberDef *tp_members;
    struct PyGetSetDef *tp_getset;
    struct _typeobject *tp_base;
    PyObject *tp_dict;
    descrgetfunc tp_descr_get;
    descrsetfunc tp_descr_set;
    Py_ssize_t tp_dictoffset;
    initproc tp_init;
    allocfunc tp_alloc;
    newfunc tp_new;
    freefunc tp_free; /* Low-level free-memory routine */
    inquiry tp_is_gc; /* For PyObject_IS_GC */
    PyObject *tp_bases;
    PyObject *tp_mro; /* method resolution order */
    PyObject *tp_cache;
    PyObject *tp_subclasses;
    PyObject *tp_weaklist;
    destructor tp_del;

    /* Type attribute cache version tag. Added in version 2.6 */
    unsigned int tp_version_tag;

#ifdef COUNT_ALLOCS
    /* these must be last and never explicitly initialized */
    Py_ssize_t tp_allocs;
    Py_ssize_t tp_frees;
    Py_ssize_t tp_maxalloc;
    struct _typeobject *tp_prev;
    struct _typeobject *tp_next;
#endif
} PyTypeObject;
看到了没,这段代码就定义了一个对象所属类型的所有信息,如tp_name 表明了该对象的类型名称,


type  struct 定义下的第一段需要解释一下,该宏是所有变长对象的的一个宏,

#define PyObject_VAR_HEAD               \\
    PyObject_HEAD                       \\
    Py_ssize_t ob_size; /* Number of items in variable part */
[PYPORT.H]
/* Py_ssize_t is a signed integral type such that sizeof(Py_ssize_t) ==
 * sizeof(size_t).  C99 doesn't define such a thing directly (size_t is an
 * unsigned integral type).  See PEP 353 for details.
 */
#ifdef HAVE_SSIZE_T
typedef ssize_t         Py_ssize_t;
#elif SIZEOF_VOID_P == SIZEOF_SIZE_T
typedef Py_intptr_t     Py_ssize_t;
#else
#   error "Python needs a typedef for Py_ssize_t in pyport.h."
#endif



从该宏中不难发现,该宏相对于定长对象,只多一个变量,用以表示它包含的对象多少

后面紧跟着的就是,变量声明,在该结构体的定义里面充斥着大量的指向函数的指针,结构体,因此你在阅读的时候可能要花费点功夫

typedef void (*destructor)(PyObject *);
typedef PyObject *(*getiterfunc) (PyObject *);
typedef PyObject *(*iternextfunc) (PyObject *);
struct PyMethodDef {
    const char	*ml_name;	/* The name of the built-in function/method */
    PyCFunction  ml_meth;	/* The C function that implements it */
    int		 ml_flags;	/* Combination of METH_xxx flags, which mostly
				   describe the args expected by the C func */
    const char	*ml_doc;	/* The __doc__ attribute, or NULL */
};
typedef struct PyMethodDef PyMethodDef;
  destructor tp_dealloc;
  const char *tp_doc; /* Documentation string *

 /* Added in release 2.2 */
    /* Iterators */
    getiterfunc tp_iter;
    iternextfunc tp_iternext;
struct PyMethodDef *tp_methods;


第一个是析构函数,第二个,第三个就是我们这张的主题了,可见在iterator的根还是比较深的,同时也说明python type str有多么的复杂

下面我们就主要讨论这个函数指针,但是直接搜索发现,没有找到该函数的定义,因此我们转变方向,从官方文档入手发现C_API部分有我们需要的信息,即abstract object layer。

正如该层开始所描述:

The functions in this chapter interact with Python objects regardless of their type, or with wide classes of object types。When used on object types for which they do not apply, they will raise a Python exception

object protocol

number  protocol

Sequence Protocol
Mapping Protocol
Iterator Protocol
Old Buffer Protocol


抽象对象层的函数是对所有对象的操作进行了抽象,它可以应用于任何对象类型,如果该对象没有该方法,调用时将抛出异常。

这样我们就知道iterator的实现是抽象层实现,果不其然,在其代码部分可以看见:

/* Iterators */

     PyAPI_FUNC(PyObject *) PyObject_GetIter(PyObject *);
     /* Takes an object and returns an iterator for it.
    This is typically a new iterator but if the argument
    is an iterator, this returns itself. */

#define PyIter_Check(obj) \
    (PyType_HasFeature((obj)->ob_type, Py_TPFLAGS_HAVE_ITER) && \
     (obj)->ob_type->tp_iternext != NULL && \
     (obj)->ob_type->tp_iternext != &_PyObject_NextNotImplemented)

  PyAPI_FUNC(PyObject *) PyIter_Next(PyObject *);
     /* Takes an iterator object and calls its tp_iternext slot,
    returning the next value.  If the iterator is exhausted,
    this returns NULL without setting an exception.
    NULL with an exception means an error occurred. */

PyObject_GetIter : 用于获取iterator对象,每次调用都返回一个新的iterator对象,当参数为iterator时,返回该iterator的一个copy
PyIter_Check : 检查参数是否是iteratable?
PyIter_Next : 以iterator为参数,返回一个下一个value,iterator exhausted 时,返回NULL

:

a='abcdef'
it1=iter(a)
print it1.next() # expect a
it2=iter(a)
it3=iter(it1)
print it3.next()  # expect b
print hex(id(it1))
print hex(id(it2))
print hex(id(it3))
a
b
0x1e28410
0x1e28450
0x1e28410



你对list对象,每次调用iter函数,返回的对象都是不同的,当传入的参数是iterator本身,返回的也是对象自身的copy

PyIter_Check(obj) \
    (PyType_HasFeature((obj)->ob_type, Py_TPFLAGS_HAVE_ITER) && \
     (obj)->ob_type->tp_iternext != NULL && \
     (obj)->ob_type->tp_iternext != &_PyObject_NextNotImplemented)
检查对象是否iterable是通过宏来实现,在type的定义中我们type object有两个参数
   /* Iterators */
    getiterfunc tp_iter;
    iternextfunc tp_iternext;
在这里都被使用到了,宏中第二行还用到了

int PyType_HasFeature(PyObject *o, int feature)
Return true if the type object o sets the feature feature. Type features are denoted by single bit flags.

该函数判断某一个type 对象是否具有某种feature
下面我们使用list来说明我们的iterator,我们忽略其他部分,只管iterator部分哈:

/*********************** List Iterator **************************/

typedef struct {
    PyObject_HEAD
    long it_index;
    PyListObject *it_seq; /* Set to NULL when iterator is exhausted */
} listiterobject;

static PyObject *list_iter(PyObject *);
static void listiter_dealloc(listiterobject *);
static int listiter_traverse(listiterobject *, visitproc, void *);
static PyObject *listiter_next(listiterobject *);
static PyObject *listiter_len(listiterobject *);

PyDoc_STRVAR(length_hint_doc, "Private method returning an estimate of len(list(it)).");

static PyMethodDef listiter_methods[] = {
    {"__length_hint__", (PyCFunction)listiter_len, METH_NOARGS, length_hint_doc},
    {NULL,              NULL}           /* sentinel */
};

PyTypeObject PyListIter_Type = {
    PyVarObject_HEAD_INIT(&PyType_Type, 0)
    "listiterator",                             /* tp_name */
    sizeof(listiterobject),                     /* tp_basicsize */
    0,                                          /* tp_itemsize */
    /* methods */
    (destructor)listiter_dealloc,               /* tp_dealloc */
    0,                                          /* tp_print */
    0,                                          /* tp_getattr */
    0,                                          /* tp_setattr */
    0,                                          /* tp_compare */
    0,                                          /* tp_repr */
    0,                                          /* tp_as_number */
    0,                                          /* tp_as_sequence */
    0,                                          /* tp_as_mapping */
    0,                                          /* tp_hash */
    0,                                          /* tp_call */
    0,                                          /* tp_str */
    PyObject_GenericGetAttr,                    /* tp_getattro */
    0,                                          /* tp_setattro */
    0,                                          /* tp_as_buffer */
    Py_TPFLAGS_DEFAULT | Py_TPFLAGS_HAVE_GC,/* tp_flags */
    0,                                          /* tp_doc */
    (traverseproc)listiter_traverse,            /* tp_traverse */
    0,                                          /* tp_clear */
    0,                                          /* tp_richcompare */
    0,                                          /* tp_weaklistoffset */
    PyObject_SelfIter,                          /* tp_iter */
    (iternextfunc)listiter_next,                /* tp_iternext */
    listiter_methods,                           /* tp_methods */
    0,                                          /* tp_members */
};


static PyObject *
list_iter(PyObject *seq)
{
    listiterobject *it;

    if (!PyList_Check(seq)) {
        PyErr_BadInternalCall();
        return NULL;
    }
    it = PyObject_GC_New(listiterobject, &PyListIter_Type); /*ALLOC A OBJECT */
    if (it == NULL)
        return NULL;
    it->it_index = 0;
    Py_INCREF(seq);
    it->it_seq = (PyListObject *)seq;
    _PyObject_GC_TRACK(it);   /* Gc TRACK IT */
    return (PyObject *)it;
}

static void
listiter_dealloc(listiterobject *it)
{
    _PyObject_GC_UNTRACK(it);
    Py_XDECREF(it->it_seq);
    PyObject_GC_Del(it);
}

static int
listiter_traverse(listiterobject *it, visitproc visit, void *arg)
{
    Py_VISIT(it->it_seq);
    return 0;
}

static PyObject *
listiter_next(listiterobject *it)
{
    PyListObject *seq;
    PyObject *item;

    assert(it != NULL);
    seq = it->it_seq;
    if (seq == NULL)
        return NULL;
    assert(PyList_Check(seq));

    if (it->it_index < PyList_GET_SIZE(seq)) {
        item = PyList_GET_ITEM(seq, it->it_index);
        ++it->it_index;
        Py_INCREF(item);
        return item;
    }

    Py_DECREF(seq);
    it->it_seq = NULL;
    return NULL;
}

static PyObject *
listiter_len(listiterobject *it)
{
    Py_ssize_t len;
    if (it->it_seq) {
        len = PyList_GET_SIZE(it->it_seq) - it->it_index;
        if (len >= 0)
            return PyInt_FromSsize_t(len);
    }
    return PyInt_FromLong(0);
}

注意蓝色两行,即为对象的静态初始化,下面给出了引用函数的具体的实现。

[object.h]

PyObject *
PyObject_SelfIter(PyObject *obj)
{
    Py_INCREF(obj);
    return obj;
}

仔细回想一下,我们从python中的语法,一步一步的探索,从base class  到type class,到abstract class layer,

--> iterator  protocol---list iterator 的实现,这样你是不是有一个感性的认识?


东西太多了,思路有点乱,有空细心的整理一把~~~~~~~~~~

To write a loop which iterates over an iterator, the C code should look something like this:

PyObject *iterator = PyObject_GetIter(obj);
PyObject *item;

if (iterator == NULL) {
    /* propagate error */
}

while (item = PyIter_Next(iterator)) {
    /* do something with item */
    ...
    /* release reference when done */
    Py_DECREF(item);
}

Py_DECREF(iterator);

if (PyErr_Occurred()) {
    /* propagate error */
}
else {
    /* continue doing useful work */
}

友情提示:

Applying iter() to a dictionary always loops over the keys, but dictionaries have methods that return other iterators. If you want to iterate over keys, values, or key/value pairs, you can explicitly call the iterkeys(), itervalues(), or iteritems() methods to get an appropriate iterator.

Don't confuse your Iterables, Iterators and Generators

First, the iterator protocol - when you write

for x in mylist: ...loop body...

Python performs the following two steps:

  1. Gets an iterator for mylist:

    Call iter(mylist) -> this returns an object with a next() method.

    [This is the step most people forget to tell you about]

  2. Uses the iterator to loop over items:

    Keep calling the next() method on the iterator returned from step 1. The return value fromnext() is assigned to x and the loop body is executed. If an exception StopIteration is raised from within next(), it means there are no more values in the iterator and the loop is exited.

The truth is Python performs the above two steps anytime it wants to loop over the contents of an object - so it could be a for loop


The yield keyword reduced to 2 simple facts:

  1. If the compiler detects the yield keyword anywhere inside a function, that function no longer returns via the return statement. INSTEAD, it immediately returns a lazy "pending list" objectcalled a generator
  2. A generator is iterable. What is an iterable?--It's anything like a list or set or range or dict-view, with a built-in protocol for visiting each element in a certain order.
In python-speak, an iterable is any object which "understands the concept of a for-loop" like a list[1,2,3], and an iterator is a specific instance of the requested for-loop like [1,2,3].__iter__(). A generator is exactly the same as any iterator, except for the way it was written (with function syntax).



Thus, in the unlikely event that you are failing to do something like this...

> x = myRange(5) > list(x) [0, 1, 2, 3, 4] > list(x) []

... then remember that a generator is a iterator; that is, it is one-time-use. If you want to reuse it, you should call myRange(...) again. Those who absolutely need to clone a generator (e.g. who are doing terrifyingly hackish metaprogramming) can use itertools.tee if absolutely necessary, since the copyable iterator python PEP standards proposal has been deferred.


REF:


python manual



阅读(4447) | 评论(0) | 转发(0) |
给主人留下些什么吧!~~