分类: Python/Ruby
2007-08-08 08:09:33
Thread State and the Global Interpreter Lock
线程状态和全局解释器锁
The Python interpreter is not fully thread safe. In order to support multi-threaded Python programs, there's a global lock that must be held by the current thread before it can safely access Python objects. Without the lock, even the simplest operations could cause problems in a multi-threaded program: for example, when two threads simultaneously increment the reference count of the same object, the reference count could end up being incremented only once instead of twice.
Python 解释器不是完全线程安全的。当前线程想要安全访问 Python 对象的前提是获取用以支持多线程安全的全局锁。没有这个锁,甚至多线程程序中最简单的操作都会发生问题。例如,两个线程同时增加一个对象的引用计数,该引用计数可能只增加了一次而非两次。
Therefore, the rule exists that only the thread that has acquired the global interpreter lock may operate on Python objects or call Python/C API functions. In order to support multi-threaded Python programs, the interpreter regularly releases and reacquires the lock -- by default, every 100 bytecode instructions (this can be changed with sys.setcheckinterval()). The lock is also released and reacquired around potentially blocking I/O operations like reading or writing a file, so that other threads can run while the thread that requests the I/O is waiting for the I/O operation to complete.
因 此,存在一个规则:只有获得了全局解释器锁的线程才能操作 Python 对象或者调用 Python/C API 函数。为了支持多线程 Python 编程,解释器有规律的释放和回收锁——默认情况下,每100字节指令集循环一次(可以通过sys.setcheckinterval()设置)。类似文件 读写之类的 i/o 片也会随锁释放和回收,这样其它的线程在请求 I/O 操作的线程等待I/O操作完成的时候也可以运行。
The Python interpreter needs to keep some bookkeeping information separate per thread -- for this it uses a data structure called . There's one global variable, however: the pointer to the current structure. While most thread packages have a way to store "per-thread global data", Python's internal platform independent thread abstraction doesn't support this yet. Therefore, the current thread state must be manipulated explicitly.
Python解释器需要为每个独立的线程保留一些薄记信息——为此它使用一个称为的数据结构。然而,这是一个全局变量:当前 结构的指针。尽管大多数线程包都有办法保存“每线程全局数据”,Python 的内置平台无关线程指令还不支持它。因此,必须明确操作当前的线程状态。
This is easy enough in most cases. Most code manipulating the global interpreter lock has the following simple structure:
大多数境况下这都是很简单的。全局解释器锁的操作代码主要是以下结构:
1 Save the thread state in a local variable. 2 Release the interpreter lock. 3 ...Do some blocking I/O operation... 4 Reacquire the interpreter lock. 5 Restore the thread state from the local variable.
This is so common that a pair of macros exists to simplify it:
这种方式如此通用,我们可以用一对现成的宏来简化它:
1 Py_BEGIN_ALLOW_THREADS 2 ...Do some blocking I/O operation... 3 Py_END_ALLOW_THREADS
The Py_BEGIN_ALLOW_THREADS macro opens a new block and declares a hidden local variable; the Py_END_ALLOW_THREADS macro closes the block. Another advantage of using these two macros is that when Python is compiled without thread support, they are defined empty, thus saving the thread state and lock manipulations.
Py_BEGIN_ALLOW_THREADS 宏打开一个新的 block 并且定义一个隐藏的局部变量;Py_END_ALLOW_THREADS 宏关闭这个 block 。这两个宏还有一个高级的用途:如果 Python 编译为不支持线程的版本,他们定义为空,因此保存线程状态并锁定操作。
When thread support is enabled, the block above expands to the following code:
如果支持线程,这个 block 就会展开为以下代码:
1 PyThreadState *_save; 2 3 _save = PyEval_SaveThread(); 4 ...Do some blocking I/O operation... 5 PyEval_RestoreThread(_save);
Using even lower level primitives, we can get roughly the same effect as follows:
使用更低级的元素,我们可以获得同样的效果:
1 PyThreadState *_save; 2 3 _save = PyThreadState_Swap(NULL); 4 PyEval_ReleaseLock(); 5 ...Do some blocking I/O operation... 6 PyEval_AcquireLock(); 7 PyThreadState_Swap(_save);
There are some subtle differences; in particular, _() saves and restores the value of the global variable errno, since the lock manipulation does not guarantee that errno is left alone. Also, when thread support is disabled, _() and _() don't manipulate the lock; in this case, _() and _() are not available. This is done so that dynamically loaded extensions compiled with thread support enabled can be loaded by an interpreter that was compiled with disabled thread support.
这里有些微妙的不同,细节上,因为锁操作不保证全局变量 erron 的一致,_() 保存和恢复 errno。同样,不支持线程时,_() 和 _() 不操作锁,在这种情况下 _() 和 _() 不可用。这使得不支持线程的解释器可以动态加载支持线程的扩展。
The global interpreter lock is used to protect the pointer to the current thread state. When releasing the lock and saving the thread state, the current thread state pointer must be retrieved before the lock is released (since another thread could immediately acquire the lock and store its own thread state in the global variable). Conversely, when acquiring the lock and restoring the thread state, the lock must be acquired before storing the thread state pointer.
全局解释器锁用于保护当前线程状态的指针。当事方锁并保存状态的时候,当前线程状态指针必须在锁释放之前回收(因为另一个指针将会随之获取锁并且在全局变量中保存它自己的线程状态)。相反,获取锁并恢复线程状态的时候,锁必须在保存状态指针之前就获得。
Why am I going on with so much detail about this? Because when threads are created from C, they don't have the global interpreter lock, nor is there a thread state data structure for them. Such threads must bootstrap themselves into existence, by first creating a thread state data structure, then acquiring the lock, and finally storing their thread state pointer, before they can start using the Python/C API. When they are done, they should reset the thread state pointer, release the lock, and finally free their thread state data structure.
为 什么我要对这些进行详细介绍?因为从 C 中创建线程的时候,它们没有全局解释器锁,也没有对应的线程状态数据结构。这些线程在他们使用 Python/C API 之前必须自举,首先要创建线程状态数据结构,然后获取锁,最后保存它们的线程状态指针。完成工作之后,他们可以重置线程状态指针,释放锁,最后释放他们的 线程数据结构。
Beginning with version 2.3, threads can now take advantage of the PyGILState_*() functions to do all of the above automatically. The typical idiom for calling into Python from a C thread is now:
自2.3版开始,线程可以使用 PyGILState_*()函数方便的自动获取以上的所有功能。从C线程中进入 Python 调用的典型方法现在变成:
1 PyGILState_STATE gstate; 2 gstate = PyGILState_Ensure(); 3 4 /* Perform Python actions here. */ 5 result = CallSomeFunction(); 6 /* evaluate result */ 7 8 /* Release the thread. No Python API allowed beyond this point. */ 9 PyGILState_Release(gstate);
Note that the PyGILState_*() functions assume there is only one global interpreter (created automatically by Py_Initialize()). Python still supports the creation of additional interpreters (using Py_()), but mixing multiple interpreters and the PyGILState_*() API is unsupported.
注意 PyGILState_*() 函数假定只有一个全局解释器(由 Py_Initialize() 自动创建)。Python 还支持创建附加的解释器(通过 Py_()),但是 PyGILState_*() 不支持混合多解释器。
This data structure represents the state shared by a number of cooperating threads. Threads belonging to the same interpreter share their module administration and a few other internal items. There are no public members in this structure.
这个数据结构描述几个协作线程共享的状态。属于同一个解释器的线程共享它们的模块维护和几个其它的内部子项。这个结构没有公开成员。
Threads belonging to different interpreters initially share nothing, except process state like available memory, open file descriptors and such. The global interpreter lock is also shared by all threads, regardless of to which interpreter they belong.
属于不同解释器的线程除了可用内存、打开的文件描述符之类的进程状态不共享任何东西。全局解释器锁也由所有线程共享,与它们所属的解释器无关。
This data structure represents the state of a single thread. The only public data member is *interp, which points to this thread's interpreter state.
这个数据结构描述了单个线程的状态。唯一的数据成员是 *interp,这个线程的解释器状态。
Initialize and acquire the global interpreter lock. It should be called in the main thread before creating a second thread or engaging in any other thread operations such as _() or _(tstate). It is not needed before calling _() or _().
初始化和获取全局解释器锁。它应该在主线程中创建,并且应该在第二个线程创建或者类似 _() 或 _(tstate) 之类的线程操作之前。它不需要在 _() 或 _()之前调用。
This is a no-op when called for a second time. It is safe to call this function before calling Py_Initialize().
第二次调用的话这就是一个否定操作。它可以在 Py_Initialize() 被调用之前安全调用。
When only the main thread exists, no lock operations are needed. This is a common situation (most Python programs do not use threads), and the lock operations slow the interpreter down a bit. Therefore, the lock is not created initially. This situation is equivalent to having acquired the lock: when there is only a single thread, all object accesses are safe. Therefore, when this function initializes the lock, it also acquires it. Before the Python thread module creates a new thread, knowing that either it has the lock or the lock hasn't been created yet, it calls _(). When this call returns, it is guaranteed that the lock has been created and that the calling thread has acquired it.
只 有一个主线程的时候,不需要锁操作。这是常见的情景(大多数 Python 程序员不用线程),锁操作稍微拖慢了解释器。因此,锁没有从一开始就创建。这种情况等同于已经获取了锁:只有一个线程的时候,所有的对象访问都是安全的。 因此,当该函数初始化锁,它也可以获得锁。Python 线程模块创建一个新的线程之前,它调用_(),了解有锁或者还没有创建锁。当这个调用返回时,它确保锁以被创建,并且调用的线程已经得到它。
It is not safe to call this function when it is unknown which thread (if any) currently has the global interpreter lock.
当前拥有全局解释器锁的线程(或其它什么)未知时,调用这个函数不安全。
This function is not available when thread support is disabled at compile time.
编译时如果不支持线程,这个函数不可用。
Returns a non-zero value if _() has been called. This function can be called without holding the lock, and therefore can be used to avoid calls to the locking API when running single-threaded. This function is not available when thread support is disabled at compile time. New in version 2.4.
如果 _() 已经被调用,这个函数返回非0值。因为单线程的时候可以不调用锁 API,这个函数可以在没有获得锁的情况下使用。这个函数在编译时禁用线程支持的情况下不可用。2.4版新加入。
Acquire the global interpreter lock. The lock must have been created earlier. If this thread already has the lock, a deadlock ensues. This function is not available when thread support is disabled at compile time.
获取全局解释器锁。锁必须提前创建。如果线程已经得到锁,会发生死锁。这个函数在编译时禁用线程支持的情况下不可用。
Release the global interpreter lock. The lock must have been created earlier. This function is not available when thread support is disabled at compile time.
释放全局解释器锁。锁必须提前创建。这个函数在编译时禁用线程支持的情况下不可用。
Acquire the global interpreter lock and set the current thread state to tstate, which should not be NULL. The lock must have been created earlier. If this thread already has the lock, deadlock ensues. This function is not available when thread support is disabled at compile time.
获得全局解释器锁并将当前线程状态设定为 tstate ,它不能为NULL。锁必须提前创建。如果线程已经拥有锁,会发生死锁。这个函数在编译时禁用线程支持的情况下不可用。
Reset the current thread state to NULL and release the global interpreter lock. The lock must have been created earlier and must be held by the current thread. The tstate argument, which must not be NULL, is only used to check that it represents the current thread state -- if it isn't, a fatal error is reported. This function is not available when thread support is disabled at compile time.
重置当前线程状态为NULL并释放全局解释器锁。锁必须提前创建并且以在当前线程中获得。参数 tstate 不能为 NULL。它只能用于校验它描述的当前线程状态——如果它不对,会报告一个致命错误。这个函数在编译时禁用线程支持的情况下不可用。
Release the interpreter lock (if it has been created and thread support is enabled) and reset the thread state to NULL, returning the previous thread state (which is not NULL). If the lock has been created, the current thread must have acquired it. (This function is available even when thread support is disabled at compile time.)
释放解释器锁(如果它已经被创建而且定义了线程支持)并且将线程状态设为 NULL ,返回前一个线程状态(如果它不为 NULL )。如果锁已经创建,当前线程必须获取它。(这个函数甚至在编译时不支持线程的情况下也能使用)。
Acquire the interpreter lock (if it has been created and thread support is enabled) and set the thread state to tstate, which must not be NULL. If the lock has been created, the current thread must not have acquired it, otherwise deadlock ensues. (This function is available even when thread support is disabled at compile time.)
获取解释器锁(如果支持线程并且锁已经创建)并设置线程状态为非空的 tstate。如果锁已经创建,当前线程必须没有在之前获得它,不然会发生死锁。(这个函数甚至在编译时不支持线程的情况下也能使用)。
The following macros are normally used without a trailing semicolon; look for example usage in the Python source distribution.
以下的宏通常调用的时候不以分号结尾;可以在发布的 Python 源代码中找到使用的示例。
This macro expands to "{ *_save; _save = _();". Note that it contains an opening brace; it must be matched with a following Py_END_ALLOW_THREADS macro. See above for further discussion of this macro. It is a no-op when thread support is disabled at compile time.
这个宏展开为 "{ *_save; _save = _();" 。注意它包含一个左大括号;它必须在其后匹配 Py_END_ALLOW_THREADS 宏。这个宏的介绍参见后面。当线程支持在编译时被禁用时它是一个 no-op。
This macro expands to "_(_save); }". Note that it contains a closing brace; it must be matched with an earlier Py_BEGIN_ALLOW_THREADS macro. See above for further discussion of this macro. It is a no-op when thread support is disabled at compile time.
这个宏展开为 "_(_save); }" 。注意它包含一个右大括号;它必须在之前匹配一个 Py_BEGIN_ALLOW_THREADS 宏。这个宏的介绍参见前面。当线程支持在编译时被禁用时它是一个 no-op。
This macro expands to "_(_save);": it is equivalent to Py_END_ALLOW_THREADS without the closing brace. It is a no-op when thread support is disabled at compile time.
这个宏展开为 "_(_save);" ;它等同于 Py_END_ALLOW_THREADS 去掉右大括号。当线程支持在编译时被禁用时它是一个 no-op。
This macro expands to "_save = _();": it is equivalent to Py_BEGIN_ALLOW_THREADS without the opening brace and variable declaration. It is a no-op when thread support is disabled at compile time.
这个宏展开为 "_save = _();" ;它是等同于 Py_BEGIN_ALLOW_THREADS 去掉左大括号和变量声明。当线程支持在编译时被禁用时它是一个 no-op。
All of the following functions are only available when thread support is enabled at compile time, and must be called only when the interpreter lock has been created.
以下所有函数只能在编译时确认支持线程的情况下可用,并且必须在解释器锁创建后被调用。
Create a new interpreter state object. The interpreter lock need not be held, but may be held if it is necessary to serialize calls to this function.
创建一个新解释器状态对象。不必要捕获解释器锁,但是当需要同步调用这个函数进行序列化的时候可能需要锁定。
Reset all information in an interpreter state object. The interpreter lock must be held.
重置解释器状态对象中的所有信息。解释器锁必须被获取。
Destroy an interpreter state object. The interpreter lock need not be held. The interpreter state must have been reset with a previous call to _Clear().
析构一个解释器状态对象。解释器锁需要获取。解释器对象必须预先用 _Clear() 重置。
Create a new thread state object belonging to the given interpreter object. The interpreter lock need not be held, but may be held if it is necessary to serialize calls to this function.
创建一个从属于给定解释器的新线程状态对象。解释器锁不需要捕获,但是需要同步调用该函数时可能需要捕获。
Reset all information in a thread state object. The interpreter lock must be held.
重置指定线程状态对象的所有信息。解释器锁必须捕获。
Destroy a thread state object. The interpreter lock need not be held. The thread state must have been reset with a previous call to _Clear().
销毁一个线程状态对象。不需要捕获解释器锁。线程状态必须提前调用 _Clear() 进行清除。
Return the current thread state. The interpreter lock must be held. When the current thread state is NULL, this issues a fatal error (so that the caller needn't check for NULL).
返回当前解释器状态。必须捕获解释器锁。当前线程状态如果为 NULL, 发生一个致命错误(因此调用者不需要校验NULL)。
Swap the current thread state with the thread state given by the argument tstate, which may be NULL. The interpreter lock must be held.
将当前线程状态与给定的参数 tstate 交换,tstate可能为 NULL。解释器锁必须被捕获。
Return value: Borrowed reference.
返回值:托管引用。
Return a dictionary in which extensions can store thread-specific state information. Each extension should use a unique key to use to store state in the dictionary. It is okay to call this function when no current thread state is available. If this function returns NULL, no exception has been raised and the caller should assume no current thread state is available. Changed in version 2.3: Previously this could only be called when a current thread is active, and NULL meant that an exception was raised.
返 回可存储线程独立的状态信息的一个扩展字典。每个扩展需要一个唯一键用于在字典中保存状态。当前线程状态不可用的时候它也可以调用。如果这个函数返回 NULL,没有抛出异常,调用者会假定当前线程状态无效。自 2.3 版以后的修改:以前它只能在当前线程激活的情况下被调用,如果返回 NULL 就意味着发生了异常。
Asynchronously raise an exception in a thread. The id argument is the thread id of the target thread; exc is the exception object to be raised. This function does not steal any references to exc. To prevent naive misuse, you must write your own C extension to call this. Must be called with the GIL held. Returns the number of thread states modified; this is normally one, but will be zero if the thread id isn't found. If exc is NULL, the pending exception (if any) for the thread is cleared. This raises no exceptions. New in version 2.3.
在 线程中异步抛出一个异常。参数 id 是目标线程的线程 id ;exc 是要抛出的异常对象。这个函数不获取 exc 的任何引用。为了防止低级错误,你必须自己编写你的 C 扩展来调用它。调用必须捕获 GIL。返回线程状态修改数;通常为 1 ,但是如果线程 id 没有找到就会返回0。如果 exc 是 NULL, 所有异常(任何可能)都会从线程中清除。不抛出异常。2.3版新加入。
Ensure that the current thread is ready to call the Python C API regardless of the current state of Python, or of its thread lock. This may be called as many times as desired by a thread as long as each call is matched with a call to PyGILState_Release(). In general, other thread-related APIs may be used between PyGILState_Ensure() and PyGILState_Release() calls as long as the thread state is restored to its previous state before the Release(). For example, normal usage of the Py_BEGIN_ALLOW_THREADS and Py_END_ALLOW_THREADS macros is acceptable.
确 保当前线程已经可以调用与当前 Python 状态无关的 Python C API,或者它的线程锁。当一个线程每次希望匹配到 PyGILState_Release() 调用时可能会反复调用这个函数。通常,在线程状态恢复为 Release() 之前的状态时,其它线程相关的 API 可能会在一对 PyGILState_Ensure() 和 PyGILState_Release() 之间调用。例如,通常可以用于 Py_BEGIN_ALLOW_THREADS 宏 Py_END_ALLOW_THREADS。
The return value is an opaque "handle" to the thread state when PyGILState_Acquire() was called, and must be passed to PyGILState_Release() to ensure Python is left in the same state. Even though recursive calls are allowed, these handles cannot be shared - each unique call to PyGILState_Ensure must save the handle for its call to PyGILState_Release.
PyGILState_Acquire ()被调用的时候,返回值是一个不透明的线程状态“句柄”,Python离开当前状态时一定会被被传递到 PyGILState_Release() 。甚至尽管允许递归调用,这些句柄也不能共享——每次调用 PyGILState_Ensure 都是唯一的,它们的句柄对应它们的PyGILState_Release。
When the function returns, the current thread will hold the GIL. Failure is a fatal error. New in version 2.3.
当函数返回,当前线程将会捕获 GIL ,失败会造成致命错误。2.3版新增。
Release any resources previously acquired. After this call, Python's state will be the same as it was prior to the corresponding PyGILState_Ensure call (but generally this state will be unknown to the caller, hence the use of the GILState API.)
释放所有之前获取的资源。这个调用之后,Python的状态会与之前 PyGILState_Ensure 调用一致(但是通常这个状态对调用者是未知的,因此使用 GILState API)。
Every call to PyGILState_Ensure() must be matched by a call to PyGILState_Release() on the same thread. New in version 2.3.
每次调用 PyGILState_Ensure() 都要在同一线程对应调用 PyGILState_Release() 。2.3版本新增。
Trackback: http://tb.blog.csdn.net/TrackBack.aspx?PostId=1347424