分类: Python/Ruby
2008-09-11 11:39:13
Published September 2007
Among the core principles of Python's way of doing things there is a rule about having high-level interfaces to APIs. The Database API (in this case the Oracle API) is one example. Using the cx_Oracle Python module from Computronix, you can take command over the Oracle query model while maintaining compatibility with Python Database API Specification v2.0.
The model of querying databases using DB API 2.0 remains consistent for all client libraries conforming to the specification. On top of this, Anthony Tuininga, the principal developer of cx_Oracle, has added a wide set of properties and methods that expose Oracle-specific features to developers. It is absolutely possible to use only the standard methods and forget about the "extra" ones, but in this installment you won't be doing that. The concept of universal database wrappers might work in some cases but at the same time, you lose all the optimizations that the RDBMS offers.
The Python Database API Specification v2.0 is a community effort to unify the model of accessing different database systems. Having a relatively small set of methods and properties, it is easy to learn and remains consistent when switching database vendors. It doesn't map database objects to Python structures in any way. Users are still required to write SQL by hand. After changing to another database, this SQL would probably need to be rewritten. Nevertheless it solves Python-database connectivity issues in an elegant and clean manner.
The specification defines parts of the API such as the module interface, connection objects, cursor objects, type objects and constructors, optional extensions to the DB API and optional error handling mechanisms.
The gateway between the database and Python language is the Connection object. It contains all the ingredients for cooking database-driven applications, not only adhering to the DB API 2.0 but being a superset of the specification methods and attributes. In multi-threaded programs, modules as well as connections can be shared across threads; sharing cursors is not supported. This limitation is usually acceptable because shareable cursors can carry the risk of deadlocks.
Python makes extensive use of the exception model and the DB API defines several standard exceptions that could be very helpful in debugging problems in the application. Below are the standard exceptions with a short description of the types of causes:
The connect process begins with the Connection object, which is the base for creating Cursor objects. Beside cursor operations, the Connection object also manages transactions with the commit() and rollback() methods. The process of executing SQL queries, issuing DML/DCL statements and fetching results are all controlled by cursors.
cx_Oracle extends the standard DB API 2.0 specification in its implementation of the Cursor and Connection classes at most. All such extensions will be clearly marked in the text if needed.
Within the scope of a Connection object (such as assigned to the db variable above) you can get the database version by querying the version attribute (an extension to DB API 2.0). This can be used to make Python programs Oracle-version dependent. Likewise, you can get the connect string for the connection by querying the dsn attribute.
>>> import cx_Oracle
>>> db = cx_Oracle.connect('hr', 'hrpwd', 'localhost:1521/XE')
>>> db1 = cx_Oracle.connect('hr/hrpwd@localhost:1521/XE')
>>> dsn_tns = cx_Oracle.makedsn('localhost', 1521, 'XE')
>>> print dsn_tns
(DESCRIPTION=(ADDRESS_LIST=(ADDRESS=(PROTOCOL=TCP)(HOST=localhost)(PORT=1521)))
(CONNECT_DATA=(SID=XE)))
>>> db2 = cx_Oracle.connect('hr', 'hrpwd', dsn_tns)
>>> print db.version
10.2.0.1.0
>>> versioning = db.version.split('.')
>>> print versioning
['10', '2', '0', '1', '0']
>>> if versioning[0]=='10':
... print "Running 10g"
... elif versioning[0]=='9':
... print "Running 9i"
...
Running 10g
>>> print db.dsn
localhost:1521/XE
Application logic often requires clearly distinguishing the stages of processing a statement issued against the database. This will help understand performance bottlenecks better and allow writing faster, optimized code. The three stages of processing a statement are:
>>> cursor = db.cursor()
Before going forward with cursor examples please welcome the pprint function from the pprint module. It outputs Python data structures in a clean, readable form.
cx_Oracle cursors are iterators. These powerful Python structures let you iterate over sequences in a natural way that fetches subsequent items on demand only. Costly database select operations naturally fit into this idea because the data only gets fetched when needed. Instead of creating or fetching the whole result set, you iterate until the desired value is found or another condition fulfilled.
>>> from pprint import pprint
>>> cursor.execute('SELECT * FROM jobs')
[, ,
, ]
>>> pprint(cursor.fetchall())
[('AD_PRES', 'President', 20000, 40000),
('AD_VP', 'Administration Vice President', 15000, 30000),
('AD_ASST', 'Administration Assistant', 3000, 6000),
('FI_MGR', 'Finance Manager', 8200, 16000),
('FI_ACCOUNT', 'Accountant', 4200, 9000),
|
('PR_REP', 'Public Relations Representative', 4500, 10500)]
>>> cursor = db.cursor()Just after an execute list(cursor) does the same job as cursor.fetchall(). This is because the built-in list() function iterates until the end of the given iterator.
>>> cursor.execute('SELECT * FROM jobs')
[, ,
, ]
>>> for row in cursor: ## notice that this is plain English!
... print row
...
('AD_VP', 'Administration Vice President', 15000, 30000)
('AD_ASST', 'Administration Assistant', 3000, 6000)
('FI_MGR', 'Finance Manager', 8200, 16000)
('FI_ACCOUNT', 'Accountant', 4200, 9000)
('AC_MGR', 'Accounting Manager', 8200, 16000)
|
('PR_REP', 'Public Relations Representative', 4500, 10500)
During the fetch stage, basic Oracle data types get mapped into their Python equivalents. cx_Oracle maintains a separate set of data types that helps in this transition. The Oracle - cx_Oracle - Python mappings are:
Oracle | cx_Oracle | Python |
VARCHAR2 NVARCHAR2 LONG |
cx_Oracle.STRING |
str |
CHAR |
cx_Oracle.FIXED_CHAR |
|
NUMBER |
cx_Oracle.NUMBER |
int |
FLOAT |
float |
|
DATE |
cx_Oracle.DATETIME |
datetime.datetime |
TIMESTAMP |
cx_Oracle.TIMESTAMP |
|
CLOB |
cx_Oracle.CLOB |
cx_Oracle.LOB |
BLOB |
cx_Oracle.BLOB |
Other data types that are not yet handled by cx_Oracle include XMLTYPE and all complex types. All queries involving columns of unsupported types will currently fail with a NotSupportedError exception. You need to remove them from queries or cast to a supported data type.
For example, consider the following table for storing aggregated RSS feeds:
When trying to query this table with Python, some additional steps need to be performed. In the example below XMLType.GetClobVal() is used to return XML from the table as CLOB values.
CREATE TABLE rss_feeds (
feed_id NUMBER PRIMARY KEY,
feed_url VARCHAR2(250) NOT NULL,
feed_xml XMLTYPE
);
>>> cursor.execute('SELECT * FROM rss_feeds')You might have already noticed the cx_Oracle.Cursor.execute* family of methods returns column data types for queries. These are lists of Variable objects (an extension to DB API 2.0), which get the value None before the fetch phase and proper data values after the fetch. Detailed information about data types is available through the description attribute of cursor objects. The description is a list of 7-item tuples where each tuple consists of a column name, column type, display size, internal size, precision, scale and whether null is possible. Note that column information is only accessible for SQL statements that are queries.
Traceback (most recent call last):
File "", line 1, in
cursor.execute('SELECT * FROM rss_feeds')
NotSupportedError: Variable_TypeByOracleDataType: unhandled data type 108
>>> cursor.execute('SELECT feed_id, feed_url, XMLType.GetClobVal(feed_xml) FROM rss_feeds')
[, ,
]
>>> column_data_types = cursor.execute('SELECT * FROM employees')
>>> print column_data_types
[, ,
, ,
, ,
, ,
, ,
]
>>> pprint(cursor.description)
[('EMPLOYEE_ID',, 7, 22, 6, 0, 0),
('FIRST_NAME',, 20, 20, 0, 0, 1),
('LAST_NAME',, 25, 25, 0, 0, 0),
('EMAIL',, 25, 25, 0, 0, 0),
('PHONE_NUMBER',, 20, 20, 0, 0, 1),
('HIRE_DATE',, 23, 7, 0, 0, 0),
('JOB_ID',, 10, 10, 0, 0, 0),
('SALARY',, 12, 22, 8, 2, 1),
('COMMISSION_PCT',, 6, 22, 2, 2, 1),
('MANAGER_ID',, 7, 22, 6, 0, 1),
('DEPARTMENT_ID',, 5, 22, 4, 0, 1)]
SELECT * FROM emp_details_view WHERE department_id=50When run one-by-one, each need to be parsed separately which adds extra overhead to your application. By using bind variables you can tell Oracle to parse a query only once. cx_Oracle supports binding variables by name or by position.
SELECT * FROM emp_details_view WHERE department_id=60
SELECT * FROM emp_details_view WHERE department_id=90
SELECT * FROM emp_details_view WHERE department_id=110
Passing bind variables by name requires the parameters argument of the execute method to be a dictionary or a set of keyword arguments. query1 and query2 below are equivalent:
When using named bind variables you can check the currently assigned ones using the bindnames() method of the cursor:
>>> named_params = {'dept_id':50, 'sal':1000}
>>> query1 = cursor.execute('SELECT * FROM employees
WHERE department_id=:dept_id AND salary>:sal', named_params)
>>> query2 = cursor.execute('SELECT * FROM employees
WHERE department_id=:dept_id AND salary>:sal', dept_id=50, sal=1000)
Passing by position is similar but you need to be careful about naming. Variable names are arbitrary so it's easy to mess up queries this way. In the example below, all three queries r1, r2, and r3 are equivalent. The parameters variable must be given as a sequence.
>>> print cursor.bindnames()
['DEPT_ID', 'SAL']
>>> r1 = cursor.execute('SELECT * FROM locationsWhen binding, you can first prepare the statement and then execute None with changed parameters. Oracle will handle it as in the above case, governed by the rule that one prepare is enough when variables are bound. Any number of executions can be involved for prepared statements.
WHERE country_id=:1 AND city=:2', ('US', 'Seattle'))
>>> r2 = cursor.execute('SELECT * FROM locations
WHERE country_id=:9 AND city=:4', ('US', 'Seattle'))
>>> r3 = cursor.execute('SELECT * FROM locations
WHERE country_id=:m AND city=:0', ('US', 'Seattle'))
>>> cursor.prepare('SELECT * FROM jobs WHERE min_salary>:min')You have already limited the number of parses. In the next paragraph we'll be eliminating unnecessary executions, especially expensive bulk inserts.
>>> r = cursor.execute(None, {'min':1000})
>>> print len(cursor.fetchall())
19
Let's create a table for a Python module list, this time directly from Python. You will drop it later.
>>> create_table = """Only one execute has been issued to the database to insert all 76 module names. This is huge performance boost for large insert operations. Notice two small quirks here: cursor.execute(create_tab) doesn't produce any output since it is a DDL statement and (76,) is a tuple with a single element. (76) without a comma would simply be equivalent to an integer 76.
CREATE TABLE python_modules (
module_name VARCHAR2(50) NOT NULL,
file_path VARCHAR2(300) NOT NULL
)
"""
>>> from sys import modules
>>> cursor.execute(create_table)
>>> M = []
>>> for m_name, m_info in modules.items():
... try:
... M.append((m_name, m_info.__file__))
... except AttributeError:
... pass
...
>>> len(M)
76
>>> cursor.prepare("INSERT INTO python_modules(module_name, file_path) VALUES (:1, :2)")
>>> cursor.executemany(None, M)
>>> db.commit()
>>> r = cursor.execute("SELECT COUNT(*) FROM python_modules")
>>> print cursor.fetchone()
(76,)
>>> cursor.execute("DROP TABLE python_modules PURGE")
You have learned about three stages that SQL statements go through and how to minimize the number of steps the Oracle Database needs to perform. Bind variables are an inevitable part of database application development and Python enables binding them by name or by position.
You have also been introduced to the smooth transition between Oracle and Python datatypes and the natural way of handling database data in the context of handling cursors as iterators. All these features boost productivity and enable focusing on the data, which is what it's all about.