狂甩酷拽吊炸天
分类: Python/Ruby
2017-06-21 12:59:15
1.利用pip安装pyhton-docx
安装过程会报错,从报错信息来看应该是安装lxml的时候出错,报错信息:make sure the development packages of libxml2 and libxslt are installed
解决办法:http://blog.csdn.net/azhao_dn/article/details/7501432
原因是少安装了libxml2、libxslt、libxslt-devel三个包,yum安装即可
2.基本操作说明
官网:
github:
参考:http://blog.csdn.net/qianchenglenger/article/details/51582005
#coding=utf-8 from docx import Document from docx.shared import Pt from docx.shared import Inches from docx.oxml.ns import qn #打开文档 document = Document() #加入不同等级的标题 document.add_heading(u'MS WORD写入测试',0)
document.add_heading(u'一级标题',1)
document.add_heading(u'二级标题',2) #添加文本 paragraph = document.add_paragraph(u'我们在做文本测试!') #设置字号 run = paragraph.add_run(u'设置字号、')
run.font.size = Pt(24) #设置字体 run = paragraph.add_run('Set Font,')
run.font.name = 'Consolas' #设置中文字体 run = paragraph.add_run(u'设置中文字体、')
run.font.name=u'宋体' r = run._element
r.rPr.rFonts.set(qn('w:eastAsia'), u'宋体') #设置斜体 run = paragraph.add_run(u'斜体、')
run.italic = True #设置粗体 run = paragraph.add_run(u'粗体').bold = True #增加引用 document.add_paragraph('Intense quote', style='Intense Quote') #增加无序列表 document.add_paragraph( u'无序列表元素1', style='List Bullet' )
document.add_paragraph( u'无序列表元素2', style='List Bullet' ) #增加有序列表 document.add_paragraph( u'有序列表元素1', style='List Number' )
document.add_paragraph( u'有序列表元素2', style='List Number' ) #增加图像(此处用到图像image.bmp,请自行添加脚本所在目录中) document.add_picture('image.bmp', width=Inches(1.25)) #增加表格 table = document.add_table(rows=1, cols=3)
hdr_cells = table.rows[0].cells
hdr_cells[0].text = 'Name' hdr_cells[1].text = 'Id' hdr_cells[2].text = 'Desc' #再增加3行表格元素 for i in xrange(3):
row_cells = table.add_row().cells
row_cells[0].text = 'test'+str(i)
row_cells[1].text = str(i)
row_cells[2].text = 'desc'+str(i) #增加分页 document.add_page_break() #保存文件 document.save(u'测试.docx')
读取word文档并存入数据库:
# -*- coding:utf-8 -*- import MySQLdb
conn=MySQLdb.connect( host='1xx.2xx.6x.2xx', port=3306, user='django', passwd='django', db='django', charset='utf8' )
curs = conn.cursor() # curs.execute("show databases;") # print curs.fetchall() import docx
document = docx.Document(u'D:/ppp2.docx')
count = 0 for paragraph in document.paragraphs: if count == 8: break if count % 2 != 0:
answer = paragraph.text.encode('utf-8') if answer: print "answer:", answer
sqls = "UPDATE timu SET answer = %s WHERE id = %s;" curs.execute(sqls, (answer, count/2))
curs.execute('COMMIT ;') else:
title = paragraph.text.encode('utf-8') if title: print "title:", title
sqls = "INSERT INTO timu(id, title) VALUES(%s,%s);" curs.execute(sqls ,(count/2, title))
curs.execute('COMMIT ;')
count += 1 curs.execute('COMMIT ;')
curs.close()
conn.close()
3.实际解析自己的doc文档时候,由于格式 会报错:
docx.opc.exceptions.PackageNotFoundError: Package not found
看样子应该我的word文档格式有问题,导致识别不了。
我在windows下用word打开,另存为word docx格式的文档,问题解决,看来只能解析docx格式的文档,word 2003的 doc格式并不支持。