Chinaunix首页 | 论坛 | 博客

idn

  • 博客访问: 45380
  • 博文数量: 22
  • 博客积分: 1400
  • 博客等级: 上尉
  • 技术积分: 180
  • 用 户 组: 普通用户
  • 注册时间: 2008-05-14 21:30
文章分类

全部博文(22)

文章存档

2010年(9)

2009年(10)

2008年(3)

我的朋友
最近访客

分类:

2009-11-26 00:19:09

 

Copy Paste HTML From MS Word: IE's DHTML Editing Control (in a .NET WinApp)

When copy/pasting from MS Word, the HTML it generates is really messy and can't be used verbatim.  This has been a pain of mine  and many others.  I've found that many 3rd party controls, and some client-side blogging tools (like BlogJet) have a miraculous way of converting messy MS Word HTML into something that works well (displays correctly, yet still bloated).  The question is how?!?

Deducing a Solution
Try opening this little "Hello World" MS Word Doc , select all (Ctrl-A) and copy, and  view the clipboard contents  ( C# source )… notice the "HTML Format" contents?  Now paste into , , , or your own blogging tool.  Switch to HTML view and notice the HTML has been nicely transformed!  Each of these retail tools has transformed it into practically the same result .

This led me to believe these are all using the same base control.  Sure enough, IE 5.0 introduced a that does the work.

Use it Yourself
So, how can you use this in your own app?  It is actually quite easy.  You can drop a new .NET 2.0 WebBrowser control in your app and put it into Design Mode.  Here is an app that demonstrates this, IE DHTML Editing Control Example ( C# source ).  Try the sample app with the same experimentation step above and you’ll see its the same control.

The and are extensive COM objects and only a subset of members are wrapped in the .NET 2.0 control, so getting to the underlying ActiveX control is necessary.

  1. Put a WebBrowser control on a form and call it "web"
  2. Add a project reference to the COM library "Microsoft HTML Object Library"
  3. Use the code below to initialize into design mode.

// Load the MSHTML component

web . Navigate ( "about:blank" );

 

// Release control to the system

Application . DoEvents ();

 

// Turn ON Design Mode

(( mshtml . HTMLDocument ) web . Document . DomDocument ). designMode = "On" ;

Fixing Word's HTML
This technique actually converts most any HTML block in the clipboard (from IE, Word, Excel, Power Point, etc).  It does not save embedded images.  IE apparently takes the style sheets that may be defined in a

 

Copy Paste HTML From MS Word: IE's DHTML Editing Control (in a .NET WinApp)

When copy/pasting from MS Word, the HTML it generates is really messy and can't be used verbatim.  This has been a pain of mine  and many others.  I've found that many 3rd party controls, and some client-side blogging tools (like BlogJet) have a miraculous way of converting messy MS Word HTML into something that works well (displays correctly, yet still bloated).  The question is how?!?

Deducing a Solution
Try opening this little "Hello World" MS Word Doc , select all (Ctrl-A) and copy, and  view the clipboard contents  ( C# source )… notice the "HTML Format" contents?  Now paste into , , , or your own blogging tool.  Switch to HTML view and notice the HTML has been nicely transformed!  Each of these retail tools has transformed it into practically the same result .

This led me to believe these are all using the same base control.  Sure enough, IE 5.0 introduced a that does the work.

Use it Yourself
So, how can you use this in your own app?  It is actually quite easy.  You can drop a new .NET 2.0 WebBrowser control in your app and put it into Design Mode.  Here is an app that demonstrates this, IE DHTML Editing Control Example ( C# source ).  Try the sample app with the same experimentation step above and you’ll see its the same control.

The and are extensive COM objects and only a subset of members are wrapped in the .NET 2.0 control, so getting to the underlying ActiveX control is necessary.

  1. Put a WebBrowser control on a form and call it "web"
  2. Add a project reference to the COM library "Microsoft HTML Object Library"
  3. Use the code below to initialize into design mode.

// Load the MSHTML component

web . Navigate ( "about:blank" );

 

// Release control to the system

Application . DoEvents ();

 

// Turn ON Design Mode

(( mshtml . HTMLDocument ) web . Document . DomDocument ). designMode = "On" ;

Fixing Word's HTML
This technique actually converts most any HTML block in the clipboard (from IE, Word, Excel, Power Point, etc).  It does not save embedded images.  IE apparently takes the style sheets that may be defined in a