wget和html2txt使用-kangle000

kangle000kanglekal.blog.chinaunix.net

首页　| 　博文目录　| 　关于我

kangle000_cu

博客访问： 125378
博文数量： 54
博客积分： 2986
博客等级：少校
技术积分： 600
用户组：普通用户
注册时间： 2009-12-18 10:26

文章分类

全部博文（54）

系统相关（0）
dns（3）
网络编程（6）
tcpip协议（4）
其他（13）
c 语言（14）
脚本（12）
未分配的博文（2）

文章存档

2012年（2）

2011年（16）

2010年（36）

我的朋友

20052270

最近访客

推荐博文

wget和html2txt使用

分类：

2010-04-01 09:32:39

这段时间，学习Shell脚本，将wget下载的网页用html2txt转化成普通的文本，发现对于各大网站的首页并不适合用html2txt转换。里面的一些动态的新闻什么的，用wget并不能下载下来。此外，使用wget时，当url含有&等特殊字符时，要用'\'进行转换

下面说下html2text的使用说明

代码:

This is html2text, version 1.3.2a

Usage:

  html2text -help

  html2text -version

  html2text [ -unparse | -check ] [ -debug-scanner ] [ -debug-parser ] \

     [ -rcfile  ] [ -style ( compact | pretty ) ] [ -width  ] \

     [ -o  ] [ -nobs ] [ -ascii ] [  ] ...

Formats HTML document(s) read from  or STDIN and generates ASCII

text.

  -help          Print this text and exit
   显示本页文本并退出
  -version       Print program version and copyright notice

  -unparse       Generate HTML instead of ASCII output

  -check         Do syntax checking only
    做语法检查
  -debug-scanner Report parsed tokens on STDERR (debugging)

  -debug-parser  Report parser activity on STDERR (debugging)

  -rcfile  Read  instead of "$HOME/.html2textrc"

  -style compact Create a "compact" output format (default)

  -style pretty  Insert some vertical space for nicer output

  -width      Optimize for screen widths other than 79

  -o       Redirect output into 
将输入重新输出至 ＜file＞
  -nobs          Do not use backspaces for boldface and underlining

          这个选项要用着。不然的话转换后的文件 里会有很多没用的符号
         
  -ascii         Use plain ASCII for output instead of ISO-8859-1

阅读(841) | 评论(0) | 转发(0) |

上一篇：一个查看进程打开的文件的工具

下一篇：文件i属性

给主人留下些什么吧！~~

感谢所有关心和支持过ChinaUnix的朋友们

16024965号-6