Chinaunix首页 | 论坛 | 博客
  • 博客访问: 81246
  • 博文数量: 3
  • 博客积分: 1694
  • 博客等级: 上尉
  • 技术积分: 130
  • 用 户 组: 普通用户
  • 注册时间: 2006-01-28 00:13
文章分类

全部博文(3)

文章存档

2010年(2)

2008年(1)

分类: 系统运维

2008-03-26 21:47:26


Google Desktop for Linux With Apache2 On LAN

前言:
  在两年前第一次试作将google desktop与apache结合用于LAN的文件搜索,原文见这里《第一次原创:使用Google桌面搜索打造企业搜索服务器》http://blog.chinaunix.net/u/13472/showart.php?id=73880
  当时for linux的google desktop还没有出来,让我的samba文件服务器没有了集成的搜索服务可用,可谓望眼欲穿。等到for linux出来后,发现居然不支持搜索MS专有格式文档,又失望了很一段时间。终于,终于等到了google desktop for linux v1.1.1.0075,支持DOC、XLS、PPT的索引支持了,所以就捣鼓着一定要将它放置在我的samba服务器上,在提供samba服务的同时也 提供一个简单的搜索服务器。
正文:
  原理和前文一样,依靠apache来代理google desktop。前文中提到需要端口映射器,经过后来的搜索,原来是缺少了设置反向代理所致,即在ProxyPass后面再接一个 ProxyPassReverse代理就可以避免了。所以,现在与apache结合的google desktop已经不需要客户端做任何设置了,有一个浏览器就足够了,而文件浏览器足够充当这个角色了。
  如果这个apache没有其他用途,如前文,给服务器分配第2个ip专门用来处置这个google desktop代理,简单的配置文件如下:


Code:

[Ctrl+A Select All]


  在重启apache前还需要修改apache的运行用户为google desktop的运行用户,这是因为google desktop的索引文件都是针对单个linux用户可读的,其他用户都不可读,所以用其他用户启动的apache是不能读取google desktop的数据的,也就无法代理了。
  修订好这一切,apache重启后,通过(后面省略的是google desktop的起始地址,每个linux桌面用户的都不同)就可以访问On LAN上的google desktop。
  下一步,我试作将这个On LAN的google desktop集成进文件服务器,毕竟去记住那串后缀地址还是很困难的,所以有必要把这个首页文件存放在文件服务器上,通过文件服务器访问到这个文件后就可以点击首页文件打开搜索代理服务器了。
  这里需要注意的是,简单的将首页保存下来的文件中由于相对地址的原因,通过文件服务器启动的首页文件不能进行搜索,所以我做了这样的改动:


Code:

[Ctrl+A Select All]


上面含有的“”字串都是我改动添加上去的,如此启动的首页文件便可以触发搜索代理服务器。

  我的基本要求达到后,还是没有达到我的预期。因为我的服务器本身启用apache的原因是为了提供samba文件服务器的跨网段web访问,所以前面那个 首页文件也可以被原来的apache访问到,但是却不能提供搜索服务(我的ip地址有限,不能够把内网地址全部映射出去的)。所以,接下来,对上面的设置 适当加以改造,让它适合互联网应用。
  显然,不能代理成根目录了,因为根目录要用来当作文件服务器的首页,所以就把它代理到/googlesearch,所以代理部分的内容就变成了:


Code:

[Ctrl+A Select All]


  这样的代理可以打开主页但是根本不能展开搜索,原因是google desktop启动搜索的时候的url地址都是从根/search开始的,所以,需要进行URL重写,如下:


Code:

[Ctrl+A Select All]


  终于,google desktop被集成进apache了。最后一步,修改主页文件,另存为/Fileserver/文件搜索/目录下的index.html,以保证apache访问到该目录时直接打开首页文件。
  首页文件的修改很简单,把上面的全部替换成“/googlesearch”就可以了。

尾注:
  目前残留的问题就是将搜索出来的文件打开的问题,上面的处理都是简单的屏蔽,要实现如DNKA一般的效果需要采用输出重新,我这里简单把mod_sar的说明贴在这儿。

NAME
mod_sar - apache2 module which works as output filter and it's
      purpose is to Search And Replace strings found in web
      content before it's sending to the client.


COMPILE
mod_sar can be compiled with apxs(8) or manually by hand.

1. Using apxs for compilation:
apxs -c mod_sar.c

If everything goes fine, you will find mod_sar.so under .libs in your
current directory.

2. Compiling mod_sar manually:
gcc -pthread -I/usr/include/httpd -c mod_sar.c
gcc -shared mod_sar.o -Wl,-soname -Wl,mod_sar.so -o mod_sar.so

If needed, modify path to your httpd include directory and if everything
goes fine, you will find mod_sar.so in your current directory.


INSTALL
mod_sar can be installed with apxs(8) or manually by hand.

1. Using apxs for instalation:
This command will compile and install your mod_sar module.
apxs -i -a -c mod_sar.c

Restart apache by first stopping it and then starting it:
apachectl stop
apachectl start

2. Installing mod_sar manually:
cp mod_sar.so /usr/lib/httpd/modules
chown root: /usr/lib/httpd/modules/mod_sar.so
chmod 755 /usr/lib/httpd/modules/mod_sar.so

If needed, modify path to your httpd modules directory.
Now, you have to modify your httpd.conf file. Find the bunch of
LoadModule directives and append your own line under them:
LoadModule sar_module modules/mod_sar.so

Restart apache by first stopping it and then starting it:
apachectl stop
apachectl start


DESCRIPTION
mod_sar ("sar" stands for Search And Replace) is apache2 module which
works as output filter. It's purpose is to search and replace strings
found in web content before it's sending to the client.
Search performed can be case sensitive or case insensitive, depending
on configuration.
Perfect example of common usage of this module is reverse proxy.

Reverse proxy is proxy in front of the local server, which can be
accessed from Internet only trough that proxy. In some cases such
configuration can be used effectively to prevent worms and other
unwanted guests but most commonly it just present a false layer of
security for those who do not understand server - client communication.

Whatever reason you have, for usable reverse proxy you will have to
solve two problems: modification of headers and modification of
content before it's sending to client.

1. Header modification
Header modification is not problem at all. It can be achieved two
ways.
You can use mod_proxy_http:
 
   
        Order deny,allow
        Allow from all
   

    ProxyRequests On
    ProxyPass /
    ProxyPassReverse /
    ProxyErrorOverride On
 

Or, you can use mod_rewrite:
 
    RewriteEngine on
    RewriteRule ^/(.*) [P]
    RewriteOptions inherit
 


2. Content modification
Header modification will make all relative links look like they are
coming from external domain some-domain.com instead of real, local
domain some-domain.local. But if server behind reverse proxy the
serves pages with absolute links, we will have to modify content of
that pages on the fly, using apache2 output filter mechanism.

There are three choices: mod_proxy_html, mod_ext_filter and mod_sar.
The first uses a libxml2 and because of that, it is not good for
purpose such as reverse proxy. For example, libxml2 will seriously
corrupt HTML code in case of a minor errors in HTML such as missing
quote. mod_proxy_html inherits that nasty habit from libxml2 but
if you want to try it your own, you can find that module at

The second one is not a third party module, it comes with apache2
and it can suite needs for reverse proxy but it is not good for heavy
loaded sites because external command is executed for every request.
Here is example of mod_ext_filter usage:
 
    ExtFilterDefine fixtext mode=output intype=text/html \
        cmd="/bin/sed s/some-domain\.local/some-domain\.com/g"
   
        SetOutputFilter fixtext
   

 

And the third one is the one you are just looking at: mod_sar.
See the DIRECTIVES and EXAMPLES sections for usage information.
mod_sar will do one simple thing. It will replace one string
with another, depending on configuration. It can perform case
insensitive search if needed. It has been tested under heavy load
without performance impact.


DIRECTIVES
SarStrings
    This directive requires two parameters, search string and
    replace string enclosed with double quotes.
    It can be used in server config and virtual host context.

SarCaseInsensitive
    If set to On, case insensitive search will be performed instead
    of exact string match.
    Default is Off.
    It can be used in server config and virtual host context.

SarVerbose
    If set to On, every time mod_sar is used as filter, message is
    printed into apache error logs.
    Default is Off.
    It can be used in server config and virtual host context.


EXAMPLES
   
      AddOutputFilterByType sar_filter text/html
      SarStrings "" "http//some-domain.com"
      SarCaseInsensitive Off
      SarVerbose Off
   



REQUIREMENTS
Apache-2.0.


COMPATIBILITY
It has been tested on Linux but there is no obvious reason why it
would'n work on other unix platforms supported by apache2.
        OS: Linux
    compiler: gcc-2.9x, gcc-3.x
      apache: apache-2.0.x


BUGS
Current version of mod_sar does not contain known bugs.


SEE ALSO
apxs(8),


AUTHOR
Josip Deanovic <>

  由于新版本的google desktop的输出url规则比较复杂,重写很困难,加上linux文件系统中太多的权限,许多目录都不会允许apache访问的,所以就懒得再折腾 了,毕竟输出的信息中已经有文件位置的详细地址,通过文件服务器找寻下去也是很方便的。
  最后,希望看到这篇文章的达人能够帮助写出mod-sar的输出规则,帮我完善这个Google Desktop For Linux with Apache2 On LAN,谢谢。


阅读(3486) | 评论(0) | 转发(0) |
给主人留下些什么吧!~~