關于文本生成的惡趣味代碼-March.Liu-ChinaUnix博客

追寻智慧之美

首页　| 　博文目录　| 　关于我

March.Liu

博客访问： 467096
博文数量： 55
博客积分： 2603
博客等级：少校
技术积分： 750
用户组：普通用户
注册时间： 2006-12-31 02:30

文章分类

全部博文（55）

web2py（4）
Postgres（11）
工作??（4）
Python（15）
haskell学习资料（9）
未分配的博文（12）

文章存档

2011年（1）

2010年（22）

2009年（17）

2008年（15）

我的朋友

最近访客

推荐博文

關于文本生成的惡趣味代碼

分类：

2008-03-14 22:34:11

緣起于 wayhome 朋友發到Python中文社區的一封郵件：

今天看书看了一个如下的ruby程序，于是想着用python来实现。就是从一个字符串产生一系列片段字符串

---------我的插入線--------------

CU的代碼高亮裏居然不包括Ruby！

---------我的飄走線-------------

#!/usr/bin/ruby

sentence = "refractory anemia with excess blasts in"
sentence_array = sentence.split
length = sentence_array.size
length.times do
(1..sentence_array.size).each do
|place_length|
puts sentence_array.slice(0,place

_length).join(" ")
end
sentence_array.shift
end
exit

如上，处理字符串为"refractory anemia with excess blasts in",
最后打印结果为:
refractory
refractory anemia
refractory anemia with
refractory anemia with excess
refractory anemia with excess blasts
refractory anemia with excess blasts in
anemia
anemia with
anemia with excess
anemia with excess blasts
anemia with excess blasts in
with
with excess
with excess blasts
with excess blasts in
excess
excess blasts
excess blasts in
blasts
blasts in
in

自己试着用python改写了下，发现自己写的长多了，还不得不写了一个方法。也许是自己Python学的不到家，有人也能用Python实现的像ruby那么简单么？

------------------面對踢館的應戰分割線----------------------

馬上 Samuel Chi 同學響應號召，寫出了如下代碼：

sentence = "refractory anemia with excess blasts in" array = sentence.split() length = len(array) for i in range(length): for j in range(length - i): print ' '.join(array[i:i+j+1])

真是帥呆了……Wayhome同學也給出了自己改進後的實現代碼：

sentence = "refractory anemia with excess blasts in" sentence_array = sentence.split() while len(sentence_array): for i in range(len(sentence_array)): print ' '.join(sentence_array[0:i+1]) del sentence_array[0]

Lee 同學很快寫出了一個比較雷的代碼，如果你去看郵件，會發現他是在Shell裏寫出來的，還能看得到提示符：

foo = lambda m, x:[m and x[i:] or x[:i+1] for i in range(len(x))] s = "refractory anemia with excess blasts in" print "\n".join(map(lambda k:"\n".join(map(lambda i:" ".join(i), k)), map(lambda x:foo(0, x), foo(1, s.split())[::-1])[::-1]))

大概怕把人民大眾嚇著，Lee隨後寫了一個稍稍不那麽雷的偽從良版本：

s = "refractory anemia with excess blasts in" foo = lambda m, x:[m and x[i:] or x[:i+1] for i in range(len(x))] print '\n'.join(["\n".join([" ".join(i) for i in k]) for k in [foo(0, x) for x in foo(1, s.split())[::-1]]][::-1])

隨後有 George Lu 同學發了一個天雷陣陣的版本：

print '\n'.join(['\n'.join([' '.join(s.split()[i:][:j+1]) for j,x in enumerate(s.split()[i:])]) for i,t in enumerate(s.split())])

後面還有三五個版本，都脫不出用索引+切割的方式來組合字符串的思路，我就想，如果可以用正則表達式，也許可以更簡單一些。不過這兩天忙著給公司配發的工作機升級硬盤，一團亂糟糟，什麽都沒搞成。其實，這只是數學上簡單的組合，我們不用正則表達式，甚至不依賴索引，也可以寫得很清楚，當然，沒有這麽短（男人幹嘛要追求短？XD）：

# 文本在這兒 sentence = "refractory anemia with excess blasts in" # 首先，我們切割出單詞數組 words = sentence.split() # 其實，我們需要的是，每一個元素，和它後面的部分，因此，我們應該把它們 pop 出來，記得要從前面pop：）： while words: # 從一個空列表開始 left = [] # 接收words中pop出來的單詞為起始 left.append(words.pop(0)) # 打印我們的起點 print left[0] # 用餘下的部分復制出一個復本 right = words[:] # 將餘下的部分逐個推到左邊 while right: left.append(right.pop(0)) print ' '.join(left)

當然，這個代碼只是為了盡可能清楚的表現出操作的過程，它的特色就是完全沒有利用到數組長度這個指標，完全就是通過迭代實現的，如果你喜歡比較雷人的效果，不妨把它改為更簡短的方式。
今天我在讀《精通正則表達式》的時候，學到這麽幾個知識點，一個呢，是Perl的正則表達式，可以通過(?:{...})嵌入代碼；另一個是(?!)可以告知（欺騙？）正則引擎，當前的匹配失敗，讓它去嘗試下一個；最後就是/x可以將搜索“固化”，避免重復搜索。于是，這個程序在Perl中，只需要一行混合了打印代碼的正則表達式……Perl果然不虧是文本處理之王！

my $sentence = "refractory anemia with excess blasts in"; $sentence =~ m/(?:\w+(?:\s|$))+?(?{print "$&\n";})(?!)/x;