百度mp3下载地址抓取^_^-OReadonly-ChinaUnix博客

大笨钟的ChinaUnix地盘

首页　| 　博文目录　| 　关于我

OReadonly

博客访问： 193637
博文数量： 60
博客积分： 0
博客等级：民兵
技术积分： 385
用户组：普通用户
注册时间： 2013-02-19 21:43

个人简介

readonly

文章分类

全部博文（60）

资源（1）
网络（1）
体验用户（5）
DBA（2）
HTML5（1）
那个啥的工具（3）
MAC开发（3）
java（1）

丰衣足食（1）
iOS的那些事（43）

多线程及并发（2）

iOS版本特性（1）

网络通讯（1）

OCUnit（1）

web app（2）

发布（3）

码码码农农农（11）

性能优化（3）

设计模式（3）

基础知识（14）

消息机制（1）
未分配的博文（0）

文章存档

2013年（60）

我的朋友

相关博文

百度mp3下载地址抓取^_^

分类： Java

2013-03-01 10:54:46

那天朋友发我一个txt文件，里面是他喜欢的歌曲名称，让我帮忙写个程序把上面歌曲的下载地址找来。我想到了HtmlUnit, 于是ＹＹ几分钟后写了个hello world. 如下

点击(此处)折叠或打开

public static void main(String[] args)
throws FailingHttpStatusCodeException, MalformedURLException,
IOException {
File writeFile = new File("/Users/jferson/Desktop/rs.txt");
if (writeFile.isFile())
writeFile.mkdirs();
List<String> songInfos = FileUtils.readLines(new File(
"/Users/jferson/Desktop/500.txt"));
String mp3UrlPrefix = "";
String mp3UrlSuffix = "&lm=-1&f=ms&tn=baidump3&ct=134217728&lf=&rn=";
if (songInfos != null && songInfos.size() > 0) {
String song = null;
String singer = null;
StringBuffer sb = new StringBuffer();
for (String songInfo : songInfos) {
String[] splits = songInfo.split("_");
if (splits != null && splits.length == 2) {
song = splits[0];
singer = splits[1];
String ma3UrlPath = mp3UrlPrefix
+ song.replaceAll(" ", "+") + mp3UrlSuffix;
WebClient webClient = new WebClient();
webClient.setThrowExceptionOnScriptError(false);
webClient.setThrowExceptionOnFailingStatusCode(false);
HtmlPage gamePage = webClient.getPage(ma3UrlPath);
if (gamePage == null
|| gamePage.getElementById("songResults") == null)
continue;
DomNodeList<HtmlElement> trElements = gamePage
.getElementById("songResults")
.getElementsByTagName("tr");
int i = 1;
sb.append("歌曲：").append(song).append("rn");
for (HtmlElement htmlElement : trElements) {
DomNodeList<HtmlElement> tdElements = htmlElement
.getElementsByTagName("td");
if (tdElements == null || tdElements.size() == 0)
continue;
boolean bingo = false;
if (i > 2) {
break;
}
HtmlElement thridElement = tdElements.get(2);
if (thridElement.getAttribute("class")
.equalsIgnoreCase("third")) {
if (thridElement
.asText()
.toLowerCase()
.trim()
.equalsIgnoreCase(
singer.toLowerCase().trim())) {
bingo = true;
} else {
continue;
}
}
HtmlElement downElement = tdElements.get(6);
if (downElement.getAttribute("class").equalsIgnoreCase(
"down")
&& bingo) {
HtmlPage p = downElement.getElementsByTagName("a")
.get(0).click();
String pageHtml = p.getWebResponse()
.getContentAsString();
if (pageHtml.lastIndexOf("/j?j=2&url=") != -1) {
String downloadPath = null;
pageHtml = StringUtils
.substring(
pageHtml,
pageHtml.lastIndexOf("/j?j=2&url=") + 11,
pageHtml.indexOf("return startDownLoad()") - 11);
downloadPath = pageHtml.replaceAll("%2F", "/")
.replaceAll("%3A", ":")
.replaceAll("%3F", "?")
.replaceAll("%3D", "=");
sb.append("下载地址" + i + ": ")
.append(downloadPath).append("rn");
i++;
}
}
}
if (i == 1) {
sb.append("找不到匹配的下载地址！rn");
}
}
}
FileUtils.writeStringToFile(writeFile, sb.toString());
}
}

阅读(1242) | 评论(0) | 转发(0) |

上一篇：IP数据包长度问题总结

下一篇：Objective-C中nil与release的区别与用法

给主人留下些什么吧！~~

感谢所有关心和支持过ChinaUnix的朋友们

16024965号-6