Chinaunix首页 | 论坛 | 博客
  • 博客访问: 801778
  • 博文数量: 330
  • 博客积分: 9641
  • 博客等级: 中将
  • 技术积分: 3181
  • 用 户 组: 普通用户
  • 注册时间: 2007-01-19 14:41
文章分类

全部博文(330)

文章存档

2012年(17)

2011年(135)

2010年(85)

2009年(57)

2008年(36)

我的朋友

分类:

2011-11-20 20:11:40

  已经有人做了字幕下载器,可是是exe版本,我在Linux下面写了一个python的脚本,来做同样的事情。
  本来只想下载machine learning的字幕,后来发现小小的修改下载地址,就可以下载Database的字幕了。
Python语言:
#!/usr/bin/env python
# encoding: utf-8

# Author: Liu Dan
# For download standford 2011 machine learning subtitles.
# Prerequisite: The video had already downloaded in one directory,and
#               this program's first paramter is the one directory.
#               Also need the 'wget' downloader.

import os
import sys

def download_sub(path, files):
    for video in files:
        filename = os.path.splitext(video)[0] + '-subtitles.xml'
        #database for ''
        link = '' \
               + filename
        cmd = "wget -c " + link + " -P " + path
        os.system(cmd)

def time_add(begin, duration):
    begin = begin.replace(',', ':').split(':')
    duration = duration.replace(',', ':').split(':')
    subtime = int(begin[3]) + int(duration[3])
    carry = 0
    if subtime >= 100:
        begin[3] = str(subtime - 100)
        carry = 1
    else:
        begin[3] = str(subtime)

    for i in range(3):
        j = 3 - i - 1
        subtime = int(begin[j]) + int(duration[j]) + carry
        if subtime >= 60:
            begin[j] = str(subtime - 60)
            carry = 1
        else:
            begin[j] = str(subtime)
            carry = 0
    begin[-2] = begin[-2] + ',' + begin[-1]
    del begin[-1]
    return ':'.join(begin)

def convert_xml(path, files):
    for xml in files:
        filename = os.path.splitext(xml)[0] + '-subtitles.xml'
        lines = []
        num = 1
        fdr = open(os.path.join(path, filename))
        for line in fdr:
            if line[:2] == ':
                betime = line.replace('.', ',').split('"')
                endtime = time_add(betime[1], betime[3])
                lines.append( str(num) + '\n')
                subtime = betime[1] + ' --> ' + endtime
                lines.append(subtime + '\n')
            elif line[-4:-1] == '/p>':
                lines.append(line.split('<')[0] + '\n\n')
                num += 1
            else:
                pass
        fdr.close()
        subtitlename = os.path.splitext(xml)[0] + '.srt'
        fdw = open(os.path.join(path, subtitlename), 'w')
        fdw.writelines(''.join(lines))
        fdw.close()


if __name__ == '__main__':
    if len(sys.argv) < 2:
        print 'Please give the \'Path\' argument!'
    for path in sys.argv:
        if os.path.isdir(path):
            files = os.listdir(path)
            i = 0
            while i < len(files):
                if '-subtitles.xml' in files[i]:
                    del files[i]
                else:
                    i += 1
            print files
            download_sub(path, files)
            convert_xml(path, files)

阅读(507) | 评论(0) | 转发(0) |
给主人留下些什么吧!~~