Chinaunix首页 | 论坛 | 博客
  • 博客访问: 4447211
  • 博文数量: 1214
  • 博客积分: 13195
  • 博客等级: 上将
  • 技术积分: 9105
  • 用 户 组: 普通用户
  • 注册时间: 2007-01-19 14:41
个人简介

C++,python,热爱算法和机器学习

文章分类

全部博文(1214)

文章存档

2021年(13)

2020年(49)

2019年(14)

2018年(27)

2017年(69)

2016年(100)

2015年(106)

2014年(240)

2013年(5)

2012年(193)

2011年(155)

2010年(93)

2009年(62)

2008年(51)

2007年(37)

分类:

2011-11-20 20:11:40

  已经有人做了字幕下载器,可是是exe版本,我在Linux下面写了一个python的脚本,来做同样的事情。
  本来只想下载machine learning的字幕,后来发现小小的修改下载地址,就可以下载Database的字幕了。
Python语言:
#!/usr/bin/env python
# encoding: utf-8

# Author: Liu Dan
# For download standford 2011 machine learning subtitles.
# Prerequisite: The video had already downloaded in one directory,and
#               this program's first paramter is the one directory.
#               Also need the 'wget' downloader.

import os
import sys

def download_sub(path, files):
    for video in files:
        filename = os.path.splitext(video)[0] + '-subtitles.xml'
        #database for ''
        link = '' \
               + filename
        cmd = "wget -c " + link + " -P " + path
        os.system(cmd)

def time_add(begin, duration):
    begin = begin.replace(',', ':').split(':')
    duration = duration.replace(',', ':').split(':')
    subtime = int(begin[3]) + int(duration[3])
    carry = 0
    if subtime >= 100:
        begin[3] = str(subtime - 100)
        carry = 1
    else:
        begin[3] = str(subtime)

    for i in range(3):
        j = 3 - i - 1
        subtime = int(begin[j]) + int(duration[j]) + carry
        if subtime >= 60:
            begin[j] = str(subtime - 60)
            carry = 1
        else:
            begin[j] = str(subtime)
            carry = 0
    begin[-2] = begin[-2] + ',' + begin[-1]
    del begin[-1]
    return ':'.join(begin)

def convert_xml(path, files):
    for xml in files:
        filename = os.path.splitext(xml)[0] + '-subtitles.xml'
        lines = []
        num = 1
        fdr = open(os.path.join(path, filename))
        for line in fdr:
            if line[:2] == ':
                betime = line.replace('.', ',').split('"')
                endtime = time_add(betime[1], betime[3])
                lines.append( str(num) + '\n')
                subtime = betime[1] + ' --> ' + endtime
                lines.append(subtime + '\n')
            elif line[-4:-1] == '/p>':
                lines.append(line.split('<')[0] + '\n\n')
                num += 1
            else:
                pass
        fdr.close()
        subtitlename = os.path.splitext(xml)[0] + '.srt'
        fdw = open(os.path.join(path, subtitlename), 'w')
        fdw.writelines(''.join(lines))
        fdw.close()


if __name__ == '__main__':
    if len(sys.argv) < 2:
        print 'Please give the \'Path\' argument!'
    for path in sys.argv:
        if os.path.isdir(path):
            files = os.listdir(path)
            i = 0
            while i < len(files):
                if '-subtitles.xml' in files[i]:
                    del files[i]
                else:
                    i += 1
            print files
            download_sub(path, files)
            convert_xml(path, files)

阅读(2085) | 评论(0) | 转发(0) |
给主人留下些什么吧!~~