在Hadoop权威指南的第二张《关于mapreduce》中,提到了气象数据的分析问题,使用到Unix脚本,我就抽空把气象数据给下载下来,然后放到服务器上,用书中的脚本测试一下。实际发现书上有错误的地方,这么多年很少看书,第一次发现书中的错误,已圈红色:
我按照上面的方式下载文件 ftp://ftp.ncdc.noaa.gov/pub/data/noaa,使用sftp协议flashfxp软件上传上去,用了10年的数据,然后写脚本测试:
-
[yangkai@localhost ~]$ cat max_temperatuer.sh
-
#!/bin/bash
-
for year in raw/*
-
#for year in all/*
-
do
-
echo -ne $(basename $year)"\t"
-
#echo -ne ${year}"\n"
-
gunzip -c ${year}/* |\
-
awk '{temp=substr($0,88,5)+0;
-
q=substr($0,93,1);
-
if(temp !=9999 && q~/[01459]/ && temp>max)max=temp}
-
END {print max}'
-
done
-
exit
-
[yangkai@localhost ~]$ sh max_temperatuer.sh
-
1901 317
-
1902 244
-
1903 289
-
1904 256
-
1905 283
-
1906 294
-
1907 283
-
1908 289
-
1909 278
-
1910 294
-
[yangkai@localhost ~]$ ls
-
029070-99999-1901 for.sh max_temperatuer.sh raw yjdmdp.tar.gz
-
[yangkai@localhost ~]$ ll raw/
-
total 40
-
drwxr-xr-x 2 yangkai root 4096 May 19 10:50 1901
-
drwxr-xr-x 2 yangkai root 4096 May 19 09:46 1902
-
drwxr-xr-x 2 yangkai root 4096 May 19 09:46 1903
-
drwxr-xr-x 2 yangkai root 4096 May 19 09:46 1904
-
drwxr-xr-x 2 yangkai root 4096 May 19 09:46 1905
-
drwxr-xr-x 2 yangkai root 4096 May 19 09:46 1906
-
drwxr-xr-x 2 yangkai root 4096 May 19 09:46 1907
-
drwxr-xr-x 2 yangkai root 4096 May 19 09:46 1908
-
drwxr-xr-x 2 yangkai root 4096 May 19 09:46 1909
-
drwxr-xr-x 2 yangkai root 4096 May 19 09:46 1910
-
[yangkai@localhost ~]$ ll raw/1901/
-
total 72
-
-rw-r--r-- 1 yangkai root 11445 Nov 23 2004 029070-99999-1901.gz
-
-rw-r--r-- 1 yangkai root 11210 Nov 23 2004 029500-99999-1901.gz
-
-rw-r--r-- 1 yangkai root 11647 Nov 23 2004 029600-99999-1901.gz
-
-rw-r--r-- 1 yangkai root 10998 Nov 23 2004 029720-99999-1901.gz
-
-rw-r--r-- 1 yangkai root 11999 Nov 23 2004 029810-99999-1901.gz
-
-rw-r--r-- 1 yangkai root 11132 Nov 23 2004 227070-99999-1901.gz
-
[yangkai@localhost ~]$
-
[yangkai@localhost ~]$ gunzip -c ./029720-99999-1901.gz |head
-
0029029720999991901010106004+60450+022267FM-12+001499999V0209991C000019999999N0000001N9-02061+99999102601ADDGF108991999999999999999999
-
0029029720999991901010113004+60450+022267FM-12+001499999V0202001N001019999999N0000001N9-01561+99999102621ADDGF108991999999999999999999
-
0029029720999991901010120004+60450+022267FM-12+001499999V0201801N001019999999N0000001N9-01391+99999102461ADDGF108991999999999999999999
-
0029029720999991901010206004+60450+022267FM-12+001499999V0202301N009319999999N0000001N9-00781+99999102311ADDGF108991999999999999999999
-
0029029720999991901010213004+60450+022267FM-12+001499999V0202301N012319999999N0000001N9-00391+99999102321ADDGF108991999999999999999999
-
0029029720999991901010220004+60450+022267FM-12+001499999V0202501N012319999999N0000001N9-00331+99999102241ADDGF108991999999999999999999
-
0029029720999991901010306004+60450+022267FM-12+001499999V0202701N015419999999N0000001N9-00391+99999102391ADDGF108991999999999999999999
-
0029029720999991901010313004+60450+022267FM-12+001499999V0202301N015419999999N0000001N9-00331+99999102301ADDGF108991999999999999999999
-
0029029720999991901010320004+60450+022267FM-12+001499999V0202701N015419999999N0000001N9-00391+99999102161ADDGF108991999999999999999999
-
0029029720999991901010406004+60450+022267FM-12+001499999V0202301N002619999999N0000001N9-00331+99999102191ADDGF108991999999999999999999
-
[yangkai@localhost ~]$
结束。
阅读(3158) | 评论(0) | 转发(0) |