Chinaunix首页 | 论坛 | 博客
  • 博客访问: 44899
  • 博文数量: 2
  • 博客积分: 837
  • 博客等级: 军士长
  • 技术积分: 190
  • 用 户 组: 普通用户
  • 注册时间: 2005-11-29 13:38
文章分类
文章存档

2013年(1)

2011年(1)

我的朋友

分类: LINUX

2013-03-25 10:42:10

原文地址:tar打包原理分析 作者:T-bagwell

 

首先是进入main
获得program_name //这个是执行程序的文件名
然后设置环境变量,初始化退出的状态,代码如下:

1566 program_name = argv[0];
  1567 setlocale (LC_ALL, "");
  1568 textdomain (PACKAGE);
  1570 exit_status = TAREXIT_SUCCESS;

下面这个函数是tar里封装的一个申请内存的函数:

74 /* Allocate N bytes of memory dynamically, with error checking. */
   75
   76 VOID *
   77 xmalloc (n)
   78 size_t n;
   79 {
   80 VOID *p;
   81
   82 p = malloc (n);
   83 if (p == 0)
   84 p = fixup_null_alloc (n);
   85 return p;
=> 86 }

decode_options (argc, argv);这个函数是在main函数里面的,用来获得我们执行tar的时候带的参数的里面我们会看到一个getopt
 

 

1072 /*----------------------------.
  1073 | Parse the options for tar. |
  1074 `----------------------------*/

  1075
  1076 #define OPTION_STRING \
  1077 "-01234567ABC:F:GK:L:MN:OPRST:V:WX:Zb:cdf:g:hiklmoprstuvwxz"
  1078
  1079 #define SET_COMMAND_MODE(Mode) \
  1080 (command_mode = command_mode == COMMAND_NONE ? (Mode) : COMMAND_TOO_MANY)
  1081
  1082 static void
  1083 decode_options (int argc, char *const *argv)
  1084 {
  1085 int optchar; /* option letter */
  1086
  1087 /* Set some default option values. */
  1088
=>1089 blocking = DEFAULT_BLOCKING;
  1090 flag_rsh_command = NULL;
  1091

接着通过下面这个循环获得指令中的参数

 

 

1141 /* Parse all options and non-options as they appear. */
  1142
  1143 while (optchar = getopt_long (argc, argv, OPTION_STRING, long_options, NULL),
  1144 optchar != EOF)
=>1145 switch (optchar)

看一下optchar

 

1848 (gdb) print optchar
1849 $68 = 99

输出的是99,可以看一下对应的ascii,在终端里输入man ascii的时候,找到
143   99    63    c

这么一行,第一列是八进制,第二列是10进制,第三列式16进制,第四列就是对应的字符了。
这个是在一个while循环里面做的,我使用的是一个打包的参数,不带压缩参数:

1837 (gdb) p argv[0]
1838 $63 = 0xbffff758 "/home/liuqi/dvntar/dvntar"
1839 (gdb) p argv[1]
1840 $64 = 0xbffff772 "-cf"
1841 (gdb) p argv[2]
1842 $65 = 0xbffff776 "example.tar"
1843 (gdb) p argv[3]
1844 $66 = 0xbffff782 "example"
1845 (gdb) p argv[4]
1846 $67 = 0x0
1847 (gdb)

以上是命令行输入的参数,可以看到我输入的是:

/home/liuqi/dvntar/dvntar -cf example.tar example

然后继续向下走,就会到

1315 case 'c':
=>1316 SET_COMMAND_MODE (COMMAND_CREATE);
  1317 break;

这里需要注意,后面会提到,设置命令模式为COMMAND_CREATE模式,这个在main里面会看到,打包的工作主要是也在这里做

在循环获得参数后,会得到一个f参数
这个时候会进入:

 

1331 case 'f':
=>1332 if (archive_names == allocated_archive_names)
  1333 {
  1334 allocated_archive_names *= 2;
  1335 archive_name_array =(const char **)xrealloc(archive_name_array,sizeof (const char *) * allocated_archive_names);
  1336 }
  1337 archive_name_array[archive_names++] = optarg;
  1338 break;
  1339

1857 (gdb) print archive_names
1858 $70 = 0
1859 (gdb) print allocated_archive_names
1860 $71 = 10
1861 (gdb)

因为两个值不同,所以不会进if条件的立面,会直接进入archive_name_array[archive_names++] = optarg;

再回到循环获得optarg的时候,会看到

1865 (gdb) print optchar
1866 $72 = 1
1867 (gdb)

然后就进入了

1157 case 1:
  1158 /* File name or non-parsed option, because of RETURN_IN_ORDER
  1159 ordering triggerred by the leading dash in OPTION_STRING. */

  1160
=>1161 name_add (optarg);
  1162 break;

添加文件名在name_array字符串结尾

109 /*--------------------------------------------------------------.
   110 | Add NAME at end of name_array, reallocating it as necessary. |
   111 `--------------------------------------------------------------*/

   112
   113 static void
   114 name_add (const char *name)
   115 {
=> 116 if (names == allocated_names)
   117 {
   118 allocated_names *= 2;
   119 name_array = (const char **)xrealloc (name_array, sizeof (const char *) * allocated_names);
   120 }
   121 name_array[names++] = name;
   122 }

从gdb里面可以看到,name是

1868 (gdb) step
1869 name_add (name=0xbffff782 "example") at /home/liuqi/dvntar/src/tar.c:116
1870 (gdb)

由于参数里面没有带z或者j的压缩,所以,这里
  1551   if (flag_compress_block && !flag_compressprog)
没有设置flag_compressprog和flag_compress_block

如果带了对应的参数的话,将会多起一个进程,使用管道来进行gzip压缩,这个在代码里面会看到,用gdb可以跟到起新进程那里,fork函数启动的子进程。
执行完了以后会回到main函数,这个时候会进行下一步
=>1590   if (!names_argv)
  1591     name_init (argc, argv);
gdb打印一下names_argv,得出结果如下
1885 (gdb) print names_argv
1886   $73 = (char * const *) 0x0
1887 (gdb)        

看结果会进入  name_init (argc, argv);
进去看看:

由于参数里面没有带z或者j的压缩,所以,这里
  1551 if (flag_compress_block && !flag_compressprog)
没有设置flag_compressprog和flag_compress_block

如果带了对应的参数的话,将会多起一个进程,使用管道来进行gzip压缩,这个在代码里面会看到,用gdb可以跟到起新进程那里,fork函数启动的子进程。
执行完了以后会回到main函数,这个时候会进行下一步
=>1590 if (!names_argv)
  1591 name_init (argc, argv);
gdb打印一下names_argv,得出结果如下
1885 (gdb) print names_argv
1886 $73 = (char * const *) 0x0
1887 (gdb)

看结果会进入 name_init (argc, argv);
进去看看:

从代码看,并用gdb输出了一下

1890 (gdb) print flag_namefile
1891 $74 = 0
1892 (gdb)

结果是这个函数会将参数设置给公共变量names_argc和names_argv。

然后开始之前的设置的

=>1598 switch (command_mode)
  1599 {

前面设置过命令模式就是当我们解析参数为-c的时候设置的,在这里会用到:

1619 case COMMAND_CREATE:
=>1620 create_archive ();
  1621 if (flag_totals)
  1622 fprintf (stderr, _("Total bytes written: %d\n"), tot_written);
  1623 break;

到这里了,会进入create_archive ();
打包的操作主要是在这里进行,进去看看就知道了

610 void
   611 create_archive (void)
   612 {
   613 register char *p;
   614
=> 615 open_archive (0); /* open for writing */

遇到一个open_archive函数,在这之前,我们会看到,还没有创建我们要写入的文件包,创建包是在这里进行的,执行完这个函数后,就会看到打包的文件了。(如果不信的话可以试试,在这里设断点就可以了)
接着进入该函数后会看到如下的代码:

532 current_file_name = NULL;
   533 current_link_name = NULL;
   534 save_name = NULL;
   535
   536 if (flag_multivol)
   537 {
   538 ar_block
   539 = (union record *) valloc ((unsigned) (blocksize + (2 * RECORDSIZE)));
   540 if (ar_block)
   541 ar_block += 2;
   542 }
   543 else
=> 544 ar_block = (union record *) valloc ((unsigned) blocksize);

因为前面没有对flag_multivol进行赋值,所以,这里会进入else里面执行申请内存

=> 556 if (flag_compressprog)
   557 {
   558 if (reading == 2 || flag_verify)
   559 ERROR ((TAREXIT_FAILURE, 0,
   560 _("Cannot update or verify compressed archives")));
   561 if (flag_multivol)
   562 ERROR ((TAREXIT_FAILURE, 0,
   563 _("Cannot use multi-volume compressed archives")));
   564 child_open ();
   565 if (!reading && strcmp (archive_name_array[0], "-") == 0)
   566 stdlis = stderr;
   567 #if 0
   568 child_open (rem_host, rem_file);
   569 #endif
   570 }

由于没有添加压缩参数,所以,这里不会进入if条件内而是进入的

596 else
=> 597 archive = rmtcreat (archive_name_array[0], 0666, flag_rsh_command);

这里开始创建一个文件执行完这一句以后,就创建了一个example.tar文件,创建完以后会继续回到create_archive里面

610 void
   611 create_archive (void)
   612 {
   613 register char *p;
   614
   615 open_archive (0); /* open for writing */
   616
=> 617 if (flag_gnudump)
   618 {

根据打印看

2039 (gdb) print flag_gnudump
2040 $80 = 0
2041 (gdb)

不会进入这个if条件内,而是进入了else

653 else
   654 {
=> 655 while (p = name_next (1), p)
   656 dump_file (p, -1, 1);
   657 }

接着是一个while循环,条件是如果有p的话,查找p里的name,进入看一下

197 /*-------------------------------------------------------------------------.
   198 | Get the next name from argv or the name file. Result is in static |
   199 | storage and can't be relied upon across two calls. |
   200 | |
   201 | If CHANGE_DIRS is non-zero, treat a filename of the form "-C" as meaning |
   202 | that the next filename is the name of a directory to change to. If |
   203 | `filename_terminator' is '\0', CHANGE_DIRS is effectively always 0. |
   204 `-------------------------------------------------------------------------*/

   205
   206 char *
   207 name_next (int change_dirs)
   208 {
   209 const char *source;
   210 char *cursor;
   211 int chdir_flag = 0;
   212
   213 if (filename_terminator == '\0')
   214 change_dirs = 0;
   215
=> 216 if (name_file)

关于这个函数,在注释里面已经说得很明白了,首先我们看一下这个目录里面有什么文件

[root@1jjk dvntar]# ls example
aaa bbb
[root@1jjk dvntar]#

文件里面有两个文件,分别是aaa,bbb,继续往下跟,可以输出一下name_file

2046 (gdb) print name_file
2047 $81 = (FILE *) 0x0
2048 (gdb)

结果告诉我们她不会进入这个条件里面,而是进入了

250 else
   251 {
   252
   253 /* Read from argv, after options. */
   254
   255 while (1)
   256 {
   257 if (name_index < names)
   258 source = name_array[name_index++];
   259 else if (optind < names_argc)
   260 source = names_argv[optind++];
   261 else
   262 break;
   263
=> 264 if (strlen (source) > name_buffer_length)
   265 {
   266 free (name_buffer);
   267 name_buffer_length = strlen (source);
   268 name_buffer = xmalloc (name_buffer_length + 2);
   269 }
其实经过gdb输出,进入
259 else if (optind < names_argc)
   260 source = names_argv[optind++];

  

这里是活的目录名

2059 (gdb) print source
2060 $86 = 0xbffff782 "example"
2061 (gdb)

以后,我们可以继续了

270 strcpy (name_buffer, source);
   271
   272 /* Zap trailing slashes. */
   273
=> 274 cursor = name_buffer + strlen (name_buffer) - 1;

其实通过打印,我们可以知道name_buffer其实是一个用来记录一个字符串的地址

2065 (gdb) print name_buffer
2066 $88 = 0x806e888 "example"
2067 (gdb)

而cursor相当于一个指针,也可以理解为光标

2068 (gdb) print cursor
2069 $89 = 0x806e88e "e"
2070 (gdb)

接着往下走

278 if (chdir_flag)
   279 {
   280 if (chdir (name_buffer) < 0)
   281 ERROR ((TAREXIT_FAILURE, errno,
   282 _("Cannot chdir to %s"), name_buffer));
   283 chdir_flag = 0;
   284 }
   285 else if (change_dirs && strcmp (name_buffer, "-C") == 0)
   286 chdir_flag = 1;
   287 else
=> 291 return un_quote_string (name_buffer);

因为我们这个不是chdir_flag,通过gdb输出可以看出来,所以进入到了un_quote_string (name_buffer);

2077 (gdb) step
2078 un_quote_string (string=0x806e888 "example") at /home/liuqi/dvntar/src/port.c:730
2079 (gdb)

进去以后,会看到如下代码

714 /*-----------------------------------------------------------------------.
  715 | Un_quote_string takes a quoted c-string (like those produced by |
  716 | quote_string or quote_copy_string and turns it back into the un-quoted |
  717 | original. This is done in place. |
  718 `-----------------------------------------------------------------------*/

  719
  720 /* There is no un-quote-copy-string. Write it yourself */
  721
  722 char *
  723 un_quote_string (char *string)
  724 {
  725 char *ret;
  726 char *from_here;
  727 char *to_there;
  728 int tmp;
  729
  730 ret = string;
  731 to_there = string;
  732 from_here = string;
=>733 while (*from_here)
  734 {
  735 if (*from_here != '\\')
  736 {
  737 if (from_here != to_there)
  738 *to_there++ = *from_here++;
  739 else
  740 from_here++, to_there++;
  741 continue;
  742 }

从代码分析,会进入这个循环里面执行,执行完以后会退出这个函数,

805 if (*to_there)
  806 *to_there++ = '\0';
=>807 return ret;
  808 }

然后会回到create_archive中,执行 dump_file (p, -1, 1);
进去看一下:

666 /*-------------------------------------------------------------------------.
   667 | Dump a single file. If it's a directory, recurse. Result is 1 for |
   668 | success, 0 for failure. Sets global "hstat" to stat() output for this |
   669 | file. P is file name to dump. CURDEV is device our parent dir was on. |
   670 | TOPLEVEL tells wether we are a toplevel call. |
   671 `-------------------------------------------------------------------------*/

   672
   673 void
   674 dump_file (char *p, int curdev, int toplevel)
   675 {
   676 union record *header;
   677 char type;
   678 union record *exhdr;
   679 char save_linkflag;
=> 680 int critical_error = 0;
   681 struct utimbuf restore_times;
从注释可以看出,这个事用来遍历目录的函数。
  1151 else if (S_ISDIR (hstat.st_mode))
  1152 {
  1153 register DIR *dirp;
  1154 register struct dirent *d;
  1155 char *namebuf;
  1156 int buflen;
  1157 register int len;
  1158 int our_device = hstat.st_dev;
  1159
  1160 /* Build new prototype name. */
  1161
  1162 len = strlen (p);
  1163 buflen = len + NAMSIZ;
  1164 namebuf = xmalloc ((size_t) (buflen + 1));
  1165 strncpy (namebuf, p, (size_t) buflen);
  1166 while (len >= 1 && namebuf[len - 1] == '/')
  1167 len--; /* delete trailing slashes */
=>1168 namebuf[len++] = '/'; /* now add exactly one back */
  1169 namebuf[len] = '\0'; /* make sure null-terminated */

如果遇到了目录的话,会在后面加上一个'/'

1175 if (!flag_oldarch)
  1176 {
  1177 hstat.st_size = 0; /* force 0 size on dir */
  1178
  1179 /* If people could really read standard archives, this should
  1180 be: (FIXME)
  1181
  1182 header = start_header (flag_standard ? p : namebuf, &hstat);
  1183
  1184 but since they'd interpret LF_DIR records as regular files,
  1185 we'd better put the / on the name. */

  1186
=>1187 header = start_header (namebuf, &hstat);

这里会进入start_header,建立一个header.

185 /* Header handling. */
   186
   187 /*---------------------------------------------------------------------.
   188 | Make a header block for the file name whose stat info is st. Return |
   189 | header pointer for success, NULL if the name is too long. |
   190 `---------------------------------------------------------------------*/

   191
   192 static union record *
   193 start_header (const char *name, register struct stat *st)
   194 {
   195 register union record *header;
   196
=> 197 if (strlen (name) >= (size_t) NAMSIZ)
   198 write_long (name, LF_LONGNAME);
   199
   200 header = (union record *) findrec ();
   201 memset (header->charptr, 0, sizeof (*header)); /* FIXME: speed up */
   202

建立完header以后,会结束建立header

286 /*-------------------------------------------------------------------------.
   287 | Finish off a filled-in header block and write it out. We also print the |
   288 | file name and/or full info if verbose is on. |
   289 `-------------------------------------------------------------------------*/

   290
   291 void
   292 finish_header (register union record *header)
   293 {
   294 register int i, sum;
   295 register char *p;
   296
=> 297 memcpy (header->header.chksum, CHKBLANKS, sizeof (header->header.chksum));
   298
   299 sum = 0;
   300 p = header->charptr;
   301 for (i = sizeof (*header); --i >= 0; )
   302 /* We can't use unsigned char here because of old compilers, e.g. V7. */
   303 sum += 0xFF & *p++;
   304
   305 /* Fill in the checksum field. It's formatted differently from the
   306 other fields: it has [6] digits, a null, then a space -- rather than
   307 digits, a space, then a null. We use to_oct then write the null in
   308 over to_oct's space. The final space is already there, from
   309 checksumming, and to_oct doesn't modify it.
   310
   311 This is a fast way to do:
   312
   313 sprintf(header->header.chksum, "%6o", sum); */

   314
=> 315 to_oct ((long) sum, 8, header->header.chksum);
   316 header->header.chksum[6] = '\0'; /* zap the space */
然后在后面补一个'\0'
然后会进入
   152 /*----------------------------------------------------------------------.
   153 | Indicate that we have used all records up thru the argument. (should |
   154 | the arg have an off-by-1? FIXME) |
   155 `----------------------------------------------------------------------*/

   156
   157 void
   158 userec (union record *rec)
   159 {
=> 160 while (rec >= ar_record)
   161 ar_record++;
   162
   163 /* Do *not* flush the archive here. If we do, the same argument to
   164 userec() could mean the next record (if the input block is exactly
   165 one record long), which is not what is intended. */

   166
   167 if (ar_record > ar_last)
   168 abort ();
   169 }

接着会出来,继续进入下一个节点:

2494 (gdb) bt
2495 #0 dump_file (p=0x806e888 "example", curdev=-1, toplevel=1) at /home/liuqi/dvntar/src/create.c:1299
2496 #1 0x0805c0dd in create_archive () at /home/liuqi/dvntar/src/create.c:656
2497 #2 0x08065f68 in main (argc=4, argv=0xbffff614) at /home/liuqi/dvntar/src/tar.c:1620
2498 (gdb)

进入了example目录里面,接着会找到一个节点:

2498 (gdb) p d->d_name
2499 $120 =

我们可以看到,里面有bbb和aaa.

1291 while (d = readdir (dirp), d)
  1292 {
  1293
  1294 /* Skip `.' and `..'. */
  1295
  1296 if (is_dot_or_dotdot (d->d_name))
  1297 continue;
  1298
  1299 if ((int) NAMLEN (d) + len >= buflen)
  1300 {
  1301 buflen = len + NAMLEN (d);
  1302 namebuf = (char *) xrealloc (namebuf, (size_t) (buflen + 1));
  1308 }
  1309 strcpy (namebuf + len, d->d_name);

到这里,我们会看到namebuf为

2502 (gdb) print namebuf
2503 $121 = 0x806e8f8 "example/bbb"
2504 (gdb)


这里会一一遍历到的。
然后建立格式如果用diff查看的话,会发现,其实tar包里面有我们的文件的内容相同的内容:

[root@1jjk dvntar]# ls
aaa config.h cscope.file cscope.in.out cscope.out cscope.po.out dvntar example example.tar.gz lib lib.tar.gz Makefile sad sad.tar src tags
[root@1jjk dvntar]# diff example/aaa sad.tar
1c1,3
< sdajkfhskdjhfksjdhfkjsdfwerw2346543dhjkdsjf
---
> example/ 40755 0 0 0 11235565534 10524 5 ustar root root example/bbb 100644 0 0 100 11235554210 11225 0 ustar root root ksuyfdokhyusdifhusdifhsdkfhskdhfkdsfkshfdkhdsfkhsdkfhskdhjkdsjf
> example/aaa 100644 0 0 54 11235565534 11225 0 ustar root root sdajkfhskdjhfksjdhfkjsdfwerw2346543dhjkdsjf
>
\ No newline at end of file
[root@1jjk dvntar]# diff example/bbb sad.tar
1c1,3
< ksuyfdokhyusdifhusdifhsdkfhskdhfkdsfkshfdkhdsfkhsdkfhskdhjkdsjf
---
> example/ 40755 0 0 0 11235565534 10524 5 ustar root root example/bbb 100644 0 0 100 11235554210 11225 0 ustar root root ksuyfdokhyusdifhusdifhsdkfhskdhfkdsfkshfdkhdsfkhsdkfhskdhjkdsjf
> example/aaa 100644 0 0 54 11235565534 11225 0 ustar root root sdajkfhskdjhfksjdhfkjsdfwerw2346543dhjkdsjf
>
\ No newline at end of file
[root@1jjk dvntar]#

其实以上操作步骤,最关键的就是将文件的内容,属性,和我们的用户信息,存储到文件中,存放到他规定的大小的区域内,就形成了该模式,所有的步骤在前面已经讲过了,今天先写这么点。
阅读(4545) | 评论(0) | 转发(0) |
0

上一篇:开博

下一篇:没有了

给主人留下些什么吧!~~