fork,vfork,clone都是linux的系统调用,用来创建子进程的(确切说vfork创造出来的是线程)。
先介绍下进程必须的4要点:
a.要有一段程序供该进程运行,就像一场戏剧要有一个剧本一样。该程序是可以被多个进程共享的,多场戏剧用一个剧本一样。
b.有起码的私有财产,就是进程专用的系统堆栈空间。
c.有“户口”,既操作系统所说的进程控制块,在linux中具体实现是task_struct
d.有独立的存储空间。
当一个进程缺少d条件时候,我们称其为线程。
1.fork 创造的子进程复制了父亲进程的资源,包括内存的内容task_struct内容(2个进程的pid不同)。这里是资源的复制不是指针的复制。下面的例子可以看出
[root@liumengli program]# cat testFork.c
#include"stdio.h"
int main() {
int count = 1;
int child;
if(!(child = fork())) { //开始创建子进程
printf("This is son, his count is: %d. and his pid is: %d\n", ++count, getpid());//子进程的内容
} else {
printf("This is father, his count is: %d, his pid is: %d\n", count, getpid());
}
}
[root@liumengli program]# gcc testFork.c -o testFork
[root@liumengli program]# ./testFork
This is son, his count is: 2. and his pid is: 3019
This is father, his count is: 1, his pid is: 3018
[root@liumengli program]#
从代码里面可以看出2者的pid不同,内存资源count是值得复制,子进程改变了count的值,而父进程中的count没有被改变。有人认为这样大批
量的复制会导致执行效率过低。其实在复制过程中,子进程复制了父进程的task_struct,系统堆栈空间和页面表,这意味着上面的程序,我们没有执行
count++前,其实子进程和父进程的count指向的是同一块内存。而当子进程改变了父进程的变量时候,会通过copy_on_write的手段为所
涉及的页面建立一个新的副本。所以当我们执行++count后,这时候子进程才新建了一个页面复制原来页面的内容,基本资源的复制是必须的,而且是高效
的。整体看上去就像是父进程的独立存储空间也复制了一遍。
其次,我们看到子进程和父进程直接没有互相干扰,明显2者资源都独立了。我们看下面程序
[root@liumengli program]# cat testFork.c
#include"stdio.h"
int main() {
int count = 1;
int child;
if(!(child = fork())) {
int i;
for(i = 0; i < 200; i++) {
printf("This is son, his count is: %d. and his pid is: %d\n", i, getpid());
}
} else {
printf("This is father, his count is: %d, his pid is: %d\n", count, getpid());
}
}
[root@liumengli program]# gcc testFork.c -o testFork
[root@liumengli program]# ./testFork
...
This is son, his count is: 46. and his pid is: 4092
This is son, his count is: 47. and his pid is: 4092
This is son, his count is: 48. and his pid is: 4092
This is son, his count is: 49. and his pid is: 4092
This is son, his count is: 50. and his pid is: 4092
This is father, his count is: 1, his pid is: 4091
[root@liumengli program]# This is son, his count is: 51. and his pid is: 4092
This is son, his count is: 52. and his pid is: 4092
...
(运气很衰,非要200多个才有效果,郁闷)从结果可以看出父子2个进程是同步运行的。这和下面的vfork有区别。
2.vfork创建出来的不是真正意义上的进程,而是一个线程,因为它缺少了我们上面提到的进程的四要素的第4项,独立的内存资源,看下面的程序
[root@liumengli program]# cat testVfork.c
#include "stdio.h"
int main() {
int count = 1;
int child;
printf("Before create son, the father's count is:%d\n", count);
if(!(child = vfork())) {
printf("This is son, his pid is: %d and the count is: %d\n", getpid(), ++count);
exit(1);
} else {
printf("After son, This is father, his pid is: %d and
the count is: %d, and the child is: %d\n", getpid(), count, child);
}
}
[root@liumengli program]# gcc testVfork.c -o testVfork
[root@liumengli program]# ./testVfork
Before create son, the father's count is:1
This is son, his pid is: 4185 and the count is: 2
After son, This is father, his pid is: 4184 and the count is: 2, and the child is: 4185
[root@liumengli program]#
从运行结果可以看到vfork创建出的子进程(线程)共享了父进程的count变量,这一次是指针复制,2者的指针指向了同一个内存,所以子进程修改了
count变量,父进程的
count变量同样受到了影响。另外由vfork创造出来的子进程还会导致父进程挂起,除非子进程exit或者execve才会唤起父进程,看下面程序:
[root@liumengli program]# cat testVfork.c
#include "stdio.h"
int main() {
int count = 1;
int child;
printf("Before create son, the father's count is:%d\n", count);
if(!(child = vfork())) {
int i;
for(i = 0; i < 100; i++) {
printf("This is son, The i is: %d\n", i);
if(i == 70)
exit(1);
}
printf("This is son, his pid is: %d and the count is: %d\n", getpid(), ++count);
exit(1);
} else {
printf("After son, This is father, his pid is: %d and
the count is: %d, and the child is: %d\n", getpid(), count, child);
}
}
[root@liumengli program]# gcc testVfork.c -o testVfork
[root@liumengli program]# ./testVfork
...
This is son, The i is: 68
This is son, The i is: 69
This is son, The i is: 70
After son, This is father, his pid is: 4433 and the count is: 1, and the child is: 4434
[root@liumengli program]#
从这里就可以看到父进程总是等子进程执行完毕后才开始继续执行。
3.clone函数功能强大,带了众多参数,因此由他创建的进程要比前面2种方法要复杂。clone可以让你有选择性的继承父进程的资源,你可以选
择想vfork一样和父进程共享一个虚存空间,从而使创造的是线程,你也可以不和父进程共享,你甚至可以选择创造出来的进程和父进程不再是父子关系,而是
兄弟关系。先有必要说下这个函数的结构
int clone(int (*fn)(void *), void *child_stack, int flags, void *arg);
这里fn是函数指针,我们知道进程的4要素,这个就是指向程序的指针,就是所谓的“剧本",
child_stack明显是为子进程分配系统堆栈空
间(在linux下系统堆栈空间是2页面,就是8K的内存,其中在这块内存中,低地址上放入了值,这个值就是进程控制块task_struct的
值),flags就是标志用来描述你需要从父进程继承那些资源, arg就是传给子进程的参数)。下面是flags可以取的值
标志 含义
CLONE_PARENT 创建的子进程的父进程是调用者的父进程,新进程与创建它的进程成了“兄弟”而不是“父子”
CLONE_FS 子进程与父进程共享相同的文件系统,包括root、当前目录、umask
CLONE_FILES 子进程与父进程共享相同的文件描述符(file descriptor)表
CLONE_NEWNS 在新的namespace启动子进程,namespace描述了进程的文件hierarchy
CLONE_SIGHAND 子进程与父进程共享相同的信号处理(signal handler)表
CLONE_PTRACE 若父进程被trace,子进程也被trace
CLONE_VFORK 父进程被挂起,直至子进程释放虚拟内存资源
CLONE_VM 子进程与父进程运行于相同的内存空间
CLONE_PID 子进程在创建时PID与父进程一致
CLONE_THREAD Linux 2.4中增加以支持POSIX线程标准,子进程与父进程共享相同的线程群
下面的例子是创建一个线程(子进程共享了父进程虚存空间,没有自己独立的虚存空间不能称其为进程)。父进程被挂起当子线程释放虚存资源后再继续执行。
[root@liumengli program]# cat test_clone.c
#include "stdio.h"
#include "sched.h"
#include "signal.h"
#define FIBER_STACK 8192
int a;
void * stack;
int do_something(){
printf("This is son, the pid is:%d, the a is: %d\n", getpid(), ++a);
free(stack); //这里我也不清楚,如果这里不释放,不知道子线程死亡后,该内存是否会释放,知情者可以告诉下,谢谢
exit(1);
}
int main() {
void * stack;
a = 1;
stack = malloc(FIBER_STACK);//为子进程申请系统堆栈
if(!stack) {
printf("The stack failed\n");
exit(0);
}
printf("creating son thread!!!\n");
clone(&do_something, (char *)stack + FIBER_STACK, CLONE_VM|CLONE_VFORK, 0);//创建子线程
printf("This is father, my pid is: %d, the a is: %d\n", getpid(), a);
exit(1);
}
[root@liumengli program]# gcc test_clone.c -o test_clone
[root@liumengli program]# ./test_clone
creating son thread!!!
This is son, the pid is:7326, the a is: 2
This is father, my pid is: 7325, the a is: 2
[root@liumengli program]#
读者可以试试其它的资源继承方式。
- 这里介绍fork, vfork和 clone的具体实现
- 它们具体实现的代码如下:
- asmlinkage int sys_fork(struct pt_regs regs)
- {
- return do_fork(SIGCHLD, regs.esp, ®s, 0);
- }
- asmlinkage int sys_clone(struct pt_regs regs)
- {
- unsigned long clone_flags;
- unsigned long newsp;
- clone_flags = regs.ebx;
- newsp = regs.ecx;
- if (!newsp)
- newsp = regs.esp;
- return do_fork(clone_flags, newsp, ®s, 0);
- }
- asmlinkage int sys_vfork(struct pt_regs regs)
- {
- return do_fork(CLONE_VFORK | CLONE_VM | SIGCHLD, regs.esp, ®s, 0);
- }
- 这里可以看到它们都是对do_fork的调用,不过是参数不同而已下面是 do_fork函数(很长)
- int do_fork(unsigned int clone_flags, unsigned long stack_start, struct pt_regs * regs, unsigned long stack_size) {
- int retval = -ENOMEM;
- struct task_struct *p;
- DECLARE_MUTEX_LOCKED(sem);
-
- if(clone_flags & CLONE_PID)
- {
- if(current->pid)
- return -EPERM;
- }
-
- current->vfork_sem = sem;
-
- p = alloc_task_struct();
- if(!p)
- goto fork_out;
-
- *p = *current;
-
- retval = -EAGAIN;
- if(atomic_read(&p->user->processes) >= p->rlim[RLIMIT_NPROC].rlim_cur)
- goto bad_fork_free;
- atomic_inc(&p->user->__count);
- atomic_inc(&p->user->processes);
-
- if(nr_threads >= max_threads)
- goto bad_fork_cleanup_count;
-
- get_exec_domain(p->exec_domain);
-
- if(p->binfmt && p->binfmt->module)
- __MOD_INC_USE_COUNT(p->binfmt->module);
-
- p->did_exec = 0;
- p->swappable = 0;
- p->state = TASK_UNINTERRUPTIBLE;
-
- copy_flags(clone_flags, p);
- p->pid = get_pid(clone_flags);
-
- p->run_list.next = NULL;
- p->run_list.prev = NULL;
-
- if((clone_flags & CLONE_VFORK) || !(clone_flags & CLONE_PARENT))
- {
- p->p_opptr = current;
- if(!(p->trace & PT_PTRACED))
- p->p_pptr = current;
- }
- p->p_cptr = NULL;
- init_waitqueue_head(&p->wait_childexit);
- p->vfork_sem = NULL;
- spin_lock_init(&p->alloc_lock);
-
- p->sigpending = 0;
- init_sigpending(&p->sigpending);
-
- p->it_real_value = p->it_virt_value = p->it_prof_value = 0;
- p->it_real_incr = p->it_virt_incr = p->it_prof_incr = 0;
- init_timer(&p->real_timer);
- p->real_timer.data = (unsigned long)p;
-
- p->leader = 0;
- p->tty_old_pgrp = 0;
- p->times.tms_utime = p->times.tms_stime = 0;
- p->times.tms_curtime = p->times.tms_cstime = 0;
- #ifdef CONFIG_SMP
- {
- int i;
- p->has_cpu = 0;
- p->processor = current->processor;
-
- for(i = 0; i < smp_num_cpus; i++)
- p->per_cpu_utime[i] = p->per_cpu_stime[i] = 0;
- spin_lock_init(&p->sigmask_lock);
- }
- #endif //多处理器相关
- p->lock_death = -1;
- p->start_time = jiffies;
-
- retval = -ENOMEM;
- if(copy_files(clone_flags,p))
- goto bad_fork_cleanup;
- if(copy_fs(clone_flags, p));
- goto bad_fork_cleanup_files;
- if(copy_sighand(clone_flags, p))
- goto bad_fork_cleanpu_fs;
- if(copy_mm(clone_flags, p))
- goto bad_fork_cleanup_sighand;
- retval = copy_thread(0, clone_flags, stack_start, stack_size, p, regs);
- if(retval)
- goto bad_fork_cleanup_sighand;
- p->semundo = NULL;
-
- p->parent_exec_id = p->self_exec_id;
-
- p->swappable = 1;
- p->exit_signal = clone_flags & CSIGNAL;
- p->pdeath_signal = 0;
- p->counter = (current->counter + 1) >> 1;
- current->counter >>= 1;
- if (!current->counter)
- current->need_resched = 1;
-
- retval = p->pid;
- p->tgid = retval;
- INIT_LIST_HEAD(&p->thread_group);
- write_lock_irq(&tasklist_lock);
- if (clone_flags & CLONE_THREAD) {
- p->tgid = current->tgid;
- list_add(&p->thread_group, ¤t->thread_group);
- }
- SET_LINKS(p);
- hash_pid(p);
- nr_threads++;
- write_unlock_irq(&tasklist_lock);
- if (p->ptrace & PT_PTRACED)
- send_sig(SIGSTOP, p, 1);
- wake_up_process(p); //将子进程唤醒,到这里子进程已经完成了
- ++total_forks;
-
- fork_out:
- if ((clone_flags & CLONE_VFORK) && (retval > 0))
- down(&sem);
- return retval;
- }
- struct user_struct {
- atomic_t __count;
- atomic_t processes;
- atomic_t files;
-
- struct user_struct *next, **pprev;
- uid_t uid;
- };
- struct exec_domain
- {
- const char *name;
- handler_t handler;
- unsigned char pers_low;
- unsigned char pers_high;
- unsigned long *signal_map;
- unsigned long *signal_invmap;
- struct map_segment *err_map;
- struct map_segment *socktype_map;
- struct map_segment *sockopt_map;
- struct map_segment *af_map;
- struct module *module;
- struct exec_domain *next;
- };
- static int copy_mm(unsigned long clone_flags, struct task_struct * tsk)
- {
- struct mm_struct * mm, *old_mm;
- int retval;
-
- tsk->min_flt = tsk->maj_flt = 0;
- tsk->cmin_flt = tsk->cmaj_flt = 0;
- tsk->nswap = tsk->cnswap = 0;
-
- tsk->mm = NULL;
- tsk->active_mm = NULL;
-
- old_mm = current->mm;
- if(!old_mm)
- return 0;
-
- if(clone_flags & CLONE_VM) {
- atomic_inc(&old_mm->mm_users);
- mm = oldmm;
- goto good_mm;
- }
-
- retval = -ENOMEM;
- mm = allocate_mm();
- if(!mm)
- goto fail_nomem;
-
- memcpy(mm, oldmm, sizeof(*mm));
- if(!mm_init(mm));
- goto fail_nomem;
-
- down(&oldmm->mmap_sem);
- retval = dup_mmap(mm);
- up(&oldmm->mmap_sem);
-
- if(retval)
- goto free_pt;
-
- copy_segments(tsk, mm);
-
- if(init_new_context(tsk, mm));
- goto free_pt;
-
- good_mm:
- tsk->mm = mm;
- tsk->active_mm = mm;
- return 0;
- free_pt:
- mmput(mm);
- fail_nomem:
- return retval;
- }
- static inline int dup_mmap(struct mm_struct * mm) {
- struct vm_area_struct * mpnt, * tmp, **prev;
- int retval;
-
- flush_cache_mm(current->mm);
- mm->locked_vm = 0;
- mm->mmap = NULL;
- mm->mmap_avl = NULL;
- mm->mmap_cache = NULL;
- mm->map_count = 0;
- mm->cpu_vm_mask = 0;
- mm->swap_cnt = 0;
- mm->swap_address = 0;
- pprev = &mm->mmap;
-
- for(mpnt = current->mm_mmap; mpnt; mpnt= mpnt->vm_next) {
- struct file * file;
-
- retval = -ENOMEM;
- if(mpnt->vm_flags & VM_DONTCOPY)
- continue;
- tmp = kmem_cache_alloc(vm_area_cachep, SLAB_KERNEL);
- if(!tmp)
- goto fail_nomem;
- *tmp = *mpnt;
- tmp->vm_flags &= ~VM_LOCKED;
- tmp->vm_mm = mm;
- mm->map_count++;
- tmp->vm_next = NULL;
- file = tmp->vm_file;
- if(file) {
- struct inode *inode = file->f_dentry->d_inode;
- get_file(file);
- if(tmp->vm_flags & VM_DENYWRITE)
- atomic_dec(&inode->i_writecount);
-
- spin_lock(&inode->i_mapping->i_shared_lock);
- if((tmp->vm_next_share = mpnt->vm_next_share) != NULL)
- mpnt->vm_next_share->vm_pprev_share = &tmp->vm_next_share;
- mpnt->vm_next_share = tmp;
- tmp->vm_pprev_share = &mpnt->vm_next_share;
- spin_unlock(&inode->i_mapping->i_shared_lock);
- }
-
- retval = (mm, current->mm, tmp);
- if(!retval && tmp->tmp->vm_ops && tmp->vm_ops->open)
- tmp->vm_ops->open(tmp);
-
- *pprev = tmp;
- pprev = &tmp->vm_next;
-
- if(retval)
- goto fail_nomem;
- }
- retval = 0;
- if(mm->map_count >= AVL_MIN_MAP_COUNT)
- build_mmap_avl(mm);
- fail_nomem;
- flush_tlb_mm(current->mm);
- return retval;
- }
- int copy_page_range(struct mm_struct * dst, struct mm_struct * src, struct vm_area_struct * vma) {
- pgd_t * src_pgd, * dst_pgd;
- unsigned long address = vma->vm_start;
- unsigned long end = vma->vm_end;
- unsigned long cow = (vma->vm_flags & (VM_SHARED | VM_MAYWRITE)) == VM_MAYWRITE;
-
- src_pgd = pgd_offset(src, address) - 1;
- dst_pgd = pgd_offset(dst, address) - 1;
-
- for(;;) {
- pmd_t * src_pmd, * dst_pmd;
-
- src_pgd++;
- dst_pgd++;
-
- if(pgd_none(*src_pgd))
- goto skip_copy_pmd_range;
- if(pgd_bad(* src_pgd)) {
- pgd_ERROR(*src_pgd);
- pgd_clear(src_pgd);
- skip_copy_pmd_range:
- address = (address + PGDIR_SIZE) &PGDIR_MASK;
- if(!address || (address >= end))
- goto out;
- continue;
- }
-
- if(pgd_none(*dst_pgd)) {
- if(!pmd_alloc(dst_pgd, 0))
- goto nomem;
- }
-
- src_pmd = pmd_offset(src_pgd, address);
- dst_pmd = pmd_offset(dst_pgd, address);
-
- do{
- pte_t * src_pte, * dst_pte;
-
- if(pmd_none(*src_pmd))
- goto skip_copy_pte_range;
- if(pmd_bad(*src_pmd)) {
- pmd_ERROR(*src_pmd);
- pmd_clear(src_pmd);
- skip_copy_pte_range:
- address = (address + PMD_SIZE) & PMD_MASK;
- if(address >= end)
- goto out;
- goto cont_copy_pmd_range;
- }
- if(pmd_none(*dst_pmd))
- {
- if(!pte_alloc(dst_pmd, 0))
- goto nomem;
- }
-
- src_pte = pte_offset(src_pmd, address);
- dst_pte = pte_offset(dst_pmd, address);
-
- do{
- pte_t pte = *src__pte;
- struct page * ptepage;
-
- if(pte_none(pte))
- goto cont_copy_pte_range_noset;
- if(!pte_present(pte)) {
- swap_duplicate(pte_to_swp_entry(pte));
- goto cont_copy_pte_range;
- }
- ptepage = pte_page(pte);
- if((!VALLID_PAGE(ptepage)) || PageReserved(ptepage))
- goto cont_copy_pte_range;
-
- if(cow) {
- ptep_set_wrprotect(src_pte);
- pte = * src_pte;
- }
-
- if(vma->vm_flags& VM_SHARED)
- pte = pte_mkclean(pte);
- pte = pte_mkold(pte);
- get_page(ptepage);
- cont_copy_pte_range:
- set_pte(dst_pte, pte);
- cont_copy_pte_range_noset:
- if(address >= end)
- goto out;
- src_pte++;
- dst_pte++;
- } while((unsigned long)src_pte & PTE_TABLE_MASK);
- cont_copy_pmd_rang:
- src_pmd++;
- dst_pmd++;
- } while((unsigned long) src_pmd & PMD_TABLE_MASK);
- }
- out:
- return 0;
- nomem:
- return -ENOMEM;
- }
- 529 int copy_thread(int nr, unsigned long clone_flags, unsigned long esp,
- 530 unsigned long unused,
- 531 struct task_struct * p, struct pt_regs * regs)
- 532{
- 533 struct pt_regs * childregs;
- 534
- 535 childregs = ((struct pt_regs *) (THREAD_SIZE + (unsigned long) p)) - 1;
- 536 struct_cpy(childregs, regs);
- 537 childregs->eax = 0;
- 538 childregs->esp = esp;
- 539
- 540 p->thread.esp = (unsigned long) childregs;
- 541 p->thread.esp0 = (unsigned long) (childregs+1);
- 542
- 543 p->thread.eip = (unsigned long) ret_from_fork;
- 544
- 545 savesegment(fs,p->thread.fs);
- 546 savesegment(gs,p->thread.gs);
- 547
- 548 unlazy_fpu(current);
- 549 struct_cpy(&p->thread.i387, ¤t->thread.i387);
- 550
- 551 return 0;
- 552}