Chinaunix首页 | 论坛 | 博客
  • 博客访问: 2159741
  • 博文数量: 333
  • 博客积分: 10161
  • 博客等级: 上将
  • 技术积分: 5238
  • 用 户 组: 普通用户
  • 注册时间: 2008-02-19 08:59
文章分类

全部博文(333)

文章存档

2017年(10)

2014年(2)

2013年(57)

2012年(64)

2011年(76)

2010年(84)

2009年(3)

2008年(37)

分类: LINUX

2010-11-26 14:42:57

Q: What architectures support CPU hotplug?

A: As of 2.6.14, the following architectures support CPU hotplug.

i386 (Intel), ppc, ppc64, parisc, s390, ia64 and x86_64

Q: How to test if hotplug is supported on the newly built kernel?怎么测试最新编译的内核是否支持CPU热插拔操作。
A: You should now notice an entry in sysfs.

Now you should see entries for all present cpu, the following is an example
in a 8-way system.

         #pwd
         #/sys/devices/system/cpu
         #ls -l
         total 0
         drwxr-xr-x   10 root root 0 Sep 19 07:44 .
         drwxr-xr-x   13 root root 0 Sep 19 07:45 ..
         drwxr-xr-x   3 root root 0 Sep 19 07:44 cpu0
         drwxr-xr-x   3 root root 0 Sep 19 07:44 cpu1
         drwxr-xr-x   3 root root 0 Sep 19 07:44 cpu2
         drwxr-xr-x   3 root root 0 Sep 19 07:44 cpu3
         drwxr-xr-x   3 root root 0 Sep 19 07:44 cpu4
         drwxr-xr-x   3 root root 0 Sep 19 07:44 cpu5
         drwxr-xr-x   3 root root 0 Sep 19 07:44 cpu6
         drwxr-xr-x   3 root root 0 Sep 19 07:48 cpu7

Under each directory you would find an "online" file which is the control
file to logically online/offline a processor.
在每个目录下可以看见一个“online”文件,该文件描述的所属CPU是否online

Q: Does hot-add/hot-remove refer to physical add/remove of cpus?是否CPU热插拔支持物理CPU添加和移出?
A: The usage of hot-add/remove may not be very consistently used in the code.
CONFIG_HOTPLUG_CPU enables logical online/offline capability in the kernel.
To support physical addition/removal, one would need some BIOS hooks and
the platform should have something like an attention button in PCI hotplug.
CONFIG_ACPI_HOTPLUG_CPU enables ACPI support for physical add/remove of CPUs.
CPU逻辑上的热插拔处理是在linux内核的代码完成。支持物理上CPU热插拔,需要BIOS支持以及硬件平台的支持比如像PCI设备热插拔以类似的硬件支持

Q: How do i logically offline a CPU?我该怎么将一个CPU逻辑上热移出
A: Do the following.

        #echo 0 > /sys/devices/system/cpu/cpuX/online在终端里,输本行命令即可

Once the logical offline is successful, check(检查是否成功将CPU热移出

         #cat /proc/interrupts在终端里,输本行命令即可

You should now not see the CPU that you removed. Also online file will report
the state as 0 when a cpu if offline and 1 when its online.
在“online”文件里描述CPU的状态,0代表已经移出,1代表已经添加成功

         #To display the current cpu state.
       #cat /sys/devices/system/cpu/cpuX/online

Q: Why cant i remove CPU0 on some systems?
A: Some architectures may have some special dependency on a certain CPU.

For e.g in IA64 platforms we have ability to sent platform interrupts to the
OS. a.k.a Corrected Platform Error Interrupts (CPEI). In current ACPI
specifications, we didn't have a way to change the target CPU. Hence if the
current ACPI version doesn't support such re-direction, we disable that CPU
by making it not-removable.

In such cases you will also notice that the online file is missing under cpu0.

Q: How do i find out if a particular CPU is not removable?
A: Depending on the implementation, some architectures may show this by the
absence of the "online" file. This is done if it can be determined ahead of
time that this CPU cannot be removed.

In some situations, this can be a run time check, i.e if you try to remove the
last CPU, this will not be permitted. You can find such failures by
investigating the return value of the "echo" command.

Q: What happens when a CPU is being logically offlined?当一个CPU逻辑热移出时,内核放生了什么?
A: The following happen, listed in no particular order :-)

- A notification is sent to in-kernel registered modules by sending an event
   CPU_DOWN_PREPARE or CPU_DOWN_PREPARE_FROZEN, depending on whether or not the
   CPU is being offlined while tasks are frozen due to a suspend operation in
   progress
将事件CPU_DOWN_PREPARE 或者 CPU_DOWN_PREPARE_FROZEN发送到内核中
- All processes are migrated away from this outgoing CPU to new CPUs.
   The new CPU is chosen from each process' current cpuset, which may be
   a subset of all online CPUs.
将所有进程从目标CPU迁移到系统其他CPU上运行。
- All interrupts targeted to this CPU is migrated to a new CPU
将所有中断从目标CPU迁移到其他CPU
- timers/bottom half/task lets are also migrated to a new CPU
计时器和下半部迁移到其他CPU
- Once all services are migrated, kernel calls an arch specific routine
   __cpu_disable() to perform arch specific cleanup.
当所有服务迁移后,内核调用例程__cpu_disable()执行体系结构特定的清除操作
- Once this is successful, an event for successful cleanup is sent by an event
   CPU_DEAD (or CPU_DEAD_FROZEN if tasks are frozen due to a suspend while the
   CPU is being offlined).
一旦成功地将CPU热移出,就会发送CPU_DEAD。

   "It is expected that each service cleans up when the CPU_DOWN_PREPARE
   notifier is called, when CPU_DEAD is called its expected there is nothing
   running on behalf of this CPU that was offlined"
希望在执行通知链上CPU_DOWN_PREPARE操作时完成所有服务的清除操作。当执行通知链上CPU_DEAD的操作时,希望没有任何服务在CPU上执行,因为CPU已经成功地被移出。

Q: If i have some kernel code that needs to be aware of CPU arrival and
   departure, how to i arrange for proper notification?
A: This is what you would need in your kernel code to receive notifications.

         #include
         static int __cpuinit foobar_cpu_callback(struct notifier_block *nfb,
                                                   unsigned long action, void *hcpu)
         {
                   unsigned int cpu = (unsigned long)hcpu;

                   switch (action) {
                   case CPU_ONLINE:
                   case CPU_ONLINE_FROZEN:
                             foobar_online_action(cpu);
                             break;
                   case CPU_DEAD:
                   case CPU_DEAD_FROZEN:
                             foobar_dead_action(cpu);
                             break;
                   }
                   return NOTIFY_OK;
         }

         static struct notifier_block __cpuinitdata foobar_cpu_notifer =
         {
             .notifier_call = foobar_cpu_callback,
         };

You need to call register_cpu_notifier() from your init function.
Init functions could be of two types:
1. early init (init function called when only the boot processor is online).
2. late init (init function called _after_ all the CPUs are online).

For the first case, you should add the following to your init function

         register_cpu_notifier(&foobar_cpu_notifier);

For the second case, you should add the following to your init function

         register_hotcpu_notifier(&foobar_cpu_notifier);

You can fail PREPARE notifiers if something doesn't work to prepare resources.
This will stop the activity and send a following CANCELED event back.

CPU_DEAD should not be failed, its just a goodness indication, but bad
things will happen if a notifier in path sent a BAD notify code.

Q: I don't see my action being called for all CPUs already up and running?
A: Yes, CPU notifiers are called only when new CPUs are on-lined or offlined.
   If you need to perform some action for each cpu already in the system, then

         for_each_online_cpu(i) {
                   foobar_cpu_callback(&foobar_cpu_notifier, CPU_UP_PREPARE, i);
                   foobar_cpu_callback(&foobar_cpu_notifier, CPU_ONLINE, i);
         }

Q: If i would like to develop cpu hotplug support for a new architecture,
   what do i need at a minimum?
A: The following are what is required for CPU hotplug infrastructure to work
   correctly.

     - Make sure you have an entry in Kconfig to enable CONFIG_HOTPLUG_CPU
     - __cpu_up()         - Arch interface to bring up a CPU
     - __cpu_disable()   - Arch interface to shutdown a CPU, no more interrupts
                           can be handled by the kernel after the routine
                           returns. Including local APIC timers etc are
                           shutdown.
     - __cpu_die()       - This actually supposed to ensure death of the CPU.
                           Actually look at some example code in other arch
                           that implement CPU hotplug. The processor is taken
                           down from the idle() loop for that specific
                           architecture. __cpu_die() typically waits for some
                           per_cpu state to be set, to ensure the processor
                           dead routine is called to be sure positively.

Q: I need to ensure that a particular cpu is not removed when there is some
   work specific to this cpu is in progress.
A: First switch the current thread context to preferred cpu

         int my_func_on_cpu(int cpu)
         {
                   cpumask_t saved_mask, new_mask = CPU_MASK_NONE;
                   int curr_cpu, err = 0;

                   saved_mask = current->cpus_allowed;
                   cpu_set(cpu, new_mask);
                   err = set_cpus_allowed(current, new_mask);

                   if (err)
                             return err;

                   /*
                     * If we got scheduled out just after the return from
                     * set_cpus_allowed() before running the work, this ensures
                     * we stay locked.
                     */
                   curr_cpu = get_cpu();

                   if (curr_cpu != cpu) {
                             err = -EAGAIN;
                             goto ret;
                   } else {
                             /*
                             * Do work : But cant sleep, since get_cpu() disables preempt
                             */
                   }
                   ret:
                             put_cpu();
                             set_cpus_allowed(current, saved_mask);
                             return err;
                   }


Q: How do we determine how many CPUs are available for hotplug.
A: There is no clear spec defined way from ACPI that can give us that
   information today. Based on some input from Natalie of Unisys,
   that the ACPI MADT (Multiple APIC Description Tables) marks those possible
   CPUs in a system with disabled status.

   Andi implemented some simple heuristics that count the number of disabled
   CPUs in MADT as hotpluggable CPUS.   In the case there are no disabled CPUS
   we assume 1/2 the number of CPUs currently present can be hotplugged.

   Caveat: Today's ACPI MADT can only provide 256 entries since the apicid field
   in MADT is only 8 bits.

User Space Notification

Hotplug support for devices is common in Linux today. Its being used today to
support automatic configuration of network, usb and pci devices. A hotplug
event can be used to invoke an agent script to perform the configuration task.

You can add /etc/hotplug/cpu.agent to handle hotplug notification user space
scripts.

         #!/bin/bash
         # $Id: cpu.agent
         # Kernel hotplug params include:
         #ACTION=%s [online or offline]
         #DEVPATH=%s
         #
         cd /etc/hotplug
         . ./hotplug.functions

         case $ACTION in
                   online)
                             echo `date` ":cpu.agent" add cpu >> /tmp/hotplug.txt
                             ;;
                   offline)
                             echo `date` ":cpu.agent" remove cpu >>/tmp/hotplug.txt
                             ;;
                   *)
                             debug_mesg CPU $ACTION event not supported
         exit 1
         ;;
         esac
阅读(6282) | 评论(0) | 转发(1) |
给主人留下些什么吧!~~