Chinaunix首页 | 论坛 | 博客
  • 博客访问: 1315449
  • 博文数量: 554
  • 博客积分: 10425
  • 博客等级: 上将
  • 技术积分: 7555
  • 用 户 组: 普通用户
  • 注册时间: 2006-11-09 09:49
文章分类

全部博文(554)

文章存档

2012年(1)

2011年(1)

2009年(8)

2008年(544)

分类:

2008-04-07 18:59:31

问题

在 Solaris 系统上,我通常使用 Sun Studio dbx 工具解决 JVM 中的崩溃和挂起等问题。对于大多数此类问题,我们从客户那里接收到一个磁心存储文件。该文件使用由 JVM 构建的产品所生成。由于 JVM 构建的产品并不包含任何调试信息,所以 dbx 就不能显示 JVM 的对象细节,这使得调试和解密地址非常困难。

解决方案

dbx 调试器有一个非常有用的功能。使用这项功能我们可以“加载”和“使用”一个交替共享对象,而不是在进程运行中所使用的对象。

(dbx) loadobject -load alternate-lib

(dbx) loadobject -use alternate-lib

这项功能对于调试没有调试信息的进程或磁心存储文件非常有用。使用 -g 标志对我们所希望调试的库进行重新编译,然后将库加载到 dbx 调试会话中,这是可能的。

调试 JVM 挂起

在发生挂起的情况下,在装有 JDK 版本 1.5.0_08 的 Solaris x86 系统上客户面临一个硬挂起,甚至进程不响应 kill -3。客户向我们发送一个挂起进程的磁心存储文件。

对于具有调试信息的 dbx,我们使用 -g 标志重新构建 JDK 版本 1.5.0_08 的产品版本 libjvm.so,并将之加载,然后使用带有 dbx 的 .so。做了这样的更改,dbx 就能够获得使用 libjvm.so 生成的原本不带任何调试信息的磁心存储文件的所有调试信息。

在磁心存储文件上的 pstack 显示,VMThread 正试图获取 Threads_lock,主线程在 Threads_lock 上正处于等待状态,另一个线程 3992 也试图攫取 Threads_lock。那么这个锁看起来会引起死锁。下面给出了这个磁心存储文件。

-----------------  lwp# 1 / thread# 1  --------------------
 fef4009b lwp_park (0, 0, 0)
 fef3a909 cond_wait_queue (807c540, 807c528, 0, 0) + 3b
 fef3ae02 _cond_wait (807c540, 807c528) + 66
 fef3ae44 cond_wait (807c540, 807c528) + 21
 fe92fc64 __1cHMonitorEwait6Mil_i_ (807c4d0, 0, 0) + 2a4
 fecf526b __1cHThreadsKdestroy_vm6F_i_ (8073fb9, 0, 806dc14, fee4d33c, fee4a2a0, 10004) + 7b
 feb66ea3 jni_DestroyJavaVM (fee43560) + 123
 08051f8e main     (2, 807017c, 80432b4) + bfb
 080512fa ???????? (86, 8043468, 804346d, 8043494, 80434a8, 80434c6)
-----------------  lwp# 11 / thread# 11  --------------------
 fef4009b lwp_park (0, 0, 0)
 fef399bb slow_lock (fbc92000, 807c528, 0) + 3d
 fef39ab1 mutex_lock_impl (807c528, 0) + ec
 fef39bbd mutex_lock (807c528) + 1a
 fe8c9a00 __1cFMutexElock6MpnGThread__v_ (807c4d0, 8261480) + c0
 fe8cc1fe __1cFMutexElock6M_v_ (807c4d0) + 4e
 fe996295 __1cUSafepointSynchronizeFbegin6F_v_ (fee493fc, 8261480, fedee000, f8199b74, 3, f8199bbc) + 85
 fea659c4 __1cIVMThreadEloop6M_v_ (8261480) + 134
 fea66478 __1cIVMThreadDrun6M_v_ (8261480) + 88
 fecbe0f2 __1cG_start6Fpv_0_ (8261480) + d2
 fef3fd52 _thr_setup (fbc92000) + 4e
 fef40040 _lwp_start (fbc92000, 0, 0, f8199ff8, fef40040, fbc92000)
-----------------  lwp# 3992 / thread# 3992  --------------------
 fef4009b lwp_park (0, 0, 0)
 fef399bb slow_lock (6c173c00, 807c528, 0) + 3d
 fef39ab1 mutex_lock_impl (807c528, 0) + ec
 fef39bbd mutex_lock (807c528) + 1a
 fe8ea95e __1cFMutexbClock_without_safepoint_check6M_v_ (807c4d0) + 7e
 fe95c56e __1cUSafepointSynchronizeFblock6FpnKJavaThread__v_ (a27f780) + ee
 fecb6825 __1cGParkerEpark6Mix_v_ (a27f908, 0, d964b198, 45) + 1e5
 fed0963d Unsafe_Park (a27f840, 6807aba4, 0, d964b198, 45, 9480bcb8) + 1dd
 f876f2f9 ???????? (0, 0, a27c330, 0, 9ae69a8, 0)
 00000000 ???????? (fecf6330, fea09d40, fea03c80, fe964f40, fe91cc40, fe939c50)
 00000000 ???????? (fffffd2c, 3badd9be, f468fe16, e78b7254, fe1dbbcc, 0)
 00000000 ???????? ()

 

现在让我们使用 dbx 看一看 Threads_lock

(dbx) print -fx Threads_lock
Threads_lock = 0x807c4d0

(dbx) print -r *((Monitor*)0x807c4d0)
*((class Monitor *) 0x807c4d0) = {
    Monitor::Mutex::_lock_count      = 0
    Monitor::Mutex::_lock_event      = 0x807c500
    Monitor::Mutex::_suppress_signal = 0
    Monitor::Mutex::_owner           = 0x97b0978   <----------------
    Monitor::Mutex::_name            = 0xfed97a0f "Threads_lock"
    Monitor::Mutex::INVALID_THREAD   = 0
    Monitor::_event   = (nil)
    Monitor::_counter = 0
    Monitor::_waiters = 1
    Monitor::_tickets = 0
}

(dbx) print -r *((Thread*)0x97b0978)
*((class Thread *) 0x97b0978) = {
    Thread::ThreadShadow::_pending_exception = (nil)
    Thread::_exception_file                       = (nil)
    Thread::_exception_line                       = 0
    Thread::_real_malloc_address                  = 0x97b0d28
    Thread::_rawmonitor_list                      = (nil)
    Thread::_SR_lock                              = 0x9b7b890
    Thread::_vm_suspend_depth                     = 0
    Thread::_suspend_flags                        = 0
    Thread::_active_handles                       = 0x97b0b48
    Thread::_free_handle_block                    = (nil)
    Thread::_last_handle_mark                     = 0x65c92618
    Thread::_oops_do_parity                       = 0
    Thread::_highest_lock                         = 0x65c92cb8 "x^I{^I\xe8^K{^I\xb0\xd6\x82\xf4\xa0H\xbeu"
    Thread::_unboosted_priority                   = NoPriority
    Thread::_tlab                                 = {
        Thread::ThreadLocalAllocBuffer::_start               = 0x762476d8
        Thread::ThreadLocalAllocBuffer::_top                 = 0x7624c750
        Thread::ThreadLocalAllocBuffer::_end                 = 0x76257308
        Thread::ThreadLocalAllocBuffer::_size                = 14078U
        Thread::ThreadLocalAllocBuffer::_refill_waste_limit  = 219U
        Thread::ThreadLocalAllocBuffer::_number_of_refills   = 6U
        Thread::ThreadLocalAllocBuffer::_fast_refill_waste   = 0
        Thread::ThreadLocalAllocBuffer::_slow_refill_waste   = 202U
        Thread::ThreadLocalAllocBuffer::_gc_waste            = 0
        Thread::ThreadLocalAllocBuffer::_slow_allocations    = 0
        Thread::ThreadLocalAllocBuffer::_allocation_fraction = {
            Thread::ThreadLocalAllocBuffer::AdaptiveWeightedAverage::_average      = 0.006992618
            Thread::ThreadLocalAllocBuffer::AdaptiveWeightedAverage::_sample_count = 1U
            Thread::ThreadLocalAllocBuffer::AdaptiveWeightedAverage::_weight       = 35U
            Thread::ThreadLocalAllocBuffer::AdaptiveWeightedAverage::_last_sample  = 0.006992618
        }
        Thread::ThreadLocalAllocBuffer::_target_refills      = 50
        Thread::ThreadLocalAllocBuffer::_global_stats        = 136222936
    }
    Thread::_vm_operation_started_count           = 1
    Thread::_vm_operation_completed_count         = 1
    Thread::_current_pending_monitor              = (nil)
    Thread::_current_pending_monitor_is_from_java = 1
    Thread::_current_waiting_monitor              = (nil)
    Thread::_osthread                             = 0x97b0b00     <-------------
    Thread::_resource_area                        = 0x8c55f50
    Thread::_handle_area                          = 0x98dafc0
    Thread::_stack_base                           = 0x65c93000 ""
    Thread::_stack_size                           = 262144U
    Thread::_self_raw_id                          = 1731617792U
}

(dbx) print -r *((OSThread*)0x97b0b00)
*((class OSThread *) 0x97b0b00) = {
    OSThread::_start_proc                   = (nil)
    OSThread::_start_parm                   = (nil)
    OSThread::_state                        = RUNNABLE
    OSThread::_interrupted                  = 0
    OSThread::_thread_id                    = 11982U
    OSThread::_lwp_id                       = 11982U
    OSThread::_caller_sigmask               = {
        OSThread::__sigbits = (4U, 0, 0, 0)
    }
    OSThread::_vm_created_thread            = 1
    OSThread::_current_callback             = (nil)
    OSThread::_current_callback_lock        = (nil)
    OSThread::_interrupt_event              = 0x97b07d8
    OSThread::_saved_interrupt_thread_state = _thread_new
}

 

在前面的的文件显示,Threads_lock 的所有者是具有 ID 11982 的线程。现在我们看一下这个线程的堆栈跟踪。

(dbx) thread t@11982
t@11982 (l@11982) stopped in __lwp_park at 0xfef4009b
0xfef4009b: __lwp_park+0x000b:  jae      __lwp_park+0x19        [ 0xfef400a9, .+0xe ]
(dbx) where
current thread: t@11982
=>[1] __lwp_park(0x0, 0x0), at 0xfef4009b
  [2] mutex_lock_queue(0x67366400, 0x0, 0x88a0598, 0x0), at 0xfef3911f
  [3] slow_lock(0x67366400, 0x88a0598, 0x0), at 0xfef399bb
  [4] mutex_lock_impl(0x88a0598, 0x0), at 0xfef39ab1
  [5] __mutex_lock(0x88a0598), at 0xfef39bbd
  [6] Mutex::wait_for_lock_blocking_implementation(0xa27f908, 0x97b0978), at 0xfe9d93ba
  [7] Mutex::lock(0xa27f908, 0x97b0978), at 0xfe8c99ea
  [8] Mutex::lock(0xa27f908), at 0xfe8cc1fe
  [9] Parker::unpark(0xa27f908), at 0xfecb69bb
  [10] os::interrupt(0xa27f780), at 0xfecbf6bb
  [11] Thread::interrupt(0xa27f780), at 0xfecf2ed9
  [12] JVM_Interrupt(0x97b0a38, 0x65c923fc, 0x65c923d0, 0x0, 0x65c923fc, 0xf4873020, 0x0, 0xf482f600, 0x65c923fc, 0x65c92424,
 0xf8202b6b, 0x0, 0xf8206399, 0x9593fdc0, 0x1, 0x0, 0x65c92400, 0xf482db6b, 0x65c92438, 0xf4873020), at 0xfeb92255
  [13] 0xf820843a(0x0, 0xf8206399, 0x9593fdc0, 0x1, 0x0, 0x65c92400), at 0xf820843a
  [14] 0xf8202b6b(0x0, 0x0, 0x97f59c18, 0x9593fdc0, 0x65c9243c, 0xf5746e95), at 0xf8202b6b
  [15] 0xf8202b6b(0x967a0c40, 0x9, 0x968f6eb8, 0x65c92464, 0xf57472be, 0x65c92498), at 0xf8202b6b
  [16] 0xf8202b6b(0x0, 0x968f6eb8, 0x967a0c40, 0x65c9249c, 0xf5845a0c, 0x65c924c4), at 0xf8202b6b
  [17] 0xf8202b6b(0x1b58, 0x967a0c40, 0x65c924c8, 0xf5845674, 0x65c92524, 0xf58a1f78), at 0xf8202b6b
  [18] 0xf8202b6b(0x7624c168, 0x7624c138, 0x7624c108, 0x7624c0d8, 0x7624bee8, 0x7624bd00), at 0xf8202b6b

 

线程正试图获取 Monitor 0xa27f908。

现在,让我们看一下 Monitor。

(dbx) print -r *((Monitor*)0xa27f908)
*((class Monitor *) 0xa27f908) = {
    Monitor::Mutex::_lock_count      = 0
    Monitor::Mutex::_lock_event      = 0x88a0570  <-----------
    Monitor::Mutex::_suppress_signal = 0
    Monitor::Mutex::_owner           = (nil)
    Monitor::Mutex::_name            = 0xfed97c99 "Park Semaphore"
    Monitor::Mutex::INVALID_THREAD   = 0
    Monitor::_event   = (nil)
    Monitor::_counter = 0
    Monitor::_waiters = 0
    Monitor::_tickets = 0
}
(dbx) print -r *((Event*)0x88a0570)
More than one identifier 'Event'.
Select one of the following:
 0) Cancel
 1) class os::Solaris::Event
 2) class Solaris::Event
> 1
*((class os::Solaris::Event *) 0x88a0570) = {
    Event::_count   = 0
    Event::_nParked = 0
    Event::CachePad = (-6.7529205832009e-205, -6.6220952168053e-205, -1.2786650622506e-204, -1.9668306807887e-204)
    Event::_mutex   = (
    {
        Event::_lwp_mutex::flags = {
            Event::_lwp_mutex::flag1        = 4U
            Event::_lwp_mutex::flag2        = '\0'
            Event::_lwp_mutex::ceiling      = '\0'
            Event::_lwp_mutex::mbcp_type_un = {
                Event::_lwp_mutex::bcptype      = 0
                Event::_lwp_mutex::mtype_rcount = {
                    Event::_lwp_mutex::count_type1 = '\0'
                    Event::_lwp_mutex::count_type2 = '\0'
                }
            }
            Event::_lwp_mutex::magic        = 19800U
        }
        Event::_lwp_mutex::lock  = {
            Event::_lwp_mutex::lock64  = {
                Event::_lwp_mutex::pad = ""
            }
            Event::_lwp_mutex::lock32  = {
                Event::_lwp_mutex::ownerpid = 0
                Event::_lwp_mutex::lockword = 16842752U
            }
            Event::_lwp_mutex::owner64 = 72339069014638592ULL
        }
        Event::_lwp_mutex::data  = 1813462016ULL
    }

)
    Event::_cond    = (
    {
        Event::_lwp_cond::flags = {
            Event::_lwp_cond::flag  = ""
            Event::_lwp_cond::type  = 0
            Event::_lwp_cond::magic = 17238U
        }
        Event::_lwp_cond::data  = 0
    }

)
    Event::FreeNext = (nil)
    Event::Immortal = 0
}

 

这里,_lwp_mutex->data 是 0x6C173C00。它是 ulwp_t) 线程的所有者。

接下来,让我们看一下该线程。

(dbx) x 0x6C173C00/30X
0x6c173c00:      0x6c173c00 0x00000000 0x00000000 0x00000000
0x6c173c10:      0x00000000 0x00000000 0x00000000 0x00000000
0x6c173c20:      0x00000000 0x00000000 0x00000000 0xfef6b740
0x6c173c30:      0x00000000 0x00000000 0x6c174000 0x6c173800
0x6c173c40:      0x00000000 0x00000000 0x00000000 0x6803b000
0x6c173c50:      0x00040000 0x00000000 0x6807b000 0x00040000
0x6c173c60:      0x6803b000 0x00040000 0x00000000 0x00000398
0x6c173c70:      0x00000f98 0x0000007f

 

线程 0x00000f98 ( 是 3992) 是持有 Event::mutex 的线程 ID。

现在,让我们看一下线程 3992。

(dbx) thread t@3992
t@3992 (l@3992) stopped in __lwp_park at 0xfef4009b
0xfef4009b: __lwp_park+0x000b:  jae      __lwp_park+0x19        [ 0xfef400a9, .+0xe ]
(dbx) where
current thread: t@3992
=>[1] __lwp_park(0x0, 0x0), at 0xfef4009b
  [2] mutex_lock_queue(0x6c173c00, 0x0, 0x807c528, 0x0), at 0xfef3911f
  [3] slow_lock(0x6c173c00, 0x807c528, 0x0), at 0xfef399bb
  [4] mutex_lock_impl(0x807c528, 0x0), at 0xfef39ab1
  [5] __mutex_lock(0x807c528), at 0xfef39bbd
  [6] Mutex::lock_without_safepoint_check(0x807c4d0), at 0xfe8ea95e
  [7] SafepointSynchronize::block(0xa27f780), at 0xfe95c56e
  [8] Parker::park(0xa27f908, 0x0, 0xd964b198, 0x45), at 0xfecb6825
  [9] Unsafe_Park(0xa27f840, 0x6807aba4, 0x0, 0xd964b198, 0x45, 0x9480bcb8), at 0xfed0963d

 

它正等待锁定 Threads_lock

因此线程 11982 和 3992 发生了死锁:

  • 线程 11982 持有 Thread_lock 并等待获取 Monitor 0xa27f908。
  • 线程 3992 持有在 0xa27f908 上的锁并等待获取 Threads_lock

这个问题在 Java SE 5.0 更新 15 和 Java SE 6 中使用 CR 6601725 已经得到了修复。

以上文章转自于 : http://developers.sun.com.cn/

 

阅读(690) | 评论(0) | 转发(0) |
给主人留下些什么吧!~~