Sei sulla pagina 1di 3

[#WT-2608] update-checkpoint-btree hangs in WT-2544

https://jira.mongodb.org/si/jira.issueviews:issue-html/WT-260...

[WT-2608] update-checkpoint-btree hangs in WT-2544 Created: 03/May/16

Updated: 26/May/16 Resolved: 26/May/16

Status:

Resolved

Project:

WiredTiger

Component/s:

None

Affects Version/s:

None

Fix Version/s:

None

Type:

Bug

Priority:

Major - P3

Reporter:

David Hows

Assignee:

Sue LoVerso

Resolution:

Gone away

Votes:

Labels:

None

Issue Links:

Related
is related to

WT-2544

Fix eviction statistics when clear is...

Operating System:

ALL

# Replies:

10

Participants:

David Hows, Michael Cahill, Sue LoVerso

Days since reply:

24 weeks ago

Date of 1st Reply:

03/May/16 3:14 PM

Last commenter:

Sue LoVerso

Last comment by
Customer:

true

Resolved

Description
Currently wiredtiger-perf-checkpoint is hung on the update-checkpoint-btree task.
Job is here
Trace is as below:

(gdb) thread apply all bt

Thread 9 (Thread 0x7f52ce7ff700 (LWP 16225)):


#0 0x00007f52cf768f4d in __lll_lock_wait () from /lib64/libpthread.so.0
#1 0x00007f52cf764d02 in _L_lock_791 () from /lib64/libpthread.so.0
#2 0x00007f52cf764c08 in pthread_mutex_lock () from /lib64/libpthread.so.0
#3 0x000000000041fde4 in __wt_spin_lock (session=0x7f52ce89bd00, t=<optimized out>) at ../src/include/mutex.i:159
#4 __evict_clear_walk (session=0x7f52ce89bd00, session=0x7f52ce89bd00, is_locked=false) at ../src/evict/evict_lru.c:731
#5 __evict_clear_walks (session=session@entry=0x7f52ce89bd00) at ../src/evict/evict_lru.c:759
#6 0x00000000004215f3 in __evict_pass (is_server=true, session=0x7f52ce89bd00) at ../src/evict/evict_lru.c:590
#7 __evict_server (arg=0x7f52ce89bd00) at ../src/evict/evict_lru.c:198
#8 0x00007f52cf762dc5 in start_thread () from /lib64/libpthread.so.0
#9 0x00007f52cf48fc9d in clone () from /lib64/libc.so.6

Thread 8 (Thread 0x7f52cdffe700 (LWP 16226)):


#0 0x00007f52cf766a82 in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
#1 0x0000000000435c09 in __wt_cond_wait_signal (session=session@entry=0x7f52ce89c040, cond=0x7f52ce80f080, usecs=<optimized
#2 0x000000000041844c in __wt_cond_wait (usecs=<optimized out>, cond=<optimized out>, session=0x7f52ce89c040) at ../src/inc
#3 __sweep_server (arg=0x7f52ce89c040) at ../src/conn/conn_sweep.c:272
#4 0x00007f52cf762dc5 in start_thread () from /lib64/libpthread.so.0
#5 0x00007f52cf48fc9d in clone () from /lib64/libc.so.6

Comments
Comment by Sue LoVerso [ 03/May/16 ]
FYI, this hang is from the wt-2544 branch, not on develop. I will look into it.

1 of 3

11/9/16, 8:40 PM

[#WT-2608] update-checkpoint-btree hangs in WT-2544

https://jira.mongodb.org/si/jira.issueviews:issue-html/WT-260...

Comment by Sue LoVerso [ 03/May/16 ]


I have found and pushed a fix to the branch for this deadlock. I aborted all the existing Jenkins perf tests and restarted them on the branch so that they run with the fix.
Comment by David Hows [ 03/May/16 ]
Thanks Sue.
Assuming the currently active run of perf passes, I'l close this.
Comment by David Hows [ 05/May/16 ]

Looks like this is still hanging also in the mongo-ycsb-develop job. That task currently on WT-2544 and on the correct hashes which should include your fix if I understand c
Mongo Hash c8b9c6bcd9c0fd74c37570727516956656eb1b65
WT Hash 5482e7014c6bed8995bd821db52fb5ac5abfa6e9
Comment by David Hows [ 05/May/16 ]
(gdb) thread apply all bt

Thread 29 (Thread 0x7f970029d700 (LWP 16311)):


#0

0x00007f970066ee91 in sigwait () from /lib64/libpthread.so.0

#1

0x0000000001516b92 in mongo::(anonymous namespace)::signalProcessingThread () at src/mongo/util/signal_handlers.cpp:170

#2

0x0000000001e275b0 in std::(anonymous namespace)::execute_native_thread_routine (__p=<optimized out>) at ../../../../../

#3

0x00007f9700667dc5 in start_thread () from /lib64/libpthread.so.0

#4

0x00007f9700394c9d in clone () from /lib64/libc.so.6

Thread 28 (Thread 0x7f96ffa9c700 (LWP 16312)):


#0

0x00007f970066ba82 in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0

#1

0x00000000014b5379 in __gthread_cond_timedwait (__abs_timeout=0x7f96ffa9b930, __mutex=<optimized out>, __cond=0x33b9d68)

#2

__wait_until_impl<std::chrono::duration<long, std::ratio<1l, 1000000000l> > > (__atime=..., __lock=<synthetic pointer>,

#3

wait_until<std::chrono::duration<long, std::ratio<1l, 1000000000l> > > (__atime=..., __lock=<synthetic pointer>, this=0x

#4

wait_for<long, std::ratio<1l, 1000l> > (__rtime=..., __lock=<synthetic pointer>, this=0x33b9d68) at /opt/mongodbtoolchai

#5

operator() (__closure=0x32b0108) at src/mongo/util/background_thread_clock_source.cpp:73

#6

_M_invoke<> (this=0x32b0108) at /opt/mongodbtoolchain/v2/include/c++/5.3.0/functional:1531

#7

operator() (this=0x32b0108) at /opt/mongodbtoolchain/v2/include/c++/5.3.0/functional:1520

#8

std::thread::_Impl<std::_Bind_simple<mongo::BackgroundThreadClockSource::_startTimerThread()::<lambda()>()> >::_M_run(vo

Comment by David Hows [ 05/May/16 ]


Thread 14 (Thread 0x7f96f8a8e700 (LWP 16326)):
#0

0x00007f970066ba82 in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0

#1

0x0000000000d090f6 in __gthread_cond_timedwait (__abs_timeout=0x7f96f8a8d840, __mutex=<optimized out>, __cond=0x32b6310)

#2

__wait_until_impl<std::chrono::duration<long, std::ratio<1l, 1000000000l> > > (__atime=..., __lock=<synthetic pointer>,

#3

wait_until<std::chrono::duration<long, std::ratio<1l, 1000000000l> > > (__atime=..., __lock=<synthetic pointer>, this=0x

#4

mongo::FTDCController::doLoop (this=0x32b62c0) at src/mongo/db/ftdc/controller.cpp:175

#5

0x0000000001e275b0 in std::(anonymous namespace)::execute_native_thread_routine (__p=<optimized out>) at ../../../../../

#6

0x00007f9700667dc5 in start_thread () from /lib64/libpthread.so.0

#7

0x00007f9700394c9d in clone () from /lib64/libc.so.6

Thread 13 (Thread 0x7f97013b1700 (LWP 16362)):


#0

0x00007f970066e5bb in recv () from /lib64/libpthread.so.0

#1

0x00000000014da9cc in mongo::Socket::_recv (this=this@entry=0x318e500, buf=buf@entry=0x7f97013b0230 "`\002;\001\227\177"

#2

0x00000000014daa11 in mongo::Socket::unsafe_recv (this=this@entry=0x318e500, buf=buf@entry=0x7f97013b0230 "`\002;\001\22

#3

0x00000000014daa6d in mongo::Socket::recv (this=0x318e500, buf=buf@entry=0x7f97013b0230 "`\002;\001\227\177", len=len@en

#4

0x00000000014d0739 in mongo::MessagingPort::recv (this=0x3166800, m=...) at src/mongo/util/net/message_port.cpp:142

#5

0x00000000014d331e in mongo::PortMessageServer::handleIncomingMsg (arg=<optimized out>) at src/mongo/util/net/message_se

#6

0x00007f9700667dc5 in start_thread () from /lib64/libpthread.so.0

#7

0x00007f9700394c9d in clone () from /lib64/libc.so.6

Comment by Sue LoVerso [ 05/May/16 ]

David Hows Thank you for those stacks! They point to another/different deadlock than the one I fixed the other day. I see where it is coming from and will work on a fix tom
Comment by Michael Cahill [ 05/May/16 ]

2 of 3

11/9/16, 8:40 PM

[#WT-2608] update-checkpoint-btree hangs in WT-2544

https://jira.mongodb.org/si/jira.issueviews:issue-html/WT-260...

A little more detail based on the call with Sue LoVerso:


This thread is holding the evict_pass lock and waiting for the data handle lock:
select,__wt_sleep,__evict_walk,__evict_lru_walk,__evict_pass,__evict_worker,start_thread,clone
This thread is holding the data handle lock and waiting on the evict_pass lock:

__lll_lock_wait,_L_lock_791,pthread_mutex_lock,__wt_spin_lock,__evict_clear_walk,__evict_clear_all_walks,__evict_server,star
Comment by Sue LoVerso [ 05/May/16 ]

I have been thinking about this issue and walking through code all morning. I believe that this deadlock indicates that the simplistic approach in the current branch will not w
the clear code) that both assumes it is running as the server and that there is no populate going on at the time. I'll put more in the WT-2544 ticket.
Comment by Sue LoVerso [ 24/May/16 ]
I believe I have fixed the deadlock. I am going to point the perf tests to the branch to verify.
Generated at Wed Nov 09 19:40:04 UTC 2016 using JIRA 6.4.14#64029-sha1:ae256fe0fbb912241490ff1cecfb323ea0905ca5.

3 of 3

11/9/16, 8:40 PM

Potrebbero piacerti anche