Skip to content

Psi 4.0b5#457

Merged
boegel merged 21 commits intoeasybuilders:developfrom
wpoely86:psi
Oct 11, 2013
Merged

Psi 4.0b5#457
boegel merged 21 commits intoeasybuilders:developfrom
wpoely86:psi

Conversation

@wpoely86
Copy link
Copy Markdown
Member

@wpoely86 wpoely86 commented Oct 6, 2013

This is the one to rule them all.

This PR replaces #443, #434, #439, #433 and #439.

Test on raichu and everything works.

Depends on easybuilders/easybuild-easyblocks#270

boegel and others added 9 commits October 10, 2013 17:58
…into psi

* 'psi' of https://github.com/boegel/easybuild-easyconfigs:
  remove respecifying default max_fail_ratio of 0.5 in NWChem easyconfig
  really fix typo in patch file
  fix typo in patch file
  complete mpi patch file for PSI-4.0b5
  align PSI easyconfigs, correct filename of -mt PSI goalf easyconfig, add mpi PSI goalf easyconfig
@wpoely86
Copy link
Copy Markdown
Member Author

All merged, ready to go! (At least!)

@boegel
Copy link
Copy Markdown
Member

boegel commented Oct 11, 2013

I retested all of this on gengar, with PSI being installed on gengar scratch, which worked fine.
I also tested this on delcatty, with PSI being installed on gengar scratch (NFS mounted). In that case, I'm still seeing PSI tests occasionally hanging...

I don't think the problem is in the build however, so I won't hold this PR any longer.

Just to clarify things: when you tested on raichu, on which filesystem was the target installation dir for PSI?

boegel added a commit that referenced this pull request Oct 11, 2013
@boegel boegel merged commit 8a31b2d into easybuilders:develop Oct 11, 2013
@wpoely86 wpoely86 deleted the psi branch October 13, 2013 14:09
@wpoely86
Copy link
Copy Markdown
Member Author

@boegel, I've tested on raichu, both on /tmp and on the scratch. In both case everything worked fine. If you can reproduce those hangs, I'm still interested in the backtraces.

@boegel
Copy link
Copy Markdown
Member

boegel commented Oct 18, 2013

I'm seeing the hang again, this time with PSI 4.0b4 on delcatty using delcatty's scratch (so not related to using NFS or not, indeed). This might be because the thread-pool patch is not applied to PSI 4.0b4 though, can you confirm?

It's hanging on:

user  35362  0.0  0.0   4396  1080 ?        S    Oct17   0:00                  \_ make tests TESTFLAGS=-u -q
user  35363  0.0  0.0   9356  1372 ?        S    Oct17   0:00                      \_ /bin/sh -c (cd tests; echo Running test suite...; make) || exit 1;
user  35364  0.0  0.0   9360   936 ?        S    Oct17   0:00                          \_ /bin/sh -c (cd tests; echo Running test suite...; make) || exit 1;
user  35365  0.0  0.0   4664  1348 ?        S    Oct17   0:00                              \_ make
user  42653  0.0  0.0   9356  1396 ?        S    Oct17   0:00                                  \_ /bin/sh -c make -C cisd-h2o-clpse; true
user  42654  0.0  0.0   4396  1092 ?        S    Oct17   0:00                                      \_ make -C cisd-h2o-clpse
user  42663  0.0  0.0  25900  2384 ?        S    Oct17   0:00                                          \_ perl ../runtest.pl /tmp/me/easybuild_build/PSI/4.0b4/ictce-4.1.13/psi4.0b4/tests/cisd-h2o-clpse/input.dat cisd-h2o-clpse.test false
user  42664  0.0  0.1 2671244 95908 ?       Sl   Oct17   0:11                                              \_ ../../bin/psi4 /tmp/me/easybuild_build/PSI/4.0b4/ictce-4.1.13/psi4.0b4/tests/cisd-h2o-clpse/input.dat output.dat

Here's the backtrace:

(gdb) bt
#0  0x00002aaaab46243c in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
#1  0x0000000000e6833d in psi::detci::tpool_queue_close(psi::detci::tpool*, int) ()
#2  0x0000000000e6330e in psi::detci::s3_block_vdiag(psi::detci::stringwr*, psi::detci::stringwr*, double**, double**, double*, int, int, int, int, int, int, int, int, double**, double*, double*, double*, int*, int*) ()
#3  0x0000000000e4403d in psi::detci::sigma_block(psi::detci::stringwr**, psi::detci::stringwr**, double**, double**, double*, double*, int, int, int, int, int, int, int, int, int, int, int, int, int, int, int, int) ()
#4  0x0000000000e44400 in psi::detci::sigma_b(psi::detci::stringwr**, psi::detci::stringwr**, psi::detci::CIvect&, psi::detci::CIvect&, double*, double*, int, int) ()
#5  0x0000000000e42ce2 in psi::detci::sigma(psi::detci::stringwr**, psi::detci::stringwr**, psi::detci::CIvect&, psi::detci::CIvect&, double*, double*, int, int) ()
#6  0x0000000000e3ebb9 in psi::detci::sem_iter(psi::detci::CIvect&, psi::detci::stringwr**, psi::detci::stringwr**, double*, double, double, double, double, int, int, int, _IO_FILE*, int) ()
#7  0x0000000000e227dc in psi::detci::diag_h(psi::detci::stringwr**, psi::detci::stringwr**) ()
#8  0x0000000000e1e89e in psi::detci::detci(psi::Options&) ()
#9  0x0000000000748cbd in py_psi_detci() ()
#10 0x00000000007691fe in boost::python::objects::caller_py_function_impl<boost::python::detail::caller<double (*)(), boost::python::default_call_policies, boost::mpl::vector1<double> > >::operator()(_object*, _object*) ()
#11 0x00002aaaac8d5f71 in boost::python::objects::function::call(_object*, _object*) const ()
   from /user/scratchdelcatty/replica/vsc400/vsc40023/easybuild_REGTEST/SL6/sandybridge/software/Boost/1.53.0-ictce-4.1.13-Python-2.7.3/lib/libboost_python.so.1.53.0
#12 0x00002aaaac8d5c85 in boost::detail::function::void_function_ref_invoker0<boost::python::objects::(anonymous namespace)::bind_return, void>::invoke(boost::detail::function::function_buffer&) () from /user/scratchdelcatty/replica/vsc400/vsc40023/easybuild_REGTEST/SL6/sandybridge/software/Boost/1.53.0-ictce-4.1.13-Python-2.7.3/lib/libboost_python.so.1.53.0
#13 0x00002aaaac8e0cfb in boost::python::handle_exception_impl(boost::function0<void>) ()
   from /user/scratchdelcatty/replica/vsc400/vsc40023/easybuild_REGTEST/SL6/sandybridge/software/Boost/1.53.0-ictce-4.1.13-Python-2.7.3/lib/libboost_python.so.1.53.0
#14 0x00002aaaac8d7837 in function_call ()
   from /user/scratchdelcatty/replica/vsc400/vsc40023/easybuild_REGTEST/SL6/sandybridge/software/Boost/1.53.0-ictce-4.1.13-Python-2.7.3/lib/libboost_python.so.1.53.0
#15 0x00002aaaad9f64eb in PyObject_Call (func=0x1244480c, arg=0x80, kw=0x5b) at Objects/abstract.c:2529
#16 0x00002aaaadaccac6 in call_function (pp_stack=0x1244480c, oparg=128) at Python/ceval.c:4044
#17 0x00002aaaadac4e14 in PyEval_EvalFrameEx (f=0x1244480c, throwflag=128) at Python/ceval.c:2666
#18 0x00002aaaadacb36d in PyEval_EvalCodeEx (co=0x1244480c, globals=0x80, locals=0x5b, args=0xffffffffffffffff, argcount=0, kws=0x2d, kwcount=0, defs=0x0, defcount=0, closure=0x0)
    at Python/ceval.c:3253
#19 0x00002aaaada2fea0 in function_call (func=0x1244480c, arg=0x80, kw=0x5b) at Objects/funcobject.c:526
#20 0x00002aaaad9f64eb in PyObject_Call (func=0x1244480c, arg=0x80, kw=0x5b) at Objects/abstract.c:2529
#21 0x00002aaaadacc105 in ext_do_call (func=0x1244480c, pp_stack=0x80, flags=91, na=-1, nk=0) at Python/ceval.c:4334
#22 0x00002aaaadac49c1 in PyEval_EvalFrameEx (f=0x1244480c, throwflag=128) at Python/ceval.c:2705
#23 0x00002aaaadacb36d in PyEval_EvalCodeEx (co=0x1244480c, globals=0x80, locals=0x5b, args=0xffffffffffffffff, argcount=0, kws=0x2d, kwcount=0, defs=0x0, defcount=0, closure=0x0)
    at Python/ceval.c:3253
#24 0x00002aaaadaccde2 in call_function (pp_stack=0x1244480c, oparg=128) at Python/ceval.c:4042
#25 0x00002aaaadac4e14 in PyEval_EvalFrameEx (f=0x1244480c, throwflag=128) at Python/ceval.c:2666
#26 0x00002aaaadacb36d in PyEval_EvalCodeEx (co=0x1244480c, globals=0x80, locals=0x5b, args=0xffffffffffffffff, argcount=0, kws=0x2d, kwcount=0, defs=0x0, defcount=0, closure=0x0)
    at Python/ceval.c:3253
#27 0x00002aaaadacac29 in PyEval_EvalCode (co=0x1244480c, globals=0x80, locals=0x5b) at Python/ceval.c:667
#28 0x00002aaaadaff833 in PyRun_StringFlags (str=0x1244480c "[", start=128, globals=0x5b, locals=0xffffffffffffffff, flags=0x0) at Python/pythonrun.c:1316
#29 0x00002aaaac8e5192 in boost::python::exec(boost::python::str, boost::python::api::object, boost::python::api::object) ()
   from /user/scratchdelcatty/replica/vsc400/vsc40023/easybuild_REGTEST/SL6/sandybridge/software/Boost/1.53.0-ictce-4.1.13-Python-2.7.3/lib/libboost_python.so.1.53.0
#30 0x00000000007534aa in psi::Python::run(_IO_FILE*) ()
#31 0x000000000071dbb4 in main ()

@wpoely86
Copy link
Copy Markdown
Member Author

Yeah, this is with 99% certainty the same bug. Apply PSI-4.0b5-thread-pool.patch and everything should be fine.

I'm puzzled that you can trigger this bug so easily. The threads have to follow a certain pattern for it to trigger: the only way I could reproduce it as on a NFS share but I don't understand the correlation between NFS and the scheduler. Once loaded into memory, NFS should not make any difference.
Anyway, as far as I can see, this bug is already in PSI for a long long time.

@boegel
Copy link
Copy Markdown
Member

boegel commented Oct 18, 2013

@wpoely86: OK, fixed in #471

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants