21 July 2025: This instance at RAL is read-only. Please do not try submitting new workflows for now.
Jobsub ID 418831.62@justin-prod-sched01.dune.hep.ac.uk
Jobsub ID | 418831.62@justin-prod-sched01.dune.hep.ac.uk | |
Workflow ID | 7809 | |
Stage ID | 1 | |
User name | calcuttj@fnal.gov | |
HTCondor Group | group_dune.prod_mcsim | |
Requested | Processors | 1 |
GPU | No | |
RSS bytes | 4193255424 (3999 MiB) | |
Wall seconds limit | 80000 (22 hours) | |
Submitted time | 2025-06-20 14:35:56 | |
Site | US_FNAL-FermiGrid | |
Entry | FNAL_GPGrid_ce04_mcore_op_duneonly | |
Last heartbeat | 2025-06-20 15:30:54 | |
From worker node | Hostname | dunegli-5841960-0-fnpc22014.fnal.gov |
cpuinfo | AMD EPYC 7543 32-Core Processor | |
OS release | Scientific Linux release 7.9 (Nitrogen) | |
Processors | 1 | |
RSS bytes | 4193255424 (3999 MiB) | |
Wall seconds limit | 172800 (48 hours) | |
GPU | ||
Inner Apptainer? | True | |
Job state | jobscript_error | |
Allocator name | justin-allocator-pro.dune.hep.ac.uk | |
Started | 2025-06-20 14:40:52 | |
Input files | monte-carlo-007809-000058 | |
Jobscript | Exit code | 1 |
Real time | 0m (0s) | |
CPU time | 0m (0s = 0%) | |
Max RSS bytes | 0 (0 MiB) | |
Outputting started | ||
Output files | ||
Finished | 2025-06-20 15:30:48 | |
Saved logs | justin-logs:418831.62-justin-prod-sched01.dune.hep.ac.uk.logs.tgz | |
List job events Wrapper job log |
Jobscript log (last 10,000 characters)
34b_-1GeV_-27.7_008501_418831_62_20250620T144059.root:/VirtualDetector Error in <TNetXNGFile::ReadBuffer>: [ERROR] Server responded with an error: [3011] Item not found. Error in <TKey::ReadFile>: Failed to read data. hadd Target path: H4_v34b_-1GeV_-27.7_008501_418831_62_20250620T144059.root:/Detector Error in <TNetXNGFile::TNetXNGFile>: The remote file is not open Error in <TKey::ReadFile>: Failed to read data. hadd Target path: H4_v34b_-1GeV_-27.7_008501_418831_62_20250620T144059.root:/NTuples Error in <TNetXNGFile::TNetXNGFile>: The remote file is not open Error in <TKey::ReadFile>: Failed to read data. Error in <TNetXNGFile::Close>: [ERROR] Server responded with an error: [3011] Item not found. *** Break *** segmentation violation =========================================================== There was a crash. This is the entire stack trace of all threads: =========================================================== Thread 6 (Thread 0x154c889fe700 (LWP 969) "hadd"): #0 0x0000154cade0db3b in do_futex_wait.constprop () from /lib64/libpthread.so.0 #1 0x0000154cade0dbcf in __new_sem_wait_slow.constprop.0 () from /lib64/libpthread.so.0 #2 0x0000154cade0dc6b in sem_wait GLIBC_2.2.5 () from /lib64/libpthread.so.0 #3 0x0000154c8bd84566 in XrdSysSemaphore::Wait (this=0x33cd4e0) at /scratch/workspace/canvas-products/label1/swarm/label2/SLF7/build/xrootd/v5_5_5a/source/xrootd-5.5.5/src/./XrdSys/XrdSysPthread.hh:509 #4 XrdCl::SyncQueue<XrdCl::JobManager::JobHelper>::Get (this=0x33c1cf8) at /scratch/workspace/canvas-products/label1/swarm/label2/SLF7/build/xrootd/v5_5_5a/source/xrootd-5.5.5/src/./XrdCl/XrdClSyncQueue.hh:66 #5 XrdCl::JobManager::RunJobs (this=0x33c1ce0) at /scratch/workspace/canvas-products/label1/swarm/label2/SLF7/build/xrootd/v5_5_5a/source/xrootd-5.5.5/src/XrdCl/XrdClJobManager.cc:151 #6 0x0000154c8bd84619 in RunRunnerThread (arg=<optimized out>) at /scratch/workspace/canvas-products/label1/swarm/label2/SLF7/build/xrootd/v5_5_5a/source/xrootd-5.5.5/src/XrdCl/XrdClJobManager.cc:34 #7 0x0000154cade07ea5 in start_thread () from /lib64/libpthread.so.0 #8 0x0000154caccfeb0d in clone () from /lib64/libc.so.6 Thread 5 (Thread 0x154c887fd700 (LWP 968) "hadd"): #0 0x0000154cade0db3b in do_futex_wait.constprop () from /lib64/libpthread.so.0 #1 0x0000154cade0dbcf in __new_sem_wait_slow.constprop.0 () from /lib64/libpthread.so.0 #2 0x0000154cade0dc6b in sem_wait GLIBC_2.2.5 () from /lib64/libpthread.so.0 #3 0x0000154c8bd84566 in XrdSysSemaphore::Wait (this=0x33cd4e0) at /scratch/workspace/canvas-products/label1/swarm/label2/SLF7/build/xrootd/v5_5_5a/source/xrootd-5.5.5/src/./XrdSys/XrdSysPthread.hh:509 #4 XrdCl::SyncQueue<XrdCl::JobManager::JobHelper>::Get (this=0x33c1cf8) at /scratch/workspace/canvas-products/label1/swarm/label2/SLF7/build/xrootd/v5_5_5a/source/xrootd-5.5.5/src/./XrdCl/XrdClSyncQueue.hh:66 #5 XrdCl::JobManager::RunJobs (this=0x33c1ce0) at /scratch/workspace/canvas-products/label1/swarm/label2/SLF7/build/xrootd/v5_5_5a/source/xrootd-5.5.5/src/XrdCl/XrdClJobManager.cc:151 #6 0x0000154c8bd84619 in RunRunnerThread (arg=<optimized out>) at /scratch/workspace/canvas-products/label1/swarm/label2/SLF7/build/xrootd/v5_5_5a/source/xrootd-5.5.5/src/XrdCl/XrdClJobManager.cc:34 #7 0x0000154cade07ea5 in start_thread () from /lib64/libpthread.so.0 #8 0x0000154caccfeb0d in clone () from /lib64/libc.so.6 Thread 4 (Thread 0x154c885fc700 (LWP 967) "hadd"): #0 0x0000154cade0db3b in do_futex_wait.constprop () from /lib64/libpthread.so.0 #1 0x0000154cade0dbcf in __new_sem_wait_slow.constprop.0 () from /lib64/libpthread.so.0 #2 0x0000154cade0dc6b in sem_wait GLIBC_2.2.5 () from /lib64/libpthread.so.0 #3 0x0000154c8bd84566 in XrdSysSemaphore::Wait (this=0x33cd4e0) at /scratch/workspace/canvas-products/label1/swarm/label2/SLF7/build/xrootd/v5_5_5a/source/xrootd-5.5.5/src/./XrdSys/XrdSysPthread.hh:509 #4 XrdCl::SyncQueue<XrdCl::JobManager::JobHelper>::Get (this=0x33c1cf8) at /scratch/workspace/canvas-products/label1/swarm/label2/SLF7/build/xrootd/v5_5_5a/source/xrootd-5.5.5/src/./XrdCl/XrdClSyncQueue.hh:66 #5 XrdCl::JobManager::RunJobs (this=0x33c1ce0) at /scratch/workspace/canvas-products/label1/swarm/label2/SLF7/build/xrootd/v5_5_5a/source/xrootd-5.5.5/src/XrdCl/XrdClJobManager.cc:151 #6 0x0000154c8bd84619 in RunRunnerThread (arg=<optimized out>) at /scratch/workspace/canvas-products/label1/swarm/label2/SLF7/build/xrootd/v5_5_5a/source/xrootd-5.5.5/src/XrdCl/XrdClJobManager.cc:34 #7 0x0000154cade07ea5 in start_thread () from /lib64/libpthread.so.0 #8 0x0000154caccfeb0d in clone () from /lib64/libc.so.6 Thread 3 (Thread 0x154c88bff700 (LWP 966) "hadd"): #0 0x0000154cade0ee9d in nanosleep () from /lib64/libpthread.so.0 #1 0x0000154c8c25646d in XrdSysTimer::Wait (mills=<optimized out>) at /scratch/workspace/canvas-products/label1/swarm/label2/SLF7/build/xrootd/v5_5_5a/source/xrootd-5.5.5/src/XrdSys/XrdSysTimer.cc:239 #2 0x0000154c8bd0497d in XrdCl::TaskManager::RunTasks (this=0x33ac780) at /scratch/workspace/canvas-products/label1/swarm/label2/SLF7/build/xrootd/v5_5_5a/source/xrootd-5.5.5/src/XrdCl/XrdClTaskManager.cc:246 #3 0x0000154c8bd04a89 in RunRunnerThread (arg=<optimized out>) at /scratch/workspace/canvas-products/label1/swarm/label2/SLF7/build/xrootd/v5_5_5a/source/xrootd-5.5.5/src/XrdCl/XrdClTaskManager.cc:38 #4 0x0000154cade07ea5 in start_thread () from /lib64/libpthread.so.0 #5 0x0000154caccfeb0d in clone () from /lib64/libc.so.6 Thread 2 (Thread 0x154c98fff700 (LWP 965) "hadd"): #0 0x0000154caccff0e3 in epoll_wait () from /lib64/libc.so.6 #1 0x0000154c8c2507d7 in XrdSys::IOEvents::PollE::Begin (this=0x5b69310, syncsem=<optimized out>, retcode=<optimized out>, eTxt=<optimized out>) at /scratch/workspace/canvas-products/label1/swarm/label2/SLF7/build/xrootd/v5_5_5a/source/xrootd-5.5.5/src/./XrdSys/XrdSysIOEventsPollE.icc:212 #2 0x0000154c8c24c905 in XrdSys::IOEvents::BootStrap::Start (parg=0x7ffcde310420) at /scratch/workspace/canvas-products/label1/swarm/label2/SLF7/build/xrootd/v5_5_5a/source/xrootd-5.5.5/src/XrdSys/XrdSysIOEvents.cc:149 #3 0x0000154c8c255b28 in XrdSysThread_Xeq (myargs=0x33fcaa0) at /scratch/workspace/canvas-products/label1/swarm/label2/SLF7/build/xrootd/v5_5_5a/source/xrootd-5.5.5/src/XrdSys/XrdSysPthread.cc:86 #4 0x0000154cade07ea5 in start_thread () from /lib64/libpthread.so.0 #5 0x0000154caccfeb0d in clone () from /lib64/libc.so.6 Thread 1 (Thread 0x154cb28c0cc0 (LWP 943) "hadd"): #0 0x0000154caccc5659 in waitpid () from /lib64/libc.so.6 #1 0x0000154cacc42f62 in do_system () from /lib64/libc.so.6 #2 0x0000154cacc43311 in system () from /lib64/libc.so.6 #3 0x0000154cae4e5ecc in TUnixSystem::Exec (shellcmd=<optimized out>, this=0xb81620) at /scratch/workspace/critic-slf/BUILDTYPE/prof/QUAL/e26/label1/swarm/label2/SLF7/build/root/v6_28_12/source/root-6.28.12/core/unix/src/TUnixSystem.cxx:2104 #4 TUnixSystem::StackTrace (this=0xb81620) at /scratch/workspace/critic-slf/BUILDTYPE/prof/QUAL/e26/label1/swarm/label2/SLF7/build/root/v6_28_12/source/root-6.28.12/core/unix/src/TUnixSystem.cxx:2395 #5 0x0000154cae4e5894 in TUnixSystem::DispatchSignals (this=0xb81620, sig=kSigSegmentationViolation) at /scratch/workspace/critic-slf/BUILDTYPE/prof/QUAL/e26/label1/swarm/label2/SLF7/build/root/v6_28_12/source/root-6.28.12/core/unix/src/TUnixSystem.cxx:3615 #6 <signal handler called> #7 0x0000000000000000 in ?? () #8 0x0000154cae39723d in (anonymous namespace)::R__ListSlowClose (files=0xb87020) at /scratch/workspace/critic-slf/BUILDTYPE/prof/QUAL/e26/label1/swarm/label2/SLF7/build/root/v6_28_12/source/root-6.28.12/core/base/src/TROOT.cxx:1095 #9 0x0000154cae397f84 in TROOT::CloseFiles (this=0x154cae84b240 <ROOT::Internal::GetROOT1()::alloc>) at /scratch/workspace/critic-slf/BUILDTYPE/prof/QUAL/e26/label1/swarm/label2/SLF7/build/root/v6_28_12/source/root-6.28.12/core/base/src/TROOT.cxx:1145 #10 0x0000154cacc39ce9 in __run_exit_handlers () from /lib64/libc.so.6 #11 0x0000154cacc39d37 in exit () from /lib64/libc.so.6 #12 0x0000154cacc2255c in __libc_start_main () from /lib64/libc.so.6 #13 0x0000000000406be9 in _start () =========================================================== The lines below might hint at the cause of the crash. If you see question marks as part of the stack trace, try to recompile with debugging information enabled and export CLING_DEBUG=1 environment variable before running. You may get help by asking at the ROOT forum https://root.cern/forum Only if you are really convinced it is a bug in ROOT then please submit a report at https://root.cern/bugs Please post the ENTIRE stack trace from above as an attachment in addition to anything else that might help us fixing this issue. =========================================================== #7 0x0000000000000000 in ?? () #8 0x0000154cae39723d in (anonymous namespace)::R__ListSlowClose (files=0xb87020) at /scratch/workspace/critic-slf/BUILDTYPE/prof/QUAL/e26/label1/swarm/label2/SLF7/build/root/v6_28_12/source/root-6.28.12/core/base/src/TROOT.cxx:1095 #9 0x0000154cae397f84 in TROOT::CloseFiles (this=0x154cae84b240 <ROOT::Internal::GetROOT1()::alloc>) at /scratch/workspace/critic-slf/BUILDTYPE/prof/QUAL/e26/label1/swarm/label2/SLF7/build/root/v6_28_12/source/root-6.28.12/core/base/src/TROOT.cxx:1145 #10 0x0000154cacc39ce9 in __run_exit_handlers () from /lib64/libc.so.6 #11 0x0000154cacc39d37 in exit () from /lib64/libc.so.6 #12 0x0000154cacc2255c in __libc_start_main () from /lib64/libc.so.6 #13 0x0000000000406be9 in _start () =========================================================== Traceback (most recent call last): File "/cvmfs/fifeuser4.opensciencegrid.org/sw/dune/5a837a2f9ce0b916d8725ae4ed0b18872c84fe1f//merge_g4bl.py", line 403, in <module> do_merge(args) File "/cvmfs/fifeuser4.opensciencegrid.org/sw/dune/5a837a2f9ce0b916d8725ae4ed0b18872c84fe1f//merge_g4bl.py", line 119, in do_merge raise Exception('Error in hadd') Exception: Error in hadd Exiting with error