justIN           Dashboard       Workflows       Jobs       AWT       Sites       Storages       Docs       Login

21 July 2025: This instance at RAL is read-only. Please do not try submitting new workflows for now.

Jobsub ID 226827.24@justin-prod-sched02.dune.hep.ac.uk

Jobsub ID226827.24@justin-prod-sched02.dune.hep.ac.uk
Workflow ID7822
Stage ID1
User namecalcuttj@fnal.gov
HTCondor Groupgroup_dune.prod_mcsim
RequestedProcessors1
GPUNo
RSS bytes4193255424 (3999 MiB)
Wall seconds limit80000 (22 hours)
Submitted time2025-06-20 20:21:17
SiteUK_RAL-PPD
EntryCMSHTPC_T2_UK_SGrid_RALPP_hep207
Last heartbeat2025-06-21 01:31:27
From worker nodeHostnameheplnc164.pp.rl.ac.uk
cpuinfoAMD EPYC 7763 64-Core Processor
OS releaseScientific Linux release 7.9 (Nitrogen)
Processors1
RSS bytes4193255424 (3999 MiB)
Wall seconds limit257400 (71 hours)
GPU
Inner Apptainer?True
Job statejobscript_error
Allocator namejustin-allocator-pro.dune.hep.ac.uk
Started2025-06-20 20:41:58
Input filesmonte-carlo-007822-000022
JobscriptExit code1
Real time0m (0s)
CPU time0m (0s = 0%)
Max RSS bytes0 (0 MiB)
Outputting started 
Output files
Finished2025-06-21 01:31:27
Saved logsjustin-logs:226827.24-justin-prod-sched02.dune.hep.ac.uk.logs.tgz
List job events     Wrapper job log

Jobscript log (last 10,000 characters)

rror in <TNetXNGFile::ReadBuffer>: [ERROR] Server responded with an error: [3012] Too many DFS read attempts; operation terminated
Error in <TKey::ReadFile>: Failed to read data.
Error in <TNetXNGFile::TNetXNGFile>: The remote file is not open
hadd Target path: H4_v34b_-1GeV_-27.7_002401_226827_24_20250620T204205.root:/Detector
Error in <TNetXNGFile::TNetXNGFile>: The remote file is not open
Error in <TKey::ReadFile>: Failed to read data.
hadd Target path: H4_v34b_-1GeV_-27.7_002401_226827_24_20250620T204205.root:/NTuples
Error in <TNetXNGFile::TNetXNGFile>: The remote file is not open
Error in <TKey::ReadFile>: Failed to read data.
Error in <TNetXNGFile::Close>: [ERROR] Server responded with an error: [3012] Too many DFS read attempts; operation terminated

 *** Break *** segmentation violation



===========================================================
There was a crash.
This is the entire stack trace of all threads:
===========================================================

Thread 6 (Thread 0x1530a3bfe700 (LWP 975) "hadd"):
#0  0x00001530c900db3b in do_futex_wait.constprop () from /lib64/libpthread.so.0
#1  0x00001530c900dbcf in __new_sem_wait_slow.constprop.0 () from /lib64/libpthread.so.0
#2  0x00001530c900dc6b in sem_wait

GLIBC_2.2.5 () from /lib64/libpthread.so.0
#3  0x00001530a6f84566 in XrdSysSemaphore::Wait (this=0x3f04050) at /scratch/workspace/canvas-products/label1/swarm/label2/SLF7/build/xrootd/v5_5_5a/source/xrootd-5.5.5/src/./XrdSys/XrdSysPthread.hh:509
#4  XrdCl::SyncQueue<XrdCl::JobManager::JobHelper>::Get (this=0x3f26788) at /scratch/workspace/canvas-products/label1/swarm/label2/SLF7/build/xrootd/v5_5_5a/source/xrootd-5.5.5/src/./XrdCl/XrdClSyncQueue.hh:66
#5  XrdCl::JobManager::RunJobs (this=0x3f26770) at /scratch/workspace/canvas-products/label1/swarm/label2/SLF7/build/xrootd/v5_5_5a/source/xrootd-5.5.5/src/XrdCl/XrdClJobManager.cc:151
#6  0x00001530a6f84619 in RunRunnerThread (arg=<optimized out>) at /scratch/workspace/canvas-products/label1/swarm/label2/SLF7/build/xrootd/v5_5_5a/source/xrootd-5.5.5/src/XrdCl/XrdClJobManager.cc:34
#7  0x00001530c9007ea5 in start_thread () from /lib64/libpthread.so.0
#8  0x00001530c7efeb0d in clone () from /lib64/libc.so.6

Thread 5 (Thread 0x1530a39fd700 (LWP 974) "hadd"):
#0  0x00001530c900db3b in do_futex_wait.constprop () from /lib64/libpthread.so.0
#1  0x00001530c900dbcf in __new_sem_wait_slow.constprop.0 () from /lib64/libpthread.so.0
#2  0x00001530c900dc6b in sem_wait

GLIBC_2.2.5 () from /lib64/libpthread.so.0
#3  0x00001530a6f84566 in XrdSysSemaphore::Wait (this=0x3f04050) at /scratch/workspace/canvas-products/label1/swarm/label2/SLF7/build/xrootd/v5_5_5a/source/xrootd-5.5.5/src/./XrdSys/XrdSysPthread.hh:509
#4  XrdCl::SyncQueue<XrdCl::JobManager::JobHelper>::Get (this=0x3f26788) at /scratch/workspace/canvas-products/label1/swarm/label2/SLF7/build/xrootd/v5_5_5a/source/xrootd-5.5.5/src/./XrdCl/XrdClSyncQueue.hh:66
#5  XrdCl::JobManager::RunJobs (this=0x3f26770) at /scratch/workspace/canvas-products/label1/swarm/label2/SLF7/build/xrootd/v5_5_5a/source/xrootd-5.5.5/src/XrdCl/XrdClJobManager.cc:151
#6  0x00001530a6f84619 in RunRunnerThread (arg=<optimized out>) at /scratch/workspace/canvas-products/label1/swarm/label2/SLF7/build/xrootd/v5_5_5a/source/xrootd-5.5.5/src/XrdCl/XrdClJobManager.cc:34
#7  0x00001530c9007ea5 in start_thread () from /lib64/libpthread.so.0
#8  0x00001530c7efeb0d in clone () from /lib64/libc.so.6

Thread 4 (Thread 0x1530a37fc700 (LWP 973) "hadd"):
#0  0x00001530c900db3b in do_futex_wait.constprop () from /lib64/libpthread.so.0
#1  0x00001530c900dbcf in __new_sem_wait_slow.constprop.0 () from /lib64/libpthread.so.0
#2  0x00001530c900dc6b in sem_wait

GLIBC_2.2.5 () from /lib64/libpthread.so.0
#3  0x00001530a6f84566 in XrdSysSemaphore::Wait (this=0x3f04050) at /scratch/workspace/canvas-products/label1/swarm/label2/SLF7/build/xrootd/v5_5_5a/source/xrootd-5.5.5/src/./XrdSys/XrdSysPthread.hh:509
#4  XrdCl::SyncQueue<XrdCl::JobManager::JobHelper>::Get (this=0x3f26788) at /scratch/workspace/canvas-products/label1/swarm/label2/SLF7/build/xrootd/v5_5_5a/source/xrootd-5.5.5/src/./XrdCl/XrdClSyncQueue.hh:66
#5  XrdCl::JobManager::RunJobs (this=0x3f26770) at /scratch/workspace/canvas-products/label1/swarm/label2/SLF7/build/xrootd/v5_5_5a/source/xrootd-5.5.5/src/XrdCl/XrdClJobManager.cc:151
#6  0x00001530a6f84619 in RunRunnerThread (arg=<optimized out>) at /scratch/workspace/canvas-products/label1/swarm/label2/SLF7/build/xrootd/v5_5_5a/source/xrootd-5.5.5/src/XrdCl/XrdClJobManager.cc:34
#7  0x00001530c9007ea5 in start_thread () from /lib64/libpthread.so.0
#8  0x00001530c7efeb0d in clone () from /lib64/libc.so.6

Thread 3 (Thread 0x1530a3dff700 (LWP 972) "hadd"):
#0  0x00001530c900ee9d in nanosleep () from /lib64/libpthread.so.0
#1  0x00001530a745646d in XrdSysTimer::Wait (mills=<optimized out>) at /scratch/workspace/canvas-products/label1/swarm/label2/SLF7/build/xrootd/v5_5_5a/source/xrootd-5.5.5/src/XrdSys/XrdSysTimer.cc:239
#2  0x00001530a6f0497d in XrdCl::TaskManager::RunTasks (this=0x37359d0) at /scratch/workspace/canvas-products/label1/swarm/label2/SLF7/build/xrootd/v5_5_5a/source/xrootd-5.5.5/src/XrdCl/XrdClTaskManager.cc:246
#3  0x00001530a6f04a89 in RunRunnerThread (arg=<optimized out>) at /scratch/workspace/canvas-products/label1/swarm/label2/SLF7/build/xrootd/v5_5_5a/source/xrootd-5.5.5/src/XrdCl/XrdClTaskManager.cc:38
#4  0x00001530c9007ea5 in start_thread () from /lib64/libpthread.so.0
#5  0x00001530c7efeb0d in clone () from /lib64/libc.so.6

Thread 2 (Thread 0x1530b41ff700 (LWP 971) "hadd"):
#0  0x00001530c7eff0e3 in epoll_wait () from /lib64/libc.so.6
#1  0x00001530a74507d7 in XrdSys::IOEvents::PollE::Begin (this=0x65f50d0, syncsem=<optimized out>, retcode=<optimized out>, eTxt=<optimized out>) at /scratch/workspace/canvas-products/label1/swarm/label2/SLF7/build/xrootd/v5_5_5a/source/xrootd-5.5.5/src/./XrdSys/XrdSysIOEventsPollE.icc:212
#2  0x00001530a744c905 in XrdSys::IOEvents::BootStrap::Start (parg=0x7ffdc5b97720) at /scratch/workspace/canvas-products/label1/swarm/label2/SLF7/build/xrootd/v5_5_5a/source/xrootd-5.5.5/src/XrdSys/XrdSysIOEvents.cc:149
#3  0x00001530a7455b28 in XrdSysThread_Xeq (myargs=0x65fd250) at /scratch/workspace/canvas-products/label1/swarm/label2/SLF7/build/xrootd/v5_5_5a/source/xrootd-5.5.5/src/XrdSys/XrdSysPthread.cc:86
#4  0x00001530c9007ea5 in start_thread () from /lib64/libpthread.so.0
#5  0x00001530c7efeb0d in clone () from /lib64/libc.so.6

Thread 1 (Thread 0x1530cdafdcc0 (LWP 949) "hadd"):
#0  0x00001530c7ec5659 in waitpid () from /lib64/libc.so.6
#1  0x00001530c7e42f62 in do_system () from /lib64/libc.so.6
#2  0x00001530c7e43311 in system () from /lib64/libc.so.6
#3  0x00001530c96e5ecc in TUnixSystem::Exec (shellcmd=<optimized out>, this=0x168c620) at /scratch/workspace/critic-slf/BUILDTYPE/prof/QUAL/e26/label1/swarm/label2/SLF7/build/root/v6_28_12/source/root-6.28.12/core/unix/src/TUnixSystem.cxx:2104
#4  TUnixSystem::StackTrace (this=0x168c620) at /scratch/workspace/critic-slf/BUILDTYPE/prof/QUAL/e26/label1/swarm/label2/SLF7/build/root/v6_28_12/source/root-6.28.12/core/unix/src/TUnixSystem.cxx:2395
#5  0x00001530c96e5894 in TUnixSystem::DispatchSignals (this=0x168c620, sig=kSigSegmentationViolation) at /scratch/workspace/critic-slf/BUILDTYPE/prof/QUAL/e26/label1/swarm/label2/SLF7/build/root/v6_28_12/source/root-6.28.12/core/unix/src/TUnixSystem.cxx:3615
#6  <signal handler called>
#7  0x00001530c9597237 in (anonymous namespace)::R__ListSlowClose (files=0x1692020) at /scratch/workspace/critic-slf/BUILDTYPE/prof/QUAL/e26/label1/swarm/label2/SLF7/build/root/v6_28_12/source/root-6.28.12/core/base/src/TROOT.cxx:1095
#8  0x00001530c9597f84 in TROOT::CloseFiles (this=0x1530c9a4b240 <ROOT::Internal::GetROOT1()::alloc>) at /scratch/workspace/critic-slf/BUILDTYPE/prof/QUAL/e26/label1/swarm/label2/SLF7/build/root/v6_28_12/source/root-6.28.12/core/base/src/TROOT.cxx:1145
#9  0x00001530c7e39ce9 in __run_exit_handlers () from /lib64/libc.so.6
#10 0x00001530c7e39d37 in exit () from /lib64/libc.so.6
#11 0x00001530c7e2255c in __libc_start_main () from /lib64/libc.so.6
#12 0x0000000000406be9 in _start ()
===========================================================


The lines below might hint at the cause of the crash. If you see question
marks as part of the stack trace, try to recompile with debugging information
enabled and export CLING_DEBUG=1 environment variable before running.
You may get help by asking at the ROOT forum https://root.cern/forum
Only if you are really convinced it is a bug in ROOT then please submit a
report at https://root.cern/bugs Please post the ENTIRE stack trace
from above as an attachment in addition to anything else
that might help us fixing this issue.
===========================================================
#7  0x00001530c9597237 in (anonymous namespace)::R__ListSlowClose (files=0x1692020) at /scratch/workspace/critic-slf/BUILDTYPE/prof/QUAL/e26/label1/swarm/label2/SLF7/build/root/v6_28_12/source/root-6.28.12/core/base/src/TROOT.cxx:1095
#8  0x00001530c9597f84 in TROOT::CloseFiles (this=0x1530c9a4b240 <ROOT::Internal::GetROOT1()::alloc>) at /scratch/workspace/critic-slf/BUILDTYPE/prof/QUAL/e26/label1/swarm/label2/SLF7/build/root/v6_28_12/source/root-6.28.12/core/base/src/TROOT.cxx:1145
#9  0x00001530c7e39ce9 in __run_exit_handlers () from /lib64/libc.so.6
#10 0x00001530c7e39d37 in exit () from /lib64/libc.so.6
#11 0x00001530c7e2255c in __libc_start_main () from /lib64/libc.so.6
#12 0x0000000000406be9 in _start ()
===========================================================



Traceback (most recent call last):
  File "/cvmfs/fifeuser4.opensciencegrid.org/sw/dune/5a837a2f9ce0b916d8725ae4ed0b18872c84fe1f//merge_g4bl.py", line 403, in <module>
    do_merge(args)
  File "/cvmfs/fifeuser4.opensciencegrid.org/sw/dune/5a837a2f9ce0b916d8725ae4ed0b18872c84fe1f//merge_g4bl.py", line 119, in do_merge
    raise Exception('Error in hadd')
Exception: Error in hadd
Exiting with error
justIN time: 2025-08-15 03:00:48 UTC       justIN version: 01.03.02