justIN           Dashboard       Workflows       Jobs       AWT       Sites       Storages       Docs       Login

Jobsub ID 168046.1@justin-prod-sched02.dune.hep.ac.uk

Jobsub ID168046.1@justin-prod-sched02.dune.hep.ac.uk
Workflow ID5815
Stage ID1
User namemhandley@fnal.gov
HTCondor Groupgroup_dune
RequestedProcessors1
GPUNo
RSS bytes4194304000 (4000 MiB)
Wall seconds limit80000 (22 hours)
Submitted time2025-04-01 00:51:21
SiteUK_QMUL
EntryDUNE_UK_London_QMUL_arcce02
Last heartbeat2025-04-01 01:30:48
From worker nodeHostnamecn019.htc.esc.qmul
cpuinfoIntel(R) Xeon(R) Gold 6248R CPU @ 3.00GHz
OS releaseScientific Linux release 7.9 (Nitrogen)
Processors1
RSS bytes4194304000 (4000 MiB)
Wall seconds limit171000 (47 hours)
GPU
Inner Apptainer?True
Job statejobscript_error
Allocator namejustin-allocator-pro.dune.hep.ac.uk
Started2025-04-01 00:52:05
Input filesjustin-tutorial:tut_np02bde_307160012_np02_bde_coldbox_run012352_0053_20211216T000148.hdf5
JobscriptExit code1
Real time0m (0s)
CPU time0m (0s = 0%)
Max RSS bytes0 (0 MiB)
Outputting started 
Output files
Finished2025-04-01 01:30:48
Saved logsjustin-logs:168046.1-justin-prod-sched02.dune.hep.ac.uk.logs.tgz
List job events     Wrapper job log

Jobscript log (last 10,000 characters)

e 1196 in H5B_iterate(): B-tree iteration failed
    major: B-Tree node
    minor: Iteration failed
  #007: /scratch/workspace/build-single/BUILDTYPE/prof/QUAL/e20/label1/swarm/label2/SLF7/build/hdf5/v1_12_2a/source/hdf5-1.12.2/src/H5B.c line 1155 in H5B__iterate_helper(): B-tree iteration failed
    major: B-Tree node
    minor: Iteration failed
  #008: /scratch/workspace/build-single/BUILDTYPE/prof/QUAL/e20/label1/swarm/label2/SLF7/build/hdf5/v1_12_2a/source/hdf5-1.12.2/src/H5B.c line 1155 in H5B__iterate_helper(): B-tree iteration failed
    major: B-Tree node
    minor: Iteration failed
  #009: /scratch/workspace/build-single/BUILDTYPE/prof/QUAL/e20/label1/swarm/label2/SLF7/build/hdf5/v1_12_2a/source/hdf5-1.12.2/src/H5Gnode.c line 1018 in H5G__node_sumup(): unable to load symbol table node
    major: Symbol table
    minor: Unable to load metadata into cache
  #010: /scratch/workspace/build-single/BUILDTYPE/prof/QUAL/e20/label1/swarm/label2/SLF7/build/hdf5/v1_12_2a/source/hdf5-1.12.2/src/H5AC.c line 1426 in H5AC_protect(): H5C_protect() failed
    major: Object cache
    minor: Unable to protect metadata
  #011: /scratch/workspace/build-single/BUILDTYPE/prof/QUAL/e20/label1/swarm/label2/SLF7/build/hdf5/v1_12_2a/source/hdf5-1.12.2/src/H5C.c line 2370 in H5C_protect(): can't load entry
    major: Object cache
    minor: Unable to load metadata into cache
  #012: /scratch/workspace/build-single/BUILDTYPE/prof/QUAL/e20/label1/swarm/label2/SLF7/build/hdf5/v1_12_2a/source/hdf5-1.12.2/src/H5C.c line 7209 in H5C__load_entry(): Can't read image*
    major: Object cache
    minor: Read failed
  #013: /scratch/workspace/build-single/BUILDTYPE/prof/QUAL/e20/label1/swarm/label2/SLF7/build/hdf5/v1_12_2a/source/hdf5-1.12.2/src/H5Fio.c line 148 in H5F_block_read(): read through page buffer failed
    major: Low-level I/O
    minor: Read failed
  #014: /scratch/workspace/build-single/BUILDTYPE/prof/QUAL/e20/label1/swarm/label2/SLF7/build/hdf5/v1_12_2a/source/hdf5-1.12.2/src/H5PB.c line 721 in H5PB_read(): read through metadata accumulator failed
    major: Page Buffering
    minor: Read failed
  #015: /scratch/workspace/build-single/BUILDTYPE/prof/QUAL/e20/label1/swarm/label2/SLF7/build/hdf5/v1_12_2a/source/hdf5-1.12.2/src/H5Faccum.c line 202 in H5F__accum_read(): driver read request failed
    major: Low-level I/O
    minor: Read failed
  #016: /scratch/workspace/build-single/BUILDTYPE/prof/QUAL/e20/label1/swarm/label2/SLF7/build/hdf5/v1_12_2a/source/hdf5-1.12.2/src/H5FDint.c line 189 in H5FD_read(): driver read request failed
    major: Virtual File Layer
    minor: Read failed
  #017: /scratch/workspace/build-single/BUILDTYPE/prof/QUAL/e20/label1/swarm/label2/SLF7/build/hdf5/v1_12_2a/source/hdf5-1.12.2/src/H5FDsec2.c line 755 in H5FD__sec2_read(): file read failed: time = Tue Apr  1 02:27:07 2025
, filename = 'root://meitner.tier2.hep.manchester.ac.uk:1094//cephfs/experiments/dune/RSE/justin-tutorial/53/85/tut_np02bde_307160012_np02_bde_coldbox_run012352_0053_20211216T000148.hdf5', file descriptor = 26, errno = 116, error message = 'Stale file handle', buf = 0xdb981f0, total read size = 328, bytes this sub-read = 328, bytes actually read = 18446744073709551615, offset = 0
    major: Low-level I/O
    minor: Read failed
HDF5-DIAG: Error detected in HDF5 (1.12.2) thread 0:
  #000: /scratch/workspace/build-single/BUILDTYPE/prof/QUAL/e20/label1/swarm/label2/SLF7/build/hdf5/v1_12_2a/source/hdf5-1.12.2/src/H5F.c line 711 in H5Fclose(): decrementing file ID failed
    major: File accessibility
    minor: Unable to close file
  #001: /scratch/workspace/build-single/BUILDTYPE/prof/QUAL/e20/label1/swarm/label2/SLF7/build/hdf5/v1_12_2a/source/hdf5-1.12.2/src/H5Iint.c line 1018 in H5I_dec_app_ref(): can't decrement ID ref count
    major: Object atom
    minor: Unable to decrement reference count
  #002: /scratch/workspace/build-single/BUILDTYPE/prof/QUAL/e20/label1/swarm/label2/SLF7/build/hdf5/v1_12_2a/source/hdf5-1.12.2/src/H5Fint.c line 251 in H5F__close_cb(): unable to close file
    major: File accessibility
    minor: Unable to close file
  #003: /scratch/workspace/build-single/BUILDTYPE/prof/QUAL/e20/label1/swarm/label2/SLF7/build/hdf5/v1_12_2a/source/hdf5-1.12.2/src/H5VLcallback.c line 3983 in H5VL_file_close(): file close failed
    major: Virtual Object Layer
    minor: Unable to close file
  #004: /scratch/workspace/build-single/BUILDTYPE/prof/QUAL/e20/label1/swarm/label2/SLF7/build/hdf5/v1_12_2a/source/hdf5-1.12.2/src/H5VLcallback.c line 3952 in H5VL__file_close(): file close failed
    major: Virtual Object Layer
    minor: Unable to close file
  #005: /scratch/workspace/build-single/BUILDTYPE/prof/QUAL/e20/label1/swarm/label2/SLF7/build/hdf5/v1_12_2a/source/hdf5-1.12.2/src/H5VLnative_file.c line 838 in H5VL__native_file_close(): can't close file
    major: File accessibility
    minor: Unable to decrement reference count
  #006: /scratch/workspace/build-single/BUILDTYPE/prof/QUAL/e20/label1/swarm/label2/SLF7/build/hdf5/v1_12_2a/source/hdf5-1.12.2/src/H5Fint.c line 2349 in H5F__close(): can't close file
    major: File accessibility
    minor: Unable to close file
  #007: /scratch/workspace/build-single/BUILDTYPE/prof/QUAL/e20/label1/swarm/label2/SLF7/build/hdf5/v1_12_2a/source/hdf5-1.12.2/src/H5Fint.c line 2522 in H5F_try_close(): problems closing file
    major: File accessibility
    minor: Unable to close file
  #008: /scratch/workspace/build-single/BUILDTYPE/prof/QUAL/e20/label1/swarm/label2/SLF7/build/hdf5/v1_12_2a/source/hdf5-1.12.2/src/H5Fint.c line 1605 in H5F__dest(): unable to close file
    major: File accessibility
    minor: Unable to close file
  #009: /scratch/workspace/build-single/BUILDTYPE/prof/QUAL/e20/label1/swarm/label2/SLF7/build/hdf5/v1_12_2a/source/hdf5-1.12.2/src/H5FD.c line 830 in H5FD_close(): close failed
    major: Virtual File Layer
    minor: Unable to close file
  #010: /scratch/workspace/build-single/BUILDTYPE/prof/QUAL/e20/label1/swarm/label2/SLF7/build/hdf5/v1_12_2a/source/hdf5-1.12.2/src/H5FDsec2.c line 456 in H5FD__sec2_close(): unable to close file, errno = 110, error message = 'Connection timed out'
    major: Low-level I/O
    minor: Unable to close file
DataPrepByApaModule::endJob: # events processed: 0
DataPrepByApaModule::endJob:   # events skipped: 0

====================================================================================================================
TimeTracker printout (sec)            Min           Avg           Max         Median          RMS         nEvts   
====================================================================================================================
[ No processed events ]
====================================================================================================================

====================================================================================================
MemoryTracker summary (base-10 MB units used)

  Peak virtual memory usage (VmPeak)  : 2728.59 MB
  Peak resident set size usage (VmHWM): 1717.85 MB
  Details saved in: 'mem.db'
====================================================================================================
PandoraMonitoring, only able to use default TApplication (limited functionality).
PandoraMonitoring::SaveTree, error: No tree with name 'Validation' exists.
ToolBasedRawDigitPrepService:dtor: Event count: 0
ToolBasedRawDigitPrepService:dtor:  Call count: 0
ToolBasedRawDigitPrepService:dtor: Time report for 7 tools.
ToolBasedRawDigitPrepService:dtor: digitReader                   :0.00    sec
ToolBasedRawDigitPrepService:dtor: vdcb_adcChannelRawRmsFiller   :0.00    sec
ToolBasedRawDigitPrepService:dtor: adcSampleFiller               :0.00    sec
ToolBasedRawDigitPrepService:dtor: vdbcb_adcScaleAdcToKe         :0.00    sec
ToolBasedRawDigitPrepService:dtor: vdbcb_cnrw                    :0.00    sec
ToolBasedRawDigitPrepService:dtor: adcKeepAllSignalFinder        :0.00    sec
ToolBasedRawDigitPrepService:dtor: vdbcb_adcScaleKeToAdc         :0.00    sec
=== End last 100 lines of lar log file ===
lar exit code 0
Traceback (most recent call last):
  File "/cvmfs/dune.opensciencegrid.org/products/dune/duneutil/v09_75_00d00/bin/extractor_prod.py", line 434, in <module>
    main()
  File "/cvmfs/dune.opensciencegrid.org/products/dune/duneutil/v09_75_00d00/bin/extractor_prod.py", line 373, in main
    mddict = expSpecificMetadata.getmetadata()
  File "/cvmfs/dune.opensciencegrid.org/products/dune/duneutil/v09_75_00d00/bin/extractor_prod.py", line 344, in getmetadata
    jobt = self.get_job(proc)
  File "/cvmfs/dune.opensciencegrid.org/products/dune/duneutil/v09_75_00d00/bin/extractor_prod.py", line 69, in get_job
    raise RuntimeError('sam_metadata_dumper returned nonzero exit status {}.'.format(rc))
RuntimeError: sam_metadata_dumper returned nonzero exit status 1.
extractor_prod.py exit code 1
Error reading metadata from file: Expecting value: line 1 column 1 (char 0)
pdjson2metadata exit code 1
.:
total 108
-rw-r--r-- 1 pildune22 pildune 36864 Apr  1 02:28 mem.db
-rw-r--r-- 1 pildune22 pildune 33250 Apr  1 02:28 tut_np02bde_307160012_np02_bde_coldbox_run012352_0053_20211216T000148_reco_2025-04-01T_005212Z.log
-rw-r--r-- 1 pildune22 pildune 16384 Apr  1 01:56 time.db
-rw-r--r-- 1 pildune22 pildune  9829 Apr  1 02:28 jobscript.log
-rw-r--r-- 1 pildune22 pildune   519 Apr  1 02:28 tut_np02bde_307160012_np02_bde_coldbox_run012352_0053_20211216T000148_reco_hist.root
-rw-r--r-- 1 pildune22 pildune   182 Apr  1 01:52 all-input-dids.txt
-rw-r--r-- 1 pildune22 pildune     0 Apr  1 01:57 Pandora_Events.pndr
-rw-r--r-- 1 pildune22 pildune     0 Apr  1 01:54 debugprod.log
-rw-r--r-- 1 pildune22 pildune     0 Apr  1 02:28 tut_np02bde_307160012_np02_bde_coldbox_run012352_0053_20211216T000148_reco_data_2025-04-01T_005212Z.root.ext.json
-rw-r--r-- 1 pildune22 pildune     0 Apr  1 02:28 tut_np02bde_307160012_np02_bde_coldbox_run012352_0053_20211216T000148_reco_data_2025-04-01T_005212Z.root.json
justIN time: 2025-04-03 08:15:35 UTC       justIN version: 01.03.00