justIN           Dashboard       Workflows       Jobs       AWT       Sites       Storages       Docs       Login

Jobsub ID 289859.1@justin-prod-sched01.dune.hep.ac.uk

Jobsub ID289859.1@justin-prod-sched01.dune.hep.ac.uk
Workflow ID4103
Stage ID1
User nameamoor@fnal.gov
HTCondor Groupgroup_dune
RequestedProcessors1
RSS bytes8388608000 (8000 MiB)
Wall seconds limit80000 (22 hours)
Submitted time2024-11-14 20:30:07
SiteUK_Edinburgh
EntryDUNE_UK_SGridECDF_ce1_multicore
Last heartbeat2024-11-14 21:29:06
From worker nodeHostnamenode2A06.ecdf.ed.ac.uk
cpuinfoIntel(R) Xeon(R) Gold 6130 CPU @ 2.10GHz
OS releaseScientific Linux release 7.9 (Nitrogen)
Processors1
RSS bytes8388608000 (8000 MiB)
Wall seconds limit171000 (47 hours)
Inner Apptainer?True
Job statejobscript_error
Allocator namejustin-allocator-pro.dune.hep.ac.uk
Started2024-11-14 20:31:52
Input filesusertests:000790_reco_data_2024-11-14T_093429Z.root
JobscriptExit code1
Real time0m (0s)
CPU time0m (0s = 0%)
Outputting started 
Output files
Finished2024-11-14 21:29:06
Saved logsjustin-logs:289859.1-justin-prod-sched01.dune.hep.ac.uk.logs.tgz
List job events     Wrapper job log

Jobscript log (last 10,000 characters)

:08.198786 +0000][Debug  ][PostMaster        ][ 1126] [ccxrootdegee.in2p3.fr:1094] Stream TTL elapsed, disconnecting...
[2024-11-14 20:53:08.199192 +0000][Debug  ][AsyncSock         ][ 1126] [ccxrootdegee.in2p3.fr:1094.0] Closing the socket
[2024-11-14 20:53:08.199293 +0000][Debug  ][Poller            ][ 1126] <[::ffff:172.16.43.7]:38182><--><[::ffff:134.158.209.218]:1094> Removing socket from the poller
[2024-11-14 20:53:08.199522 +0000][Error  ][PostMaster        ][ 1126] [ccxrootdegee.in2p3.fr:1094] Forcing error on disconnect: [ERROR] Operation interrupted.
[2024-11-14 20:53:08.199546 +0000][Debug  ][PostMaster        ][ 1126] [ccxrootdegee.in2p3.fr:1094] Reporting disconnection to queued message handlers.
[2024-11-14 20:53:08.199693 +0000][Debug  ][TaskMgr           ][ 1126] Requesting unregistration of: "TickGeneratorTask for: root://ccxrootdegee.in2p3.fr:1094"
[2024-11-14 20:53:08.199749 +0000][Debug  ][AsyncSock         ][ 1126] [ccxrootdegee.in2p3.fr:1094.0] Closing the socket
[2024-11-14 20:53:08.199772 +0000][Debug  ][PostMaster        ][ 1126] [ccxrootdegee.in2p3.fr:1094] Destroying stream
[2024-11-14 20:53:08.199826 +0000][Debug  ][AsyncSock         ][ 1126] [ccxrootdegee.in2p3.fr:1094.0] Closing the socket
[2024-11-14 20:53:09.173319 +0000][Debug  ][TaskMgr           ][ 1126] Removing task: "TickGeneratorTask for: root://ccxrootdegee.in2p3.fr:1094"
IonAndScint endJob.
[2024-11-14 21:28:05.534728 +0000][Debug  ][File              ][ 1126] [0xa37c980@root://ccxrootdegee.in2p3.fr:1094/pnfs/in2p3.fr/data/dune/disk/usertests/80/cf/000790_reco_data_2024-11-14T_093429Z.root?xrdcl.requuid=306743bb-24ef-4692-a53d-68c4992b2a01] Sending a close command for handle 0x0 to ccdcacli487.in2p3.fr:30288
[2024-11-14 21:28:05.534785 +0000][Debug  ][ExDbgMsg          ][ 1126] [ccdcacli487.in2p3.fr:30288] MsgHandler created: 0xad4abc0 (message: kXR_close (handle: 0x00000000) ).
[2024-11-14 21:28:05.534892 +0000][Debug  ][ExDbgMsg          ][ 1126] [ccdcacli487.in2p3.fr:30288] Moving MsgHandler: 0xad4abc0 (message: kXR_close (handle: 0x00000000) ) from out-queu to in-queue.
[2024-11-14 21:28:05.563907 +0000][Debug  ][ExDbgMsg          ][ 1126] [msg: 0x1958bd50] Assigned MsgHandler: 0xad4abc0.
[2024-11-14 21:28:05.563964 +0000][Debug  ][ExDbgMsg          ][ 1126] [handler: 0xad4abc0] Removed MsgHandler: 0xad4abc0 from the in-queue.
[2024-11-14 21:28:05.564138 +0000][Debug  ][ExDbgMsg          ][ 1126] [ccdcacli487.in2p3.fr:30288] Calling MsgHandler: 0xad4abc0 (message: kXR_close (handle: 0x00000000) ) with status: [SUCCESS] .
[2024-11-14 21:28:05.564249 +0000][Debug  ][File              ][ 1126] [0xa37c980@root://ccxrootdegee.in2p3.fr:1094/pnfs/in2p3.fr/data/dune/disk/usertests/80/cf/000790_reco_data_2024-11-14T_093429Z.root?xrdcl.requuid=306743bb-24ef-4692-a53d-68c4992b2a01] Close returned from ccdcacli487.in2p3.fr:30288 with: [SUCCESS] 
[2024-11-14 21:28:05.564308 +0000][Debug  ][ExDbgMsg          ][ 1126] [ccdcacli487.in2p3.fr:30288] Destroying MsgHandler: 0xad4abc0.
14-Nov-2024 21:28:05 GMT  Closed input file "root://ccxrootdegee.in2p3.fr:1094/pnfs/in2p3.fr/data/dune/disk/usertests/80/cf/000790_reco_data_2024-11-14T_093429Z.root"

================================================================================================================================
TimeTracker printout (sec)                        Min           Avg           Max         Median          RMS         nEvts   
================================================================================================================================
Full event                                     0.013778       42.4802       3093.14      0.264302       349.96         77     
--------------------------------------------------------------------------------------------------------------------------------
source:RootInput(read)                        0.00010713    0.00244185     0.143079     0.000134789    0.0164899       77     
simulate:rns:RandomNumberSaver                1.9534e-05    3.60529e-05   0.000470729   2.8422e-05    5.05447e-05      77     
simulate:largeant:larg4Main                    0.0119716     0.785623       34.867       0.207887        3.956         77     
simulate:IonAndScint:IonAndScint              0.000106653    0.0406385      2.99057     0.000123045     0.3384         77     
simulate:elecDrift:SimDriftElectrons          3.2105e-05     0.430655       31.7285     4.4805e-05      3.59033        77     
simulate:PDFastSim:PDFastSimPAR               5.1476e-05      41.1371       3023.55     6.5449e-05      342.13         77     
[art]:TriggerResults:TriggerResultInserter     9.208e-06    1.21482e-05    3.747e-05    1.0268e-05    4.49607e-06      77     
end_path:out1:RootOutput                       2.671e-06    3.50203e-06   1.8454e-05     3.122e-06    1.90951e-06      77     
end_path:out1:RootOutput(write)               0.000780986    0.0842361     0.753352     0.00333373     0.168548        76     
================================================================================================================================
%MSG-i NuRandomService:  RootOutput:out1@EndJob 14-Nov-2024 21:28:05 GMT  ModuleEndJob

Summary of seeds computed by the NuRandomService
Random policy: 'perEvent'
  algorithm version: EventTimestamp_v1
   Configured value          Last value   ModuleLabel.InstanceName
        (per event)            42050063   IonAndScint.ISCalcAlg
        (per event)           566897688   PDFastSim.photon
        (per event)           212160827   PDFastSim.scinttime
        (per event)           489545914   elecDrift
        (per event)           721211580   largeant

%MSG

====================================================================================================
MemoryTracker summary (base-10 MB units used)

  Peak virtual memory usage (VmPeak)  : 10517.2 MB
  Peak resident set size usage (VmHWM): 8057.83 MB
====================================================================================================

TrigReport ---------- Event summary -------------
TrigReport Events total = 77 passed = 77 failed = 0

TrigReport ---------- Modules in End-path ----------
TrigReport        Run    Success      Error Name
TrigReport         77         77          0 out1

TimeReport ---------- Time summary [sec] -------
TimeReport CPU = 3282.693086 Real = 3296.457006

MemReport  ---------- Memory summary [base-10 MB] ------
MemReport  VmPeak = 10517.2 VmHWM = 8057.83

%MSG-s ArtException:  PostEndJob 14-Nov-2024 21:28:40 GMT ModuleEndJob
---- EventProcessorFailure BEGIN
  EventProcessor: an exception occurred during current event processing
  ---- FatalRootError BEGIN
    Fatal Root Error: TBufferFile::WriteByteCount
    bytecount too large (more than 1073741822)
    ROOT severity: 3000
  ---- FatalRootError END
---- EventProcessorFailure END
---- FatalRootError BEGIN
  Fatal Root Error: TTree::SetEntries
  Tree branches have different numbers of entries, eg EventAuxiliary has 76 entries while art::TriggerResults_TriggerResults__MUSUNGen. has 1000 entries.
  ROOT severity: 2000
---- FatalRootError END
%MSG
Art has completed and will exit with status 1.
[2024-11-14 21:28:40.896464 +0000][Debug  ][JobMgr            ][ 1126] Stopping the job manager...
[2024-11-14 21:28:40.897944 +0000][Debug  ][JobMgr            ][ 1126] Job manager stopped
[2024-11-14 21:28:40.898518 +0000][Debug  ][TaskMgr           ][ 1126] Stopping the task manager...
[2024-11-14 21:28:40.898725 +0000][Debug  ][TaskMgr           ][ 1126] Task manager stopped
[2024-11-14 21:28:40.898737 +0000][Debug  ][Poller            ][ 1126] Stopping the poller...
[2024-11-14 21:28:40.898873 +0000][Debug  ][TaskMgr           ][ 1126] Requesting unregistration of: "TickGeneratorTask for: root://ccdcacli487.in2p3.fr:30288"
[2024-11-14 21:28:40.898894 +0000][Debug  ][AsyncSock         ][ 1126] [ccdcacli487.in2p3.fr:30288.0] Closing the socket
[2024-11-14 21:28:40.898974 +0000][Debug  ][Poller            ][ 1126] <[::ffff:172.16.43.7]:43686><--><[::ffff:134.158.209.235]:30288> Removing socket from the poller
[2024-11-14 21:28:40.899052 +0000][Debug  ][PostMaster        ][ 1126] [ccdcacli487.in2p3.fr:30288] Destroying stream
[2024-11-14 21:28:40.899072 +0000][Debug  ][AsyncSock         ][ 1126] [ccdcacli487.in2p3.fr:30288.0] Closing the socket
=== End last 100 lines of lar log file ===
lar exit code 0
Traceback (most recent call last):
  File "/cvmfs/dune.opensciencegrid.org/products/dune/duneutil/v09_75_03d00/bin/extractor_prod.py", line 434, in <module>
    main()
  File "/cvmfs/dune.opensciencegrid.org/products/dune/duneutil/v09_75_03d00/bin/extractor_prod.py", line 373, in main
    mddict = expSpecificMetadata.getmetadata()
  File "/cvmfs/dune.opensciencegrid.org/products/dune/duneutil/v09_75_03d00/bin/extractor_prod.py", line 344, in getmetadata
    jobt = self.get_job(proc)
  File "/cvmfs/dune.opensciencegrid.org/products/dune/duneutil/v09_75_03d00/bin/extractor_prod.py", line 69, in get_job
    raise RuntimeError('sam_metadata_dumper returned nonzero exit status {}.'.format(rc))
RuntimeError: sam_metadata_dumper returned nonzero exit status 1.
extractor_prod.py exit code 1
Error reading metadata from file: Expecting value: line 1 column 1 (char 0)
pdjson2metadata exit code 1
.:
total 829800
-rw-r--r-- 1 gl05pi6 eddie_users 849067035 Nov 14 21:28 RootOutput-61ba-af72-7d37-826a.root
-rw-r--r-- 1 gl05pi6 eddie_users    617919 Nov 14 21:28 000790_reco_data_2024-11-14T_093429Z_reco_2024-11-14T_203158Z.log
-rw-r--r-- 1 gl05pi6 eddie_users      9962 Nov 14 21:28 jobscript.log
-rw-r--r-- 1 gl05pi6 eddie_users       519 Nov 14 21:28 g4_hist.root
-rw-r--r-- 1 gl05pi6 eddie_users       104 Nov 14 20:31 all-input-dids.txt
-rw-r--r-- 1 gl05pi6 eddie_users         0 Nov 14 21:28 000790_reco_data_2024-11-14T_093429Z_reco_data_2024-11-14T_203158Z.root.ext.json
-rw-r--r-- 1 gl05pi6 eddie_users         0 Nov 14 21:28 000790_reco_data_2024-11-14T_093429Z_reco_data_2024-11-14T_203158Z.root.json
justIN time: 2024-11-22 08:55:27 UTC       justIN version: 01.01.09