observed with CMSSW_16_1_0_pre4:
if I add a simple kernel in RecoTracker/LST/plugins/alpaka/LSTPixelHitsKernels.dev.cc and a module calling it RecoTracker/LST/plugins/alpaka/LSTPixelHitsFromSoAProducer.cc to the existing packages
we end up with duplicate registration symbols __cudaRegisterLinkedBinary_*LSTEvent_dev_cc*
one in pluginRecoTrackerLSTPluginsPortableCudaAsync.so and one in libRecoTrackerLSTCoreCudaAsync.so.
The former also has the new kernel registration in __cudaRegisterLinkedBinary_*_LSTPixelHitsKernels_dev_cc_*.
Due to the LSTEvent_dev_cc symbol duplication the kernel function registration is eventually missed for _LSTPixelHitsKernels_dev_cc_
This eventually leads to a runtime error CUDA_ERROR_INVALID_HANDLE
To reproduce, this branch can be used:
HSF-India-Pune-2026-LST:CMSSW_16_1_0_pre4_LST_pixelDirect-task5-pixsoa2soa
https://github.com/HSF-India-Pune-2026-LST/cmssw/pull/17/changes
Runtime is not necessarily required; after compilation the following indicates a problem
nm -C -l tmp/el9_amd64_gcc13/src/RecoTracker/LSTCore/src/alpaka/RecoTrackerLSTCoreCudaAsync/RecoTrackerLSTCoreCudaAsync_cudadlink.o | grep _LSTEvent_dev_cc_
0000000000000010 T __cudaRegisterLinkedBinary_fe01fecd_15_LSTEvent_dev_cc_75f6ad43_4100264 RecoTrackerLSTCoreCudaAsync_cudadlink.reg.c:2
nm -C -l tmp/el9_amd64_gcc13/src/RecoTracker/LST/plugins/RecoTrackerLSTPluginsPortableCudaAsync/RecoTrackerLSTPluginsPortableCudaAsync_cudadlink.o | grep _LSTEvent_dev_cc_
00000000000000e0 T __cudaRegisterLinkedBinary_fe01fecd_15_LSTEvent_dev_cc_75f6ad43_4100264 RecoTrackerLSTPluginsPortableCudaAsync_cudadlink.reg.c:3
One solution that I could come up with is to drop the present dependency on LSTEvent_dev_cc in the RecoTracker/LST/plugins/alpaka dlink step. Here I manually edited the command from nvcc -dlink ... -lRecoTrackerLSTCoreCudaAsync_nv ...LSTPixelHitsKernels.dev.cc.o -o ...RecoTrackerLSTPluginsPortableCudaAsync_cudadlink.o to drop the -lRecoTrackerLSTCoreCudaAsync_nv.
The first direct evidence that this worked is the __cudaRegisterLinkedBinary_*LSTEvent_dev_cc* does not appear anymore in RecoTrackerLSTPluginsPortableCudaAsync_cudadlink.o
The runtime is also OK.
The question is: do we really need the dependent cudadlink.o to know about dev.cc/fatbins in another library? If not, these extra -lRecoTrackerLSTCoreCudaAsync_nv dependencies during dlink could be safely removed.
more notes in the mattermost
observed with CMSSW_16_1_0_pre4:
if I add a simple kernel in RecoTracker/LST/plugins/alpaka/LSTPixelHitsKernels.dev.cc and a module calling it RecoTracker/LST/plugins/alpaka/LSTPixelHitsFromSoAProducer.cc to the existing packages
we end up with duplicate registration symbols
__cudaRegisterLinkedBinary_*LSTEvent_dev_cc*one in pluginRecoTrackerLSTPluginsPortableCudaAsync.so and one in libRecoTrackerLSTCoreCudaAsync.so.
The former also has the new kernel registration in
__cudaRegisterLinkedBinary_*_LSTPixelHitsKernels_dev_cc_*.Due to the
LSTEvent_dev_ccsymbol duplication the kernel function registration is eventually missed for_LSTPixelHitsKernels_dev_cc_This eventually leads to a runtime error
CUDA_ERROR_INVALID_HANDLETo reproduce, this branch can be used:
HSF-India-Pune-2026-LST:CMSSW_16_1_0_pre4_LST_pixelDirect-task5-pixsoa2soa
https://github.com/HSF-India-Pune-2026-LST/cmssw/pull/17/changes
Runtime is not necessarily required; after compilation the following indicates a problem
One solution that I could come up with is to drop the present dependency on
LSTEvent_dev_ccin theRecoTracker/LST/plugins/alpakadlink step. Here I manually edited the command fromnvcc -dlink ... -lRecoTrackerLSTCoreCudaAsync_nv ...LSTPixelHitsKernels.dev.cc.o -o ...RecoTrackerLSTPluginsPortableCudaAsync_cudadlink.oto drop the-lRecoTrackerLSTCoreCudaAsync_nv.The first direct evidence that this worked is the
__cudaRegisterLinkedBinary_*LSTEvent_dev_cc*does not appear anymore inRecoTrackerLSTPluginsPortableCudaAsync_cudadlink.oThe runtime is also OK.
The question is: do we really need the dependent
cudadlink.oto know about dev.cc/fatbins in another library? If not, these extra-lRecoTrackerLSTCoreCudaAsync_nvdependencies during dlink could be safely removed.more notes in the mattermost