Hi @andrew and @robertoKin,
I have 2 possible causes in mind:
- There is a bug in the
- There is a data input pipeline bottleneck which makes the computation not enough intensive to be sampled by the System Metrics collector (which samples every 60 second).
A sanity check about GPU real-time utilization and memory usage can be done by following these steps:
1. launch a Jupyter Job
2. then open 2 Terminals, in the first one launch the darknet training, in the other one run
nvidia-smi -l2, in this way you can monitor if your code is using the GPU.
Hope it helps.