Hi Evan,
The training stopped in half of the potential time, raising a GPU related error.
This doesn't seem a machine or GPU error, but a bug in the code you are running when handling multi-threading. Did you check if this error was reported in the Github repo?
Hope that helps.