PCI_E1 is registered as GPU4 on smOS and whichever GPU I place in there, seems to get hotter than the others and eventually that GPU will stop hashing. After a few minutes of getting 0.00 Mh/s on that gpu, the system will reboot and it will work as normal... then it will stop hashing again and reboot. Sometimes it will work for 10min, sometimes 10 hours.
I've opened all my GPU's and reapplied the thermal paste, all ports are clear and I'm using the latest 008s risers. My room temp is 10c right now but still happening, any ideas?
0 ·
Comments
then its prolly smOs settings or something
there is a bit of info u didnt give just in case:
swapped risers around as well as the GPU? everything cable, card, and 1x pcie
possibly swapped pcie cables and molex from psu too?
I swapped risers, 1x pcie and cables. I didn't swap molex from psu because 1 gpu is powered by 1 psu and the other is powered by another psu. So i ruled out power supply issue.
One thing I seen online is everyone saying you must allocated 16GB of virtual memory for mining. In ubuntu that would be 16Gb of 'Swap" and I've just checked, my swap allocation is 0Gb.. do you think it could be this?
swap in linux is almost completely different than page file
this is not the problem for heat tho
all swap will do is possibly give u more hashrate if for some reason ur running out of physical memory
i would assume at this point its smOS
as there is not much a slot can do by itself to overheat a gpu on a riser
is there a way to change drivers on smOS?
like maybe even for that specific slot?
It's not heat related either, I had the room temp at 10c and left cards off overnight... switched on in morning and it happened within 10minutes of starting to hash, even though temps were under 78c and these cards can run a lot hotter.
I've read online it's due to a 'memory leak' issue but still found no possible way of resolving it
if claymore check the logs for 511C temp error
otherwise ur mem OC might be too high
try to dial that back a lil and see if u gain stability
force rebooting ur rig with -r is just a bandaid and doesnt solve any problems
and could potentially allow a bad problem to become worse
all 10 of my rigs run for 300+ hours before i manually restart them