I have a 13 GPU rig that has worked good for months now in my home office. Rig Specs:
7 x 1070 TI EVGA SC mining ZEC. CHIP OC 200 MHZ, MEMORY OC 700 MHZ
6 x RX580 SAPPHIRE NITRO+ 8GB mining ETH. BIOS mod.
2 x 1000W EVGA Gold rated PSU
2 x 750W EVGA Gold rated PSU
Asus Mining Expert MB w/ 512 SSD and 8 GB DDR4
Windows 10
I have two APC UPS backups connected to this rig. Each UPS is rated up to 865W. I have the PSUs paired in two groups with a 1000W + 750W powering the Nvidia cards and a 1000W + 750W powering the AMD cards. The Nvidia cards and system is using about 865 watts and the AMD cards are also using about 865 watts.
Earlier today I noticed the LED of the UPS powering the Nvidia cards come on and a timer below showing how many minutes were left indicating power loss. The minutes kept increasing and I realized the cards were not drawing as much power. I could not RDP into the rig and Claymore Manager confirmed that the AMD cards were also offline. The fans of the AMD cards were still spinning but the fans of the Nvidia cards stopped spinning. All case fans and CPU fans were spinning. After about a minute the rig shutdown on its own.
I turned the rig back on and it was able to mine again for about 2 hours before the same process repeated. I turned it on again and this time reduced the CHIP OC of the Nvidia cards to 150 MHZ and the memory down to 650 MHZ. However as mentioned above the rig was working fine for months with the higher overclocks. Reducing the values did not impact performance much with the ZEC miner so I might do this with my other rigs to reduce stress on the cards.
Another thing I did this time was disable ECO mode on all the power supplies (stabbing in the dark, not sure if that is the culprit). There is an ECO switch on each PSU. If it is set to on the PSU fan will not come on until the temperature of the PSU has reached I believe 55 C.
So far the system has been running fine for about 1/2 hour since these changes but only time will tell.
Anyone have an idea why the rig is displaying this behavior? I looked at System Events but really could not figure out what the reason was for the crash and eventual shutdown since a monitor is not hoked up to the system. Temperatures appear to be fine in the room and the cards are not running hot.
0 ·
Comments
I was sitting next to the rig and suddenly both UPS devices came on, as if there was a power outage. Each UPS is on a different 15A circuit, one which is grounded (1070 ti cards) and another which is not grounded (RX580 cards). This part of the house has older 15a rated wiring so the rig is powered from two circuits to reduce load. I have a Zero Surge device plugged into the ungrounded circuit to provide surge protection. Without it the UPS would not be able to provide surge protection since grounding is required for this.
https://zerosurge.com/plug-in-products-solutions/
The rig stayed on but the fans started spinning very fast and were loud. I think it was the RX580 fans but not positive because HWinfo did not report faster RPMS. Maybe it was the PSU fans. I logged on to the rig and Claymore was still mining with the RX580 cards.
The 1070 ti cards were also mining ZEC with the DSTM miner however three of the cards failed in the miner and when I tried to check their stats with HWinfo they were not listed. They did show up in Device Manager and ASUS Mining Manager. I checked Afterburner and those three cards would not report any fan speeds / power / memory settings and they were unlinked from the other four cards. I rebooted the machine and they were still appearing in Device Manager but Afterburner and HWinfo would still not recognize their sensors.
I decided to download the latest Nvidia driver and did a clean installation which first removed everything and after a reboot none of the Nvidia cards were listed in Device Manager. I did another clean installation and after it rebooted everything is working properly again.
I really don't know what caused this latest incident. The UPS Backups are maxed out in terms of load since they can only provide protection for up to 865 watts each and I am using every bit of that. According to the APC monitoring software the rigs can stay powered on for 3 minutes when mining which will be enough for mini power outages of split seconds or a few seconds.
Also, I would never run a non-grounded power supply, especially with a properly grounded one in the mix. Your ground should be a reference for all of your 0's and also a drain wire contingency. If you have different potentials to ground on two sources, you will typically find a constant flow on your ground wire, which is not good.
Clean power is the basis for any well operating machine!
This thread proves that you have issues.
And maybe the reason is faulty PSU.
Start to check all power cable connectors, where they connected to risers and PSUs. Some of 12V rails can be damaged by overheating. Be very careful - EVGA cables are all black, so it's often not easy to find melted or damaged wire at first sight. If you using SATA-to-Molex cables, check them too. Yellow wire is 12V rail.
I have experienced a similar issue in the past and just removed all the cables and reattached. In this case the everything worked fine.
If you have a good PSU then they tend to shutdown if there is a bad connection, which is good.
First step, and this will be a pain, remove all the cables and reattach them.
So far it is working out though and the rig has not shut down again.
For the two rigs in the basement I installed two 20a grounded circuits in surface mounted conduit a few months ago. These have GFCI breakers (code requirement for unfinished basements). Those have also worked fine however high humidity will trip the breakers so I have dehumidifiers on the side where the rigs are.
I have six of these UPS Backups connected to three rigs:
http://www.apc.com/shop/us/en/products/APC-Power-Saving-Back-UPS-Pro-1500/P-BR1500G
Two days ago the two power supplies that power the RX580 cards for one rig (on grounded 20a circuit) in the basement were off and so were the cards. For that rig I have these two power supplies plugged into a APC 1500 VA and a slightly smaller capacity Cyberpower UPS. On the lower case of the rig I have seven 1070 ti cards powered by two more power supplies connected to another 1500 VA APC UPS. These cards were still on. So I rebooted the rig and all rigs have been working fine since.
I shut it off and went to the electrical panel and flipped the two gfci breakers for the two rigs in the basement. The UPS backups for the rig still running kicked in however the one powering the RX580 cards of that rig was humming and would not stop once I flipped the breaker back on again. After a minute the entire rig also shutdown leaving both rigs off in the basement.
Since the RX580 cards are drawing around 865 watts which is the max of the APC units the backups cannot really be effective in case of a small power outage. So I may add a few more backups to each rig to spread the load. A Tesla wall would be great but too expensive.
We don't get too many power outages here to justify going crazy with batteries and generators etc.'
The rigs are all running and the moment without issues.