REALLY FRUSTRATING! - GPU 7 hangs in OpenCL call

eFiJyeFiJy Member Posts: 7
edited July 2018 in Mining
Hi. I have a rig with 2 Corsair HX1200i PSUs, 1 Asrock H110 Pro Btc+ and 8 Asus Strix Vega 64s.
I am having some troubles with it, more specifically with one card, again and again and again.
I followed the vega mining guides to some extent, except the automatic miner restart in case of power drop and the blockchain drivers.
On Monero or other cryptonight coin, using xmr-stak, I have no problem whatsoever, but I want to mine Ethereum in any case, so it seems that I have a problem.

The last time I did a fresh Win10 install was yesterday.
I saw that some newer drivers were launched on the 19th so I used them, version 18.7.1.
But it doesn't matter what I do, because after a while, the gpu still hangs.
The rig continues to mine, as I set it that way, but I don't want to lose mining hashrate so I need to solve this somehow.

I am giving you my last log file, you can check from 04:36:19:089, as that was when the problems began, and that was about 13 hours later after I started the miner. Last run was about 37 hours long.
You can also find my current normal as well as the problematic gpu overdrive settings attached. I tried to use the same settings but with those, it lasted like 1h or so before having this issue again.
Also, when this happens, the fans remain in the set RPM, but the gpu is doing nothing, and the temperature drops from 56-60 degrees to 25-30 degrees.

I changed like 4 or 5 risers, and checked their connection thoroughly, so this is not the issue.
I keep the cards at 50 degrees Celsius, and the one with the problems at 56, as I thought the fans were the issue and didn't want to let them throttle too much.
I changed 2 or 3 power cables, 2 or 3 molex cables.
I installed all kinds of drivers, after DDUing the cards.
I thought one my PSUs can't run 4 cards, who knows, so I disabled the other 3 and let only this one card run. After 1h45min, the hanging reoccured.
I changed the overdrive settings a gazilion times, but still no result.

It is worth mentioning that one time when the rig was on, not mining, just on, I removed the power cables from the gpu connection and then reinstalled them. After that, the RGB lighting didn't work anymore, as it stays solid white even if I turn the other cards' RGB lighting off or change the colors.

I imagine it is probably a faulty gpu. They are all new, but don't have any warranty, except for 2, which I bought from a retailer. The other 6 were bought from two people.

Could you tell me what could be the problem, as this is getting really frustrating?
Post edited by eFiJy on

Comments

  • asusrigasusrig Member Posts: 141
    I have a rig with 13 GPU cards - seven 1070 ti and six RX580 cards. For some reason out of the blue the rig would crash and I was able to isolate the opencl error issue with GPU2 of the RX580 cards. This GPU2 was actually the 5th RX580 card installed in the upper chassis but for some reason was identified as GPU2 in Claymore / GPUZ / Trixx. I was able to determine which card it was by monitoring the fan speeds - the card with 0 fan speed was the problematic one.

    Guess how I fixed it? In Claymore I isolated the other five RX580 cards via the -di switch. So the .bat script would have this flag: -di02345

    Since GPU2 was giving an opencl error I wanted to see if it could mine on its own without pulling down the whole rig. So I created a second .bat script. GPU2 was called out with the following flag: -di1

    And it seems to work! GPU2 is happily mining on its own and the other 5 cards are mining as a group. For this problematic rig I have three instances of Claymore running:

    Instance 1: Seven 1070 ti cards
    Instance 2: Five RX580 cards
    Instance 3: One RX580 card

    I have no idea why the rig started doing this. Could be a windows update, riser card issue or who knows what? Just glad it works and you might want to try it.
Sign In or Register to comment.