I've run out of ideas - freezing on watchdog restart

Sequoia93Sequoia93 Posts: 129Member
I've been having an issue with my 480 rig where after 1-12 hours of mining, it is restarted by watchdog in Claymore. But then upon restart it freezes the whole computer. I'm left with two logs, the first one ending before the restart:

02:29:20:913 11e4 GPU0 t=81C fan=93%, GPU1 t=49C fan=25%, GPU2 t=77C fan=47%, GPU3 t=77C fan=43%, GPU4 t=76C fan=35%, GPU5 t=76C fan=33%
02:29:20:913 11e4 em hbt: 0, dm hbt: 0, fm hbt: 47,
02:29:20:913 11e4 watchdog - thread 0, hb time 79719
02:29:20:929 11e4 WATCHDOG: GPU 0 hangs in OpenCL call, exit
02:29:20:929 11e4 watchdog - thread 1, hb time 79609
02:29:20:929 11e4 WATCHDOG: GPU 0 hangs in OpenCL call, exit
02:29:20:944 11e4 watchdog - thread 2, hb time 31
02:29:20:944 11e4 watchdog - thread 3, hb time 141
02:29:20:944 11e4 watchdog - thread 4, hb time 203
02:29:20:944 11e4 watchdog - thread 5, hb time 78
02:29:20:960 11e4 watchdog - thread 6, hb time 141
02:29:20:960 11e4 watchdog - thread 7, hb time 16
02:29:20:960 11e4 watchdog - thread 8, hb time 31
02:29:20:960 11e4 watchdog - thread 9, hb time 141
02:29:20:976 11e4 watchdog - thread 10, hb time 78
02:29:20:976 11e4 watchdog - thread 11, hb time 203
02:29:21:991 11e4 Restarting OK, exit...


Then the next log begins before freezing:

02:29:24:679 fbc args: -epool eth-us.dwarfpool.com:8008 -ewal 0x7d76e2b7885Cb543f151931b439CBe83b171B620 -epsw x -dpool sia-us-east1.nanopool.org:7777 -dwal ae47465a42f37af677c9c136690ba8e9978356a4b87958134b9c6093793d8fd0cdb170756652 -dcoin sc -dpsw x -dcri 22
02:29:24:679 fbc
02:29:24:679 fbc ÉÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍ»
02:29:24:742 fbc º Claymore's Dual ETH + DCR/SC GPU Miner v6.4 Beta º
02:29:24:742 fbc ÈÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍͼ
02:29:24:742 fbc
02:29:24:960 fbc ETH: 1 pool is specified
02:29:25:132 fbc Main Ethereum pool is eth-us.dwarfpool.com:8008
02:29:25:132 fbc SC: 1 pool is specified
02:29:25:148 fbc Main Siacoin pool is sia-us-east1.nanopool.org:7777
02:29:33:816 fbc OpenCL platform: AMD Accelerated Parallel Processing
02:29:33:869 fbc OpenCL initializing...

02:29:33:892 fbc AMD Cards available: 6
02:29:33:909 fbc GPU #0: Ellesmere, 8192 MB available, 36 compute units
02:29:33:913 fbc GPU #0 recognized as Radeon RX 480
02:29:33:917 fbc GPU #1: Ellesmere, 8192 MB available, 36 compute units
02:29:33:920 fbc GPU #1 recognized as Radeon RX 480
02:29:33:924 fbc GPU #2: Ellesmere, 8192 MB available, 36 compute units
02:29:33:928 fbc GPU #2 recognized as Radeon RX 480
02:29:33:932 fbc GPU #3: Hawaii, 4096 MB available, 40 compute units
02:29:33:936 fbc GPU #3 recognized as Radeon 290
02:29:33:940 fbc GPU #4: Ellesmere, 8192 MB available, 36 compute units
02:29:33:944 fbc GPU #4 recognized as Radeon RX 480
02:29:33:948 fbc GPU #5: Ellesmere, 8192 MB available, 36 compute units
02:29:33:973 fbc GPU #5 recognized as Radeon RX 480
02:29:33:978 fbc POOL/SOLO version
02:29:33:981 fbc b214
02:29:34:015 fbc start building OpenCL program...


Now I've tried Claymore 6.4, 7.0, and 7.1. I've used drivers 16.9.1 and 16.9.2. I've set all my clocks/voltages to safe numbers. And I've totally reinstalled windows 10. But it keeps occurring! Then very oddly last night, my 370 rig, which has been running nonstop for the past week, wan into the same issue! And I've never had that issue with it before. It might be totally unrelated, but its like watchdog is f@#*ing haunting me. Has anyone else had/solved this issue? I'm at a loss here...

Comments

  • Jotun70Jotun70 Posts: 107Member ✭✭
    edited October 2016
    Set watchdog to 0 and see if you are dropping a card... If you are, figure out why that card is dropping (OC, bad cable/connection/slot, etc).

    EDIT: FYI, with Watchdog set to 0, your rig will keep running even if you drop a card. In my experience it has made it easy to identify certain issues.
  • Marvell9Marvell9 Posts: 592Member ✭✭✭
    Jotun70 said:

    Set watchdog to 0 and see if you are dropping a card... If you are, figure out why that card is dropping (OC, bad cable/connection/slot, etc).

    EDIT: FYI, with Watchdog set to 0, your rig will keep running even if you drop a card. In my experience it has made it easy to identify certain issues.

    how do you do that on the command line
  • Jotun70Jotun70 Posts: 107Member ✭✭
    Marvell9 said:

    Jotun70 said:

    Set watchdog to 0 and see if you are dropping a card... If you are, figure out why that card is dropping (OC, bad cable/connection/slot, etc).

    EDIT: FYI, with Watchdog set to 0, your rig will keep running even if you drop a card. In my experience it has made it easy to identify certain issues.

    how do you do that on the command line
    Check out the readme file... Pretty sure its just:

    WD 0
  • CalivetCalivet Posts: 194Member ✭✭
    its says it right there in your log. GPU 0 hangs in open cl. Look at your temp for GPU 0 also. Its at 81c fan speed at 93%. You might want to try the command lines -gser 2 and the command -eres. These two options will help with stability.
  • Jotun70Jotun70 Posts: 107Member ✭✭
    Calivet said:

    its says it right there in your log. GPU 0 hangs in open cl. Look at your temp for GPU 0 also. Its at 81c fan speed at 93%. You might want to try the command lines -gser 2 and the command -eres. These two options will help with stability.

    That's a good point. 81c with 93% fan speed is too high. Should be chillin' those cards in the 70s... I have all of mine running at 70c or below now. Lower temps will help with stability. Looks like some undervolting/underclocking is needed here.
  • Sequoia93Sequoia93 Posts: 129Member
    @Calivet @Jotun70 That is not actually GPU0. That is the single 290 card I have in the rig. It reads as GPU0 in Claymore when reporting temp/fan but is actually GPU3. The 290 runs at 78-83 degrees no problem. Also, the problem is not that a card hangs (that is its own problem) the problem is that when a card does hang, claymore freezes up after restarting itself.

    I have also tried -gser and -eres without luck. I'm having trouble remembering if this issue was present when I was running driver 16.8.2, so I just installed that and am giving it a go. But If is an issue with 16.9.1/16.9.2 I'm sure many of us would know about it.
  • CalivetCalivet Posts: 194Member ✭✭
    When it says GPU hangs in open CL that is whats causing watchdog to restart your mining program. I would suggest a clean uninstall of all your drivers with DDU (in safe mode) and then reinstall with all cards connected. Usually when you run a rig with different card types you will have this issue. Might be a stupid question to ask you but have you set the virtual memory to at least 16gb. When you try the -gser option is it with -gser 2? Your logs will definitely give you clues to what is going on.
  • Jotun70Jotun70 Posts: 107Member ✭✭
    Have you tried setting WD so that it runs a batch file to restart the whole rig when there is an issue? If you're just restarting the miner, it will freeze every time.

    Also, just a heads up, I haven't had stability issues in ages... but when I did, I HAD to use WD 0. Restarting = freeze even if I restarted the whole rig. I rather just one card drop and the thing keep running instead of having nothing running. Was just easier that way.... I'd manually restart it later when I got home.
  • Sequoia93Sequoia93 Posts: 129Member
    @Calivet I have reinstalled drivers several times (using DDU), my virtual mem is set to 20gb, and i;ve tried gser 1 and gser2

    @Jotun70 Mind giving me an example of such a batch file? That sounds like a great idea.

    And I don't have the time right now, but I'll set WD 0 tonight.

    Thanks for all the help!
  • Jotun70Jotun70 Posts: 107Member ✭✭
    edited October 2016
    I don't have the batch file anymore, but its all there in the readme! :) ...

    -wd watchdog option. Default value is "-wd 1", it enables watchdog, miner will be closed (or restarted, see "-r" option) if any thread is not responding for 1 minute or OpenCL call failed.
    Specify "-wd 0" to disable watchdog.

    -r Restart miner mode. "-r 0" (default) - restart miner if something wrong with GPU. "-r -1" - disable automatic restarting. -r >20 - restart miner if something
    wrong with GPU or by timer. For example, "-r 60" - restart miner every hour or when some GPU failed.
    "-r 1" closes miner and execute "reboot.bat" file ("reboot.bash" or "reboot.sh" for Linux version) in the miner directory (if exists) if some GPU failed.
    So you can create "reboot.bat" file and perform some actions, for example, reboot system if you put this line there: "shutdown /r /t 5 /f".


    EDIT: Basically, create a reboot.bat file (same folder as miner) and put the command example shown above.... Make sure WD is set to -r 1 in your claymore start bat.
  • muzzy124muzzy124 Posts: 78Member
    use -r 1
    "shutdown /r /f /t 0" in reboot.bat for immediate reboot, i saw such a behaviour in a rig if i rebooted with a little delay (e.g. "/t 5")
  • CalivetCalivet Posts: 194Member ✭✭
    It seems like a open CL issue. If your on windows, what version? Do you have all the C++ packages. Like the 2015 C++ from windows? My best guess is that when it freezes up on startup, it freezes when its trying to open CL to start mining, One you don't have enough virtual memory set up. Two you need to insert the commands -eres with -gser 2, so that when your mining program starts up, it makes sure you have the latest epoch before it starts mining.

    Jotun70 makes a valid point also. Is your mining program rebooting your whole rig? or just the mining program itself? If you read in the -wd and -r options, he has given you the commands to create a batch file to reboot your whole rig. Then you can have your miner batch file in the startup folder to start mining on start up. All the information is here, just search the threads.

  • Sequoia93Sequoia93 Posts: 129Member
    @Calivet not enough virtual memory? I set it to 20GB...recommended is 16GB.

    I'm running Win10 Pro.

    As I said before, -eres and -gser 2 did nothing to help the issue.

    I will have to check on the C++ packages. As of right now though, It's been running fine after I changed the mem clock on two of the 480s from 2250 to 2225. Although they both ran for days at 2250 in a different computer. It hasn't seemed to effect hashrate, so I think I'm good.

    It was just restarting the miner, not the whole PC.

    regardless I'll set up a reboot.bat, thanks for the advice.
  • cvipercviper Posts: 132Member ✭✭
    I had similar problem with one of my rigs with 6 rx 480.30 sec after being loaded with claymore 6.3 one of the gpu starts hashing with 0 mh/s and watch dog stops the miner with Hands in opencl error.The way i fixed it was reinstalling the whole windows and then its turned out to be a loosed cable from riser to x1 mb slot.From the movement it was a little bit disconnected from the mb so i pressed it more tightly and then did cmos reset.Everything is working fine now.
  • cancan Posts: 5Member
    My experience is that when you encounter these kinds of GPU problems it is almost always due to risers. Second most common is that your GPU cannot take the OC. It is a good idea to keep a few spare risers.
  • yodiyodi Posts: 168Member
    edited November 2016
    Hi , did someone find a solution ? i get the same problem , sometimes run 24h with no crash sometimes few hour , but why claymore crash when it try to restart after gpu lost

    i try the auto reboot script but it can't close the previous .exe when gpu fail because not responding
    (with rx480 nitro)
  • Sequoia93Sequoia93 Posts: 129Member
    I turned watchdog off (-wd 0). This means that if a GPU fails it will just continue mining without it. Which is less than ideal, but I check at the pool often enough that I'd catch it within a few hours. As far as I can tell the issue is just a bug within Claymore.
  • yodiyodi Posts: 168Member
    edited November 2016
    the better way i found ....

    Because Trixx is unable to load all card settings (i use only downvolt (-84mv)), it only load one card setting , you have to get other manually at every reboot , so i get Msi Afterburner, it did well for all cards, (that was my problem to reboot when gpu fail with claymore)

    Now i use -r 1 (and create reboot.bat) , because it always freeze with default restart miner, when you lose a gpu.

    Get then --> auto login (regedit or else) , and auto start (msi AB & my miner start.bat ) the only way to run 24/7 (with ssd reboot is fast).

    driver Crimson Edition 16.9.2 are better than 16.11 for me , better with hashrate regularity.
  • kemo6600kemo6600 Posts: 62Member
    @Sequoia93
    since day one claymore always fail for me when miner restarts after gpu error.
    later I started using reboot file in case of error and always work except in one case, when the error is on the main gpu connected to the monitor.
    Try to lower the clocks and up the voltage on the main gpu and use pc reboot file instead of miner restart
  • adasebadaseb Posts: 1,043Member ✭✭✭
    The reason this happens is due to crappy AMD drivers, normally with the older drivers last year when a GPU crashed all you had to do was restart the program and it worked fine.
  • yodiyodi Posts: 168Member
    edited November 2016
    kemo6600 said:

    @Sequoia93
    since day one claymore always fail for me when miner restarts after gpu error.
    later I started using reboot file in case of error and always work except in one case, when the error is on the main gpu connected to the monitor.
    Try to lower the clocks and up the voltage on the main gpu and use pc reboot file instead of miner restart

    what happen when this gpu fail ? windows freeze or else ?
    for windows 10 freeze i 'm actually testing remove the moderate power saving for PCI-e
    > control panel > system & security > power option > edit plan settings > change advanced power settings
    scroll to PCI Express and clic + then get setting on OFF or desactivate.
    probably a reason with other for freeze or gpu fail
    enjoy
  • workwork Posts: 2,075Member ✭✭✭✭
    This is really simple. GPU hangs because of too high OC, too low volts, poor power, or damaged hardware. Claymore WD restarts. Claymore then hangs in the driver when it tries to init the failed GPU.

    Real problem is your GPU hanging, not claymore's miner. Fix your hardware. Period.
  • yodiyodi Posts: 168Member
    edited November 2016
    i got a card more , everything gone crazy now this card is in the pci-e 16x ... before that i have a good stability and know crashing every 30min ~ hour -_- this make no sense
    got change new card place on riser , same shit :s

    edit : i m going windows 7 pro test with amd driver 16.9
    edit 2 : men go win7 seem really better , same Hashrate etc , will told you about long stability then
    Post edited by yodi on
  • momomomo Posts: 15Member
    hi guys,
    any real progress on watchdog issue.

    here is what I've tried so far:

    I’m currently running:

    MB – AsRock H81 PRO BTC v2
    CPU – Intel Celeron G1840
    RAM – Patriot 4GB @ 1600
    SSD – SiliconPower 120GB
    PSU – 2 x ThermalTake 730W
    GPU - 6 x Sapphire NITRO+ RX 480 8GB – (11260-07-20G) - STOCK BIOS
    PCI – 6 x Resizers x16 to x1 (USB)

    So I set it up and am running Windows 10 and Claymore Dual miner v7.4.

    PROBLEM:

    After 5-10 minutes of running in ether single or dual mode I would get: WATCHDOG: GPU (x) hangs in OpenCL call. Miner will “restart” and will not start again.

    TRIED SO FAR:

    - Reinstalling Windows 10
    - Running both single or dual mining
    - Running on stock drivers then 16.9.1 all the way to latest
    - Removing one or more cards
    - Switching PSU to run one to four cards and back and forth
    - Switching resizers
    - Running cards in both “OC” and silent mode (bios switch)
    - Virtual Memory set to 16GB+

    I've tried running system with only 4 cards and then changed PSU, resizes, PCIe placement and ALWAYS have same result.
    All brand new cards and hardware. No bios modes, changing memory timings and stuff...

    I think a listed all.
    So is there any solution to this issue since I’m pretty out of them right now.
  • cidmocidmo Posts: 192Member ✭✭
    edited February 12
    i noticed its happening to me when the devfee starts
    sometimes it generates a new dag and i have offsets set in watt-tool which causes the GPUs on the bleeding edge to hang
    im too lazy but i know ppl who get similar performance out of sgminer, sgminer just kinda burnt me out during the quark/qubit dash lyre wars
  • yodiyodi Posts: 168Member
    edited February 13

    i m using 16.11.5 driver

    with
    jukebox downvolt bios
    download here vdrop+

    use msi AB then , i had to lower 2 cards to 1110 ~ 2035 (-84 coreV (mV)) my gpu fail was everytime the same cards
    now it sometime crash (running 21h~50h)
    you have to create reboot.bat to restart you computer , and then had start.bat shorcut to startprogram and login automaticaly or desactivate it. Because claymore dual can't restart miner cause of driver or gpu lost

    so test lowering all cards at 1110 ~ 2020 , just to check if it's stable, if not maybe some hardware problem.

    - what risers do you have ?

    - have you set pci-gen to 1 or 2 in mobo bios ?


    i'm with 4 x 11260-07-20G & 2 x 11260-01-20G
  • momomomo Posts: 15Member
    I'm now running 5 cards (6th to add tomorrow) and I added reboot.bat to keep system running after freeze.

    I'm using x16 to x1 risers with USB.
    In bios I tried setting PCI to gen 1 and gen 2, same results.

    AtiWinFlash is not running for some reason, paging Cannot wind decent AMD card???
    Any thought on that? I also tried running with from CMD ( as administrator) but no luck.

    I will try 16.11.5 driver tomorrow and let you know on results.

    But for bios mod, not luck till I'm able to run ATIwinflash...
  • aguscahyoaguscahyo Posts: 6Member
    I have the exact same problem. What's curious is it hangs when it restart. Yet if I close it and I restart manually it works. I wonder why
Sign In or Register to comment.