CUDA miner

GenoilGenoil ✭✭✭0xeb9310b185455f863f526dab3d245809f6854b4dMember Posts: 769 ✭✭✭
Hey,

I've been working on a CUDA miner. I've just got all bugs resolved. First hashrate measured on a GTX780 is a meagre 0.7MH/s, but it's a start! More updates here when I make improvements...
  D  17:47:23|main  #00004000
Benchmarking on platform: { "platform": "CUDA 7.0", "device": "GeForce GTX 780",
 "version": "Compute 3.5" }
Preparing DAG...
  D  17:47:24|main  pause took 0
  i  17:47:24|main  Spawning cudaminer0
  i  17:47:24|cudaminer0  Entering work loop...
  D  17:47:24|maiUns i nkgi cdkeOvfifc et:o oGke F0o.r0c0e2 GTX 780(3.5)

Warming up...
Trial 1... 611669
Trial 2... 699050
Trial 3... 699050
Trial 4... 699050
Trial 5... 611669
  i  17:47:46|main  Stopping cudaminer0
  i  17:47:46|cudaminer0  Finishing up worker thread...
  i  17:47:46|main  Terminating cudaminer0
min/mean/max: 611669/664097/699050 H/s
inner mean: 699050 H/s
Phoning home to find world ranking...
Error phoning home. ET is sad.
Druk op een toets om door te gaan. . . 
«13456769

Comments

  • Michael_AMichael_A LondonMember Posts: 61
    Well Done Genius... :D lol well done Geniol
    Does it mean all Nvidia user need to update and upgrade their ethminer?
  • GenoilGenoil ✭✭✭ 0xeb9310b185455f863f526dab3d245809f6854b4dMember Posts: 769 ✭✭✭
    No not as long as the OpenCL hashrate on NVidia is higher than my CUDA version. And then it's still built on a relatively ancient build of eth.
  • GenoilGenoil ✭✭✭ 0xeb9310b185455f863f526dab3d245809f6854b4dMember Posts: 769 ✭✭✭
    After a bit of tweaking I'm now at 1.7MH/s, still a long way to go...
  • jzenjzen Member Posts: 49
    @Genoil Nearly a 2.5x performance improvement, way to go!
  • GenoilGenoil ✭✭✭ 0xeb9310b185455f863f526dab3d245809f6854b4dMember Posts: 769 ✭✭✭
    @jzen yes but I'm still way behind the opencl performance. I think I'm missing an important clue somewhere, because ultimately, OpenCL on NVidia runs on CUDA too. I may actually try and apply the changes I made to the CUDA kernel to the OpenCL kernel and see how that affects the OpenCL kernel speed on NVidia.
  • GenoilGenoil ✭✭✭ 0xeb9310b185455f863f526dab3d245809f6854b4dMember Posts: 769 ✭✭✭
    edited June 2015
    8.1MH/s now. The missing clue was to put the cuda compiler in release mode, which gave me a 4x speed bump. Only 3 short to match opencl performance.
    Post edited by Genoil on
  • jzenjzen Member Posts: 49
    @Genoil, Good goin! Where do think the 3 short are hiding?
  • GenoilGenoil ✭✭✭ 0xeb9310b185455f863f526dab3d245809f6854b4dMember Posts: 769 ✭✭✭
    @jzen, not sure yet. Before I discovered the debug/release thing I had already improved the speed 3x from the first working kernel. So it's just a matter of more optimization, although at this point I really don't know where to look for any.
  • Brillopad12Brillopad12 Member Posts: 17
    @Genoil Looking awesome man! Keep up the great work
  • GenoilGenoil ✭✭✭ 0xeb9310b185455f863f526dab3d245809f6854b4dMember Posts: 769 ✭✭✭
    edited June 2015
    10.8MH/s on CUDA vs 12.5MH/s on OpenCL now.
  • ConradJohnsonConradJohnson ✭✭ Member Posts: 130 ✭✭
    Bad ass @genoil just in time for me to switch back to the nvidia machine.
  • GenoilGenoil ✭✭✭ 0xeb9310b185455f863f526dab3d245809f6854b4dMember Posts: 769 ✭✭✭
    edited June 2015
    I think this is it for now, can't get past 10.8, 11 on occasion. I have also tried porting some of the cuda optimizations over to the opencl kernel, but I'll leave that to somebody actually owning an AMD card, as debugging opencl is impossible on NVidia hardware.

    I've attached a win64 release binary for those of you who want to test the difference. Might be that on different hardware, the speed difference is different from what I have on the 780. Other OS can grab and build the source from https://github.com/Genoil/cpp-ethereum/tree/cudaminer.

    Use the -U flag instead of -G
  • GenoilGenoil ✭✭✭ 0xeb9310b185455f863f526dab3d245809f6854b4dMember Posts: 769 ✭✭✭
    edited June 2015
    I've got 12.4MH/s on CUDA now vs. 12.4MH/s on OpenCL. almost done syncing the blockchain, so in a few hours I'm going to see if it's actually able to mine any blocks
  • PascalVerstPascalVerst Member Posts: 6
    I'll test along with you, I am getting 17 600 000 H/s with GTX 970 on Windows 7.
  • lefterislefteris Member Posts: 7
    Very nice work Genoil. I will see what can be ported back upstream. I am also using Nvidia (GTX 770), but on Linux with OpenCL.
  • GenoilGenoil ✭✭✭ 0xeb9310b185455f863f526dab3d245809f6854b4dMember Posts: 769 ✭✭✭
    @lefteris, thanks, I could really use some help setting up the CMake scripts. For now, for each build I have to manually set the VC++ build customisation in MSVC (which cmake doesn't support apparently), add the .cu and .cuh files and modify NVCC properties.

    Before porting anything upstream, I really want to have mined a block on the testnet :). During development, I made sure the CUDA miner kept reporting the same results as the opencl miner, but you never know :)
  • lefterislefteris Member Posts: 7
    I am curious to see if Cuda can be used also in Linux directly instead of as an OpenCL backend. Will have to try that. Not officically supported by us ofcourse, but still the Nvidia guys could use a bump in hashrate if that's done.
  • GenoilGenoil ✭✭✭ 0xeb9310b185455f863f526dab3d245809f6854b4dMember Posts: 769 ✭✭✭
    For now I just have equal hashrate on CUDA vs OpenCL, so in a way it's quite pointless :). I still have some optimization left to do, but it isn't going to do much, I think.

    I'm now mining with CUDA on the blockchain, but with the current hashrate I don't think I'll be finding any block anytime soon.
  • GenoilGenoil ✭✭✭ 0xeb9310b185455f863f526dab3d245809f6854b4dMember Posts: 769 ✭✭✭
    edited June 2015
    I can now mine a tiny bit faster than opencl @12.6MH/s. I'm down to replacing C with inline PTX assembly now, with some luck I should be able to push a bit harder...

  • jzenjzen Member Posts: 49
    @Genoil Did you mine one yet?
  • GenoilGenoil ✭✭✭ 0xeb9310b185455f863f526dab3d245809f6854b4dMember Posts: 769 ✭✭✭
    @jzen I don't know. The only machine I can reliably mine on is behind a firewall, so I'm connecting to geth on EC2 with rpc. It's now the second time in a row I can't reach my EC2 node after having run it for a longer period. I hope I'll be able to recover it.

    @PascalVerst The build I attached was for Kepler architecture (Compute 3.5). I'll post one that suport Maxwell better, I would expect it to do a little better.
  • GenoilGenoil ✭✭✭ 0xeb9310b185455f863f526dab3d245809f6854b4dMember Posts: 769 ✭✭✭
    edited June 2015
    Got back into the EC2 node. 0 mined blocks, while statistically speaking, I should have mined about 10. But my balance is 1.6875 ether, while it was 0 when I started. Don't know where I got that from. Anyone?

    https://explorer.etherapps.info/address/0x99b645a86bc157ec695f7db8dbe5751260c788ea
  • klaoklao Member Posts: 3
    Genoil said:

    Got back into the EC2 node. 0 mined blocks, while statistically speaking, I should have mined about 10. But my balance is 1.6875 ether, while it was 0 when I started. Don't know where I got that from. Anyone?

    1.7 ether looks like exactly one uncle block. (Meaning, that you have mined a block, but someone was a little bit faster. So, your block was not incorporated in the chain, but it was mentioned in the next block, and you've got a little reward for that.)

    As for why you only got that and not more, I've no idea. But yeah, with that hashrate you should have got more...
  • GenoilGenoil ✭✭✭ 0xeb9310b185455f863f526dab3d245809f6854b4dMember Posts: 769 ✭✭✭
    @klao thanks for clearing that up. I'm happy with the uncle block. I'll try again overnight.
  • GenoilGenoil ✭✭✭ 0xeb9310b185455f863f526dab3d245809f6854b4dMember Posts: 769 ✭✭✭
    Seems I tried to over-optimise here and there. Reverted some things to how they were originally, and now I'm suddenly doing 14MH/s.
  • GenoilGenoil ✭✭✭ 0xeb9310b185455f863f526dab3d245809f6854b4dMember Posts: 769 ✭✭✭
    edited June 2015
    Latest build for Windows. Requires a Compute 3.5 Device (GTX 780, Titan) minimally. Compute 2.0 and 3.0 shouldn't be an issue but doesn't compile after the asm optimizations.

    Now does 17.8MH/s on the GTX780.
    Post edited by Genoil on
  • TheoCoyneTheoCoyne IowaMember Posts: 6
    @Genoil I'm running a GTX 980. If you throw up a quick and dirty how to get started with your CUDA miner guide, I'd love to do some testing.
  • o0ragman0oo0ragman0o mod Member, Moderator Posts: 1,291 mod
    Benchmarking on platform: { "platform": "CUDA 7.0", "device": "GeForce GTX 750", "version": "Compute 5.0" }
    Preparing DAG...
    i 1W9a:r0m2i:n3g3 |ucpu.d.a.m
    iner0 workLoop 0 #00000000 #00000000
    i 19:02:33|cudaminer0 Initialising miner...
    Using device: GeForce GTX 750(5.0)
    Trial 1... 0
    Trial 2... 0
    Trial 3... 0
    Trial 4... 0
    Trial 5... 0
    Not sure what's going on here with the benchmark. It also hangs before phoning home.

    I'm running GTX750 TI on Win 7.
    Normal mining seems to work but at only 1MH/s. Given that stats put the card about 1/3 as powerful as 780 I was thinking it would have been around 5MH/s.
    i  19:01:49|main  Mining on PoWhash #1b119561 : 1096476 H/s = 13369344 hashes / 12.193 s
  • GenoilGenoil ✭✭✭ 0xeb9310b185455f863f526dab3d245809f6854b4dMember Posts: 769 ✭✭✭
    edited June 2015
    @o0ragman0o thanks for reporting. It's not good, indeed. I have a 750 Ti in another system as well. Depending on the machine it is installed in, it performs very different. The other machine has a Celeron G1840 with 4GB, opencl hashing around 500KH/s. Once I took it out and installed it side by side with the 780 in a Xeon E5, it suddenly hashed 8Mh/s. After the weekend I'll upgrade the Celeron to an i5 and double the RAM. Then I can look into tuning performance for the 750Ti, because eventually that's a more interesting card for mining than a 780.
  • o0ragman0oo0ragman0o mod Member, Moderator Posts: 1,291 mod
    corrections:
    GTX 750 OC (not TI, but no real difference)
    Did phone home on benchmark after a while.

    The cards power consumption is only 30% when mining so there's still a lot of legs in there somewhere.
      i  09:54:29|cudaminer0  workLoop 1 #c6320e3c #c6320e3c
    i 09:54:29|main Mining on PoWhash #336a7d6f : 1048576 H/s = 524288 hashes / 0.5 s
    i 09:54:30|main Got work package:
    i 09:54:31|main Header-hash: c9b987cc71afeff02082550720d55ae121031cee0199e07a7ff564cfe3e2fdb9
    i 09:54:32|main Seedhash: c6320e3c1c3456002aa6ccc5b60cf5bf054d7e9392712f8b7764869abe630035
    i 09:54:34|main Target: 000000001ddf0fe4a8ee03c49bcf8b5057537dfcd8f569923dc16481c969c7b8
    ! 09:54:39|main pause 3.001 s
    Also, there is a 3 second pause warning after every work loop. Is that intentional? Sorry I can't be more constructive, I think I'm going to have to knuckle into the material about how mining actually works...
Sign In or Register to comment.