@TheoCoyne 18.7 is the best I've seen as of yet, but i would expect a 980 to do a little better. I'll try to add some tuning options soon. How are your OpenCL scores?
My miner did mine a full block on the chain yesterday, but the yield is still way to low. I suspect it doesn't work perfect yet, needs more testing.
@eth001 unfortunately the current cmakefile doesn't compile the CUDA kernel. I have to do a bit of manual setup in MSVC after running cmake. Hope to find some time to work on that this week. Also GRID K520 isn't supported at the moment, but that's an easy fix.
@eth001 and all other wanting to build from source. I've created a CMakeLists file that should allow building from scratch without any post config tweaking. Only tested on Windows/MSVC.
Prerequisites: - grab the source here - The NVidia CUDA SDK. Get it here.
Use this command to build ethminer:
cmake -DBUNDLE=miner -DETHASHCU=1 [plus any other options i.e. -G ... that you're used to] ..
To speed up compilation time, you can add the COMPUTE flag to specifically target your hardware. You can check your compute capabilties here. For compute "x.y", use the format "xy". i.e. for a GTX770 (Compute 3.0):
cmake -DBUNDLE=miner -DETHASHCU=1 -DCOMPUTE=30 [plus any other options i.e. -G ... that you're used to] ..
I've also already added a "tuning parameter" that is required to set up compile time. It's MAXREGCOUNT. By default it is already set up to achieve the best performance (limit to 72 regs on compute>3.0), but your results may vary. i.e:
cmake -DBUNDLE=miner -DETHASHCU=1 -DCOMPUTE=52 -DMAXREGCOUNT=255 [plus any other options i.e. -G ... that you're used to] ..
Unrestricted, the current kernel uses 73 registers. But 72 yields a better occupancy.
The current kernel is preset to use 32KB of shared memory while using 8K (128 threads/block). This might be an issue for some older and midrange cards. The idea is to make that all tunable, but that's for later.
@Brips hmm that looks like some generic error in the opencl code (which is still part of the code). Have you successfully built ethminer before from the official source?
I have updated my ethminer with the tuning parameters. They don't just work for CUDA, but also for OpenCL. With the right numbers, a performance increase of about 30% with opencl, and 40% using CUDA compared to the official ethminer can be achieved. That is, on my GTX780. You mileage may vary.
The parameters (with defaults shown as examples):
--gpu-miningbuffers 2 : the amount of buffers used for storing search results. On OpenCL, 2 seems the right value, but on CUDA (where I'm using streams) I got better results with 4 buffers on the GTX780.
--gpu-batch-size 18 : the amount of nonces tried in a single kernel execution, as a power of 2. So with 18, the batch size 2^18 = 262144. For my GTX780, on CUDA i use 19 and 21 for OpenCL.
--gpu-workgroup-size 64: On OpenCL, this is simply the workgroup size, which translates to the number of threads per block on CUDA. Powers of 2 generally give the best results. Currently limited to 512 on CUDA, but that should be more than sufficient. For my GTX780, on CUDA i use 128 and 64 for OpenCL.
When on Compute 5.0 or higher (GTX750ti and newer), better use this build. Found out that Compute 5.0 didn't like the MAXREGCOUNT of 72. Curious which build works best for Compute 5.2 (GTX9xx series) and 3.0 devices.
@Brips hmm that looks like some generic error in the opencl code (which is still part of the code). Have you successfully built ethminer before from the official source?
yes i successfully compile on the official git ^_^ I see a little difference in make, on the official git, i make a build directory and "cmake .." but with your version, i must call "cmake" on root directory, not in build.
The end of cmake build, i see with your version : -- Configuring done -- Generating done -- Build files have been written to: /mnt/lvm/git/cuda/cpp-ethereum
with the official repository, i see : -- Build files have been written to: /mnt/lvm/git/cuda/cpp-ethereum/build
@Brips I'm not familair with building on Linux, but on Windows we have to create the build directory, go there and from there call cmake on the parent dir. So effectively the build files always end up in cpp-ethereum/build. Didn't you simply forget that when building my fork?
@o0ragman0o I think the default kernel parameters are a bit too tight for a 750. You might want to try lowering them until in runs. i.e. ethminer.exe -U -M --gpu-mining-buffers 1 --gpu-workgroup-size 8 --gpu-batch-size 13. Then increase until it crashes.
You may also want to update your graphics driver. You're still on OpenCL 1.1, version 1.2 will give you a slightly better hashrate
@o0ragman0o there's always MSI Afterburner . But close 7MH/s is what is normal for a GTX750. The 750Ti has an extra SMX unit (128 CUDA cores), that just pushes it over 8MH/s.
btw 8 buffers is overkill. I bet it runs as fast with 3 or 4 buffers. using multiple buffers (CUDA streams) allows for a little bit of kernel and memcpy concurrency.
@Genoil, yes I was surprised about the buffers. I tried everything up to 20 after which my drivers crashed. I found fairly incremental gains up to 8. After that the trial reading oscillated high and low so I figure there was some time dependent lag with kernels closing out. It then strangely stabilised again at 20 buffer. But yeah, 8 gave the highest most stable trials. Anything lower was back down 6.74 range
@Brips I'm not familair with building on Linux, but on Windows we have to create the build directory, go there and from there call cmake on the parent dir. So effectively the build files always end up in cpp-ethereum/build. Didn't you simply forget that when building my fork?
hum no it's working like that in the official git repo, but not in yours. but this isn't the big problem for compiling ^^
@Brips yeah I know, tried to build on Linux yesterday. Latest version in my repo compiles but you have to disable -Wall. Couldn't run it because of driver issues on the g2 AWS instance I built it on..
You might also need to add another flag to the nvcc call in the make txt of the CUDA lib, -c++11. I'm not sure I committed that
@ConradJohnson it doesn't build without errors yet, unless you apply some hacks to a few of the cmake files. I'm planning to resolve those issues first and not commit them to the source tree. You could try compiling without -Wall, add -c++11 to the nvcc parameters and then build with -DETHASHCU=1 -DBUNDLE=miner -DCOMPUTE=50. The latter is strictly for the 750.
@ConradJohnson you should update to the latest source, then the only remaining issue are a crapload of gcc warnings ("style of line directive is a gcc extension") I don't quite understand but I suspect have something to to with Windows line endings. For now, remove the string "-Werror" on line 6 from cpp-ethereum/cmake/EthCompilerSettings.cmake
Comments
Preparing DAG...
i 23:07:20|Wcaurdmaimnign eurp0. . .w
orkLoop 0 #00000000… #00000000…
i 23:07:20|cudaminer0 Initialising miner...
Using device: GeForce GTX 980(5.2)
Trial 1... 18699605
Trial 2... 18699605
Trial 3... 18612224
Trial 4... 18699605
Trial 5... 18699605
min/mean/max: 18612224/18682128/18699605 H/s
inner mean: 6233201 H/s
Phoning home to find world ranking...
Error phoning home. ET is sad.
My miner did mine a full block on the chain yesterday, but the yield is still way to low. I suspect it doesn't work perfect yet, needs more testing.
For opencl cpp version:
Trial 1... 7165269
Trial 2... 6903125
Trial 3... 7165269
Trial 4... 7077888
Trial 5... 7165269
⚡ 11:24:14|gpuminer0 Worker stopping 0.143587 s
min/mean/max: 6903125/7095364/7165269 H/s
inner mean: 4747719 H/s
For Cuda version:
Trial 1... 7689557
Trial 2... 7776938
Trial 3... 7689557
Trial 4... 7602176
Trial 5... 7602176
min/mean/max: 7602176/7672080/7776938 H/s
inner mean: 5155498 H/s
I try to run this in ubuntu 14.04 , there is some bug in my OS
the result is
Prerequisites:
- grab the source here
- The NVidia CUDA SDK. Get it here.
Use this command to build ethminer:
cmake -DBUNDLE=miner -DETHASHCU=1 [plus any other options i.e. -G ... that you're used to] ..
To speed up compilation time, you can add the COMPUTE flag to specifically target your hardware. You can check your compute capabilties here. For compute "x.y", use the format "xy". i.e. for a GTX770 (Compute 3.0):
cmake -DBUNDLE=miner -DETHASHCU=1 -DCOMPUTE=30 [plus any other options i.e. -G ... that you're used to] ..
I've also already added a "tuning parameter" that is required to set up compile time. It's MAXREGCOUNT. By default it is already set up to achieve the best performance (limit to 72 regs on compute>3.0), but your results may vary. i.e:
cmake -DBUNDLE=miner -DETHASHCU=1 -DCOMPUTE=52 -DMAXREGCOUNT=255 [plus any other options i.e. -G ... that you're used to] ..
Unrestricted, the current kernel uses 73 registers. But 72 yields a better occupancy.
The current kernel is preset to use 32KB of shared memory while using 8K (128 threads/block). This might be an issue for some older and midrange cards. The idea is to make that all tunable, but that's for later.
Got 18.1MH/s with these settings on the ol' 780.
The parameters (with defaults shown as examples):
--gpu-miningbuffers 2 : the amount of buffers used for storing search results. On OpenCL, 2 seems the right value, but on CUDA (where I'm using streams) I got better results with 4 buffers on the GTX780.
--gpu-batch-size 18 : the amount of nonces tried in a single kernel execution, as a power of 2. So with 18, the batch size 2^18 = 262144. For my GTX780, on CUDA i use 19 and 21 for OpenCL.
--gpu-workgroup-size 64: On OpenCL, this is simply the workgroup size, which translates to the number of threads per block on CUDA. Powers of 2 generally give the best results. Currently limited to 512 on CUDA, but that should be more than sufficient. For my GTX780, on CUDA i use 128 and 64 for OpenCL.
Example: gnl-ethminer.exe -U -M --gpu-mining-buffers 4 --gpu-workgroup-size 128 --gpu-batch-size 19
I see a little difference in make, on the official git, i make a build directory and "cmake .."
but with your version, i must call "cmake" on root directory, not in build.
I'll try with your update
Thanks @Genoil !
-- Configuring done
-- Generating done
-- Build files have been written to: /mnt/lvm/git/cuda/cpp-ethereum
with the official repository, i see :
-- Build files have been written to: /mnt/lvm/git/cuda/cpp-ethereum/build
Using device: GeForce GTX 750(OpenCL 1.1 CUDA)
Trial 1... 6618918
Trial 2... 6531827
Trial 3... 6618918
Trial 4... 6618918
Trial 5... 6627725
min/mean/max: 6531827/6603261/6627725 H/s
inner mean: 6589887 H/s
OpenCL only. Switch -U crashes miner (Running Windows 7).
You may also want to update your graphics driver. You're still on OpenCL 1.1, version 1.2 will give you a slightly better hashrate
Hitting around 6.8MH now on OpenCL 6.7 on CUDA but haven't played with the buffers on that yet .
6.72MHs Default CUDA
6.97MHs ethminer.exe -M -U --gpu-batch-size 19 --gpu-mining-buffers 8 --gpu-workgroup-size 128
Not a huge margin but the best I could find. Couldn't find anything better than default 6.86 on OpenCL.
Either way, just couldn't quite crack 7.
btw 8 buffers is overkill. I bet it runs as fast with 3 or 4 buffers. using multiple buffers (CUDA streams) allows for a little bit of kernel and memcpy concurrency.
but this isn't the big problem for compiling ^^
You might also need to add another flag to the nvcc call in the make txt of the CUDA lib, -c++11. I'm not sure I committed that
Where would I do this: "You could try compiling without -Wall, add -c++11 to the nvcc parameters" ?