@ConradJohnson I created support for multi-GPU mining today. It's in latest develop, but with a bug left that is fixed in attached Win64 binary (or here). I think it works, but I went on the cheap and bought a GT720 card to verify. Bad idea, it crashes when mining. Can you, or anybody else with a multi-GPU setup check it out see if there's a significant hashrate difference between:
Thanks @ConradJohnson, I hope you can thow a 5 card rig at it. It should run fine on Linux. Meanwhile, I went ahead and took my GTX750ti to work to test alongside my GTX780. Works great! some findings:
- my earlier measurement of GTX750Ti was severely bottlenecked by the CPU. On the Xeon E5 it does 8 Mh/s as opposed to 0.5Mh/s on the Celeron G1840. - Both the GTX 780 and the GTX 750Ti benchmark around 60% TDP. - The memory controller load on the 780 is only 48% whereas on the 750ti it is 92%. - The combined hashrate is 20Mh/s. So Mh/s/W wise the GTX750Ti is a much better card. Really curious how it compares to its AMD counterparts. - the GTX750Ti needs a longer warmup period (--benchmark-warmup ). Perhaps because copying the DAG to it is much slower. - It also heated up pretty bad (95 deg celsius), up to a point that the driver told the card to slow down. It's a fairly basic single fan Asus GTX750Ti, and the location in the case was pretty bad as well. - The GTX780 (MSI dual fan) stays much cooler, around 70 degrees.
Dang... only have an 1840 - Celeron on the rig I have, plus the one I'm building. I'll give readings for that and I'll put the E3 on order (only 1 cpu slot in the rigs' mb).
I would first test it before ordering any CPU's. #1 You're on Linux, #2 I've only got 4GB RAM in that machine #3 I got to 15Mh/s using the R9 270 (which I just sold) in that build using the benchmark of the ethash repo
@Genoil - Allright, building successfully now (make) on the multiple-gpu branch. Should have a look at the hash rate on this Nvidia machine pretty quickly. Noticed the cuda version in one of your branches also.
Started piecing together the AMD machine also, so a couple of days and I should have that one up and running. I'll also be able to swap the 4 core CPU on both machines to see if that makes a difference.
That's odd. Anyway I would always recommend to stay with the main source tree. The multi GPU stuff has been merged into cop-ether rum:develop for a week already. God knows what bug snug up into my fork . As for the CUDA branch; it's almost done host-side, GPU side still lots to do.
Can anyone in the dev group shed some light on what I would need to run to try and detect the OpenCL implementation in my Ubuntu (14.04) environment? The cmake and make were successful after I added a symbolic link to the OpenCL implementation from Nvidia Cuda: ls -s /ust/local/cuda-7.0/targets/x86_64-linux/include/CL /usr/include
But still just getting CPU mining with the arguments: /eth -G -M and ./eth -G -M --opencl-device 0
Interesting indeed @ConradJohnson . It really seems like each GPU miner needs its own (logical) core. I've got one more interesting test that I can't really do because of lack of hardware, using the -t [n] flag (in addition to -G -M). Start with n=1 and increase until you hit 6. The t flag limits the amount of miner threads to [n]. I wouldn't be surprised if you get a better score with t=3 or t=4 on the E3 than t=6. If that turns out to be true, AMD hexacores (FX-6300), might be the best choice for multi GPU mining.
On the other hand, there's probably lots of headroom for optimization in this area.
Oh and BTW, you also want to add --benchmark-warmup 6 (or 5, or more) to your startup script. As you can see, your first trial is always lower than the rest. This is gone with the extra "warmup" time. GTX750Ti seems to need that. (It think it's because of the narrow bus, causing the DAG upload to take a bit longer.)
@Genoil - So looks like t 2 is the limit for this build and hardware setup. 8 MH/s for 1 and 16 for 2. Anything higher than that gets 0s on the trials and is killed.
@ConradJohnson with the E3 or the G1840? Hm I would have expected better from a quad core. I just noticed that the latest develop also has en "ethminer" executable, which can only mine and drops all the other functionality of "eth". It has the same parameters and might just a tiny bit easier on CPU requirements. But I doubt it, really
-- edit--
oh the E3 actually has 8 logical cores so in theory it should have given you the full 40Mh/s. I really wonder what's the limiting factor then.
@ConradJohnson How much RAM is on that rig? Might be the cause. I haven't looked at it in too much detail when I had the two GPU's in, but it could be that it (temporarily) takes up ~1GB of RAM, per card. If you could monitor RAM usage with -t 1 vs. -t 2 vs. -t 3, that would be great. If it stays at 1GB for any config, you could try to run it with gdb to see if you can get any useful information as to why it crashes. In that case cmake with -DCMAKE_BUILD_TYPE=Debug. And then gdb --args ethminer -G -M. Or something like that, I've never used gdb, really
Perhaps I'll try to cram 3 GPU's in my machine, but it would be messy...
Roger that @Genoil . I think I have 4GB but I'll double check. I got 8GB total I can throw on there. I'll try and run those tests tonight, but might not get to it.
It looks like my guess was right. The application takes 1GB RAM fo the DAG, and then allocates an additional GB per card. I have to read up on OpenCL memory mapping for a bit to see if and how that can be optimized.
@ConradJohnson you can use this as a rule of thumb: ram required = (1+ num cards) * 1GB. That's on top of the RAM already in use by your system. So with 8GB, I think 5 cards is the max. When I find some time (and a clear head), I'm going to try to size those requirements down, drastically.
Hey @ConradJohnson , you should be able use as many cards as you want with just 4GB if you use this version. I also wouldn't be surprised if the G1840 would suffice at that.
Comments
eth.exe -G -M
and
eth.exe -G -M --opencl-device=0
Thanks!
- my earlier measurement of GTX750Ti was severely bottlenecked by the CPU. On the Xeon E5 it does 8 Mh/s as opposed to 0.5Mh/s on the Celeron G1840.
- Both the GTX 780 and the GTX 750Ti benchmark around 60% TDP.
- The memory controller load on the 780 is only 48% whereas on the 750ti it is 92%.
- The combined hashrate is 20Mh/s. So Mh/s/W wise the GTX750Ti is a much better card. Really curious how it compares to its AMD counterparts.
- the GTX750Ti needs a longer warmup period (--benchmark-warmup ). Perhaps because copying the DAG to it is much slower.
- It also heated up pretty bad (95 deg celsius), up to a point that the driver told the card to slow down. It's a fairly basic single fan Asus GTX750Ti, and the location in the case was pretty bad as well.
- The GTX780 (MSI dual fan) stays much cooler, around 70 degrees.
GTX750Ti owners need not worry; turns out there was a SATA or USB header cable stuck to the fan, causing the GPU to overheat.
Started piecing together the AMD machine also, so a couple of days and I should have that one up and running. I'll also be able to swap the 4 core CPU on both machines to see if that makes a difference.
Stand by and I'll have some #'s shortly.
benchmarking on platform: 2-thread CPU
Can't seem to get the GPU mining going.
Tried this: ./eth -G -M
and this: ./eth -G -M --opencl-device 0
Can anyone in the dev group shed some light on what I would need to run to try and detect the OpenCL implementation in my Ubuntu (14.04) environment? The cmake and make were successful after I added a symbolic link to the OpenCL implementation from Nvidia Cuda: ls -s /ust/local/cuda-7.0/targets/x86_64-linux/include/CL /usr/include
But still just getting CPU mining with the arguments:
/eth -G -M
and
./eth -G -M --opencl-device 0
Benchmarking on platform: { "platform": "NVIDIA CUDA", "device": "GeForce GTX 750 Ti", "version": "OpenCL 1.1 CUDA" }
Preparing DAG...
Anyway to know if it is running this on all of my cards (6) or just the one device?
I'll let you know what the hashrate is once DAG is built.
Trial 1... 0
Trial 2... 0
Trial 3... 0
Trial 4... 0
Trial 5... 0
ℹ 16:23:53|eth stopWorking for thread gpuminer0
ℹ 16:23:53|eth Stopping gpuminer0
ℹ 16:23:53|eth Waiting until Stopped...
Killed
System became unusable like you mentioned before (On the Celeron G1840).
Never got a successful stoppage and process was eventually killed like you see above.
2nd run with command (./eth -G -M):
Trial 1... 0
Trial 2... 4582208
Trial 3... 15813202
Trial 4... 13655800
Trial 5... 16154776
ℹ 16:28:16|eth stopWorking for thread gpuminer0
ℹ 16:28:17|eth Stopping gpuminer0
ℹ 16:28:17|eth Waiting until Stopped...
Killed
Again system became unusable and never got successful stoppage so process was killed automatically.
Looks like it recognized all 6 of my cards.
I'll run with the other command (./eth -G -M --opencl-device 0) and post then I'll swap CPUs
Using this command: ./eth -G -M --opencl-device 0
It limited the test to only 1 card and I got 8MH out of it.
Using platform: NVIDIA CUDA
Using device: GeForce GTX 750 Ti(OpenCL 1.1 CUDA)
Trial 1... 6728362
Trial 2... 8126464
Trial 3... 8126464
Trial 4... 8126464
Trial 5... 8126464
ℹ 16:46:51|eth stopWorking for thread gpuminer0
ℹ 16:46:51|eth Stopping gpuminer0
ℹ 16:46:51|eth Waiting until Stopped...
ℹ 16:46:51|gpuminer0 Finishing up worker thread...
ℹ 16:46:51|gpuminer0 State: Stopped: Thread was 2
ℹ 16:46:51|gpuminer0 Waiting until not Stopped...
ℹ 16:46:51|eth Terminating gpuminer0
min/mean/max: 6728362/7846843/8126464 H/s
inner mean: 8126464 H/s
Phoning home to find world ranking...
Ranked: 27 of all benchmarks.
Swapped out the CPU for the 4 core CPU (E3-1230)
without the --opencl-device 0 arg, here's what I got:
Using device: GeForce GTX 750 Ti(OpenCL 1.1 CUDA)
Using device: GeForce GTX 750 Ti(OpenCL 1.1 CUDA)
Using device: GeForce GTX 750 Ti(OpenCL 1.1 CUDA)
Using device: GeForce GTX 750 Ti(OpenCL 1.1 CUDA)
Using device: GeForce GTX 750 Ti(OpenCL 1.1 CUDA)
Trial 1... 14728581
Trial 2... 16153828
Trial 3... 16160159
Trial 4... 16252928
Trial 5... 16165546
ℹ 18:40:23|gpuminer0 Finishing up worker thread...
ℹ 18:40:23|eth stopWorking for thread gpuminer0
ℹ 18:40:23|eth Stopping gpuminer0
ℹ 18:40:23|eth Waiting until Stopped...
ℹ 18:40:23|gpuminer0 State: Stopped: Thread was 2
ℹ 18:40:23|gpuminer0 Waiting until not Stopped...
ℹ 18:40:23|eth Terminating gpuminer0
ℹ 18:40:26|eth stopWorking for thread gpuminer1
ℹ 18:40:26|eth Stopping gpuminer1
ℹ 18:40:26|eth Waiting until Stopped...
ℹ 18:41:32|gpuminer1 Finishing up worker thread...
ℹ 18:41:32|gpuminer1 State: Stopped: Thread was 2
ℹ 18:41:32|gpuminer1 Waiting until not Stopped...
ℹ 18:41:32|eth Terminating gpuminer1
ℹ 18:41:36|eth stopWorking for thread gpuminer2
ℹ 18:41:37|eth Stopping gpuminer2
ℹ 18:41:37|eth Waiting until Stopped...
ℹ 18:42:42|gpuminer2 Finishing up worker thread...
ℹ 18:42:42|gpuminer2 State: Stopped: Thread was 2
ℹ 18:42:42|gpuminer2 Waiting until not Stopped...
ℹ 18:42:46|eth Terminating gpuminer2
ℹ 18:42:49|gpuminer3 Finishing up worker thread...
ℹ 18:42:49|gpuminer3 State: Stopped: Thread was 1
ℹ 18:42:49|gpuminer3 Waiting until not Stopped...
ℹ 18:42:49|eth stopWorking for thread gpuminer3
ℹ 18:42:49|eth Stopping gpuminer3
ℹ 18:42:49|eth Waiting until Stopped...
ℹ 18:42:49|eth Terminating gpuminer3
ℹ 18:42:52|eth stopWorking for thread gpuminer4
ℹ 18:42:52|gpuminer4 Finishing up worker thread...
ℹ 18:42:52|eth Stopping gpuminer4
ℹ 18:42:52|gpuminer4 State: Stopped: Thread was 1
ℹ 18:42:52|gpuminer4 Waiting until not Stopped...
ℹ 18:42:52|eth Waiting until Stopped...
ℹ 18:42:52|eth Terminating gpuminer4
ℹ 18:42:55|eth stopWorking for thread gpuminer5
ℹ 18:42:55|gpuminer5 Finishing up worker thread...
ℹ 18:42:55|eth Stopping gpuminer5
ℹ 18:42:55|gpuminer5 State: Stopped: Thread was 1
ℹ 18:42:55|eth Waiting until Stopped...
ℹ 18:42:55|gpuminer5 Waiting until not Stopped...
ℹ 18:42:55|eth Terminating gpuminer5
min/mean/max: 14728581/15892208/16252928 H/s
inner mean: 16188971 H/s
Phoning home to find world ranking...
Ranked: 14 of all benchmarks.
So doubled the hashrate of one GPU, but on a 6 card machine... expecting more. But at least it successfully closed all threads this time.
On the other hand, there's probably lots of headroom for optimization in this area.
Oh and BTW, you also want to add --benchmark-warmup 6 (or 5, or more) to your startup script. As you can see, your first trial is always lower than the rest. This is gone with the extra "warmup" time. GTX750Ti seems to need that. (It think it's because of the narrow bus, causing the DAG upload to take a bit longer.)
-- edit--
oh the E3 actually has 8 logical cores so in theory it should have given you the full 40Mh/s. I really wonder what's the limiting factor then.
Let me know if I can get you any measurements.
Perhaps I'll try to cram 3 GPU's in my machine, but it would be messy...