GPU Key specs for high hasrate

peoplespeoples Member Posts: 29 ✭✭
edited November 2015 in Mining
Hey,

as far as i see the r9 280x is still one of the best mining cards.
But when i compare the specs with, for example, a R9 nano. Theoretically the R9 nano should clearly outperform the 280x. But instead it also does something around 25 MH like the 280x ?

R9 280x Specs:
http://www.techpowerup.com/gpudb/2398/radeon-r9-280x.html

R9 Nano Specs:
http://www.techpowerup.com/gpudb/2735/radeon-r9-nano.html

My question is why ? What are the key facts for fast ether mining ? Cache Bandwith ? Clock rate ? GFLOPS ?

Thanks
regards
Alex

Comments

  • o0ragman0oo0ragman0o Member, Moderator Posts: 1,291 mod
    This was a very disappointing realisation indeed. By memory bandwidth specs of the HBM and pre market hype, I was expecting the Nano's and Furys to fall in at about double the hash of 280x and 4x efficiency.
    The reasons for the poor performance aren't well understood but perhaps has something to do with the size of the dag itself. @Genoil's work on the CUDA miner revealed some very disturbing behaviour on the nVidia GPUs, especially under windows. As DAG size grows >1GB, hashrate drops to nothing. He suggests memory paging is hammering the bandwidth.
    I've noticed degrading performance on my HD 7950's after each new epoch also. So maybe there is a similar behaviour with them too.
  • o0ragman0oo0ragman0o Member, Moderator Posts: 1,291 mod
    Sorry, to answer your question....

    Dagger/Hashimoto algo is 'memory hard' and so performance is bound to memory bandwidth. GPU cores can be under clocked with little effect on hashrate to yield better efficiency.

    Overall, GPU specs have not been a particularly good predictor of mining performance. I did up this spreadsheet during Olympic to try and get my head around it, but really just had to wait until benchmarks came in. In the end, second hand R9's and HD 79xx came out the better GPU's for mining.
  • GenoilGenoil 0xeb9310b185455f863f526dab3d245809f6854b4dMember Posts: 769 ✭✭✭
    edited November 2015


    @Genoil's work on the CUDA miner revealed some very disturbing behaviour on the nVidia GPUs, especially under windows. As DAG size grows >1GB, hashrate drops to nothing. He suggests memory paging is hammering the bandwidth.
    I've noticed degrading performance on my HD 7950's after each new epoch also. So maybe there is a similar behaviour with them too.

    To be more specific, it's likely the TLB (translation lookaside buffer) that gets swamped. It's seems logical that AMD's will be impacted by this as well.

    While the algo was designed to scale with memory bandwidth, in fact it seems to be designed to run nicely on GCN 1.0 cards and not scale that well on modern GPU's.

    But...you never know who cooks up some alternative implementation...
  • peoplespeoples Member Posts: 29 ✭✭
    Thanks for this.
    Brand new there ist a new Radeon card -> http://www.techpowerup.com/gpudb/2758/radeon-r9-380x.html
    Stats are same like old R9 270X -> can be interesting in terms of ROI calculation because of low price. Does anyone allready got hands on this ?
  • o0ragman0oo0ragman0o Member, Moderator Posts: 1,291 mod
    Looks like it will better the 280x on power efficiency. Also interested to see how it's hash compares. Price point is very nice but in considering ROI, you have to pretty much assume you'll be selling it again in 12 months time to have any hope of realising a profit from mining.
  • GenoilGenoil 0xeb9310b185455f863f526dab3d245809f6854b4dMember Posts: 769 ✭✭✭
    R9 380X is Antigua XT, formerly known as Tonga XT (R9 285). The 'problem' with this design is that like Nvidia with Maxwell, AMD has sacrificed bus width and makes that up with texture compression, leading to (above) equal performance in games. Combined with better power efficiency it is a good design for gaming. For ethash however, it leads to less performance, as it can't make up for the lost bus width with compression. (nothing to compress on 64 sequential loads of 128-byte DAG chunks).

    Using this logic, one would assume that AMD's HBM cards and the upcoming Nvidia Pascal would blow ethash through the roof with the 4096-bit wide bus. But due to the much lower memory clocks, the effective bandwidth is not that much higher than the top of line GDDR5 cards. Again, fine for gaming, but ethash simply works better with a narrower bus accessed at higher speeds. I'm still trying to understand exactly how this works, but it's pretty difficult to exactly figure out what's going on under the hood.
  • peoplespeoples Member Posts: 29 ✭✭
    @Genoil i see that the 280x has 288 GB/s Mem Bandwith compared to 179 GB/s Mem Bandwith of the 270x. Maybe one (or that) reason why 280x is doing up to 7MH/s more then the 270x. But from the efficiency point of view the 270x is also interesting. So i hope, because of the almost identical specs 270x <-> 380x and low price, the new 380x is as efficient as the 270x . Lets see :)
  • farwarefarware Member Posts: 116
    One problem is that the newer cards are not optimized to run ethminer. Unless they are optimized they will not perform as good as the older cards. 270, 390 etc are all not worth it. Best cards still are 7950 and r9 280x
  • GenoilGenoil 0xeb9310b185455f863f526dab3d245809f6854b4dMember Posts: 769 ✭✭✭
    @farware to my knowledge (only working with CUDA/OpenCL for half a year) it is difficult to optimize for the newer cards. In my current version of the CUDA miner, I can run more than double the amount of ALU ops without slowing down the kernel. This basically means that the kernel sits idle for long periods to wait for data to be returned from GPU memory. This is called a memory-bound kernel.

    But, I've done extensive research on the dagger algorhytm and its supposed scaling with GPU bandwidth. Unfortunately this is not fully the case. While bandwidth does play a big part, the GPU's TLB (translation lookaside buffer) has a huge impact on the achieved hashrate. It's basically a table that holds copies of earlier done translations from virtual into physical memory addresses ranges (pages). The problem with dagger and its huge slab of GPU RAM that is pseudorandomly accessed, is that the TLB fills up quite quickly and then has to redo many of those translations. With the growing DAG size, this has already led to GTX750ti having become useless to mine ETH on. For other modern Nvidia cards, the DAG has to grow until 2GB before hashrate will plumet.

    AMD cards seem to have 'better' TLB's (and generally wider memory busses which also helps) so they seem to be less affected. I still have to write a testcase in OpenCL to see to what extent they will be suffering from growing DAG size. But i've read here on the forum that with each new epoch, hashrate is already slowly dropping on AMD cards, too. Hopefully for you guys it's a linear decrease, not hyperbolic like on Nvidia :). That said, if it happens, it will happen for everyone, so its not a big deal. But if newer GPU's get bigger TLB's, the old cards are doomed.

    Another 'problem' holding performance back is that the algo only requires 128 sequental bytes in GPU RAM per iteration of the dagger loop, while the high bandwidth of modern GPU's can only be achieved when loading larger sequential (coalesced) chunks. On NVidia Kepler / Maxwell, this is 512 bytes (warp of 32 threads * 16 bytes), on AMD GCN it is 256 bytes. I suspect the fact that because on AMD its is only the double of 128, it can utilize available bandwidth more effciently, but that is somewhat speculative. Another theory I'm exploring is in the amount of available memory channels/banks. Nvidia Kepler and Maxwell and GCN 1.2 have 8, while GCN 1.0 has 12 and GCN 1.1 has 16. With pseudorandom access, this would generally lead to more bank conflicts on cards with fewer memory controllers. Perhaps some optimization possible in that area. But I don't own any AMD cards :)
  • farwarefarware Member Posts: 116
    @Genoil very interesting information. What never made sense to me was why cards with 8GB wouldn't perform much better than say 4GB.

    If the DAG grows so large that even 4GB cards have issues then they need to modify the algorithm. While they're at it, they could add a few more flags to make it easier to mine with certain cards.

    Hopefully they can manage to prevent ASICs to come in and ruin it for the rest of us.
  • farwarefarware Member Posts: 116
    Ok so memory is certainly affecting hashrate more than the clock rate alone. Clock rate matters of course, but anything above 1k is fine but with the memory set at 1600 versus 1300 you will notice a significant gain.
  • MrYukonCMrYukonC Member Posts: 627 ✭✭✭
    farware said:

    ...but with the memory set at 1600 versus 1300 you will notice a significant gain.

    This has not been my observation on the R9 280X. When dropping the memory clock from 1500 to 1350, hashrate actually increases a non-trivial amount.
  • o0ragman0oo0ragman0o Member, Moderator Posts: 1,291 mod
    farware said:

    @Genoil very interesting information. What never made sense to me was why cards with 8GB wouldn't perform much better than say 4GB.

    If the DAG grows so large that even 4GB cards have issues then they need to modify the algorithm. While they're at it, they could add a few more flags to make it easier to mine with certain cards.

    Hopefully they can manage to prevent ASICs to come in and ruin it for the rest of us.

    I think it's very unlikely they'll change the algo for the reason it is 'temporary by design'. Mining is only there to give an initial supply of eth until the POS algo (Casper) is finalised. It's designed to run with exponentially deminishing returns up to the planned 'hard fork' of doom in about another 12 months time. The hard fork is to force everyone to upgrade to the POS client. Until then it looks like the diminishing returns will have an extra TLB inefficiency coefficient.

    Nvidia is already sunk. AMD cards look like they might have winners and looses yet to be seen. All in all, Etherum mining has turned out to be something of a leaky boat race... no real finish line, just whoever stays afloat the longest...
  • GenoilGenoil 0xeb9310b185455f863f526dab3d245809f6854b4dMember Posts: 769 ✭✭✭
    If the (or some types of) AMD cards also start suffering from TLB trashing, the only thing you'll see is global hashrate plummeting. The difficulty will have to adjust to this, but it does that by design. Overall it shouldn't really make a difference.

    I'll see if I can free up some time to port my "dagger simulator" from CUDA to OpenCL to quickly assess how much AMD cards are affected by increasing DAG size.
  • peoplespeoples Member Posts: 29 ✭✭
    Stats for R9 380X

    min/mean/max: 19835562/19957896/20010325 H/s
    inner mean: 13311089 H/s
  • o0ragman0oo0ragman0o Member, Moderator Posts: 1,291 mod
    Thanks. Still not beating a the 79xx's for mining/costxcapital efficiencies but certainly not a bad choice with maybe only 300~350 days of effective mining left. Certainly the best miner of the brand new cards.
  • peoplespeoples Member Posts: 29 ✭✭
    @o0ragman0o why only 300-350 days ?
  • GenoilGenoil 0xeb9310b185455f863f526dab3d245809f6854b4dMember Posts: 769 ✭✭✭
    Did you see my dagger simulator already? Really interested to get some figures on GCN 1.2.
  • peoplespeoples Member Posts: 29 ✭✭
    @Genoil i'm running on ubuntu 15.04 , maybe you can advice me how to use your simulator on linux ?
  • o0ragman0oo0ragman0o Member, Moderator Posts: 1,291 mod
    @peoples, the POW phase is only temporary and ethereum will transition to a POS. This was targeted for about 480 days after Frontier launched but the way they are doing it is to make the POW difficulty rise exponentially. This won't be noticed so much in the first 12 months, but after that the effect picks up, blocks get longer and mining returns dwindle. There will come a crossover point where upgrading to the forthcoming POS client will be of greater benefit.
  • peoplespeoples Member Posts: 29 ✭✭
    @o0ragman0o interesting didn't know that. Is there any official statement that ethereum will switch ? Just found some interview with Zamfir wo talks about that.
  • o0ragman0oo0ragman0o Member, Moderator Posts: 1,291 mod
  • peepeedogpeepeedog Member Posts: 32

    @peoples, the POW phase is only temporary and ethereum will transition to a POS. This was targeted for about 480 days after Frontier launched but the way they are doing it is to make the POW difficulty rise exponentially. This won't be noticed so much in the first 12 months, but after that the effect picks up, blocks get longer and mining returns dwindle. There will come a crossover point where upgrading to the forthcoming POS client will be of greater benefit.

    This is 100% true. New miners have to be aware of the risks because in the future with Casper, GPU mining is essentially out. Proof of Work (GPU mining) will be replaced by Proof of Stake (Validator betting). We can then throw all our GPUs away :)

    However the thing i'm curious about is the POS client and validator betting - there does not seem to have a clear use case of how one can become a validator in a pool, to make bets against the protocol to earn ether.

    If anyone can enlighten me that would be helpful. Appreciate it!

    Cheers
    Shaun
  • tuppydogtuppydog Member Posts: 26

    Sorry, to answer your question....

    Dagger/Hashimoto algo is 'memory hard' and so performance is bound to memory bandwidth. GPU cores can be under clocked with little effect on hashrate to yield better efficiency.

    Overall, GPU specs have not been a particularly good predictor of mining performance. I did up this spreadsheet during Olympic to try and get my head around it, but really just had to wait until benchmarks came in. In the end, second hand R9's and HD 79xx came out the better GPU's for mining.

    Great job on that spreadsheet. It makes a lot of sense!!
  • wilkaswilkas Member Posts: 7
    @o0ragman0o, are values in you spreadsheet still relevant and close to the true ones? Are there any other GPU that are no longer usable for GPU mining?
  • o0ragman0oo0ragman0o Member, Moderator Posts: 1,291 mod
    @wilkas The values were originally geth benchmark values from the benchmarking thread . Geth benchmarks against a 1Gb DAG and as the DAG grew, benchmark thread got forgotten about and the fact that differing GPU's performance degradation under the growing DAG in often vastly different ways (e.g GTX 750Ti). The hashrate updates became a weak effort on my part, particularly for the newer cards which I haven't really been following.

    Had a slack attempt the other day to find some stats on 480/580's but haven't really been update the chart for quite a while as I've move more into Solidity development which is taking up all my time.

    I'd encourage anyone who still finds the chart useful (and there's always a few online when eve I go tot it) to simply save a copy and update according to their own situation.
Sign In or Register to comment.