DAGger simulator

GenoilGenoil 0xeb9310b185455f863f526dab3d245809f6854b4dMember Posts: 769 ✭✭✭
edited November 2015 in Mining
As some of you may know, I've been doing a bit of research into the effects of increasing DAG file size on hashrate. The reason behind it being the dramatic drop of hashrate on GTX750Ti on Windows as DAG file size increased. So far, I have concluded that other modern Nvidia cards are relatively safe until the DAG size hits 2GB, but I don't have any data on AMD cards yet.

Today, I've ported my CUDA dag simulator to OpenCL, but as I don't own any AMD cards, I still don't know what the results are. Attached is a (64-bit) Windows binary that can show you the approximate ethminer hashrate with a given DAG size. I would be very grateful if some of you could do some tests with me.

[big update to this post since a major update to the test program]

By default, the program will generate a pseudo DAG file with a size matching your GPU RAM (or a max of 4GB) and test bandwidth/hashrate with incremental steps from 128MB up until the maximum possible.

dagSimCL.exe

The results will be written to a tab-delimited csv file called results.csv.

Using a spreadsheet program like Excel, you can easily convert this to a graph like this one for my GTX780:



x-axis is DAG size in MB, y-axis is hashrate in MH/s. Because the test uses a simplified dagger loop without the SHA3/Keccak stages, you may see slightly better results than with the real ethminer.

If you have less system RAM than your GPU RAM, you can set the maximum using the 1st command line parameter:

dagSimCL.exe 2048

This will test DAG size up until 2048MB (= 2GB)

I think most AMD cards will be fine, but you never know. If you have more than one GPU in your system, you can use the 2nd command line param to select a different card:

dagSimCL.exe 4096 1

will measure on the second card (1st card is 0). Because the command line switches are not very clever, you will have to set the first one to your GPU RAM or higher like shown here.

If you have problems with multiple openCL platforms installed, you may add a third cmd line arg to select the right platform:

dagSimCL.exe 4096 0 1

will measure on the first card on 2nd platform.

Source code and most recent binaries here: https://github.com/Genoil/dagSimCL. Haven't tested on Linux. Also no Linux makefile available yet.
Most recent win-64 binaries also in the attached ZIP.

Please post results including results file attached. Thank you!
Post edited by Genoil on

Comments

  • GenoilGenoil 0xeb9310b185455f863f526dab3d245809f6854b4dMember Posts: 769 ✭✭✭
    Got some results in via other sources. First conclusion for GCN 1.0 miners on Win10 is that you're already on the downslope. Still awaiting some Win 7 results to compare with. But I'm expecting much better performance there, like I saw with GTX750Ti.

    http://gathering.tweakers.net/forum/list_message/45343419 (in dutch, but the graphs speak fro themselves)

    BTW the hashrates displayed there are far above normal ethash, at least for the Radeons. That's likely because of the missing SHA3/Keccak stages, that puts enormous register pressure on the kernel (== less wavefronts in flight).
  • EastwindEastwind Member Posts: 107
    I use the Windows 8.1, Catalyst 15.7. 280X. Your problem can only run to 1280MB, When it does 1408MB, it showed OpenCl Error: 'clEnqueueWriteBuffer(queue,dag,CL_TRUE,0,buffer_size,buffer,NULL,NULL,NULL)' returned -4!
  • GenoilGenoil 0xeb9310b185455f863f526dab3d245809f6854b4dMember Posts: 769 ✭✭✭
    edited November 2015
    Yes I have noticed that. I have added an amd.bat to the repository that may solve that issue:
    set GPU_MAX_HEAP_SIZE=100
    set GPU_MAX_ALLOC_PERCENT=100
    set GPU_USE_SYNC_OBJECTS=1
    dagSimCL.exe
    I haven't tested it (as I don't own any AMD cards), but it is similar to what many AMD ethereum use to get ethminer running properly. We are actually not that far away until this would become mandatory for ethminer, about 70 days at a block time of 17 seconds to be more precise.
    Post edited by Genoil on
  • EastwindEastwind Member Posts: 107
    I used your bat file. It is the same effect. It stopped at 1408MB.
  • GenoilGenoil 0xeb9310b185455f863f526dab3d245809f6854b4dMember Posts: 769 ✭✭✭
    I just edited it with set GPU_MAX_HEAP_SIZE=100 added. Is that in the bat file you used?
  • EastwindEastwind Member Posts: 107
    Yes. I used all 4 lines above.
  • EastwindEastwind Member Posts: 107
    My 280X has 3GB memory, it should not fail at 1408MB.
  • GenoilGenoil 0xeb9310b185455f863f526dab3d245809f6854b4dMember Posts: 769 ✭✭✭
    Nope it shouldn't. I think there is a way around it, which should soon be implemented in ethminer as well, or else a lot of AMD miners have a big issue in about 70 days, when the DAG hits 1280MB.

    Strangely enough I got some results from people with 79x0 devices on Win10 that got to 2048MB without issues. Still not enough, but much better than 1280.
  • EastwindEastwind Member Posts: 107
    One guy said in the thread: http://gathering.tweakers.net/forum/list_message/45343419 (in dutch, but the graphs speak fro themselves)

    7950 Boost Windows 7 here, and I'd like to help but I get a crash (see here). Am I doing something wrong? Edit: every time happens to 1408MB. And the idea that we "set GPU_MAX_ALLOC_PERCENT 100" throw in the cmd administrator before we start dagSimCL.exe?
  • GenoilGenoil 0xeb9310b185455f863f526dab3d245809f6854b4dMember Posts: 769 ✭✭✭
    Yes I'm talking to this guy as well. Currently trying to get a version up that allocates GPU RAM in chunks.
  • EastwindEastwind Member Posts: 107
    Does the official ethminer also use the same method to allocate memory? If that is the case, 90% of the hashing will disappear in 70 days.
  • GenoilGenoil 0xeb9310b185455f863f526dab3d245809f6854b4dMember Posts: 769 ✭✭✭
    Yes. There is some disabled code in the source that divides the DAG into 4 chunks, that was created because older AMD cards already had issues with a 1024MB DAG. It was diabled because apparently it doesn't work well.

    I almost have a "chunked" version of this test working on GTX780, but I'm having some difficulties allocating up to the full 3GB.
  • EastwindEastwind Member Posts: 107
    You have an opecl miner works both for nVidia and AMD cards. Does that miner use the same codes?
  • GenoilGenoil 0xeb9310b185455f863f526dab3d245809f6854b4dMember Posts: 769 ✭✭✭
    edited November 2015
    Yes. At some point, I created a modded version of the opencl kernel for Nvidia that used Nvidia specific assembly, but as it wasn't faster than my native CUDA mining kernel, I stopped working on it.

    I just finished a version of the test that splits up the allocation in 256MB chunks.

    -- edit-- temporaily removed the 256MB chunks version. It contains a bug.

    -- edit2--

    Attached binary has support for chunks with different sizes. It has becomes the first cmd line param. I've found 256MB to be the best for GTX780 (altough CUDA OpenCL prefers no chunking at all).
    Post edited by Genoil on
  • StrikerBeeStrikerBee Member Posts: 1
    -Quote:
    Yes I'm talking to this guy as well. Currently trying to get a version up that allocates GPU RAM in chunks.
    -

    Much appreciated. ;)
    Also, i posted results here:
    https://bitcointalk.org/index.php?topic=1268355.msg13110655
    because codetags.
  • EastwindEastwind Member Posts: 107
    edited November 2015
    Using the 256 MB chunks, Windows 8.1, Catalyst 15.7. 280X.
    There is no drop until 2688MB.

    DAG size (MB) Bandwidth (GB/s) Hashrate (MH/s)
    128 166.728 21.8533
    256 165.537 21.6973
    384 110.105 14.4317
    512 164.8 21.6006
    640 229.946 30.1394
    768 120.987 15.858
    896 362.78 47.5503
    1,024 138.054 18.0951
    1,152 232.697 30.5001
    1,280 139.576 18.2945
    1,408 166.527 21.827
    1,536 138.565 18.1621
    1,664 154.867 20.2988
    1,792 127.143 16.6649
    1,920 146.892 19.2535
    2,048 141.562 18.5548
    2,176 151.261 19.826
    2,304 162.762 21.3336
    2,432 192.039 25.1709
    2,560 184.887 24.2335
    2,688 29.7203 3.8955
    2,816 21.8517 2.86414
    2,944 9.08344 1.19059
  • EastwindEastwind Member Posts: 107
    edited November 2015
    My 280x (actually a 7990) is underclocked to 830/1000MHz.

    The hash rate does not drop monotonously.
  • GenoilGenoil 0xeb9310b185455f863f526dab3d245809f6854b4dMember Posts: 769 ✭✭✭
    Interesting that it has this variety in it, as opposed to NVIDIA. This opens up a possible ethminer optimization.
  • dont12dont12 Member Posts: 60
    I ve noticed that if you type ethminer -G -M --benchmark-trials 200, the hash rate will start high and steadily drop off. Granted it seems to level off and become stable.
    when I say drop starts at 49MHS drops to 46-47MHS
  • chinhdnchinhdn Member Posts: 11
    7950 result (core/mem = 800/1000, 3GB Video memory, 8GB System RAM, 15.11, Win7 x64)

    256MB chunk

    => Can run up to 2944MB DAG size, but the result is not stable and not trustable, the hashrate seem to be dropped from 2688MB DAG size.
    ================================================
    512MB chunk

    => Can run up to 2816MB DAG size, the result looks good from 128 to 512MB DAG, unstable for the rest
    ================================================
    768MB chunk

    => Can run up to 2560MB DAG size, the result looks good from 128 to 768MB DAG, unstable for the rest
    ================================================
    1024MB chunk

    => Can run up to 1152MB, the results look fine except the last one.
    ================================================
  • chinhdnchinhdn Member Posts: 11
    edited December 2015
    The DAG is exactly 1200MB at the moment (07 Dec 2015), could you make a hashrate test simulator at this size?

    After some test, I see that DAG size is a problem. I tried ETH benchmark (ethminer -G -M) and got 19.5MH/s, the DAG size in this test is 1024MB. But in real mining (1200MB DAG size) the speed drop off to 17.4MH/s. Do you know a solution for this issue?
  • davidrentaodavidrentao Member Posts: 60
    @Genoil, I checking the results from other guy, DAG more bigger,then HASH lower than before? same with amd cards or Nvidia? it's meaning?
  • GenoilGenoil 0xeb9310b185455f863f526dab3d245809f6854b4dMember Posts: 769 ✭✭✭
    @chindn solution may be chunks like in dagSimCL. still have to check.
    @davidrentao yes same issue as with GTX750Ti. TLB trashing.
  • ordoeordoe tehranMember Posts: 132 ✭✭
    Just came across this most interesting thread.
    Eastwind said:

    Does the official ethminer also use the same method to allocate memory? If that is the case, 90% of the hashing will disappear in 70 days.

    Could you summarize why this happens? What is the exact threshold, 2GB DAG size?
  • GenoilGenoil 0xeb9310b185455f863f526dab3d245809f6854b4dMember Posts: 769 ✭✭✭
    ordoe said:

    Just came across this most interesting thread.

    Eastwind said:

    Does the official ethminer also use the same method to allocate memory? If that is the case, 90% of the hashing will disappear in 70 days.

    Could you summarize why this happens? What is the exact threshold, 2GB DAG size?
    Looks like at about 1280MB. Not very far from now. I haven't done enough testing to assess the issue in detail (mainly because of not owning AMD hardware), but if I'm right and most AMD cards are affected, we will have DAGpocalypse in about 65 days. Only NVidia Kepler and Maxwell 5.2 will survive ;)
  • ordoeordoe tehranMember Posts: 132 ✭✭
    @Genoil could you test this patch.

    https://gist.github.com/anonymous/83237aa42d4292f13ceb

    it's adding a makefile for unix systems and fixes two includes. this way i was able to compile on linux. i have amd cards r9-390x (8gb) and r9-270x (2gb) available for testing. if you are interested i will let you know.
  • GenoilGenoil 0xeb9310b185455f863f526dab3d245809f6854b4dMember Posts: 769 ✭✭✭
    @ordoe. Can you issue a pull request on https://github.com/Genoil/dagSimCL ? I'll blindly merge it as I don't have any Linux hardware. Another forum user has already kindly given me access to a Windows based AMD box, but I'm too busy with another project at the moment to further look into this.
  • ordoeordoe tehranMember Posts: 132 ✭✭
    pull request pending
Sign In or Register to comment.