memory errors

boysieboysie Member Posts: 591 ✭✭✭
It looks like I selected the wrong delivery type so my new 480s devils are not coming to Monday.

So started looking at memory errors.

I noticed on watttool info it said if you are overclocking memory check for memory errors using HWINFO

now not used this before so downloaded it and fired it up, I'm only interested in the sensor info so ticked the box sensor info only.

So I'm currently running my 31.25 mod across my 480s so how are they doing

well it seems a bit of a mix.

1 card is producing almost no memory errors at 2250/920
1 card is showing a very big number and climbing a lot.
the others are climbing but showing around 1mill errors
The number is a total so climbs per tick.
I found on some cards changing the settings and applying in watttool reset the number to 0 which was handy and other didn't which was odd and inconsistent.

now being new to this info I'm not actually sue what number is a good number, but I'm going with errors in memory can be corrected but poss not at the very high rate I was seeing on one card.

So seeing a few over a second say 30-50 was prob going to be ok but ones that are climbing at a rate of over this per sec are prob not doing the work you want it to do , despite the hash rate showing on claymore.

So a small adjustment of the mem speed 1 as low as 2190 now gives me lower or similar to 30-50 a sec errors. I found some wanted a lower speed and other just a tine amt more power.

my expectation here is that if a gpu is producing GH's and 1million errors per sec its unlikely to be doing the work you think it should be and while not logging errors at the app layer doesn't mean the hashes are correct...not really sure this is accurate but with out much info to go on it seems a reasonable assumption.

I'll keep an eye on pool side and see if a more inline with expected hash rate is seen after ironing out the memory errors seen and if the odd gpu hang which seems to happen with the 480s goes away or lessens.

Boysie
«1

Comments

  • ZiljZilj Member Posts: 61
    I found your 31.25 works fine on 1-2 cards for me, I am running 6 and couldn't get them to all run at 920mv, I've had to up them to around 940-950ish. Also can get 30.5ish on around 1140core/2200mem

    I'm running a Asrock H81 Pro BTC mainboard, both molex connections to motherboard, 6x powered USB risers, 3 lots of sata connections from the power supply, with 2 cards on each connection sata set of power cables coming from the power supply;

    Current Power supply is - CoolerMaster Vanguard 1200W 80+ Platinum Full Modul, +3.3/+5v are 25a, 125w output power, +12v 100a/1200w, -12v .5a/6w, +5Vsb 3a/15w

    Next Power Supply will test, also a platinum; I feel this will give better stability.
    +3.3/+5v 30a, 180w output, +12v 100a/1200w, -12v .8a/9.6w, +5Vsb 3.5a/17.5w
  • boysieboysie Member Posts: 591 ✭✭✭
    @Zilj
    so the rom is no the point of the post, did you check your cards for memory errors?
  • adasebadaseb Member Posts: 1,043 ✭✭✭
    My 470s 4gb are

    1: 5 Memory errors at 2000Mhz
    2: 496 Memory errors at 1950Mhz
    3: 291,383,451 Memory errors at 2000Mhz

    2 are msi and 1 is gigabyte
  • boysieboysie Member Posts: 591 ✭✭✭
    so by that I would say card 3 is having lots of problems with that speed. either try more power or less speed, or both.
  • adasebadaseb Member Posts: 1,043 ✭✭✭
    Yeah but there are no incorrect shares according to Claymore so I just leave it. Also the shares submitted over a 24 hours period equal exactly the hashrate so I don't think its an issue.
  • mmxmmx Montreal, QCMember Posts: 110 ✭✭
    Individuals without a deep understanding of GPU memory timings really need to stop distributing their work to the public. The fact that you're only realizing the existence of GPU memory errors makes me sigh in disbelief.

    There is a reason why The Stilt is known for his work (source).
  • adasebadaseb Member Posts: 1,043 ✭✭✭
    Either way the cards are all under warranty for at least 2 years....
  • TruthchanterTruthchanter Member Posts: 549 ✭✭✭
    Very interesting, I never knew anything about this. I may check for errors on mine
  • boysieboysie Member Posts: 591 ✭✭✭
    edited September 2016
    mmx said:

    Individuals without a deep understanding of GPU memory timings really need to stop distributing their work to the public. The fact that you're only realizing the existence of GPU memory errors makes me sigh in disbelief.

    There is a reason why The Stilt is known for his work (source).

    so what stilt says is:

    Once you have a significant amount of errors which get through, you'll get visible artifacts

    and before that its just a loss of perf.

    so thins explains why most people don't see a perf increase after 2200, the error rate prob increases massively and the gain is very little.

    glad we are on a forum where people can openly discuss things they don't fully understand;

    thanks for the link.

    Boysie

  • Zorg33Zorg33 Member Posts: 220 ✭✭
    This explains also the unstable hashrate after mem oc.
    It's kind of like HW error on btc asics, under 5% it does not affect the hashrate much.

    I will check it today for my cards.
  • TruthchanterTruthchanter Member Posts: 549 ✭✭✭
    Odd, I checked mine and 1 of my cards got to a few million errors within a few minutes. I didn't change anything and reset and the same card got 0 errors
  • boysieboysie Member Posts: 591 ✭✭✭
  • mjaaaymjaaay Member Posts: 97
    I had 140 million memory errors on one card...I gave it a bit more power and underclocked it a bit, and now it's at 500ish errors.

    Not sure if losing a tiny bit of hash rate is worth shaving off the 140mill errors though.
  • mmxmmx Montreal, QCMember Posts: 110 ✭✭
    mjaaay said:

    I had 140 million memory errors on one card...I gave it a bit more power and underclocked it a bit, and now it's at 500ish errors.

    Not sure if losing a tiny bit of hash rate is worth shaving off the 140mill errors though.

    How about increasing the lifespan of the GPU? :|
  • RavinderDhillonRavinderDhillon Member Posts: 74 ✭✭
    boysie said:


    So started looking at memory errors

    Thank you for bringing this to our attention, I was using the 1375 straps and after checking 13 of my 24 cards had memory errors, 7 of them reporting a very high number of errors. I was wondering why I was getting a low effective hash rate poolside, this could be the culprit.
    I have since changed back to 1500 mem straps and there are 0 memory errors on any of the cards for my conservative 1050/870 core 1870/870 mem settings.
    Will report back if this improves my poolside hash.
    mmx said:

    Individuals without a deep understanding of GPU memory timings really need to stop distributing their work to the public. The fact that you're only realizing the existence of GPU memory errors makes me sigh in disbelief.

    There is a reason why The Stilt is known for his work (source).

    I would any-day take people sharing experimental settings they use over peeps with deep understanding who share nothing on the forum. You live you learn.
  • TruthchanterTruthchanter Member Posts: 549 ✭✭✭
    boysie said:

    reset what?

    restarted my computer
  • CoreolCoreol Member Posts: 30 ✭✭
    edited September 2016
    I've flashed 6 x XFX 480 Ref with @boysie 's 29MH ROM and changed the MEM V from 1000 to 940 manually on each one.
    5 cards report 0 errors while one is picking up errors at a rate of 100k per 2-3 seconds. I've tried resetting the MEM to 1000 but the errors still build up.

    As for reported vs. calc hashrate - they are pretty much identical. Occasionally the pool reports +/- 3MH
  • RavinderDhillonRavinderDhillon Member Posts: 74 ✭✭
    What is the likelyhood that memory errors could lead to reduction of card life? I mean this is underclocked memory with a slight ( less than 7% ) overclock.
    For now I'm back to my highest Hash settings ignoring the memory errors, since neither claymore or the pool report any difference. If cards begin failing ill switch back to stabler settings, but I dont really care if Im reducing card life from 5 years to 2. These are going to be resold in 10 months tops anyways.
  • TruthchanterTruthchanter Member Posts: 549 ✭✭✭
    I'm seeing that with my 470s, I have to significantly reduce the mem clock and therefore mhs to reduce/eliminate memory errors. Seems my effective hashrate poolside has been about equal to my reported hashrate so I may just end up completely ignoring these memory errors as well
  • RavinderDhillonRavinderDhillon Member Posts: 74 ✭✭
    On the 1375 strap my cards have been stable for almost a week of 24/7 operation. Hashrates at the pool were consistently 10% lower at least so that made me curious. To test this thoroughly I tried gaming on these cards with the 1375 strap and while game benches did not crash, I could see frame drops everywhere and stuttering telling me that the memory was struggling to keep up, and having to possibly rework, which could explain my consistently 10% lower pool hashrate.

    I have decided to go back to the 1500 straps and they have been rock solid at ETH 26.8/ SIA 241 dual mining with the same clocks 1050/870 1870/870 and I am going to be sticking to these settings.
    For a few cards you can perhaps check if the 1375 strap is stable for you and check in HWinfo for memory errors. If your cards pass, that memory strap will yield godlike performance/watt. But since I run about 30 cards, its hard to track and manage which cards run at what strap and settings etc so I just go with the minimum stable settings across the board.
  • ursul0ursul0 Member Posts: 54
    edited October 2016
    mjaaay said:

    mmx said:

    mjaaay said:

    I had 140 million memory errors on one card...I gave it a bit more power and underclocked it a bit, and now it's at 500ish errors.

    Not sure if losing a tiny bit of hash rate is worth shaving off the 140mill errors though.

    How about increasing the lifespan of the GPU? :|
    :| I plan to throw my cards at vitalik when POS comes
    Regarding the lifespan: it depends on the rate heat&cold destroy the structure of the silicon - if to reduce mem errors you increase voltage and you card indeed consumes LESS, than it's worthy... otherwise it could be just the opposite - low voltage and alot of mem errors may be easier on the card than "stable", but consuming more power.

    And regarding the "effective average" hashrate at the pool... well that I don't know, but intend to verify. I'd appreciate If anyone would enlighten me on the matter, 'cause I'm having trillions of them...

    Does anyone have 0% diff of average effective vs actual hashrate on ethermine?

    My current thoughts are: hash is a hash is a hash. And "effective average" is network issues and pool/setup fuckups, not related to the card performance.

    And I do have 10% lower the actual on ethermine.



    EDIT: The number of errors on the image is for 20+hours of runtime

  • Zorg33Zorg33 Member Posts: 220 ✭✭
    A valid hash is a hash, everything else is waste.
    Your 100% memcontroller usage tells nothing about mem errors, because correcting mem errors also takes effort from the memcontroller.
    So the question is what portion of the work produces valid shares.
    If it's 90%, then the pool shows 90%.
  • ursul0ursul0 Member Posts: 54
  • ursul0ursul0 Member Posts: 54
    edited October 2016
    the image of nanopool is my failover pool.
    10% of missing hashrate are seemingly mainly network issues...
    workers also in a vlan and I've seen some weird behavior once the vpn server went down.
    current dos is also to be considered.

    Bottom line: I'm not yet convinced that mem errors is a factor of any sort. Unless someone will point out that with no errors on higher voltage (I run at .850 all) 470 consumes less...
    I think I'm slightly below 150 on average. Total 9 cards(2*480+7*470 = [email protected] & [email protected]) on 2 platforms consume below 1400W off the wall.)

    The pool 'lacking behind the actual' issue I plan to investigate, but now is not the time to run solo for my taste.

    ...also even in case I'm wrong I care none, since the cards out of the box give 22MH and consume more and I've got them all at 28.0-28.7 dual mining. EDIT: 480 does 29.1 and drives display.
  • ursul0ursul0 Member Posts: 54
    boysie said:

    mmx said:

    Individuals without a deep understanding of GPU memory timings really need to stop distributing their work to the public. The fact that you're only realizing the existence of GPU memory errors makes me sigh in disbelief.

    There is a reason why The Stilt is known for his work (source).

    so what stilt says is:

    Once you have a significant amount of errors which get through, you'll get visible artifacts

    and before that its just a loss of perf.

    so thins explains why most people don't see a perf increase after 2200, the error rate prob increases massively and the gain is very little.

    glad we are on a forum where people can openly discuss things they don't fully understand;

    thanks for the link.

    Boysie

    @mmx Old timer heh?

    @Boysie Totally support this kind of approach to be applied to the real world as well :)
  • ursul0ursul0 Member Posts: 54
    btw in the link there was a piece of interesting info about showing all errors, corrected or not.
  • ursul0ursul0 Member Posts: 54
    Zorg33 said:

    A valid hash is a hash, everything else is waste.
    Your 100% memcontroller usage tells nothing about mem errors, because correcting mem errors also takes effort from the memcontroller.
    So the question is what portion of the work produces valid shares.
    If it's 90%, then the pool shows 90%.

    We may ask Claymore how he counts the hashrate on every round, but I suspect if it does not produce "invalid shares" then it is the actual result of the run and the rest is on a different level: driver/VLSI or just pure magic:)
  • TruthchanterTruthchanter Member Posts: 549 ✭✭✭
    I have not adjusted anything much (or at all) for memory errors although i still get them.

    No one is getting 100% of their reported hashrate as average 24h hashrate right? I get about 95%

    What are you guys getting?
  • ursul0ursul0 Member Posts: 54
    ...and ohhh... think about leaking electrons.... how many atoms thick do you think you have a wall between transistors? 0.11 without error correction?
    I'm expecting a silicon wall at 0.6 :)
Sign In or Register to comment.