Blockchain Size

What is keeping the blockchain from becoming ridiculously enourmous, in such a way that only a handful or organizations in the world can actually keep it? Could someone point me to some blog or whitepaper section descrining that?

Thanks

Comments

  • StephanTualStephanTual London, EnglandMember, Moderator Posts: 1,282 mod
    Horrible oversimplification follows:

    Imagine two contracts that share 10 lines of the same code.
    The blockchain will only store the code once, but 'pointers' to the code in question twice.

    That's deduplication at work, Vitalik posted an article on the subject at:
    http://blog.ethereum.org/2014/02/03/introducing-ethereum-script-2-0/
  • KlausKlaus Member Posts: 2
    Duplicated code is a tiny fraction of the block chain.
  • onecorponecorp Member Posts: 5
    What are the current ideas on handling state changes - having 100k+ contracts with their own state change in block seems like it's gonna be a spicy meatball!
  • FlavienFlavien Member Posts: 7
    From the answers I got on this, it seems they are still figuring it out.
  • salhadaarsalhadaar Member Posts: 26
    One thing I see happening is that there will be centralised scripts that perform the same task over and over again and will accept any number of "customers".

    For instance, let's say you've got a "withdraw 1% a day" contract set up. If thousands of people wanted to all have that functionality, you wouldn't need 1,000 different contracts in the network.

    Instead you could have one contract that basically can accomodate any number of deposits and keep track of them all, which addresses are allowed to withdraw, how much, and how often.

    In fact you could have a website that just features a load of these contracts. As long as they're trusted you could create a pretty impressive collection of free financial services.
  • cyberfaxcyberfax Member Posts: 11 ✭✭
    Now I don't think this will be a huge issue in a while, but what if there was millions of contracts?

    Maybe its time to think about ways one might create a distributed block chain, where each user only has to keep a small part of the actual chain, unless they want to keep the entire chain, but the system wouldn't depend on any single user keeping the entire chain. Users needing high performance on parts of the chain could have some way to select which parts of the block chain to download, perhaps only the parts that deal with certain contracts. This might also be a way to avoid excessively punishing those that might want to store large amounts of data, if those who are not interested in this data don't have to download it, it's not such a big deal.

    Maybe distributed hash table technology might be useful? Maybe MaidSafe (youtube.com/watch?v=Wtb6L7Bg3zY)? Maybe individual contracts could the option of having their own separate block chain, but being to the main block chain somehow?
  • Jam10oJam10o Member Posts: 61 ✭✭
    Arbitrary data, in Tries, built on top of a distributed DHT-based Database, and suddenly we lose the simplicity that makes Ethereum special right now :)
    Somehow, I think the current implementation means that although we might have fewer full nodes, they will not have significant power over the network (spv, pools being on-blockchain, crazy things we haven't thought of yet...).
    The whitepaper does say that the network is made to scale with our technological capabilities, after all, and like 1TB (pretty standard nowadays...) seems big from the perspective of someone from 1996, 1PB seems big for us, but may be trivial in a few years, let's not worry about blockchain size? Mobile phones can use SPV clients (like Mycelium for bitcoin).
  • chris613chris613 Member Posts: 93 ✭✭
    I think Klaus' point is important - intuition says that only a tiny fraction of the blockchain is likely to be duplicated since transactions are composed mostly of unique looking strings. I would expect (maybe incorrectly?) that contracts would be a significantly smaller portion of volume in terms of bandwidth since they only get sent once, and in general many more people will transact in the usual sense vs. creating contracts. On the other hand, I don't recall reading any limit on contract's storage usage. The whitepaper says 2^256 storage entries which seems like... lets call it... a LOT, to me. I assume that is for all contracts and that the address space will be sparsely allocated via some hash based addressing with the assumption that the space is too huge for any two contract's storage base address + offset to collide. Clearly, though, the barrier to entry for a full node can't be to store as much data as the deepest pockets can pay for. There need to be limits of various sorts that nodes can tune in terms of what they are willing to do.

    I was thinking about the cpu usage implications of millions of contracts earlier today and thought of something that sounds like cyberfax's suggestion: "select which parts of the block chain to download, perhaps only the parts that deal with certain contracts". Bitcoin's BIP037 describes using a bloom filter to whitelist traffic you are interested in with varying degrees of privacy. I was imagining that even "full nodes" are unlikely to want to be "full" full nodes after some scaling takes place for all the reasons above. I'm not sure it's possible to truly validate the blockchain without having "all" of it, though (more educated opinions welcome), but conceivably only a few major entities could really support that at scale. Is this a problem?

    The CEO of Invictus asked this question the other day "if all you care about is names[...] one smart contract [... or] gambling, why do you have to process all of the transactions of the new york stock exchange?" (http://vimeo.com/user24356268/review/86370915/8dfc523724 @ 0:37). This really got me wondering about the eventual real world cost of running contracts. The basefee seems like it would climb at least linearily with adoption, but perhaps even more drastically than that. So many machines all running my little bit of code, they will all want a piece of me, and the more there are, the more electricity, maintenance, and capital amortization they all will want to charge me for. Can this really scale?

    TLDR; is it possible to filter which txs and contracts you are willing to process and still contribute to the security of the network? If not, won't ethereum full nodes need to be truly massive to accommodate those willing to pay very dearly for their massive contracts?
  • oliverkxoliverkx Member Posts: 85
    I am relatively new to all this, so I am probably completely off the mark, but please bear with me...

    Would it be possible to renew the block chain on a regular basis? What if, say, every January 1st, the nodes would create a brand new block chain. The block chain would be seeded with one transaction per wallet, which would simply copy the current balance of that wallet from the older block chain to the new one. So everyone would have their balances preserved, but the old chain would no longer be needed, except for researching past transactions. Most nodes would be able to delete the old chain after some time (say a few months, or a year) and move on.

    This simple device would not solve bandwidth issues (transactions processed per second), but it would effectively keep the chain from growing indefinitely.

    Thank you for your time!

    Any thoughts?
  • StephanTualStephanTual London, EnglandMember, Moderator Posts: 1,282 mod
    This new post by Vitalik should go a long way in answering questions on the subject: http://blog.ethereum.org/2014/02/18/ethereum-scalability-and-decentralization-updates/
  • oliverkxoliverkx Member Posts: 85
    Vitalik's post talks about light clients. That's a useful concept for allowing people to use the network, but unless I completely misunderstand, it doesn't address the issue of the forever-increasing amount of data that all full nodes must contend with.

    Periodically truncating the block chain in the way I described above would limit the growth of the "active" part of the block chain to the amount tied to increased usage. But this active part would not be burdened by decades of historical data with very little practical use.

    In a world with a truncated block chain, you would essentially have three types of nodes instead of two:
    - light clients that can use the network but don't help maintain it.
    - full nodes that maintain the active part of the block chain.
    - archival nodes (a minority) that would also keep the full history of the block chain, back to its inception.

    If this model is flawed, or a better solution already exists, please explain...

    Thanks again!
  • oliverkxoliverkx Member Posts: 85
    edited February 2014
    Regarding the bandwidth issue (i.e. handling an increasing number of transactions per second), could this be addressed by splitting the block-chain into multiple parallel active block-chains?

    Think of the current state of affairs with the multiple alt-coins out there. A transaction in the LTC chain only affects the LTC chain, and a transaction in the VTC chain only affects the VTC chain. But a user who owns LTC can change some LTC into VTC in order to pay another user in VTC.

    Now imagine two distinct block-chains, but both denominated in the same currency (e.g. ETH). Some wallets would reside in one block chain, and some wallets in the other. Transactions between two wallets in the same block-chain would only affect that block chain. Only transactions between wallets in different block-chains would affect both chains. In this case, each chain would also act as a special wallet, and the transaction would have to cross the chain boundary using these special wallets:

    User 1 wallet -> chain A special wallet -> chain B special wallet -> user 2 wallet.

    Such a system would offer the biggest improvement if a majority of transactions stay within a given block-chain. This could be encouraged by organizing block-chains regionally (e.g. N. America, Europe, Asia...), or maybe by transaction size...

    But even if every transaction was forced to cross a block-chain boundary, the overall network would still be more scalable than with a single chain. This isn't readily apparent in my two-chains example above. But imagine 100 parallel chains. Each transaction would involve at most two chains, which represents 2% of the network, instead of 100% with a single block-chain.

    Each miner would be a full-node on one (or more) of the block-chains, and also a light-client on each of the remaining block-chains. This way, it would have the ability to verify cross-chain transactions in the other block-chains before processing them for its primary chain.

    Light clients would only need to participate in a single block-chain.

    Sounds crazy - but could it work?
  • JasperJasper Eindhoven, the NetherlandsMember Posts: 514 ✭✭✭
    As i understand blockchains can be made to make ancient history irrelevant. Miners would have to be forced to now and then repeat information that would otherwise be lost.

    'plain' PoS variants have an issue in that it is 'too easy' to mine; there is no incentive to not try incorrect blocks, or ones that are forks. [The Slasher blog post](http://blog.ethereum.org/2014/01/15/slasher-a-punitive-proof-of-stake-algorithm/) is about this.

    Anyway, about multi-blockchain, i have an idea but it seems too damn simple to be true; the above has another side; you can as easily mine multiple block chains, but in that case it is actually useful. As far as i can tell there is no hitches, and it works even with the Slashers method.

    Note that I shouldnt have written the below, but cant help myself, haha. Principal question is the thing wrong with the above.

    *Presumably* nodes can be 'full-node' about some of the chains and only partial on others. PoS mining can be two ways aswel, always being able to mine everything, only mining when you have the respective chain, or needing to be full node on everything. The latter two have the potential issue that it promotes keeping your coin on some external, trusted place(like a web wallet) where there is a full chain so it can properly play along with the system. The former doesnt promote having a full-node. However, other mechanisms, like the blockchain mining proposed could fill the role of doing that. To mine, as little as the checksum and random values `n`(as from slasher) of each potential block might be needed.(Maybe even `X=checksum(append(n,checksum_of_block))` is enough, score being `checksum(append(X,wallet_address))`)
  • JasperJasper Eindhoven, the NetherlandsMember Posts: 514 ✭✭✭
    @oliverkx The issue is that with multiple PoW blockchains is that an attacker can just dump everything on one of the chains and then do all the 51% attacks.

    Disproportionally increasing reward based on difficulty would make non-attacking miners pile on too, protecting the 'overloaded' block chain. But that leaves all the other blockchains in danger because the attack might be to divert mining power away from them!

    Sorry i should have said that the previous post!
  • oliverkxoliverkx Member Posts: 85
    I still don't understand block chains well enough to clearly see what might work and what can't. Regarding the 51% attack issue, my thought was that the number of block-chains would start out small, and increase gradually based on the total load on the system, or the total number of miners. If the load or miners per chain is kept high enough, then a 51% attack should be no easier than with a single block chain with the same hash-rate. For example, if DOGE is currently running at 100 GH/s, and the Ethereum network was running at 1,000 GH/s, then it should be able to accomodate 10 parallel block chains with about the same level of security.

    As an alternative, maybe it would be possible to force the multiple chains to stay in sync, by preventing any one chain from getting more than one block ahead of the slowest chain in the group. Such a system would require some miners to mine multiple chains, in order to load-balance the overall mining capacity.
Sign In or Register to comment.