I admit I'm no expert on this, but why can't you store the ethereum block-files on the clients in a highly compressed form(zip, rar etc...) instead of plain-text? Thanks.
JasperEindhoven, the NetherlandsMemberPosts: 514✭✭✭
This is stuff that doesnt necessarily need a hard fork if Ethereum was launched, as long as the the wire protocol has a way to communicate 'this client doesnt have that feature'. Also of course because Clients do have free reign, as long as the "Merkle root" of the Patricia trees are constructured correctly it doesnt matter how it is stored.
Do note that that it is accessed as a database, that has its own storage mechanism. Also, the code itself has some compression. Here is a perhaps outdated blog post on it. Like png, jpg, video format compression that are 'topical' it is hard to beat with a 'generalist' compression. Wouldnt say i am sure further compression wouldnt work though. ('generalist' compression is just for us low-entropy people)
In order to reduce the growth of the blockchain, have you thought on using some kind of compression mechanism when storing the ethereum blockchain files on the client? Thanks.
Stephan Tual Mod • 4 days ago
You can't compress the whole database (the blockchain is ultimately held in Leveldb), because you need access to that info rapidly, so compressing the whole database would mean having to decompress the whole thing everything time you need to read one element of it. Not a good idea That said (based on a comment by vitalik), we could compresss evm code by approx 40%, because it's stored/could be stored in a different db. It's basically about compressing what we can compress, keeping in mind the overheads, but we're working towards that of course.
Vitalik Buterin • 4 days ago
> we could compresss evm code by approx 40%, because it's stored/could be stored in a different db.
No, we can compress evm code by ~40% because EVM code has a suboptimal entropy content. To see how, imagine a language where the two characters are A and B, but A appears 90% of the time, so strings would be like AAAAAABABAAAAAAAAAAAAAABAAAA. To make an alternative representation for this, we can invent two new letters, C and D, and have a rule that CC = AAAAAAAAAA, CD = AAAA, DC = A, DD = B. Then, that above string becomes CDDCDCDDDCDDCCCDDDCD, a substantial savings. EVM is less extreme, but similar. The idea is that instead of storing the EVM code itself in a DB, we would store a compressed version, and then decompress it upon retrieval.
Comments
Do note that that it is accessed as a database, that has its own storage mechanism. Also, the code itself has some compression. Here is a perhaps outdated blog post on it. Like png, jpg, video format compression that are 'topical' it is hard to beat with a 'generalist' compression. Wouldnt say i am sure further compression wouldnt work though. ('generalist' compression is just for us low-entropy people)
Here is the conversation:
Guest
In order to reduce the growth of the blockchain, have you thought on using some kind of compression mechanism when storing the ethereum blockchain files on the client? Thanks.
Stephan Tual Mod • 4 days ago
You can't compress the whole database (the blockchain is ultimately held in Leveldb), because you need access to that info rapidly, so compressing the whole database would mean having to decompress the whole thing everything time you need to read one element of it. Not a good idea
That said (based on a comment by vitalik), we could compresss evm code by approx 40%, because it's stored/could be stored in a different db. It's basically about compressing what we can compress, keeping in mind the overheads, but we're working towards that of course.
Vitalik Buterin • 4 days ago
> we could compresss evm code by approx 40%, because it's stored/could be stored in a different db.
No, we can compress evm code by ~40% because EVM code has a suboptimal entropy content. To see how, imagine a language where the two characters are A and B, but A appears 90% of the time, so strings would be like AAAAAABABAAAAAAAAAAAAAABAAAA. To make an alternative representation for this, we can invent two new letters, C and D, and have a rule that CC = AAAAAAAAAA, CD = AAAA, DC = A, DD = B. Then, that above string becomes CDDCDCDDDCDDCCCDDDCD, a substantial savings. EVM is less extreme, but similar. The idea is that instead of storing the EVM code itself in a DB, we would store a compressed version, and then decompress it upon retrieval.