Medium-Size Problems in Ethereum

mode80mode80 Member Posts: 64 ✭✭

Hard Medium Problems in Cryptocurrency Ethereum

Vitalik gave a great talk on Hard Problems in Cryptocurrency http://vitalik.ca/files/problems.pdf (which also apply to Ethereum).

I've been working with Ethereum smart contracts on an intimate level for several weeks now while creating EtherScripter. I've been taking note of (what I see as) Ethereum's "medium size" problems. These are problems with approachable solutions that Ethereum still has time to address. My notes:

Byte arrays are yucky
. Starting in POC-4 each contract must expose a unique non-standard API defined in terms of bytes
. This is a rather low level and complex way for contracts to define their inputs
. Arbitrary byte arrays are unfriendly to high-level programmers and impossible to transparently expose as a simple list by high-level languages
. The simple list of 32-byte inputs in POC-3 was much easier to work with for a contract developer. However there was some memory overhead for the "unused bytes" per item and it didn't offer an elegant way to send a chunk of data > 32 bytes.
. Suggest: provide a standard mechanism for contracts to define their expected inputs and return values as part of the contract itself (like a "function signature"). Then calling contracts and contract UI's can deal with these as a list of parameters rather than a mess of bytes.

Large values in contracts are ambiguous
. Each contract must separately decide if large values represent big numbers or negatives.
. (If a contract has stored my "balance" as 2^255, I don't know if I'm really rich or "owe 1")
. A key benefit of smart contracts is they (should) unambiguously define the agreement
. Suggest: Define upper range numbers as being negative for all contracts and remove unsigned operators from EVM to enforce consistency
. (the largest 16-byte postive number is still super big and the very rare applications that require bigger numbers [universe simulation??] can always internally define their own 'bignum' using multiple values)

Market pricing of "compute" is not granular enough
. In POC-4 the cost of contract execution s driven by market forces (via "gas") but...
. The various components of compute that gas buys (processor, storage, memory) are hard coded to arbitrary ratios of a single gas price.
. Currently a storage step costs 100x that of a compute step(?). This probably grossly underestimates the cost difference since storage must persist forever and a compute step is only scarce for an instant.
. In the real world storage and memory and compute cycles will vary dramatically against each other and a long term solution would accommodate fluctuating market values of these resources against each other -- otherwise weird incentives develop for miners.

Cost of consensus is always linear but its benefit plateaus
. When there are just a few nodes, the benefit of another node is large, but when there is a million, the additional benefit of another node is negligible. Yet the millionth node imposes the same additional cost on use of the network as the 2nd one.
. Currently there is no way for a smart contract author to select for the amount of consensus that is appropriate for their particular application.
. A single blockchain will inevitably be a size that's too expensive for many applications and supports insufficient consensus for many others. Also, it's not clear that current incentives drive a single block chain toward a size that represents an ideal balance.
. Suggest: consider a sharded blockchain approach where a contract author has a choice among different levels of consensus. Example: contract author can choose a network of 1, 10, 100, 1000, 100000 or 1000000+ nodes (to the extent they are available). Costs would of course be proportional and this would effectively make Ethereum useful for a much broader range of uses.

The market cost for contract storage is infinity
. The market cost for 32-bytes of storage is theoretically Infinity.
. That's because the miner (or the mining network collectively) is expected to store this value forever.
. Looking at the problem another way, imagine the Ethereum blockchain is 10 TB after a year. Under the current model, a prospective miner must invest in a 10 TB hard drive before he can even begin... and he receives no fees for storing this data... those fees were earned by previous miners when the contract ran and stored the data initially. Imagine 3 years later, it's 100 TB, 6 years 1000 TB. There's increasing cost for mining participation that is not directly incentivized by any revenue under the current model. This will eventually lead to miners dropping out of the network until none are left.
. Suggest: Contract storage should cost the contract some fee --every single block-- for proper market incentivization.
. This is the only way to ensure balance, but will entail huge costs for contracts unless some mechanism is introduced to let storage exist on a subset of nodes (for correspondingly less ongoing cost)

Comments

  • chris613chris613 Member Posts: 93 ✭✭
    @mode80? this is great. I hope that the team will see this and engage in this discussion.

    I agree outright with your suggestions about "function interfaces" (signature is a reserved word in crypto ;), signed values, and storage fees. The current models for these are inadequate IMO, and your proposals make sense.

    The cost of consensus issue has been on my mind a lot. Having every node run every contract and store every word of storage just seems so wasteful and ultimately, expensive. I haven't come up with anything resembling a solution, but would be interested in exploring it further.
  • StephanTualStephanTual London, EnglandMember, Moderator Posts: 1,282 mod
    Brilliant feedback, @mode80, thank you very much for taking the time to write this up. I'll make sure it gets the team's attention :)
  • mode80mode80 Member Posts: 64 ✭✭
    edited April 2014
    Thanks Chris. I did get some feedback from Gavin. In interest of good netiquette, I'll let him respond here if he chooses. I would summarize though by saying these probably won't be addressed for a 1.0.
  • VitalikButerinVitalikButerin Administrator Posts: 84 admin
    > Byte arrays are yucky

    That's why you use Serpent :) No ugly byte-arrays there.

    The main reason why I went for the byte-array switch is that I was looking at how cryptographic functions like custom hashes and number theory would be implemented, and I realized that there really is no neat way to do that in the context of a hashmap of 32-byte values. Byte arrays are the much more elegant solution. Gavin likes byte arrays because a byte array-based VM is easier to compile down to.

    > Suggest: Define upper range numbers as being negative for all contracts and remove unsigned operators from EVM to enforce consistency

    The problem with that is 32-byte values used in cryptography (eg. ECC). Sometimes you do want the full 0...2^256-1 range. But if it turns out negative numbers are way more useful, then we will rename SDIV to DIV and DIV to UDIV and make / and % map to the signed operators in Serpent.

    > Suggest: provide a standard mechanism for contracts to define their expected inputs and return values as part of the contract itself (like a "function signature").

    If you're going to be calling a contract, you need to know what that contract does, so it's reasonable to expect people to get the return signature from the same source. But we are already standardizing things to some extent.

    > Market pricing of "compute" is not granular enough

    Agreed. Problem is, too much granularity and it leads to excessive complexity. The naive approach of adding more granularity, having ten types of gas, would make it nightmarishly complex to send a message. If we can find some clever middle ground (eg. have an EMA-targeted value for storage price) that would be optimal though.

    > Suggest: consider a sharded blockchain approach

    ETH2.0 will have either sharding or SCIP, we pretty much settled that. But we don't want to do all that for 1.0 when people are waiting for the tech to launch even as is.

    > Suggest: Contract storage should cost the contract some fee --every single block-- for proper market incentivization.

    The problem with that is that you would need to run through the entire blockchain subtracting fees from every contract, or at least deleting contracts whose balance dropped below zero, and that would be even worse for scalability. There are a few clever strategies to deal with that which we're thinking of; more on that in the next few weeks.
  • mode80mode80 Member Posts: 64 ✭✭
    > That's why you use Serpent :) No ugly byte-arrays there.

    Unfortunately Serpent (and EtherScripter) must expose the ugly byte arrays to the user because contracts use them as inputs. It's what you called the "annoying chore" of packaging inputs in your guide. The utility you offer assumes that a contract takes a list of 32-byte inputs. But many won't, so the annoying chore will then be more annoying.

    C is a low level language, but it allows function inputs to be defined as a list of parameters. It'd be nice if Ethereum's high-level languages could offer at least as much (eventually).

    > Agreed. Problem is, too much granularity and it leads to excessive complexity.

    I don't like the added complexity either. But with storage in particular the problem will be acute if it doesn't "pay its weight".

    Glad to know you're thinking on it. That was my main hope.
Sign In or Register to comment.