Is there a data race risk here?

arayboaraybo Member Posts: 17
In 'Introducing Ethereum Script 2.0' (http://blog.ethereum.org/2014/02/03/introducing-ethereum-script-2-0/) it says "Another modification is that code should be immutable, and thus separate from data; if multiple contracts rely on the same code, the contract that originally controls that code should not have the ability to sneak in changes later on. The pointer to which code a running contract should start with, however, should be mutable."

It is not clear to me whether the capability in the last sentence would enable a race condition that an attacker could exploit, by changing the code underlying a contract after a victim has chosen to enter it, but before that action gets added to the blockchain. Could an attacker create something attractive, such as a high-yield bond, and then flip the contract's code pointer to code that steals incoming funds, at least of any in-flight contract-entering transactions?

Answers

  • JasperJasper Eindhoven, the NetherlandsMember Posts: 514 ✭✭✭
    edited May 2014
    I think this immutability in your code is in part there because then you work with objects rather than changing code. So to change the code you change an adress to an object, which is a contract with code.

    Generally contracts are only trustless insofar you have checked the code to do as claimed!

    I mean the feature mentioned here might for more efficient running of the code, where it determines where to start running the bytecode next time. I think that is the idea of SETROOT. Really that would be equivalent to having a state parameter and using if to select different cases, but more efficient.

    That said, I still see JUMP, JUMPI and dont see some of the suggestions in the code. I think it(edit: the ES-2) hasnt been done yet.

    I am skeptical about the idea of libraries. I think considering the contracts as closures sounds like way more fun. :) (edit: note some edits to clarify were made by me very late... proofread better)
    Post edited by Jasper on
  • arayboaraybo Member Posts: 17
    edited May 2014
    @jasper : I am not sure whether your reply answers my question, so perhaps you can help me work through it.

    To avoid ambiguity, I will use the term 'blockchain address' to refer to an address in the Bitcoin sense, meaning an identifier of some collection of assets (or, in Ethereum's case, it could be a contract's code), and 'program address' to refer to the location of some datum or instruction within a contract's code.

    I am assuming that where you wrote 'trustless' in your second paragraph, you meant 'trustworthy'. Even after that change, it is not sufficient for trustworthiness that you have examined the code and found it to be satisfactory. You also need to know that if you choose to enter into a contract, it will be governed by the same code that you examined. As far as I can see, by far the simplest and clearest way to ensure that is to make the code immutable: once a contract is bound to a blockchain address and made public by being added to the blockchain, it can no longer be modified by anyone.

    Note that this necessarily means there can be no possibility of self-modifying code or data execution, something that version 1 of the script apparently permits. It also means that the entry point or a jump target cannot be changed, even to some other program address within the original code, as that would open the door to 'return-oriented programming' attacks. This does not mean that the contract must be inflexible, but it does mean that the scope of any such flexibility must be spelled out in the original code, and not introduced by some later unilateral action (note that this includes revocation: if a contract is revocable, the mechanism for revoking it must be within its original code.)

    Both the first two sentences that I quote from the 'Introducing Ethereum Script 2.0' document, and your first paragraph (assuming that 'address' means what I have called 'blockchain address') seem to promise that, but then, in both cases, what follows seems to suggest (at least to me) that there might be ways around the strict level of immutability that I have described in my previous paragraph.

    Therefore, let me rephrase my original question to focus on this concern: will Ethereum Script 2.0 guarantee that contracts will be strictly immutable, so that you can be sure that a contract at any given blockchain address will be executed exactly as written at the time it was published to the blockchain?

    EDIT: I disambiguated 'address' in the last sentence to conform to my own rules.
    Post edited by araybo on
  • JasperJasper Eindhoven, the NetherlandsMember Posts: 514 ✭✭✭
    Trustless is a term we use in a case where trust is not necessary. In a sense it is better than trustworthy, because it cant involve trust being misplaced. Of course trustlessness can be also misplaced; by misreading contract code. However, trust in code(making it trustless) can easily be transferred to using the same code in dealing with different people.

    See how vending machines, Bitcoin, are trustless? The idea of Ethereum is to extend it much further. Mind though, we dont aim to eliminate trust, the real world already uses damage mitigation with (non-ethereum)contracts and laws. Ethereum can do much more in some ways and less in others, and more cheaply.

    So the code could be rewritten in earlier POC versions, yes. But only the contract itself could do it. It would have a condition in which it could do it, which could include stake holders arbitrarily. Reviewing the code would include reviewing the mechanism that allows change of the code.

    I think this doesnt quite answer your answer yet. Could also be about potential bugs/exploits in Ethereum and hard-to-find ones in contracts. I'll look at the problem some more.
  • arayboaraybo Member Posts: 17
    I see that 'trustless' is being used to describe a state in which the participants in an agreement need not trust one another because the mechanism of the contract's execution does not permit fraudulent or even accidental violation of the terms. It is the issue that I am concerned with here.

    The formal verification of the correctness of computer programs is difficult, even for simple programs, and it is something that a majority of professional programmers do not know how to do. Immutability simplifies the problem, and it is one of the main arguments made for purely functional programming. Conversely, self-modifying code and data execution are the mechanism behind many of the vulnerabilities that are constantly being found in software. If Ethereum contracts are not, in practice, verifiable, much of its potential will go unrealized.
  • JasperJasper Eindhoven, the NetherlandsMember Posts: 514 ✭✭✭
    Giving the original question another shot. In principle the order of executions is simply set by the order the transactions come in. So until they add multithreading, race conditions arent an issue.

    If you deal with a contract, you have to trust the code of that contract does as promised. Any other messages send to the contract should not make it break its promise. Such messages should only change the contract as part of the promise.

    Mutability of code execution is simply part of that. However it used to be that contract.storage[index] could also access and change the contract code, the low indices would go over the code. ES2 ended that practice, probably because it was too easy to make a mistake. That was what was meant with your quote of the blog:
    Another modification is that code should be immutable, and thus separate from data; if multiple contracts rely on the same code, the contract that originally controls that code should not have the ability to sneak in changes later on. The pointer to which code a running contract should start with, however, should be mutable.
    (It may also have to do with making the fee system easier.) Anyway, contracts cannot change their code anymore, although they can create new contracts and read their own code now.

    Really, though, even if there was a command MUTATE_CODE(memory_index), analysing that it doesnt happen at the wrong time is no more difficult than doing the same to suicide(OWNER), or any command whatsoever. That includes commands sending ether to any address.

    If you do not trust that it wont send ether to some address at the wrong time, you dont trust it that it won't do it twice either. Meaning: you dont trust that all the ethers wont be drained to that address. I.e. either you do not trust contracts, or you are fine with stuff like suicide, MUTATE_CODE, that can are commands that need to be reached by the control flow.

    What it is really about is checking the promise of what it does, and how well you can check it. Being able to have index in contract.storage[index] be able to stumble onto the code is something that makes checking harder, not the bare fact whether or not you can change code. Changing code can be fine, for instance if there is just one party to the contract, or the code has to be accepted by a group for the MUTATE_CODE function to be called.

    Dont know about the stack smashing attacks and stuff. The virtual machine checks the size of the stack, and invalidates the transaction, with all gas burns if it ever runs out of stack. JMP is probably important to get the inputs right of.(for instance, a constant input)
  • arayboaraybo Member Posts: 17
    edited May 2014
    @jasper : The argument in your first paragraph does not apply because the data race that I am concerned with would occur between transactions. Transaction processing in any Bitcoin-like system is unavoidably concurrent, and distributed as well. In analyzing its security, you have to think outside the box, where in this case, the box is the script language by itself.

    In your third paragraph, you start writing about "mutability of code execution", whatever that is. From what follows, it seems to include the ability to perform conditional branches, which is not in question here. It is an invalid argument, but an easy trap to fall in to, to expand the meaning of a word or term beyond its original usage, and then persuade oneself that the original arguments are wrong because they do not work when you substitute this expanded definition of the term.

    As to whether mutability complicates verification, it most definitely does. The fewer ways an algorithm can change, the fewer the cases that have to be considered, and this is an argument that works with any reasonable definition of 'mutable'.

    In your eighth paragraph, you offer two examples where you claim changing the code does not matter. Well, there is no such thing as a contract with just one party, and the second one raises the question of how that particular promise, that a group consensus is required before a change is made, is kept. It brings us right back to the original question.
  • JasperJasper Eindhoven, the NetherlandsMember Posts: 514 ✭✭✭
    It is totally not concurrent, i dont think bitcoin is either. In both every full node runs all the contracts in the same order of the transactions in the block.(note that transactions are sent by pubkeys, contain messages to other addresses, including contracts, and contracts may send more messages as result of their execution.) It is not like a computing network, it is a decentralized consensus network.

    Heard some word of trying to add multithreading in this, but that sounds really hard. It'd have to ensure that contracts run as-if they are run in the order the transactions, and you cannot really tell who is going to be called by a contract.

    What i meant that if you cannot trust 'X only when conditions are reached' it doesnt matter if X is 'mutate code' or it sends some ethers to some dude, you cant tell if it does that N times, so the entire balance of the contract can be stolen. It quickly becomes as bad as mutating code.

    ...just because the the interpreter can do mutating code doesnt mean it actually happens in a contract.. Apparently i cant answer your question.

    In Ethereum there is such thing as a contract with just one party. For instance you can try secure your funds by having a contract pay out slowly to you, and have a lot of public keys that together can change the address it pays out too, or drain it. Or lets say there is a NameReg without a feature to sell a name. You can use an 'asset' contract, and code that to buy the domain. The asset contract has your address in it so you can only control it. However, it also has a feature where you can enter another address and an amount, and then that address can buy the name if enough is sent.
  • arayboaraybo Member Posts: 17
    The concurrency occurs because, at any given time, there will be multiple transactions in progress, being worked on by a multiplicity of miners. The transactions' outcomes, and the order in which they are taken to have occurred, is provisional and probabilistic, becoming more certain the deeper they are buried in the blockchain, and the more widely they become distributed. Any Bitcoin-like, blockchain-based, cryptographic infocurrency is an inherently concurrent, globally distributed computing network, with eventual consistency. This does not mean it does have race conditions, merely that it has the necessary preconditions for them - after all, there are a large number of concurrent systems that are race-condition-free (other than possibly from undiscovered bugs.)

    I take your point that in Ethereum terminology, where 'contract' is the term for any script bound to a blockchain address, there can be a single-sided contract, but this doesn't get us any closer to an answer, as scammers don't try to scam themselves. More generally, no non-exhaustive list of special cases in which race conditions pose no risk can rule out there being cases where they do.

    You seem to think I am constantly moving the goal posts, but I am actually trying to keep the discussion from wandering off-topic. You were close to the topic in an earlier post, when you wrote "any other messages sent to the contract should not make it break its promise" (but left the 'other than what?' question unanswered - I don't know in what circumstances any sort of message could reasonably be allowed to break a contract's promise). Where do we find the contractually-binding expression of that promise? It exists in the code, and possibly also in some of the data, e.g. if the contract contains a table that determines which of several possible actions are taken, depending on the values of some run-time data.

    Clearly, changing the code or data that express the promise could break that promise. If the possible changes are restricted to the point that someone inspecting the contract before participating can determine all possible outcomes, then the promise implicit in the contract's code is unbreakable (whether it conforms to what the contract's author says it promises is another matter.) On the other hand, if the code or data that express the promise could be changed in ways that could not be fully predicted in advance, the possibility of promise-breaking exists. For example, if it were possible to store arbitrary data and then cause the code execution to jump to an address in that data, then the original promise could be replaced with an entirely different set of rules and outcomes (or you could look at it from the point of view that the contract is unable to make any promises).

    The sentences that I quoted in my original question almost close the door on the possibility of promises being broken through the sort of exploit that I posited, but the apparent ambiguity of the final sentence seems to keep the possibility open. When I posed the question, I expected someone would quickly resolve the issue by explaining that last sentence, but that did not happen. I appreciate your help in trying to settle this matter, but it is clear that I must look into the specifics of the new version of the scripting language in order to answer my question to my satisfaction.


  • JasperJasper Eindhoven, the NetherlandsMember Posts: 514 ✭✭✭
    You are completely off on how it works.

    Miners dont handle transactions. All full nodes, wether they mine or not, handle all transactions. This is No. 1 problem in cryptocurrencies, but we can live with it because computers are so damn fast and have so damn much storage.(and the stuff we want consensus about is small)

    Ideally there is only one true blockchain, but in the case of a split, the losing side backtracks and re-does the computation to get in line with the winning side. It is unlikely that a split lasts longer than 6 blocks, this is why 6 confirmations is a thing.(A innocuous reason for a split is if the network latency gets too high, people might not know about the new block, and find a block on an old one, there are ways to mitigate this)

    The full nodes run concurrently, but they each execute all transactions one by one. So no concurrency in contracts.
    Where do we find the contractually-binding expression of that promise?
    The code is what a contract does. With promise i mean basically in human-language you say what it does. That isnt binding -not for the Ethereum system at least- you have to check that the code actually does what the guy that made it says it does. Still, i reckon for contracts you should demand a statement that is as clear as possible. You dont want plausible deniability with regard to contract behavior.

    Of course, you can add do tests, like fuzz tests around it and look at how the state changes in simulation. This does narrow it down too. However, Gavin Wood seemed a bit negative about it, he suggested the B-Method might be used.
    Clearly, changing the code or data that express the promise could break that promise.
    Not if the promise is 'under these circumstances the code may be changed, and it only infact happens during those circumstances. This could include review of the code, so it is unpredicatable the first time you deal with the contract, but it is visible and the change blockable. (Anyway, changing the code is removed, and with stuff like DOUG attaching other contracts is more likely, and better anyway. Infact recently there is a video about Andreas Olofsson doing this)
    You seem to think I am constantly moving the goal posts, but I am actually trying to keep the discussion from wandering off-topic.
    Not really, just communication not going very well. Maybe should should start the (updated)question a new thread and i'll ask someone in skype to try answer it instead.
  • arayboaraybo Member Posts: 17
    Your claim that I don't know what I am writing about might look a little bit more plausible if you did not contradict yourself in consecutive sentences while making that claim: "miners dont handle transactions. All full nodes, whether they mine or not, handle all transactions." As miners are nodes too, the second sentence contradicts the first - that's just basic logic.

    This is just another case of where you believe that if a thing can be described one way, that automatically invalidates any other description. Take, for example, your claim that "It [Ethereum] is not like a computing network, it is a decentralized consensus network." Really? It is not just like a computing network, it is a computing network! That is regardless as to whether it is, more specifically, a decentralized consensus network.

    Did it not occur to you, when you came to that absurd conclusion, that perhaps there was something wrong in your reasoning? The mistake here seems to be that you are arguing about the type of a thing as if it were its identity.

    After some irrelevant discursion, you finally concede something that you never should have disputed in the first place: there is concurrency in Ethereum. You then claim that this is irrelevant to my question, because each individual node processes transactions serially. There are at least two things wrong with this argument: it is inadequate as a general argument for the absence of concurrency-related problems, and regardless of that, it is not relevant to the specific concern I raised.

    The first point is made by the canonical Dining Philosophers' problem: each node executes the same sequential algorithm, yet in some interleavings, everyone gets to eat, while in others, they deadlock.

    As to the second point, the putative race occurs between a scammer and his victims in the processing to put transactions on to the blockchain, not in the post-insertion processing that all nodes do, and you know what type of node is needed to put things on the blockchain. This situation is very similar to the TOCTOU race vulnerabilities that are well-known in computer security literature (e.g. http://cwe.mitre.org/data/definitions/367.html), and that, in fact, is what prompted the question.

    You appear to be using what might be called word-association arguments: You see 'race condition' and associate it with 'multithreading'. You see that each node is single-threaded, and think that proves that there can be no race. An actual proof would have to be based on cause and effect, showing that the specific sequence of causes and effects posited in the question are not possible.

    One amusing example of word-association reasoning going wrong is given here: http://xkcd.com/221/

    You then go on to show that while you use the word 'trustless', you haven't really grasped its significance. You write "I reckon for contracts you should demand a statement that is as clear as possible. You dont want plausible deniability with regard to contract behavior." A scammer would be thrilled to know that you are thinking in terms of verbal promises and plausible deniability - in what court do you think deniability, plausible or not, would make a difference? Once a scammer has sent your funds to an address he controls, it is game over, nothing he promised beforehand is worth a wei, and he has no need to pretend it happened by accident. That is why effective prior verification of the exact semantics of a contract (which means a great deal more than "demand[ing] a statement that is as clear as possible") is essential to the success of Ethereum, and is why I asked the question.

    I see that you still don't get the point that it is the code, not anything said by its author, that determines what is actually being promised in a contract, that if a participant tacitly permits (or cannot prevent) arbitrary changes, then he is wide open to being scammed, and therefore he needs to be able to verify that there is no mechanism that could be used to sneak in such a change.

    Towards the end, you mention in passing that in the new script language, changing the code is removed. Two things about this: firstly, after all your convoluted arguments that mutability is necessary, you simply drop all those claims (and thereby implicitly recognize their vacuity) now that Ethereum's architects have blessed immutability. Secondly, you do not seem to have recognized, despite the fact that I made a point of it in my previous post, that if this is so, and cannot be subverted by the mechanism hinted at in the quote I started with, then my question is immediately answered in the negative. After all your circumlocution, we are back to precisely the question I originally asked, once again.
  • StephanTualStephanTual London, EnglandMember, Moderator Posts: 1,282 mod
    We've gone 5 months without arguments on this forum, I won't let that thread degenerate. Just a friendly reminder from your local mod :)
  • StephanTualStephanTual London, EnglandMember, Moderator Posts: 1,282 mod
    As for concurrency/deadlocks.

    I've asked that question to Gav when I joined the project in Jan (18 years in high TPS environment does that to a man).

    My exact question was: "so what if 2 tx come in with the exact same timestamp. One credits an account in a 'bank type' contract, the other debits. Who goes first."

    Reply was (apologies, chatlog format):
    "race conditions come about when, depending on the order of two asynchronous operations, the system can become in an invalid state. ethereum is automatically atomic, and so nothing is asynchronous and so state is always perfectly well preserved. i.e. transactions are processed one at a time.

    as always, compare to bitcoin (since that's the single-app simple example)

    what if two transactions come in at the "same time" (a meaningless concept given the distributed asymmetrical nature of transaction processing) each draining the same account but to different destinations. One must inevitably be found invalid.

    the other should go through.

    which one is chosen depends on the circumstances of the miner
    "


    I hope the above helps understand things a bit more.
    What you however *could* have are contracts that are written in a naive way, for example a gambling app: "First one to send TRX wins the jackpot" (thanks Joris for the example).

    In that case you could image a situation by which the miner participate in the gambling exercise, gets lucky and mine the block, reshuffle the order of transactions in the mined block and make himself win. That's not a guaranteed win (he still has to mine the block) but it does statistically increase his chances.

    Reply from Gavin:
    "you would never structure a contract like that because of precisely that issue

    a problem to be reported on the block chain must be one that you would reasonably expect 60 seconds reporting sooner or later would not make a difference to the winner

    if it's really so tight, you wouldn't use the blockchain
    "


    I hope this helps.
  • JasperJasper Eindhoven, the NetherlandsMember Posts: 514 ✭✭✭
    This is just another case of where you believe that if a thing can be described one way, that automatically invalidates any other description. Take, for example, your claim that "It [Ethereum] is not like a computing network, it is a decentralized consensus network." Really? It is not just like a computing network, it is a computing network! That is regardless as to whether it is, more specifically, a decentralized consensus network.
    Computing network: multiply number of nodes by 1000 → 1000× more computing power. Ethereum 1× more computing power.(no change) That is what makes it not a network for computation.
    Your claim that I don't know what I am writing about might look a little bit more plausible if you did not contradict yourself in consecutive sentences while making that claim: "miners dont handle transactions. All full nodes, whether they mine or not, handle all transactions." As miners are nodes too, the second sentence contradicts the first - that's just basic logic.
    That is only technically true, talking is for communication. If you say 'miners handle transactions', lacking other information, people are going to assume that implies other nodes dont. If you say all full nodes handle all transactions, it is clear a lot of work is duplicated. Afterwards you have to explain how that is not a problem,(in the near term) but ah well.
  • arayboaraybo Member Posts: 17
    @Stephan_tual: I ask you to read my original question and disregard all the irrelevant and often self-contradictory baggage that has been piled on to it in the responses. Regardless of how it may have been presented to you, this is not a general question about race conditions in Ethereum transaction processing, but a question about whether a specific type of attack would be possible in the interval it takes for transactions to make it on to the blockchain and be confirmed. As I pointed out earlier, this type of attack is well-known in the literature on computer security as the Time-Of-Check-Time-Of-Use (TOCTOU) Race Condition ( http://cwe.mitre.org/data/definitions/367.html , http://en.wikipedia.org/wiki/Time_of_check_to_time_of_use ).

    Furthermore, even within that scope, my question is very constrained, and concerns the meaning of two sentences. The first sentence, taken on its own, would imply that the specific attack I posited could not be performed, but the second sentence explicitly creates an exception to that rule, so I wonder if it opens the door for this type of attack.

    Anyone can, of course, choose not to participate in any contract that he cannot verify as being free of any potential for scamming, but to perform that verification, every aspect of the script language's semantics must be completely understood, so I think this is a reasonable question. As there is no external authority to appeal to, Ethereum stands or falls on its ability to avoid getting the reputation of being unduly risky.
  • BluefinBluefin Member Posts: 47
    edited May 2014
    @araybo‌, I have been following this topic of yours and your exchange of words above, which actually went off topic.

    I believe your question begets the answer that there is no "race condition" per se. The scenario you have painted is that the "attacker" is out to cheat. And in a ploy to cheat, anyone can do anything to make that happen, no matter how immutable it is, unless the code and transaction data are written with immutability as a pre-condition of the contract.

    Otherwise, I will think so that there is a "race condition" here if the attacker wishes to do so. So, the onus is on the attacker and the onus is on the designer of the code, whether she wants to make it a "race condition" or not. Really, race condition is a function of how we want it to happen or not. TOCTOU addresses a software bug issue, i.e., design flaw. If it is intentional, it is called cheating TOCTOU. In your question, it is intentional and therefore TOCTOU with malice.

    Depending on how it is designed, one can make it a non-issue. Therefore, IMHO, the answer is actually in your question, i.e., that the attacker is out to create a "race condition" in the first place when he can choose not to.

    Even if the originator of the code did not have the intention to cheat, the computation may have an erroneous consequence and a genuine race condition may occur. And I would be inclined to think that this is a bad piece of code. And the bad news is, Ethereum has no room for bad coding as they are immutable. One will just need to scrap the contract and start over again and face any consequences as a result.

    If Ethereum were to not allow a "race condition", then code flexibility and creativity will also be impacted as this would mean a rigid framework without any room for the slightest variation, i.e., absolute immutability. Then I should say, Ethereum will be a useless platform with nothing much coming out of it.

    In conclusion, a race condition will appear if the codes were badly written or if someone has an ultimate intention to cheat. Rhetoric to say the least!
    Post edited by Bluefin on
  • giactgiact Milan, ItalyMember Posts: 5
    @araybo, if I had to guess, I think that "the pointer to which code a running contract should start with, however, should be mutable" does not mean "the creator of the contract can change the entry point" but "the caller of the contract can decide the entry point".

    That would be in line with the idea of code reusage: a single contract can expose different functions that can be called by picking the right entry point.
  • arayboaraybo Member Posts: 17
    @giact : I agree that it looks that way, though note that the creator of a contract is also free to direct subsequent transactions to it. In any case, the question becomes: how much choice does the caller have over the entry point?

    At the one extreme, the caller can only choose between a set of entry points that are within immutable code, and are all the beginnings of routines. That would essentially be an 'if' or 'case' statement, and present no additional problem in verifying the contract. The only reason I think there may be more to it than this is that it hardly seems worth making a special case over what would be a redundant feature.

    I included the restriction 'and are all the beginnings of routines' because it is surprising what you can do even if you are restricted to only jumping to arbitrary addresses within immutable code and with data execution prevention in place - if you are not familiar with it, check out 'return-oriented programming' http://en.wikipedia.org/wiki/Return-oriented_programming for an explanation (things may be different in Ethereum's stack-based VM.)

    On the other extreme, if the caller could start at an arbitrary address within the contract's address space, then it seems possible, depending on what other features the contract has, that he could execute a piece of malicious code that he had stored as data in a previous transaction. Alternatively, perhaps he could execute a piece of code that the creator of the contract (who might be himself) had placed in what was ostensibly data - disguised as seeds for a random process, for example. Once data execution is achieved, the attacker has full control over the contract and all the assets it controls.

    There are other variations on the theme: if this capability allows for selecting a routine from a library, then can it be used to run a routine that did not exist when the contract was created, or has been modified since then? If so, then I don't think there is any way that an honest participant in a contract can be sure it cannot be subverted to scam him.

    The time-at-risk in these scenarios goes beyond particular race that I posited in my question, and would exist for the entire life of the contract, but they are all TOCTOU-like in that the question is 'how can I be sure that the contract will be executed according to the rules that I verified?' A person can, of course, choose not to participate in any contract that he has not been able to verify, but the most extreme of the above scenarios would raise the question of whether any contract could be verified. A more restricted scenario might make verification possible, but extremely difficult. Verification of code in Turing-complete languages is difficult anyway - in the steady stream of security patches that we are all familiar with, each one is an example of where some piece of software was less secure than its supplier thought.
  • giactgiact Milan, ItalyMember Posts: 5
    At the one extreme, the caller can only choose between a set of entry points that are within immutable code, and are all the beginnings of routines. That would essentially be an 'if' or 'case' statement, and present no additional problem in verifying the contract. The only reason I think there may be more to it than this is that it hardly seems worth making a special case over what would be a redundant feature.
    I partially agree.
    It would be slightly redundant. But with the added benefit that all contracts now have a common way of exposing and calling sub-contracts, instead of having to reinvent the wheel each time.
    But, well, since I really have no idea what Vitalik actually meant in that blog post, I was just going for the explanation that made the most sense to me.
    I included the restriction 'and are all the beginnings of routines' because it is surprising what you can do even if you are restricted to only jumping to arbitrary addresses within immutable code and with data execution prevention in place
    And I agree 100%.
    On the other extreme, if the caller could start at an arbitrary address within the contract's address space, then it seems possible, depending on what other features the contract has, that he could execute a piece of malicious code that he had stored as data in a previous transaction.
    I fail to see how this could be possible under the premise that the code of a contract is immutable after creation.
    What can be modified by the caller is what can be modified by the contract, and what can be modified by the contract cannot be executed directly.

    This scenario is still worrisome in a slightly different way, though: if a malicious caller could choose to jump execution anywhere within the "code ROM" of the contract he could jump, for example, in the middle of a immutable piece of data that is part of the contract code but was never meant to be interpreted as an instruction (for example, the argument passed to a PUSH to stack code) which could have unintended side effects (unintended by the creator of the contract).
    There are other variations on the theme: if this capability allows for selecting a routine from a library, then can it be used to run a routine that did not exist when the contract was created, or has been modified since then?
    Again, I fail to see how this could be possible under the premise that the code of a contract is immutable after creation. Sorry.
    It's quite possible I am misunderstanding what you mean: could you please explain it with a hypothetical scenario?




    Anyway, given the way Ethereum is currently implemented, all your worries do not apply: contract code is immutable, and the entry point is always fixed.

    I am pretty sure Vitalik did not mean to suggest that in the future the entry point can be arbitrarily decided by either the creator or the caller, hence why I suggested that perhaps he intended a multiple-entry-point idea that a contract might expose for the caller to select from.

    The only way to "close this case" is to wait for his response :D
  • arayboaraybo Member Posts: 17
    @giact : In response to your questions:

    On the other extreme, if the caller could start at an arbitrary address within the contract's address space, then it seems possible, depending on what other features the contract has, that he could execute a piece of malicious code that he had stored as data in a previous transaction.
    I fail to see how this could be possible under the premise that the code of a contract is immutable after creation.
    I can't see where you might think this involves modifying the contract's code. Instead, it involves ignoring the contract's code, and executing the code that the attacker has put into the contract's data (code is also data.)

    This could be blocked by data execution prevention, which is is a different security mechanism than immutable code.

    The second of the attacks I described avoids even data execution prevention, because in this version, the code that gets executed is in one or more literals within the 'code ROM', just as it is the case in the scenario you present. The only difference between these scenarios and the one that you question is in the location of the data that gets treated as code, and how it got there; none of them require mutable contract code.

    There are other variations on the theme: if this capability allows for selecting a routine from a library, then can it be used to run a routine that did not exist when the contract was created, or has been modified since then?
    Again, I fail to see how this could be possible under the premise that the code of a contract is immutable after creation.
    I am thinking of a dynamically-linked library, rather than a statically-linked one (I think there has been some discussion of this, e.g. for cryptographic functions.) Suppose each function is, like a contract, bound to a blockchain address, and also that its code is immutable. Now suppose the start pointer mechanism allows us to choose a library function with which to start the contract execution. This could be done several ways. One way might be that you pass a number, which is used as an index to select a function address from a predefined vector within the contract's immutable code. This is like an 'If' or 'case' statement, and presents no particular problem. If, on the other hand, the mechanism takes a blockchain address and starts with the function bound to that address, and that address is not checked against a list of acceptable addresses, then an attacker may be able to run arbitrary code that he had previously put on the blockchain as a library function. A third version might use a name server to associate function names with their implementations, and here you have to consider if that association could be changed to point to malicious code. None of this would require mutable code.

    A sort of hybrid attack might also be possible, in which an attack like the one you describe is used to call a malicious function from a library created by the attacker. That would minimize the amount of malicious code that would have to be hidden in the contract.

    I haven't even touched on the prospect of contracts running contracts...

    I agree that it is unlikely that these concerns will not covered in Ether Script 2, though I was surprised to find that ES1 has mutable code, and possibly data execution. When I posted this question, I expected a short, definitive answer explaining how these problems will be avoided in ES2. I have asked essentially the same question in the comments of the paper in question, so far without a response. The publication of a specification for ES2 should settle the question.


  • giactgiact Milan, ItalyMember Posts: 5
    @araybo, you seem to completely ignore how the Ethereum Virtual Machine is supposed to work.

    Data passed to a contract CANNOT be executed directly by that contract.

    In the current implementation, an instance of a running contract consists of:

    * The code ROM: this is an array of bytes (stored in the blockchain) and it's immutable; and an instruction pointer (aka PC, "Program Counter") into the code ROM.
    * The stack: this is a LIFO queue of 256bit-words.
    * The temp memory: this is an array of bytes.
    * The storage: a 256bit-word-addressable array of 256bit-words.
    * The "calldata": an array of bytes which is the data passed to the contract.

    These data structures are completely separate, and do not overlap.

    If the PC happens to point to a value greater than the size of the code ROM, the Virtual Machine is required to read a bytecode 0x00 which is the STOP opcode.

    Right now, the only way for a contract to be able to execute bytecodes passed to it as calldata would be to create a new contract using that data OR to reimplement the Ethereum Virtual Machine within itself.
    But nobody right in their mind would create such a contract and expect it to be trustable.
  • BluefinBluefin Member Posts: 47
    @giact, I believe @araybo is referring to if/when entry points in the contract that can make things "mutable" and compromise the use of the contract.
  • arayboaraybo Member Posts: 17
    @giact : The scenarios presented earlier are all conditional on the premise that the mechanism in question might create an exception to those rules, and so are not claims that they are now or will necessarily be possible. I will admit that I haven't paid a lot of attention to what ES1 (or whatever we are currently on) does, because it is going to be replaced. Gavin Wood has an example that checks for messages that could lead to code being overwritten, but that was in PoC3, which was not necessarily fully compliant with the current specification.

    Furthermore, a jump to an instruction that is disguised as a datum within the code does not appear to be ruled out by these rules.

    As for creating a contract and then calling it, that opens up a much wider scope for abuse than the specific scenario in this question. Of course no-one would knowingly create an opportunity for himself to be scammed, but a scammer would certainly create an opportunity to rip off another participant if he thought he could persuade someone to take that role. The verification of even relatively simple programs in Turing-complete languages is difficult, and mistakes are being discovered all the time. For example, a person might mistakenly think that immutable code was all the protection he needed, and therefore overlook the possibility of a jump to an instruction that is disguised as a datum within the immutable code.
Sign In or Register to comment.