Risks of buffer overflows in contracts

I understand that the first few storage slots contain the byte code of the contract, and the space after this can be used for data.

I would suggest separating out those two, otherwise that makes writing contracts harder, plus you are essentially making it very easy to have buffer overflows in contracts.


  • CoinerCoiner Member Posts: 17
    I think the main point is to allow modification of the contract code under certain conditions.
    Once one does, it is simple to have some conventions on which segment is code and which is data.
  • yoyoyoyo Member Posts: 34 ✭✭✭
    I think the main point is to allow modification of the contract code under certain conditions.
    Making a special treatment for program code would loose generality. Being able to treat its own code as any other data is a very interesting property to have for a contract.
  • ChristianPeelChristianPeel Member Posts: 26
    I tried and could not immediately think of a contract which I'd trust to modify its own code.

    On the other hand, all the contracts I could think of (sub-currencies, exchanges, derivatives, etc...) I would not want to have self-modifying code. I don't want my exchange to modify itself and suddenly start sending a percentage of the currency I'm trying to exchange to the contract author.

    I'd love to see an example (or have an example described in words) of a contract which is self-modifying which does something useful and can be trusted.
  • arckearcke Member Posts: 34
    Not a very concrete example, but Imagine a DAC which uses genetic algorithms to select efficient trading algorithms or other high-level strategies implemented into the contract. Instead of launching v2, v3 etc. of the contract, it will automatically keep itself up to date to the current market.

    There is the notion however that anything which can be achieved with code as data can also be achieved by a program running code and data separately. I do hope this is not a feature there for malicious developers to obfuscate parts of contracts.

    Trusting a contract could be done based on the code. If the codes operation is unclear due to hidden features this would make it possible to fool the majority by sneaking in uses of 'code as data'.
  • yoyoyoyo Member Posts: 34 ✭✭✭
    edited January 2014
    For me it's all about leaving possibilities open and not putting artificial barriers because of fears from a different abstraction level.
    This type of data/code separation can be implemented in the higher level language constructs if needed, but the assembly should try to stay as general as possible, and as dumb as possible.

    Now if we project in a future with autonomous agents, If I were a contract I'd like to analyze and optimize my own code. You don't have to trust it right away. If I messed up I can revert, if it works better then it's now more efficient and you'll like it. It could also spawn variations of itself to try out experiments, observe relative efficiency and merge working ones into its own code.

  • FlavienFlavien Member Posts: 7
    I don't think you understood the point I'm trying to make. I'm not suggesting a contract should not be able to modify its own code. I'm saying the byte code of the contract should be in a separate array than its data. By trying to modify the data, I may inadvertently modify the code of the contract. Or worse, a mistake in my code may allow an attacker to modify it.

    If you take the parallel with an operating system running a program, every modern OS today makes the distinction between executable memory and non-executable memory in a process, precisely to avoid this kind of issues.

    > Once one does, it is simple to have some conventions on which segment is code and which is data.

    People make mistakes. I'd rather have a system that's secure by design, especially when you're dealing with money.
  • micersmicers Member Posts: 3
    Ok, allowing the contract to self-modify (which it could if it has access to it's own code) is a *REALLY* bad idea. Allowing the contract recipient access to the contract space is equally a bad idea. Self-modifying code was a great idea when we all wrote machine language and total memory foot print of a program was 4k bytes... No, nobody needs access to the code space except through the user interfaces.
  • micersmicers Member Posts: 3
    yoyo with respect to optimizing my own code, I can do that without access to the code space. One can instrument his own code with whatever code necessary to monitor it's performance and subsequently store and reread the optimization variables. This is what genetic algorithms do. They evaluate their own performance and store that date in a "gene pool" which they then use to tweak their choices. Access to the code whether self-modifying or client access, allowing code modification in anyway, that is an idea destined to end in fraud... or worse.
  • yoyoyoyo Member Posts: 34 ✭✭✭
    @micers, this would only work if you knew in advance what variables may be optimized and what algorithms may be subject to strategy change etc. What if you want to optimize the performance monitoring code ? What if you want to change the fitness function entirely ?

    I'm not saying it should be used by all contracts, but the possibility needs to be there to not loose generality.
    And I still think you are only considering the short term case of humans writing contracts, without considering the longer term of contracts writing contract code. It's not about what we used to do back in the days, it's about leaving possibilities open.

    Being able to read other contracts persistent storage is not only desirable, it is probably required to build complex autonomous agents. Also see my post about read-only contracts storing lookup tables for the benefit of all.

    "every modern OS today makes the distinction between executable memory and non-executable memory in a process"

    My point exactly. That happens at the OS level, not at machine code level. It might be possible to enforce that separation in the compiled code if the compiler is designed to do so. (ex: only use addresses above 2^64 to store data). But I don't see any pressing need to put that sort of intelligence at that low level.

  • superfreakaholicsuperfreakaholic Member Posts: 6
    Self-modifying contracts are an interesting idea, but I think they are more likely to be used for evil than for good... Maybe it should be a phase II feature.
  • mlacortemlacorte Los AngelesMember Posts: 27 ✭✭
    I agree with @Flavien in that there is no reason storage data should be kept in the same array as program data. That's just a ticking time bomb. Programmers are going to want to do what they normally do, which is to store data at the start of an array. It's non-intuitive and entirely unnecessary. I imagine that a very high number of developers will learn about this by watching their program self destruct and wondering what went wrong. More importantly, it introduces a whole new set of potential bugs and vulnerabilities. Just store the program data in separate field. It's simple and takes nothing away.
  • XertroVXertroV Member Posts: 10
    Everyone should remember the size of storage is 2**256, each able to contain a 256-bit int. Self modifying contracts is important as it allows consensus changes to organisations or DACs. Furthermore, the address space is so huge that if you're colliding you're not picking points far enough apart. A good way to do this is SHA3(input) because you just won't get collisions. Then you use magic constants to pick up the slack for state-like variables.

    To store program data in separate memory you'd need another set of opcodes, and ES2.0 has just removed MSTORE / MLOAD in favour of the stack. Having another storage unit just complicates things further, even though one might prefer the separation. Furthermore, it is by design, currently, that contracts can access other contracts scripts, so we'd need one more opcode to allow C1 to see the code for C2.
  • ChristianPeelChristianPeel Member Posts: 26
    From the ES 2.0 blog post "Another modification is that code should be immutable, and thus separate from data; if multiple contracts rely on the same code, the contract that originally controls that code should not have the ability to sneak in changes later on. The pointer to which code a running contract should start with, however, should be mutable."

    This sounds to me like ES 2.0 will not allow self-modifying contracts.
Sign In or Register to comment.