What would a 'trusted data feed' look like?

StephanTual · January 2014

I'm going through the practical implication of smart contracts.

Take the white paper "Crop insurance" example: "One can easily make a financial derivatives contract but using a data feed of the weather instead of any price index. If a farmer in Iowa purchases a derivative that pays out inversely based on the precipitation in Iowa, then if there is a drought the farmer will automatically receive money and if there is enough rain the farmer will be happy because their crops would do well."

The keyword here is 'automate'. If you're not going to have collaterals on everything, you'll need trusted datafeeds. I have two concerns with datafeeds:

1) inaccuracy. I used to head the publishing of APIs for major retailers and have seen discrepancies despite enterprise level security and numerous audits.
2) collusion. it would be relatively straightforward for a 'small time' provider to manipulate the contracts by altering the datafeed. Nevermind insider manipulations for short term gains.

Please see realitykeys.com concept for reference - freebase is editable by anyone, yet they seem to 'trust' it.

What would make a datafeed genuinely trusted?

Roland · January 2014

The source and the channel. The source must have a very high incentive to publish accurate data, the channel obviously must be tamper proof.
Dreaming:
A block chain could be use as a peer reviewing mechanism, if the statement "precipitation in Iowa January 2014 was 0.98 inches" gets published a random meteorological institution can win a proof of reputation lottery and verify the statement. After 6 verifications the insurance contract gets redeemed

ctindall · January 2014

But what incentive does the meteorological institution have to accurately verify the data? Is someone paying them? Who? Many people benefit from accurate information, but no on individually has an incentive to pay for it. Furthermore, if they are making enough from its results that they can bankroll the whole enterprise, then they are probably more than a little biased in which results they prefer. In this example, they are probably selling crop insurance to Iowans and therefore want there to be no drought.

edmundedgar · January 2014

Hi, Edmund Edgar from Reality Keys here. Thanks for the mention.

To be clear, we do not "trust" Freebase. We don't trust any of our data sources, not even the European Central Bank. And even if we did trust them, we wouldn't trust our ability to pull data from them without somebody messing with it in transit. We actually think that trusting them completely would be unfair to them, because they aren't designed and secured with this purpose in mind, and we'd be incentivizing people to try to mess with their data.

In our design the data sources are only the first method we use to get a result. The second and final method is a human with an internet connection and Google, able to compare multiple independent data sources and make an intelligent decision about which one to believe.

Human confirmation is expensive, so our model is designed to avoid needing to do if it can be avoided. We pull from the data source, then we publish the result, then we wait for someone to object. The objection procedure consists of paying the objection fee. There's no need to pay the fee if the data source was correct, because it will just return the same result again. The automated part would be helpful in reducing the average cost of getting to the answer even if the data source consisted of a coin flip. The coin is right half the time, so you only have to pay for a human check half the time, reducing the average cost of resolving the contract correctly by 50%. (Or more, see below.)

In reality the inaccuracy rate of Freebase for our purposes will probably be fairly low. You could try to tinker with the data sources that feed into Freebase to game our result, or you could try to hack one of the other sources or feed misleading information to the ECB, but the other party could pay the fee and get the right answer anyway. There may be some situations where the value of the contract is lower than the fee, so bad data goes uncorrected, but these will be of low value (by definition) and the counter-party may object anyway out of general human cussedness. (Humans mostly seem to be Rule Utilitarians, and will tend to punish dishonest behaviour even when the short-term rational response would be to let it slide. I'm not sure how our future AI customers will behave...) Also many parties can share the same keys for the same events, so the cost of fixing the data can be split among many different people.

Allowing for partially unreliable data sources, we think the need to call on our humans will be fairly rare. Applications using our keys can reduce the cost per contract even further if they have some kind of community or reputation system, because if the data source is wrong, the fact that the "real" winner has the ability to pay the fee and stop the "wrong" winner from winning makes it worth the "wrong" winner's while to be seen doing the right thing and settling with the "real" winner directly (assuming the transaction is structured in a way that lets them do that), rather than forcing them to spend the money to correct our information and get the right key from us.

HTH, keep up the good work...

StephanTual · January 2014

Thank you very much Edmund, very informative post!

giannidalerta · January 2014

Why not a decentralized weather system? or Astronomy system? I know there is already an amateur weather network. he has all the devices on his house and posts the info, there are hundreds maybe thousands of people doing it. This could be built on Ethereum. One weather peer could post and others verified by others within a variance. This could be done with astronomy as well. You technically would not need to trust anyone. You would just need to set a minimum threshold.

Imagine a peer to peer network of smart devices posting weather related datapoints that could be used to build financial models and how they effect the markets. This would be pretty slick.

StephanTual · January 2014

DAC: Decentralized Astronomy Corporation

Great idea regarding the network of sensors! I wonder what else we can integrate with considering the current surge in 'internet of things'.

quantumcash · January 2014

Will If you had enough contract power you could set up a self verifying data input system that would use a decentralized search of internet and self determine accuracy.

edmundedgar · January 2014

Getting data in a fully decentralized way is obviously the ideal solution, but you still need to solve the problem that Bitcoin solves with the proof-of-work: How do you prevent people from setting up a bunch of bogus nodes to report whatever they want the network to think? In theory anything that you can verify with a trusted arbitor should also be doable with a network of consensus witnesses using proof-of-work or proof-of-stake. But Bitcoin has a very easy version of the problem, because all the miners have to do is timestamping, and verifying the data they timestamp is designed to be a very cheap, uncontroversial operation, where being honest is generally the simplest way to stay in line with the consensus. If you're just pulling automatically from self-reported data from somewhere it's not obvious how you make good decisions cheap enough, or that the majority of nodes will be properly incentivized to pull from the best data source, rather than the cheapest. Maybe it can be done and perhaps the Princeton team that's supposed to be working on this stuff have a good answer to this, but it sounds like there's a lot that could go wrong.

The middle way is that you have a bunch of competing services like Reality Keys out there providing data feeds that are designed to be trustworthy enough for these purposes, and the nodes and/or miners make a collective consensus decision, probably partly human-driven, about which ones can really be relied on. Each node might have a list of public keys of data providers like us that they trust to provide good data, we sign data with our private keys, and having got the public keys the nodes/miners are able to verify that transactions using our data are good for no less expense than they currently use to verify a regular transaction. You could also require a quorum of data providers to avoid excessive reliance on a single one in any given case. If a data service starts providing dodgy data, or gets hacked, or just fails to act with the transparency the community expects, people would start dropping it from their list of trusted providers. This requires a certain amount of attentiveness on the part of the people running nodes/miners, but as long as the community is paying a little bit of attention, the consensus should tend towards good data sources and people who fail to keep up with the consensus will get punished by having their blocks orphaned.

unimercio · January 2014

Excellent thread... Inaccuracy and collusion can to some extent be mitigated with aggregation, data scrubbing and statistical analysis.

I'm a consumer of vast amounts of healthcare data in my cost containment business. The sources are varied: government, 3rd party proprietary, client data and in-house.

We typically extract, transform and load,,, then throw out the outliers. whats left is somewhat 'statistically accurate' (oxymoron) .

Granted, this is not an ideally accurate oracle/trusted source but it does greatly mitigate collusion (assuming you trust your statistical methods and they are made public).

I see this very challenge in the critical path of most autonomous contracts. This thread might be the precursor to developing "best practices" and standards for what constitutes an oracle.

Cheers

charley · January 2014

Still catching up here... Anyone able to explain how a contract would receive / run logic around weather data? Will Ethereum contracts be able to say loop on a WebSocket connection? Or - a centralized application must push a transaction (with weather data) to the decentralized contract?

edmundedgar · January 2014

@charley:

Still catching up here... Anyone able to explain how a contract would receive / run logic around weather data? Will Ethereum contracts be able to say loop on a WebSocket connection? Or - a centralized application must push a transaction (with weather data) to the decentralized contract?

I may be missing something Ethereum-specific, but the obvious thing to do would be that you'd send a transaction to spend the output of a previous transaction, and evaluating whether or not your transaction was valid as a spend of that previous transaction would involve checking the external data sources.

So a regular Bitcoin transaction looks like (slightly simplified):
T1) [Some input] | This output is spendable if you signed with the private key to public key XYZ.
T2) Here's a signature using the private key to XYZ! | [ Some output]

If miners knew how to surf the internet and check facts rather than just hashing things and checking signatures you could do:
T1) [Some input] | This output is spendable if you signed with the private key to public key XYZ and it rained in Tokyo on Thursday.
T2) Here's a signature using the private key to XYZ! And it rained in Tokyo on Thursday, go ahead and check if you don't believe me! | [ Some output]

agamemnon · January 2014

I may be missing something Ethereum-specific, but the obvious thing to do would be that you'd send a transaction to spend the output of a previous transaction, and evaluating whether or not your transaction was valid as a spend of that previous transaction would involve checking the external data sources.

my understanding is that contracts cannot check (interact with) external data sources or read the
blockchain. Is it right to assume that contracts can get data from two sources?

(1) human input (via sending a transaction)
(2) from another contract

edmundedgar · February 2014

@agamemnon :

"my understanding is that contracts cannot check (interact with) external data sources or read the blockchain. Is it right to assume that contracts can get data from two sources?"

Right, they can't do that now or in any current design, we're doing theoretical hand-waving here about what we could make them do if we built something cleverer than what we've got at the moment.

The Reality Keys solution for current purposes is that we issue a public key representing "It rained in Tokyo on Thursday", and only release the private key to it if it actually rains in Tokyo on Thursday. (*)

That way you can get the equivalent of:
T1) [Some input] | This output is spendable if you signed with the private key to public key XYZ and it rained in Tokyo on Thursday.
T2) Here's a signature using the private key to XYZ! And it rained in Tokyo on Thursday, go ahead and check if you don't believe me! | [ Some output]

...by doing:
T1) [Some input] | This output is spendable if you signed with the private key to public key XYZ and the private key to public key RK123.
T2) Here's a signature using the private key to XYZ! And another one using the private key to public key RK123! | [ Some output]

This will work on the current bitcoin system, and you'll be able to do even more with it on Ethereum. But the interesting question from this thread is whether there's a way to design us out of existence.

(*) We don't actually do weather data yet, but we're working on it.

agamemnon · February 2014

> But the interesting question from this thread is whether there's a way to design us out > of existence.

escrow based contracts with a settlement protocol can achieve the same ends.

Settlement Protocol
=====================

Party A sends a claim to the contract with a refundable truth bond
Party B can challenge claim within a certain timeframe

Challenging claim involves paying a refundable truth bond.

Authorised human investigates claims and makes a ruling. Losing party
forfeits truth bond.

w0bb1yBit5 · February 2014

Perhaps the Weather Contract has the public key of the trusted source embedded in it. Evaluation consists of checking a piece of data provided to the transaction inputs. If that data is (e.g.) the hash of a statement that it rained in Tokyo on Thursday signed by the private key of the embedded trusted source, then the result is success. People, bots and contracts that trust the embedded source choose to use the Weather Contract. The certificate chain or web of trust for the trusted source is outside of Ethereum, off the blockchain, and subject to all the pki crap. Key could be revoked by Verisign or untrusted by your very own conspiracy-minded WoT. Just like your web browser.

agamemnon · February 2014

> Perhaps the Weather Contract has the public key of the trusted source embedded in it

this is another approach. depends on the nature of the contract. Some might
be cost sensitive (storage fee for public key). Presumably the contract would
include decryption logic as well (cryptofee). The concept of truthbonds provide
the incentive for counter parties to input accurate data in contracts

yoyo · February 2014

@agamemnon:
Is it right to assume that contracts can get data from two sources?
(1) human input (via sending a transaction)
(2) from another contract

(3) by reading the persistent storage of another contract. (See extro assembly instruction).

I think this whole topic is referred to as "verifiability" by Szabo and it is one of the three main desirable characteristics that he defines about smart contracts. See the seminal paper for example, and insight about tradeoffs with privacy.

yoyo · February 2014

@agamemnon:
Is it right to assume that contracts can get data from two sources?
(1) human input (via sending a transaction)
(2) from another contract

(3) by reading the persistent storage of another contract. (See extro assembly instruction).

I think this whole topic is referred to as "verifiability" by Szabo and it is one of the three main desirable characteristics that he defines about smart contracts. See the seminal paper for example, and insight about tradeoffs with privacy.

w0bb1yBit5 · February 2014

The whitepaper suggests (a la Mastercoin approach) putting the pubkey of the trusted source on the blockchain. I think my previous suggestion is pretty much identical to this. Also, this would not design RealityKeys out of existence, just allow Ethereum user base to choose whether an answer from RealityKeys was authoritative for their particular transaction.

agamemnon · February 2014

> The whitepaper suggests (a la Mastercoin approach) putting the pubkey of the trusted > source on the blockchain.

Three things. Firstly, mastercoin uses keys to implement their distributed exchange which is not contracts as we understand it in the ethereum context.
http://blog.mastercoin.org/2013/11/02/tutorial-test-msc-btc-distributed-exchange-transactions/

I have not come across any information indicating that mastercoin has implemented
contracts. I am not really following what they are doing. If they have, please post
a link.

Secondly, the paper is seminal, however it is not ethereum aware in several ways.

Thirdly, I have absolutely no problem with what u have suggested. Nothing stops
the contract developer from using both approaches or even others we have not
thought of. I think the Realitykeys service is cool and that has not changed. Perhaps
the service needs to change in the manner u have suggested to work within the
constraints imposed by the ethereum smart contract architecture. Developers
will invariably use approaches that work best for them based on the nature of the
contract.

edmundedgar · February 2014

@w0bb1yBit5 :

The certificate chain or web of trust for the trusted source is outside of Ethereum, off the blockchain, and subject to all the pki crap. Key could be revoked by Verisign or untrusted by your very own conspiracy-minded WoT. Just like your web browser.

If people want to implement something that uses a single certificate or public key for us and checks facts are signed with the private key to that we'd be very happy to start signing facts in that way too. It would actually be easier for us to manage than our current method, which involves issuing a gazillion random key-pairs and keeping track of which ones we've assigned to which fact.

rmsams · February 2014

How about a Schelling scheme? For a question Q, anyone can provide their answer hash(A) in encrypted form along with X ether. All answers are collected within some fixed time interval. Once that ends, everyone proves his answer is A, and those answers in the top and bottom quartiles “lose” and don’t get their ether back, and those in the middle two get paid back 2*X.

Incentive is to bet the consensus, and what would that be other than some focal point like… what the correct/best answer to Q really is. Specificity in the formulation of Q could help fix the focal point (e.g.. “if answer on http://www.ecb.europa.eu/…. differs from Reuters ticker XXXX, take the average”).

If more than 50% provide the same answer, some randomisation will be needed for breaking ties. At the limit where everyone answers the same, the game is a simple lottery.

Could still be manipulated with a party broadcasting/promoting an unreal answer as a focal point, but arguably reality would still win. And there is no trusted authority, the majority vote is the authority.

cmason · February 2014

I posted my comments on this here:

http://forum.ethereum.org/discussion/260/mistakes-blunders-hacking-coercion-and-war

It seems that most of the comments suggest that it will be necessary to rely on some trusted authority. Even if these authorities are somehow verified, and their conclusions are signed, that still doesn't prevent abuse. This is very similar to how SSL works. Certificate Authorities can be corrupted, hacked or forced to sign things they shouldn't. In fact, all of this has happened. We've had shady CAs, CAs that were hacked, and I believe it has been reported that the NSA forced some CAs to sign whatever they wanted. Even worse, when you have a bad CA, it is *very* hard to revoke it, and many client apps will never see and/or honour the revocation. And as far I know, no one has revoked Verisign yet.

In the case of a trusted authority providing data, when the stakes are high enough, any coercion is likely to be more subtle than blunt. Just the right data may be changed, just enough, at just the right time, to generate either one big impact, or a large impact over time in aggregate. You already have people hacking twitter accounts or chatrooms to post false information in order to try to manipulate a stock price. It rarely works, but sometimes one gets lucky and a false tweet goes viral, and profit is made. Raise the stakes 1000-fold and imagine what might happen then.

w0bb1yBit5 · February 2014

The most important thing about the Ethereum model is that the trust model is determined by the users through their choices to write and use individual contracts. Much is made of the incentive to cheat in a pseudononymous-irreversible-transfer model. And certainly it would be a catastrophe if trust were baked in. But I can choose to do a 1BTC bet on tomorrow's weather using NASA and their Verisign cert, or not. When Goldman Sachs does a 1Trillion USD notional swap with HSBC, they agree that (say) the US Dept of Labor will be the trusted source for next year's CPI. They both know that someone could be paying off the economists in the Dept of Labor, but they've evaluated their own trust model and find it effective for their purpose. And that is real money they are putting on the line. So, I plead that we don't need competitive consensus chains for every fact-oracle in the universe. Just a tool like Ethereum where we can implement whatever trust model individuals freely consent to engage in.

cmason · February 2014

WobblyBit5, good points. Although I think both Goldman Sachs and HSBC know the CPI (and the US Dept of Labor statistics in general) are BS, but so long as they have consensus about the BS, everyone is happy. Yes, real money is changing hands, and it's a lot of money. But these are the big guys, playing by their own rules. They can't try to steal a trillion dollars from one another. That would invalidate the system and stop them from stealing billions, millions and thousands from lots of other sources, using lots of different means.

In fact, the level of security used by the big banks, is actually really, really bad. I used to work as an Advisor with a major Canadian bank on information security and controls, and you would not believe the meetings I had. The level of thought, reflection, examination and testing used by the Bitcoin developers, and others in this space, is far beyond what the banks do. Have you ever looked at the SWIFT protocol? It's impossibly antiquated and horrible. But it survives because you can only get on the network if you are big enough, trustworthy enough and, frankly, ideologically pure enough, that they will take you. The big thieves don't steal from each other (beyond minor skirmishes), they steal from everyone else.

agamemnon · February 2014

I think we are putting the cart before the horse here! There hasn't
been any authoritative answer to the following questions.

(1) Can ethereum contracts access data feeds?
(2) If so, how?

If contracts cannot access data feeds, what is the point of spending time
discussing sources and quality of data feeds. The question of how contracts
access data is a huge security issue which needs to be explained by
the ethereum development team.

There is a lot of mastercoin thinking here. Data feed is used by mastercoin
for an entirely different purpose. The guys at mastercoin think contracts
is a bad idea because of security concerns.

http://blog.mastercoin.org/2014/02/04/should-the-master-protocol-do-scripting/

Now, I am not saying it is not possible to build a secure ethereum scripting
engine. Some, authoritative information on ethereum's contract security
architecture will be welcomed

mlacorte · February 2014

It seems highly unlikely to me that a contract is going to be able to access data from external sources due to the massive security issues. That being said, contracts have the ability to access data from other contracts, so data feeds could be implemented in the form of a publisher pushing data to a contract that only they have access to.

agamemnon · February 2014

>It seems highly unlikely to me that a contract is going to
>be able to access data from external sources due to the
>massive security issues.

agreed....

>That being said, contracts have the ability to access data
>from other contracts, so data feeds could be implemented
>in the form of a publisher pushing data to a contract that only
>they have access to.

You could do this to remove one attack vector from the mix. An interesting thread
on ethereum's contract security can be followed here...

https://bitcointalk.org/index.php?topic=431513.0

In the short term, we need to see a contract security architecture paper to understand
the development teams approach. My sense is that the testnet will run for quite
sometime.

This is quite a high risk proposition. A hacked financial platform will have zero
value

w0bb1yBit5 · February 2014

It is not clear to me how contracts can communicate bi-directionally. To begin with, they have an internal stack and storage, a set of inputs that are associated 1-1 with an execution. So nothing comes in that wasn't there when you started execution. It is true that a contract has outputs that may include other contracts and/or addresses. An output always adds ether to an address. A contract may do arbitrary processing on its inputs, which I had assumed in this context means "validate", "check trustworthiness", etc.. Some person or bot needs to push a proposed external feed to a contract as an input. The contract can employ whatever paranoid or innocent logic it has been programmed with to determine if it "believes" the fact. It can also save the fact and/or its belief about the fact. While a contract A could broadcast this internal state to other contract addresses, this implies spending ether. And any return information from other contracts would come in the form of a subsequent transaction with an address of A, which would begin a new execution of contract A. Would it be possible or desirable to use this to implement a "publish/subscribe" mechanism? Contracts B, C, D... subscribe to contract A, depositing ether to pay for contract A to emit publication transactions when the validated data is updated? I suppose.
It remains my firm conviction that the choice of trust model for a fact is up to the individual contract. A better trust model will not be baked into the base protocol. The answer to "what would a trusted datafeed look like?" is in turn, "What is your threat model? After you answer that, I can build you a script that validates against that model."

w0bb1yBit5 · February 2014

From the horse's mouth:
Saw this on the Ethereum team's reddit AMA:
Q: how do you pull in external data? (eg. weather, EURUSD cross, etc.)
A: The data feed source needs to release a signed data feed, and then the contracts will verify the signature with the data feed's public key in script code.

(which, btw, is consistent w/ my earlier thought "trust" is external to Ethereum; "proof" is done cryptographically in a contract.)

Howdy, Stranger!

Quick Links

Categories

What would a 'trusted data feed' look like?

Comments