Skip to content

txoutsbyaddress index (take 2)#8660

Closed
djpnewton wants to merge 11 commits intobitcoin:masterfrom
djpnewton:cozz8
Closed

txoutsbyaddress index (take 2)#8660
djpnewton wants to merge 11 commits intobitcoin:masterfrom
djpnewton:cozz8

Conversation

@djpnewton
Copy link
Contributor

This is an attempt to revive PR #5048

I have moved the index to its own database and also added a gettxoutsbyaddress REST command

I still need to think about the layout of binary input/output for the REST command, or is bin encoding necessary?

Also still missing tests for the new REST stuff

@dcousens
Copy link
Contributor

dcousens commented Sep 4, 2016

It's not overly complicated to maintain this index outside of bitcoind, IMHO, that is where it should be. Perhaps indexd or some other service.
Concept NACK

edit: See #8660 (comment)

@luke-jr
Copy link
Member

luke-jr commented Sep 4, 2016

NACK. Txouts are not associated with addresses at all.

@paveljanik
Copy link
Contributor

@luke-jr They are, but this association is not useful for 99.9% bitcoind users. And it slows down the normal work used by 100% of bitcoind users. So NACK.

If people need this, please contribute some description/work using the external indexing solution into contrib/.

@btcdrak
Copy link
Contributor

btcdrak commented Sep 4, 2016

Concept ACK from me.
@paveljanik The feature is off by default so it shouldn't slow down anything for normal users.
OT: If indexes are really considered so bad then -txindex should be removed.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't use bash anymore in /rpc-tests

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, please look at the existing tests in qa/rpc-tests, use the Python framework. Also add the test to qa/pull-tester/rpc-tests.py so it gets executed by default.

@djpnewton
Copy link
Contributor Author

The inspiration for this PR is to try and make it easier to build a fully validating wallet that does not use significantly more resources then a full node itself.

It seems that most (all?) current solutions for building a wallet either involve added trust (SPV or centralized API services) or added resource requirements (bitcore.io needs 8GB RAM and 200GB disk space)

Thats the rationale (which I probably should have stated at the outset) anyway. The last time I was trying to make a demo it was easier just to use the blockcypher API, or maybe there are projects that I missed that fill this gap?

@luke-jr
Copy link
Member

luke-jr commented Sep 5, 2016

If the RPC was by script (as the internals seem to be) instead of by address, it might make sense.

@dcousens
Copy link
Contributor

dcousens commented Sep 5, 2016

If people need this, please contribute some description/work using the external indexing solution into contrib/.

OT: If indexes are really considered so bad then -txindex should be removed.

IMHO, that is where txindex should be, at least then it might easy to extend to things like this without adding 1000 SLOC.

@djpnewton
Copy link
Contributor Author

@luke-jr the RPC accepts both hex encoded script or addresses

@jonasschnelli
Copy link
Contributor

The inspiration for this PR is to try and make it easier to build a fully validating wallet that does not use significantly more resources then a full node itself.

It seems that most (all?) current solutions for building a wallet either involve added trust (SPV or centralized API services) or added resource requirements (bitcore.io needs 8GB RAM and 200GB disk space)

These are good point!

Some thoughts:

  1. If we would add something like this, it should be more modular. Have a look at https://github.com/bitcoin/bitcoin/pull/8501/files, there you can see how just a handful of lines in init.cpp can add core features over a flexible modular approach. It also simplifies rebases.

  2. Was it considered to add the index outside of bitcoin-core? I guess using ZMQ notifications together with REST (or RPC) would work. What would be the downsides or advantages of the "external" approach?

@dcousens
Copy link
Contributor

dcousens commented Sep 5, 2016

Was it considered to add the index outside of bitcoin-core? I guess using ZMQ notifications together with REST (or RPC) would work. What would be the downsides or advantages of the "external" approach?

As someone who does this approach, the only difficulties is ensuring that you handle orphans and forks properly.
They do happen enough and cause issues, so you need to be prepared for the floor to fall out between RPC calls.
Simple enough code, but battle testing it can take a while and would probably be important to have examples for in contrib/ for beginners.
As far as the db requirements go, you just need to be able to rollback data for re-orgs.

@btcdrak
Copy link
Contributor

btcdrak commented Sep 5, 2016

Indexes are desperately needed, there are quite a few users of my addrindex fork https://github.com/btcdrak/bitcoin/releases that demonstrate this need and there is a steady flow of people finding out about it. This PR is a more lightweight option and I think it should be part of Bitcoin Core in an easily accessible way (i.e. just turn on with a switch+reindex). While I could say add it to my fork, it is does reduce the visibility of it.

@sipa
Copy link
Member

sipa commented Sep 5, 2016

-txindex was only added for backward compatibility with the pre-0.8 getrawtransaction RPC. I think people who need it should build an external solution. It's inefficient, requires updates all over the place, and doesn't even guarantee consistency when there are reorgs, and is not needed for any of the functions Bitcoin Core provides. The addrindex hack is even worse - it encourages people to build an external wallet in an easy, unscalable, and inadvisable way.

Indexing the UTXO set like this PR does is less bad - at least it doesn't prevent pruning - but what function would it provide apart from RPCs? The wallet can't actually use it (as it would miss historical transactions), and it won't be available in SPV mode.

@jonasschnelli
Copy link
Contributor

Agree with @sipa.

A maybe more scalable way how "wallet" (or lets say something like the BWS [bitpay wallet service]) could be built on top of bitcoin core would be:

  • add multiwallet support (something similar like Multiwallet qt #2184)
  • add support for pubkey only wallets (should be relatively trivial, partially solved with the current support for "watchonly")
  • external wallet could use RPC in order to create "wallets", use fundrawtransaction in a pubkey-only multiwallet setup and
  • transactions can and should be signed in the "external wallet space"

"Wallets" would then be reduces to...

  • A set of addresses to keep track of (track/store transactions)
  • Coinselection and utxo per set

Restoring backups is the only feature I can think off that would require a addr-index (otherwise rescanning will be very slow).

@djpnewton
Copy link
Contributor Author

The entire point of running a node is to provide validation for a wallet somewhere. That is what makes a node significant.

Is it actually simple to provide this service outside of bitcoin-core? Are there any projects that manage to do it without duplicating the entire blockchain into their own database? It seems a real shame if we go to the trouble of assembling the UTXO set and cant provide ways for external wallets to leverage it.

@sipa are you sure it would miss historical transactions? Any transactions adding funds to the wallet will show up in the UTXO set and any transactions spending funds from the wallet can only be created by the wallet itself (I guess they can be malleated).

@jonasschnelli I think I like this idea. It might not be as flexible in the kinds of external wallets it allows though.

@dcousens
Copy link
Contributor

dcousens commented Sep 6, 2016

Are there any projects that manage to do it without duplicating the entire blockchain into their own database?

Of course, but this index is not small by any means.

@laanwj
Copy link
Member

laanwj commented Sep 13, 2016

I think this functionality is very useful. E.g. to get the current balance for an arbitrary address/script.
Missed this a few times. I think external wallets such as coinjoin may find this useful as well.

  • Sure, you cannot use this with SPV but the primary reason for Bitcoin Core's existence is to be a full node. Storing the UTXO set comes with that. To be clear SPV mode is by no means the future of this project, at most it would be optional.
  • Unlike a full tx index a UTXO index can be used with pruning, and takes a lot less disk space.

Concept ACK.

The wallet can't actually use it (as it would miss historical transactions),

There may be use for a quick-rescan that ignores historical transactions, which can be used with pruning. (out of scope for this pull, but it has been brought up before - see #8497)

@laanwj
Copy link
Member

laanwj commented Sep 13, 2016

is bin encoding necessary?

No it's not necessary - only if there is a straightforward binary representation like for transactions and blocks. You don't have to make up one.

@dcousens
Copy link
Contributor

dcousens commented Sep 13, 2016

I wasn't aware this was just for UTXOs, not txouts at any point in history (aka, a full txout index, similar to the addrindex fork).

light concept ACK

@dexX7
Copy link
Contributor

dexX7 commented Sep 13, 2016

Concept ACK.

Several projects I know of use the addrindex stuff from years ago to derive information about UTXO state, and this is much more elegant.

src/init.cpp Outdated
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My personal preference would be something shorter like "txoutsindex", especially because it really indexes scripts and not addresses.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure I dont see a problem with that

Copy link
Member

@maflcko maflcko Sep 15, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or just txoutindex (or utxoutindex or utxoindex)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

argh, paradox of choice!

Ok, how about if nobody has any objection I will change the name to "txoutindex" (command line parameter, RPC, REST etc)?

@djpnewton
Copy link
Contributor Author

Ok I converted the RPC tests to python and added some REST tests

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: You can put something like the following up a few lines, so the scope is not limited to setup_network.

self.extra_args = [["-txoutsbyaddressindex"]] * 3

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: I think this is not required

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

self.nodes = start_nodes(3, self.options.tmpdir, self.extra_args)

should work as well. (You can put self.extra_args = [['-txoutindex'], []. []] in __init__())

Copy link
Member

@maflcko maflcko Sep 15, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: you can move the connect_*s up into setup_network(). I think all other tests do this

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this required?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

node 2 is stopped so that later on it can be restarted and then orphan a block created by node 1 (to test txoutindex handles the reorg)

basically its stopped so I will not build on a block created by node 1

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should probably mention this in a comment.

@djpnewton
Copy link
Contributor Author

addressed review nits by @MarcoFalke (thank you)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Missing +x permission

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

@btcdrak
Copy link
Contributor

btcdrak commented Dec 16, 2016

needs rebase

@gmaxwell
Copy link
Contributor

gmaxwell commented Jan 2, 2017

Concept ACK. Comment, can others provide feeback: I think the argument should be changed to unspentoutputindex or utxoindex or something like that, otherwise it sounds like it indexes everything. Similar for the rpc names.

@droark
Copy link
Contributor

droark commented Jan 3, 2017

Hmmm. If @djpnewton doesn't pick this up again in the next few days, I'll seriously consider keeping it going.

}
};

typedef std::map<uint160, CCoinsByScript> CCoinsMapByScript; // uint160 = hash of script
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe replace uint160 with CScriptID to make the map type more obvious.

Copy link
Contributor

@ryanofsky ryanofsky left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed most of this PR, but not the whole thing. Mostly minor comments, except for a possible bug in CCoinsViewByScript::FetchCoinsByScript.

// Return a modifiable reference to a CCoinsByScript.
CCoinsByScript &GetCoinsByScript(const CScript &script, bool fRequireExisting = true);

static uint160 getKey(const CScript &script); // we use the hash of the script as key in the database
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could remove if using CScriptID as the map key type.

uint256 hashSerialized;
CAmount nTotalAmount;

CCoinsStats() : nHeight(0), nTransactions(0), nTransactionOutputs(0), nAddresses(0), nAddressesOutputs(0), nSerializedSize(0), nTotalAmount(0) {}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could assign these values inline above with c++11.

if (fRequireExisting)
return cacheCoinsByScript.end();
}
CCoinsMapByScript::iterator ret = cacheCoinsByScript.insert(it, std::make_pair(key, CCoinsByScript()));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't this be inserting tmp into the map, not an empty CCoinsByScript() set?

Also,

  • It would be nice if this code had some basic c++ unit tests similar to coins_tests.cpp, even if they aren't as comprehensive.
  • Don't really need the other temporary ret variable here, could just return insert().second.
  • Could replace insert with emplace or emplace_hint here to avoid needing to call make_pair.

CCoinsMapByScript::const_iterator it = mapCoinsByScript.find(CCoinsViewByScript::getKey(script));
if (it != mapCoinsByScript.end())
{
BOOST_FOREACH(const COutPoint &outpoint, it->second.setCoins)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

could avoid the loop with

coinsByScript.setCoins.insert(it->second.setCoins.begin(), it->second.setCoins.end());

i++;
pcursor->Next();
} catch (std::exception &e) {
return 0;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this exception being caught instead of passed on to the caller? I'd think it'd be better to remove the catch and have the error propogate up, or add a comment here which explains the cases where std::exception is expected and why they should return 0.

BOOST_AUTO_TEST_CASE(MempoolIndexingTest)
{
CTxMemPool pool(CFeeRate(0));
CTxMemPool pool(CFeeRate(0), false);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The CTxMemPool constructor saves a const reference to the second argument, so instead of passing literal false, you should pass a variable with a lifetime at least as long as CTxMemPool. Might be better to have the CTxMemPool class just store a bool instead of a reference to a bool to avoid the need to do this.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should probably mention this in a comment.

self.sync_all()
assert_equal(self.nodes[0].getbalance(), 5090)
assert_equal(self.nodes[1].getbalance(), 10)
txouts = self.nodes[0].gettxoutsbyaddress(1, (address,))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe check other txouts fields here as well.

if (!hashBlock.IsNull())
batch.Write(DB_BEST_BLOCK, hashBlock);

LogPrint("coindb", "Committing %u coin address indexes to coin database...\n", (unsigned int)count);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could use %z instead of (unsigned int) cast, I think.

@paveljanik
Copy link
Contributor

Needs rebase.

@djpnewton
Copy link
Contributor Author

rebased

I will start addressing review by @ryanofsky (cheers mate)

Agree with @gmaxwell that naming is confusing, I am thinking about unifying it around "-utxoindex"

@maflcko
Copy link
Member

maflcko commented Jan 5, 2017

I am thinking about unifying it around "-utxoindex"

Sounds good.

@btcdrak
Copy link
Contributor

btcdrak commented Jan 18, 2017

needs rebase again.

@ryanofsky
Copy link
Contributor

ryanofsky commented Feb 13, 2017

@djpnewton or @droark have you done more work on this? I'd be happy to help continue the effort on this feature.

@droark
Copy link
Contributor

droark commented Feb 19, 2017

@ryanofsky - I hadn't been working on this but I'm happy to run with the ball since the OP seems to have disappeared again. I'll post a new PR once it's up and running.

Thanks.

@fanquake
Copy link
Member

Closing this for now then. @Droak or @ryanofsky can open a new PR once they've restarted the work.

@fanquake fanquake closed this Feb 19, 2017
@bitcoin bitcoin locked as resolved and limited conversation to collaborators Sep 8, 2021
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Projects

None yet

Development

Successfully merging this pull request may close these issues.