Skip to content

100k peers project#5849

Closed
rustyrussell wants to merge 5 commits intoElementsProject:masterfrom
rustyrussell:100k-channels-project
Closed

100k peers project#5849
rustyrussell wants to merge 5 commits intoElementsProject:masterfrom
rustyrussell:100k-channels-project

Conversation

@rustyrussell
Copy link
Contributor

@rustyrussell rustyrussell commented Dec 23, 2022

(Based on #5868 which changes htables to need to be tal'ocated, simply to avoid merge hell!)

I hacked the code to ignore signatures so I could quickly open 100k peers with one channel each, and looked at how the node responded. Two simple fixes (linked lists were the culprit) and such a node is manageable (though not exactly speedy!).

Before this, reconnecting to all the peers took longer and longer (noticeable before we even had 10k peers). And startup was extremely slow (not sure how bad it was, I gave up!).

It still takes 9 seconds to listpeers, and almost a minute to startup, and database is about 185MB.

Changelog-None Not user visible.

@whitslack
Copy link
Collaborator

reconnecting to all the peers took longer and longer

Could this be why my lightning_connectd eats my CPU for breakfast, lunch, and dinner?

@rustyrussell
Copy link
Contributor Author

Could this be why my lightning_connectd eats my CPU for breakfast, lunch, and dinner?

Hmm, I don't think so: this was lightningd. connectd does do all the crypto now, so perhaps it's a bottleneck on large nodes? The best way to know for sure is:

perf record --call-graph dwarf $(pidof lightning_channeld)

And let it run for 30 seconds, then ^C and do perf report?

@rustyrussell rustyrussell force-pushed the 100k-channels-project branch 2 times, most recently from 2070043 to 872aa9a Compare January 14, 2023 04:55
wallet should really be a subdir of lightningd.

Signed-off-by: Rusty Russell <[email protected]>
They're used in several places, and we're about to add more.

Signed-off-by: Rusty Russell <[email protected]>
@rustyrussell rustyrussell force-pushed the 100k-channels-project branch from 872aa9a to 10ce921 Compare January 17, 2023 03:44
@rustyrussell
Copy link
Contributor Author

Rebased, fixed another test which assumed peer order.

@cdecker
Copy link
Member

cdecker commented Jan 17, 2023

The python linter seems not to be happy with this one:

tests/test_pay.py:2419:9: F405 'results' may be undefined, or defined from star imports: fixtures

Hash table coming up makes it random!

Signed-off-by: Rusty Russell <[email protected]>
After connecting 100,000 peers with one channel each (not all at
once!), we see various places where we exhibit O(N^2) behaviour.

Fix these by keeping a map of id->peer instead of a simple
linked-list.

Signed-off-by: Rusty Russell <[email protected]>
Otherwise, loading up when we have 100k peers is *painful*!

Signed-off-by: Rusty Russell <[email protected]>
@rustyrussell rustyrussell force-pushed the 100k-channels-project branch from 10ce921 to 3672dcd Compare January 18, 2023 05:04
@rustyrussell
Copy link
Contributor Author

"This time for sure!"

Copy link
Member

@cdecker cdecker left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's quite the improvement, nice :-)

ACK 3672dcd

}

/* We need to define a hashing function. siphash24 is a fast yet
* cryptographic hash in ccan/crypto/siphash24; we might be able to get away
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had to dig into this a bit, it seems the hash function is collision resistant if keyed with a random key, which we do here. It is not generally cryptographically secure, and loses collision resistance if the attacker knows the key.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Indeed!

Comment on lines +2418 to +2419
if result['channels'][0]['peer_id'] == l3.info['id']:
result['channels'] = [result['channels'][1], result['channels'][0]]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

result['channels'] = sorted(result['channels'], key=lambda c: c['peer_id'])

Comment on lines -117 to -118
/* FIXME: We should notify onchaind about NORMAL fee change in case
* it's going to generate more txs. */
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this comment no longer necessary?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Technically still true, but my branch which does zero-fee anchors fixes this (onchaind no longer makes its own txs). Overzealous removal I guess.

@rustyrussell
Copy link
Contributor Author

Weird, this was already merged by @endothermicdev . Not sure why GH didn't notice.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants