Skip to content

[graph mig 2]: graph/db: migrate graph channels and policies from kvdb to SQL#10050

Merged
guggero merged 4 commits intolightningnetwork:masterfrom
ellemouton:graphMig2-channels
Jul 9, 2025
Merged

[graph mig 2]: graph/db: migrate graph channels and policies from kvdb to SQL#10050
guggero merged 4 commits intolightningnetwork:masterfrom
ellemouton:graphMig2-channels

Conversation

@ellemouton
Copy link
Collaborator

This expands the graph migration code& tests to cover channels and channel policies.

Please see #10025 for the final result we are aiming for here.
Part of #9795
Depends on #10036

@ellemouton ellemouton self-assigned this Jul 8, 2025
Copy link

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Summary of Changes

Hello @ellemouton, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request continues the migration of LND's graph database from a key-value store to a SQL database, focusing on the transfer of channel and channel policy data. My changes ensure that this critical data is accurately moved while handling potential inconsistencies from the legacy database gracefully. I've also added comprehensive testing to validate the integrity and completeness of the migrated data.

Highlights

  • Graph Database Migration: I've expanded the ongoing graph database migration to specifically include channels and channel policies, moving them from the existing KVDB to the new SQL database. This is the second part of a multi-stage migration effort.
  • Data Validation and Handling: I've implemented robust validation for ExtraOpaqueData within channels and policies during the migration process. This ensures that entries with invalid TLV streams are gracefully skipped and logged with warnings, preventing the migration from halting due to historical data inconsistencies.
  • Refactored KVDB Interaction: I've adjusted the kv_store.go logic to perform TLV validation at the public UpdateEdgePolicy entry point. This change allows internal test helpers to bypass this validation, enabling the recreation and testing of scenarios involving pre-existing invalid data.
  • Comprehensive Testing: I've introduced extensive new test cases and helper functions (makeTestChannel, makeTestPolicy, fetchAllChannelsAndPolicies) to thoroughly verify the correctness and integrity of channel and policy migration. This includes specific tests for various edge cases and data configurations, such as invalid TLV data and missing fields.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point in your pull request via creating an issue comment (i.e. comment on the pull request page) using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in issue comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist is currently in preview and may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments to provide feedback.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Copy link

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request effectively expands the graph migration to include channels and their policies. The code is well-structured, and the addition of comprehensive tests, including those for edge cases with invalid data, is commendable. My feedback focuses on minor improvements to enhance code clarity and maintainability, such as correcting a typo in a test helper, refining error messages for better debugging, and ensuring proper error handling for invalid TLV data.

Comment on lines +367 to +369
return fmt.Errorf("unable to marshal extra opaque "+
"data for channel %d: %w %v", scid, err,
channel.ExtraOpaqueData)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Similar to a previous comment, the error wrapping here can be improved. The format string ...: %w %v is unconventional as it's generally better to have %w at the end of the format string. The channel.ExtraOpaqueData is already logged in the Warnf call, so it might be redundant to include it in the error message that gets propagated up.

			return fmt.Errorf("unable to marshal extra opaque data for channel %d: %w", scid, err)

@ellemouton ellemouton force-pushed the graphMig2-channels branch 2 times, most recently from ef5f7ed to 3867d13 Compare July 8, 2025 09:44
@ellemouton ellemouton requested a review from bhandras July 8, 2025 11:25
@ellemouton
Copy link
Collaborator Author

(will fix the release notes link on next push)

Copy link
Collaborator

@bhandras bhandras left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, very nice! 🎉

Here, we move TLV validation for the KVStore out of `updateEdgePolicy`
so that we can re-use `updateEdgePolicy` in our tests to write policies
with invalid TLV (since that was possible before the recently added TLV
sanity check) so that we can test that our SQL migration behaves
correctly for these cases.
Here we factor out some of the crud code in TestEdgePolicyMissingMaxHtcl
so that we can re-use it later on.
@ellemouton ellemouton changed the base branch from elle-graph-mig-1 to master July 8, 2025 12:42
@ellemouton ellemouton force-pushed the graphMig2-channels branch from 3867d13 to e840cf8 Compare July 8, 2025 12:42
@ellemouton
Copy link
Collaborator Author

/gemini review

Copy link

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

The code expands the graph migration code and tests to cover channels and channel policies. There is a critical bug in the migration verification logic that could cause the migration to fail for users with certain invalid data in their database.

Comment on lines +479 to +492
checkPolicy := func(expPolicy,
migPolicy *models.ChannelEdgePolicy) error {

switch {
// Both policies are nil, nothing to compare.
case expPolicy == nil && migPolicy == nil:
return nil

// One of the policies is nil, but the other is not.
case expPolicy == nil || migPolicy == nil:
return fmt.Errorf("expected both policies to be "+
"non-nil. Got expPolicy: %v, "+
"migPolicy: %v", expPolicy, migPolicy)

// Both policies are non-nil, we can compare them.
default:
}

if len(expPolicy.ExtraOpaqueData) == 0 {
expPolicy.ExtraOpaqueData = nil
}
if len(migPolicy.ExtraOpaqueData) == 0 {
migPolicy.ExtraOpaqueData = nil
}

return sqldb.CompareRecords(
*expPolicy, *migPolicy, "channel policy",
)
}

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

critical

The current logic in checkPolicy does not correctly handle cases where a policy is intentionally skipped during migration due to invalid TLV data. If an expected policy (expPolicy) is non-nil but has invalid data, it will be skipped by migChanPolicy, resulting in a nil migrated policy (migPolicy). The current check will incorrectly fail this as an error, preventing the migration from completing successfully for users with such data. The logic should be updated to account for this case: if expPolicy has invalid TLV, it should assert that migPolicy is nil.

// If we don't expect a policy, we shouldn't have a migrated one.
		if expPolicy == nil {
			if migPolicy != nil {
				return fmt.Errorf("expected nil policy, but got one")
			}
			return nil
		}

		// We expect a policy. Check if it should have been migrated.
		_, err := marshalExtraOpaqueData(expPolicy.ExtraOpaqueData)
		isInvalid := errors.Is(err, ErrParsingExtraTLVBytes)

		// If the policy from kvdb is invalid, we expect it to be skipped.
		if isInvalid {
			if migPolicy != nil {
				return fmt.Errorf("policy with invalid TLV was migrated")
			}
			return nil
		}

		// The policy from kvdb is valid, so we expect it to be migrated.
		if migPolicy == nil {
			return fmt.Errorf("valid policy was not migrated")
		}

		if len(expPolicy.ExtraOpaqueData) == 0 {
			expPolicy.ExtraOpaqueData = nil
		}
		if len(migPolicy.ExtraOpaqueData) == 0 {
			migPolicy.ExtraOpaqueData = nil
		}

		return sqldb.CompareRecords(
			*expPolicy, *migPolicy, "channel policy",
		)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hmmm indeed.
wonder why this wasnt caught by the unit test... looking

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ah, ok it is handled, just in an embedded way:

policies are loaded from kvdb using getChannelMap which has this case:

		// We don't want a single policy with bad TLV data to stop us
		// from loading the rest of the data, so we just skip this
		// policy. This is for backwards compatibility since we did not
		// use to validate TLV data in the past before persisting it.
		case errors.Is(err, ErrParsingExtraTLVBytes):
			return nil

so when loading the channel policy from kvdb, it is already skipped if the extra tlv bytes are not valid. So we actually wont have the case where we need to deal with a channel policy with invalid TLV bytes in the migration

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll add a comment to make this clear.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for the detailed explanation, @ellemouton. I understand now that the forEachChannel function already filters out channel policies with invalid TLV data, ensuring that nil policies are passed to migChanPolicy (and subsequently checkPolicy) in those specific cases. This means the scenario I highlighted, where a non-nil expPolicy would have invalid TLV and a nil migPolicy, is indeed prevented by the upstream logic.

Your plan to add a clarifying comment to make this explicit is a good approach, as it will help future readers understand this crucial data flow and why the checkPolicy function doesn't need to handle that specific invalid TLV case directly.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cc @bhandras - i havent re-requested review since no logic has changed here. But just want to flag in case

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great catch by gemini, but I agree that the existing solution whereby we ignore the policy from the original record upon loading is sufficient.

In this commit, the `MigrateGraphToSQL` function is expanded to migrate
the channel and channe policy data. Both of these have the special case
where the kvdb store records may contain invalid TLV. If we encounter a
channel with invalid TLV, we skip it and its policies. If we encounter a
policy with invalid TLV, we skip it.

The `TestMigrateGraphToSQL` and `TestMigrationWithChannelDB` tests are
updated accordingly.
@ellemouton ellemouton force-pushed the graphMig2-channels branch from e840cf8 to 640caef Compare July 9, 2025 08:19
@ellemouton
Copy link
Collaborator Author

/gemini review

Copy link

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

The pull request expands the graph migration from KVDB to SQL to include channels and channel policies. The changes are well-structured and include comprehensive tests for both normal operation and various edge cases, such as handling of invalid data. The migration logic includes verification steps to ensure data integrity.

@ellemouton ellemouton requested a review from guggero July 9, 2025 08:29
Copy link
Collaborator

@guggero guggero left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very nice, LGTM 🎉

@guggero guggero merged commit ea32aac into lightningnetwork:master Jul 9, 2025
33 of 39 checks passed
@ellemouton ellemouton deleted the graphMig2-channels branch July 9, 2025 11:10
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants