Optimize purview search logic#564
Merged
Yuqing-cat merged 5 commits intofeathr-ai:mainfrom Aug 15, 2022
Merged
Conversation
xiaoyongzhu
reviewed
Aug 10, 2022
Member
|
Thanks @enya0405 . can you update the PR description to reflect: What was the previous issue that caused it to be slow and what's the improved way? Also can you put some numbers on the updated logic (like how much time does it take to load the projects)? |
YihuiGuo
reviewed
Aug 10, 2022
xiaoyongzhu
approved these changes
Aug 15, 2022
Yuqing-cat
approved these changes
Aug 15, 2022
2 tasks
ahlag
pushed a commit
to ahlag/feathr
that referenced
this pull request
Aug 26, 2022
* optimize purview search logic Co-authored-by: Enya-Yx <[email protected]>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Description
Improves 'get_project' and similar methods' query logic to reduce time cost
The main reason that causes the slowness is we called purview APIs ( such as 'AtlasClient.get_entity') many times, especially in some recursion. It leads to a lot of HTTP requests so that we need to wait for relatively long time for their responses. I found by calling the API 'AtlasClient.get_entity_lineage' we can get all information we need. Then we just need to build edges and entities including their relationships and attributes based on the returned results which costs less time comparing with waiting for many responses from purview.
Test data: (tested from backend)
registry = PurviewRegistry(azure_purview_name = "feathrazuretest3-purview1")
entity_id_by_name = registry.get_entity_id("enya_test_registry")
start = time.time()
project_pre = registry.get_project_origin(entity_id) // previous 'get_project' API
print("duration1: ", time.time()-start)
start = time.time()
project_curr = registry.get_project(entity_id) // current 'get_project' API
print("duration2: ", time.time()-start)
duration1: 33.04559826850891
duration2: 2.3997888565063477
How was this patch tested?
Does this PR introduce any user-facing changes?
Dependencies
No. You can skip the rest of this section.
Yes. Make sure to list all the dependencies and licenses.