Accounts in Database + Account Manager#2424
Accounts in Database + Account Manager#2424neskk wants to merge 29 commits intoRocketMap:developfrom
Conversation
|
|
||
| # Make a GMO request before the encounter. | ||
| if using_db_account: | ||
| response = gmo(hlvl_api, hlvl_account, scan_location) |
There was a problem hiding this comment.
Don't do this, at least not without a flag, currently ppl is tping accounts over the place and it works for the encounter, adding a gmo will take more time to do the encounter and will not give any interesting info.
There was a problem hiding this comment.
Last night @Alderon86 explained this to me... I thought it would be a red flag for accounts to encounter mons without even doing a GMO before. I would have thought that it was a blatant misbehavior that would be quickly picked up by the game.
I will revert this change or at least add an option to enable it by choice.
| search_thread.start() | ||
| else: | ||
| # Check if we are able to solve captchas. | ||
| if args.captcha_solving and not args.hash_key: |
There was a problem hiding this comment.
OS can do captcha solving now?
There was a problem hiding this comment.
Hummm, this is still a WIP. I'm thinking that it should be possible to have an --only-server dedicated to captcha solving that would pick up captcha'd accounts from the database and would just listen for tokens (manual captcha solving). Right now this check makes little sense because I disabled AccountManager when launching in --only-server.
I think that having a flag that tells AccountManager to immediately release captcha'd accounts (if automatic captcha is not enabled) and then allow -os instances that have --captcha-solving enabled to pick up these accounts into their captcha queue. Then, we could just have a single instance querying the database for tokens and that its --hostname could be set as the --manual-captcha-domain. This only makes sense in multi instance setups but it's a bit hard to provide so many configuration/behavioral options in a single software.
|
|
||
| self.args = args | ||
| self.dbq = db_queue | ||
| self.whq = wh_queue |
There was a problem hiding this comment.
It is used often to push updates to the Account model and to be available to captcha solvers.
@friscoMad I've seen your PR #2422 and I was thinking that maybe Account manager could hold the key_scheduler instead of being defined as a global variable in apiRequests. I was a bit tired of having to pass db/wh update queue to every function. args is also for convenience, there's some spots where I managed to avoid passing the variable to the function by having this local reference in the AccountManager object. This way, if you pass the account_manager around you will have everything you need. Originally I had put AccountManager in models.py and took advantage of having args declared as a global variable. When I moved it outside of models.py - because it's not really a database model, AM isn't saved in the database - I thought if I could avoid having a local reference to args and it's tremendously useful to have it, making the function calls a bit more concise and readable.
| class AccountManager(object): | ||
| def __init__(self, args, db_queue, wh_queue, high_level=30): | ||
|
|
||
| self.args = args |
There was a problem hiding this comment.
Avoid using args, expand the needed parameters
There was a problem hiding this comment.
As I've explained in the above comment, I don't feel it's that bad to save a reference... This probably has low impact on memory/performance and it's very useful to have it available. There are too many variables that are accessed from args, maybe if, down the road, the scope of args usage inside AccountManager is reduced to a half-a-dozen variables I would expand them all. For the time being, and if I'm not wrong, accessing a pointer to a pointer isn't an avoidable code pattern and does not compromise performance.
| cycle = 0 | ||
| time.sleep(60) | ||
| while True: | ||
| # Run once every 15 seconds. |
There was a problem hiding this comment.
This will not run every 15 secs, it will run every 11m 15 sec if the cycle thing is ok
There was a problem hiding this comment.
Hum... you probably didn't read with enough attention, this is the same logic I've used in #2383, you've probably misunderstood how it works.
Sleep timer is always 15 seconds, some times it only runs the first bit (hence the "Run once every 15sec"), every 4 cycles it calls some more functions and then sleep 15 secs (hence the "Run once every minute" - 4 x 15 secs) and, finally, every 40 cycles ( 40 x 15 secs = 600 /60 = 10min) it will call account monitoring routine.
This allows a single thread to run different routines in different intervals. One of the reasons I've brought captcha code into the AccountManager was to avoid having an extra thread just to monitor the captcha queue.
Account Manager thread runs the account recycler, the captcha overseer and the account monitor routines. Perhaps I went too far in premature optimization...
There was a problem hiding this comment.
Yup, my bad it works but it is a bit convoluted, use something like this, it is a lot clearer:
counter = 0
While true:
counter += 1
captcha_manager()
if counter % 4 == 0:
account_keeper()
account_recycler()
if counter % 40 == 0:
monitor()
counter = 0
time.sleep(15)|
Some thoughts as I haven't done a proper review:
|
|
@friscoMad |
|
I know some of this has been stated over Discord, but I want to make it clear going forward. You have already been removed from the #pr channel in Discord twice for issues with our processes, decisions and methods of collaboration. I previously told you (#1819) to remain professional on GitHub or you will be blocked from the repository. You had not said anything else unprofessional on GitHub, so I did not have any problem allowing you to open PRs and work on the GitHub repository. Fast forward to now, the plan was for #2376 to be merged first, as it had a tested and proven database model, and one that we had been discussing inside #pr for a little while. You would rebase any work on behavior off of said PR, that way, we have the database model in place, solidified and tested, and we can build the behavior around it, and test it separately. You were told this by Seb. Instead, what was done, was this PR was opened in direct competition of #2376, completely disregarding the original plan. This is completely against what we aim to do here at RM, not only in terms of behavior on the repository, but our testing preferences as well. You recommended skipping #2376 because "their starting points are pretty much the same and I got account rotation to work correctly using the database. I'm not saying PR #2424 is absolutely ready and flawless but I think it's already more stable and reliable than this one, for instance". You previously mentioned that you had talked with Alderon about reworking some old code from Arengo. All this begs the question, why the hell, would you decide to open a PR in competition with #2376 instead of collaborating with Alderon on it? If they really are of the same starting points, it should have been no problem? In no particular order, a few replies: "Nice way of saying thanks to me for my work because it seems really not important to you guys." "Plus, collaboration doesn't not have it's credits properly accounted for as I told you last week when you merge spawn-spawn rework which had 90% of the code done and debugged by me." "Sometimes I wonder why I keep spending my time on this project since all you do @sebastienvercammen is hold a grudge towards me... but it's fine, I forgive you :)" You claim that you talked to Alderon for a bit, and it "immediately became clear that it would be easier just to go to 2424 and I think Alderon realized that as well". If this were the case, why would Alderon have not come to us and expressed his desire to continue with 2424 and abandon 2376? (I'm not blaming Alderon here, I'm just saying if this were truly the case and you were in collaboration with Alerdon, then that is what should have happened). Finally, you seem to be confused as to why we are preferring to separate things in the first place. It's a very simple guideline and it's one we've tried to follow for a while now. When you stare at a 500 line change PR, which say adds two new features. It's a large blob of changes to look at and test, especially when you mix multiple features in with it. But, if you split it into a couple PRs, each for the specific feature, it means that each PR's code can be reviewed separately, and then tested separately. It keeps the procedure more smooth when we have our review team, as we can simply hand them a list of PRs (which, in a perfect world, would each be a single feature), and if there's an issue with feature A, they can just say "PR #x has x issue", and it can be commented on that PR, meanwhile, feature B (PR 2) might be perfectly fine and can be already merged into the main repository while A (PR 1) is being patched. Where as, one PR for both A and B, means reviewers need to say "Feature A in PR #x is broke", and that delays the entirety of the PR until the issue is fixed. This is not a super unique standpoint held in RM, it's standard project management. And finally, the driving point behind all of this wall of text, is that you have received multiple 2nd chances, and warnings to change your behavior. Apparently, they are not being taken as they should be. Therefore, I am here to extend to you your final warning. This warning is NOT from Seb, this warning is not from anyone else on the RM team. This warning is from myself. Stop causing drama, stop causing issues, stop exhibiting anti-collaborative behavior, or I will not hesitate to block you from collaborating on the repository. We normally allow anyone to collaborate, but continued incidents is going to result in blocks. This is your choice. If you want to continue to help and collaborate with us, change your attitude, and work with the other members, not against them. I'm not going to allow more of the time of our team to be wasted on any issues like this, so should they arise in the future, immediate action will be taken. |
|
My reply was to explain why, as of now, I feel it is necessary to give you a final warning. I could have just left you the the last sentence, but I prefer to give context. Therefore, I am done here and any further comments will be related to code. |
|
Perhaps if everyone looked more at the code this whole discussion would be avoided. |
| return device_info | ||
|
|
||
|
|
||
| def generate_instance_id(args): |
There was a problem hiding this comment.
I believe using process id breaks builds in systemd.
Restarting rocketmap.service doesn't change it and I got stuck in "duplicate instance detected" until I manually cleared
There was a problem hiding this comment.
That message tells you that an instance with that same instance_id was running 30 seconds ago.
- If you're restarting the instance, allow 30 seconds to cool-off between shutdown and startup and you won't get that message.
- Maybe they are indeed conflicting instances in which case you shouldn't be manually clearing them up.
generate_instance_idshould return a differentinstance_idfor each one of your running instances. It hashes location, worker count, status name and several other configuration parameters to make sure we get an unique value.
Either you didn't wait the 30 secs between shutdown > launch or you're trying to launch an instance in the same location, with same step size, worker count and scanning parameters.
Status name is ignored in generate_instance_id if it's not set. Only after generating instance_id will it be updated to match the process ID. This is done so that you don't get a different instance_id each time you launch your instance without a defined status-name.
There was a problem hiding this comment.
tyvm
to anyone running this on systemd with
restart=always, should now add a:
RestartSec=30
There was a problem hiding this comment.
Adjusted this cool-down timer to 10 secs.
|
I'm re-writing account allocation, current method is imperfect and I think this will solve some weird issues. |
|
|
||
|
|
||
| # Keep track of different types of bans using a single field. | ||
| class AccountBanned: |
There was a problem hiding this comment.
I feel like Enum is a bit of an overkill for this, I've thought about this and feel it's unnecessary for the time being, maybe later if we have a more generic way of managing bans we change this.
There was a problem hiding this comment.
Enum doesn't "let" me do things like if account['banned'] <= AccountBanned.Shadowban which are useful. I tried to change the code and all the database queries would have to use AccountBanned.Clear.value which is also annoying. For the time being and, as I discussed with Seb some time ago, I think I will let this stay a simple class which is more flexible than Enum.
|
|
||
| def spin_pokestop(api, account, args, fort, step_location): | ||
| def spin_pokestop(args, account_manager, status, api, account, fort): | ||
| if not can_spin(account, args.account_max_spins): |
|
I still have not done a full review but I have some ideas: I see you did a lot of work in captcha handling, still captcha should be handled in apiRequest.py as it can appear at any time (captchaed accounts get the message during the login process for example) and after a captcha all requests should be retried (if the captcha was solved), that may need more changes over your changes, I am not saying that you should add that right now in the PR but take into account if possible to make the evolution as easy as possible. |
|
@friscoMad I totally agree with you in what you said, I tried to leave room for changes in the future and I think account manager can accommodate what you're thinking.
|
Starting point, removed shadow-banned related things and minimized deviations from develop branch. Need to create an account manager - using AccountSet as inspiration. Review how accounts are loaded into the database - query all, filter input files (if any) and only insert unique - set username as unique in Account model? Many annotations, we need to address a lot of stuff to make this pretty. Would be cool if some one started working on front-end page. Merge branch 'F-HealthyAccountDB' of https://github.com/Arengo/RocketMap into pr-accounts-db Rewrote accounts-csv parser. Adjusted Account database model to make things more streamlined. Accounts-csv now supports a syntax that has an extra column with account level - removed hlvl-accounts. Instead of two separate functions to check new accounts and insert them, reworked the code to do both operations in one function (Account.insert_new). Worker count is now required since this defines how many accounts we'll be using.
encounter_pokemon accounts can now solve captcha on the spot. spin_pokestop uses location from account, no need to pass one more argument.
Removed 'shadowban' field in favor of having a 'banned' field with multiple status. AccountBanned: 0 (Clear), 1 (Shadowban), 2 (Tempban) and 3 (Permban) Improved account monitoring/cleanup routine. Fixed high-level encounter logic, all situations are now correctly handled by the account manager, in particular: - When accounts are wrongly inserted as high lvl, they now fail and after resting they are returned to the appropriate queue. - Account information wasn't being updated upon account releasing. Added altitude in some places we were missing it. I don't think it's worth keeping this field in Account database model.
Reverted some odd changes in schedulers.py parse_accounts_csv will now filter duplicate accounts in files and only declares default account once.
If args.hlvl_workers is set to "0" but encounters are enabled, the Account Manager won't keep the accounts allocated. This means that the queue of high-level accounts won't hold the account after it is used. If args.hlvl_kph is set to "0", the speed limit won't be checked and we can maximize the number of encounters an account can do.
Added parameter "--encounter-retries" that controls the maximum number of attempts that an account will retry an encounter. Some methods inside Account Manager should never be used from "outside" the class - making them "private".
Bug fix on AccountManager::run_manager() Logging in account manager fixed, no more "duplicate" messages - old debug code. Account manager thread startup moved outside of class init function. This fixes a issue when starting the scanner without any accounts already on the database, it would only fetch the inserted (new) accounts after a minute (first loop of account manager). Refactored a bit of code in _allocate_accounts to avoid duplication. Fixed identation of all actions that require an account manager - runserver.py Small bugfixes.
Changed type of Account fields from 'CharField' to 'Utf8mb4CharField'. Added 'instance_id' to MainWorker and WorkerStatus A bit of clean up on worker status updates. Added a check at startup to avoid running duplicate instances. Fixed some bugs in account.py - reset_account()
When --hlvl-workers is set "0" it will try to allocate accounts on-the-fly. There was a bug that caused the instance to allocate 5 accounts because it wouldn't break on success. Overseer stats message now includes more information about the account pool. Status page "fixed": uses instance_id instead of worker_name to compute MainWorker hash. WorkerStatus model updated to include "norares" count. Status page and status printer thread displays the number of scans without rare pokemon (no-rares).
Instance ID generation now takes into account "no-gyms", "no-pokemon", "no-pokestops" and "no-raids" flags - This will help avoid conflicting instance IDs. Old captcha logic was a bit fuzzy and there was a lot of duplicate code. I think I'm happy with this solution, still needs proper testing. Reworked logic to unallocate high-level accounts after usage. "--hlvl-workers-holding-time" is used when "--hlvl-workers" is set to 0 (on-the-fly allocation). Encounter retry cycles fixed: we don't want to wait for nothing. Database schema version updated and added schema "migration".
Re-organization of how account manager keeps track of allocated, active and spare accounts. AccountRecycler: after failing and waiting its holding time in failed queue, banned accounts are released and deallocated from the instance. AccountKeeper: release excess accounts that aren't being used (-asi << -ari). Cleaned up handle_captcha() code to be more understandable. Added separate locks for each account pool and rewrote functions to move accounts between pools - performance improvement. Moved captcha code block to the end of the file - keeping accountManager.py organized. Reduced timeout to 10 secs on startup duplicate instance launch check. Log message updated to avoid doubts. MainWorker should be updated every 3 secs, so I lowered the cooldown period from 30 secs 10 secs which should be enough. AccountMananger internal methods now all start with '_', e.g. _account_keeper(). Methods that are supposed to be called from outside don't have this '_' prefix. Tested encounters, with hlvl_kph set to 0, one account can perform several encounters in seconds. Captcha solver now uses account.py setup_api() - tested. Many, many, too many changes...
Rebased. handle_captcha now handles "False" responses from failed requests.
Still have some issues with account allocation in a setup with multiple concurrent instances. Added a double check that validates which accounts were successfully allocated in case something went wrong with update query. Removed counters "replenish_count", I used them previously to replace active pool but in the end I figured it's best to keep an OrderedDict with the active account pool. Refactored and moved all account allocation code to to _allocate_accounts and _fetch_accounts. Reduced the time an account has to wait before retrying to get a spare account. Maybe this can even be lowered further to allow lower excess account holding times.
Turns out account allocation was fine and a bug in WorkerStatus was giving inconsistent results on status page. WorkerStatus acts as temporary record keeper for account stats: no rares and captcha count). Other than that there's no more useful information to fetch from previous status since the remaining information (latitude, longitude, last_scan, last_modified) now comes from Account data. Renamed WorkerStatus field 'last_scan_date' to 'last_scan' to match Account model. Updated and made adjustments where 'last_scan_date' was being used (schedulers.py). WorkerStatus a.k.a. 'status' field 'last_scan' can now be set to 'None'. This change will make it easier in the future to merge/remove duplicate information from WorkerStatus and use latitude, longitude, last_scan, directly from Account model. Small ajustements in Speed-Scan and Spawnpoint-Scan to accomodate 'last_scan' changes - I think code is more readable now and everything works the same way. Renamed WorkerStatus field 'norares' to 'no_rares' to match with 'no_items'. Updated all references of this field. Created a method that returns the default WorkerStatus dictionary with its stats reset - reduced duplicate code and separates local/runtime data from persistent/database data. Reordered WorkerStatus fields: Skips > No Items > No Rares > Captchas Status page will now only display WorkerStatus/MainStatus that were active in the last 30 minutes. Improves consistency of status page for those who don't enable database cleanup. Account stats information displayed in manual captcha solving page will also only include stats from MainWorker that have been recently active. Small improvements in some logging messages and formatting in search.py.
Since accounts are shared between instances we needed a way to keep track of the number of failed login attempts. The solution we had was local/runtime only and wouldn't work for "on-the-fly" account allocation. Renamed Account model field 'fail' to 'failed' and changed its type to SmallInt in order to have a persistent counter of login failures. 'fail' field was only used to simbolize account was on failed account pool, it had almost no purpose, now it's an important field. Added '--account-max-failures' to control the number of consecutive login attempts failed it takes to flag the account as perm banned. Default is 5, meaning the account will have to fail N login-retries, on 5 different occasions to be considered banned. Once it fails the first time account will be put to sleep for a small period of time and will attempt to login again later. Reverted some changes in '--hlvl-workers-max', it is only used when '--hlvl-workers' is 0 (on-the-fly allocation).
Improved overall robustness of critical code blocks and database operations. Added back some safety checks to avoid having the same account being used by multiple search workers. Added database().execution_context() to _allocate_accounts() to make sure account allocation is made within a transaction. Improved thread safety locks, seemed like Account Manager was losing track of some accounts. Added a try..except to _account_keeper() to pick up critical errors in account management.
No longer uses a cycle counter, instead using timers to allow more flexibility and to clarify code's logic.
Changed check_login() to output more info and handle exceptions. This function now returns a boolean accordingly to login success. Removed exceptions no longer used (TooManyLoginAttempts and LoginSequence). All information is still kept for debug. Login failures now expect a timeout afterwards, this is used to prevent permaturely marking all accounts as perma banned just because something was wrong for some hours (e.g. PoGo login servers offline).
Fixed duplicate 'only-server' test in runserver.py. Account model now holds the methods that insert and clear accounts in database. Small changes/additions to code comments.
Save remote configuration assets and templates times in the database (Account model). Only perform assets requests if database times don't match current remote configuration. This change should save a lot of RPM and make login times shorter. Update database schema: ``` ALTER TABLE `<database>`.`account` ADD COLUMN `time_assets` INT(4) UNSIGNED NOT NULL DEFAULT 0 AFTER `level`, ADD COLUMN `time_templates` INT(4) UNSIGNED NOT NULL DEFAULT 0 AFTER `time_assets`; ```
This fix the situation where newly inserted accounts would be picked up instead of older ones due to having a more recent last_modified value.
[Beta/Safe Stage] - Mostly tested, need more feedback.
Reworked pretty much every place where accounts are used and created a single entity responsible for handling all operations: an account manager.
Description
Accounts are now saved in the database, their status is kept updated.
AccountManager handles all the queues and is responsible for maintaining account information updated.
When an error/captcha is encountered accounts will remain allocated to current instance - preserving old behavior.
--shadow-ban-scan: trueallows these accounts to continue scanning. Note: detection is never disabled and will recordno-raresin thread status.--hlvl-workerscontrols the number of spare high-level accounts that account manager will keep allocated. If you had a CSV with 3 accounts, you can achieve the same effect with-hw 3.--hlvl-workers-holding-timeControls how long high-level accounts can remain idle. After this time they will be deallocated and can be picked up by other instances. Defaults to 120 seconds.--hlvl-workers 0or-hw 0--hlvl-workers-holding-time 0if set to 0, account manager will immediately release the account after it's used, the account is released back to the spare account pool.--hlvl-workers-holding-timeset to 0, this would be inefficient for setups with large volume of encounters because it would end up recreating the API object every single time, not "respecting"--no-api-store(repeat the whole login simulation process = many requests).--hlvl-workers-maxis set by default to 2, and it controls the maximum number of high-level accounts that can be allocated at any given time to current instance. This setting is only used when--hlvl-workersis 0 ("on-the-fly" allocation) to prevent a burst of encounters from suddenly allocate all the accounts in the database.Motivation and Context
Managing CSV files with accounts is kinda boring.
Account handling code was scattered across the project, multiple queues and variables were required, now it should be much simpler to control accounts, allowing more complex logic.
TODOs:
device_info,proxy_urland perhapsasset_download_status(future work - cache downloaded assets in json) could be kept in the database to maximize the consistency of usage.How Has This Been Tested?
Production environment with MySQL and 40 instances sharing a 10k account pool.
Screenshots (if appropriate):
Types of changes
Checklist: