The Gmail Archiver tool is a Python utility designed for downloading and archiving emails from a specified Gmail account. This tool organizes emails by year and month, compresses them into .zip files for each month, and optionally deletes archived emails from the Gmail account to manage inbox space.
- Download and Archive Emails: Downloads emails before a specified cutoff date, organizing them by year and month.
- Automatic Compression: Archives are saved in
.zipformat, structured asyear/month.zip. - Inbox Management: Optionally delete emails from Gmail after archiving.
- Configurable Parameters: Allows setting parameters like the archive path, cutoff date, and maximum parallelization for efficient downloading.
- Multi-threaded Execution: Supports configurable parallelization to speed up the email download process.
- Python 3.7+
- Google API Client Library for Python (
google-auth,google-auth-oauthlib,google-auth-httplib2,google-api-python-client)
-
Clone the repository:
git clone https://github.com/vikrantb/gmailarchiver.git cd gmail_archiver -
Install dependencies:
pip install -r requirements.txt
- Go to the Google Cloud Console Project Creation page.
- If you already have a project, skip to Step 3.3. Otherwise, create a new project by:
- Entering a Project Name.
- Selecting No organization (if applicable).
- Click Create to finish creating your project.
- Navigate to the Enable API and Services page for the Gmail API by clicking here.
- Ensure the correct project is selected in the top navigation bar.
- Click Enable to activate the Gmail API for your project.
- Go to the OAuth consent screen setup for your project.
- If no app exists, create one by entering the necessary details, such as your app name and email.
- Ensure the following OAuth Scopes are listed and have access:
- Gmail API:
https://mail.google.com/- Read, compose, send, and permanently delete all your email from Gmail. - Gmail API:
https://www.googleapis.com/auth/gmail.modify- Read, compose, and send emails from your Gmail account.
- Gmail API:
- If you are just using the app for yourself (not for public use), set the Publishing Status of the app to Testing on the first screen. For broader access, you'll need to submit the app for verification, which may take up to a day.
- Go to the API Credentials page in Google Cloud Console.
- Select your project from the drop-down menu if it’s not already selected.
- If there are no existing credentials, create new ones:
- Click Create Credentials and select OAuth client ID.
- For Application type, choose Desktop app.
- Name your credentials (e.g., "Gmail Archiver Credentials") and click Create.
- A dialog will appear with the OAuth Client ID. Click Download JSON to save the
credentials.jsonfile.
- Download the
credentials.jsonfile to your local machine. - Move the downloaded file to the directory where the Gmail Archiver script is located (e.g.,
gmail_archiver/). - Rename the file to
credentials.json.
- Open your terminal or command prompt and run the Gmail Archiver script.
- During the first run, the script will prompt you to authenticate. Follow the on-screen instructions to complete the OAuth authentication.
Note: Keep
credentials.jsonsecure, as it contains sensitive information. Never share this file or check it into source control.
Run the gmail_archiver.py script with the following command:
python gmail_archiver.pyUpon running, the tool will prompt you for the following information:
- Base Path for Archive: The path where the emails will be saved. If not specified, it defaults to
./gmail_archives. - Cutoff Date: Specify the cutoff date (in
MM-DD-YYYYformat) to download emails older than this date. - Delete Emails After Archiving: Confirm if you'd like to delete the emails from Gmail after archiving. Type "yes" to confirm.
Example 1: Basic Archiving
Scenario: You want to keep a copy of all your emails for future reference, but you’re not ready to delete anything yet.
Use:
python gmail_archiver.py --base-path ./gmail_archives
What it does: Archives all emails in your account to the ./gmail_archives directory, organizing them by year and month. No emails will be deleted, and you’ll have a clean backup for safekeeping.
Example 2: Goodbye, GmailArchiver Corp!
Scenario: You just left your job at GmailArchiver Corp, and all their emails are labeled GmailArchiver. Time to clean up your inbox!
Use:
python gmail_archiver.py --label GmailArchiver --delete --base-path ./gmail_archives
What it does: Archives all emails tagged with the GmailArchiver label, saves them locally, and deletes them from Gmail. Say goodbye to clutter and hello to a fresh start!
Example 3: Remember 2022
Scenario: You want to archive emails from a specific date range, like all the correspondence from the eventful year of 2022.
Use:
python gmail_archiver.py --start-date 01-01-2022 --end-date 12-31-2022 --base-path ./gmail_archives_2022
What it does: Archives all emails received between January 1, 2022, and December 31, 2022, into ./gmail_archives_2022. No emails are deleted, so you can relive those memories anytime!
Example 4: Invoice Hunters
Scenario: You’ve been searching for invoices scattered across your inbox. Time to organize them all in one place.
Use:
python gmail_archiver.py --query "invoice" --base-path ./invoice_archives
What it does: Finds all emails with the word “invoice” and archives them into the ./invoice_archives folder. Your financial trail is now neatly saved!
Example 5: Work and Meetings
Scenario: You need to archive emails labeled Work that also contain the word “meeting.” Perfect for keeping your professional life organized.
Use:
python gmail_archiver.py --label Work --query "meeting" --base-path ./work_meeting_archives
What it does: Archives all work-related meeting emails, combining label and query filters, and saves them in ./work_meeting_archives.
Example 6: Out with the Old
Scenario: Time to declutter! Archive and delete all emails you received before 2021.
Use:
python gmail_archiver.py --end-date 12-31-2020 --delete --base-path ./old_emails
What it does: Archives emails up to December 31, 2020, stores them in ./old_emails, and deletes them from Gmail. Clean slate, anyone?
Example 7: Starting Fresh
Scenario: You’re only interested in recent emails. Archive everything from January 1, 2023, onward.
Use:
python gmail_archiver.py --start-date 01-01-2023 --base-path ./recent_emails
What it does: Archives emails received from January 1, 2023, onward into the ./recent_emails folder. No old baggage here! You still might want to delete older emails.
Example 8: Keep archiving important stuff
Scenario: Your starred emails are important! Archive them for safekeeping.
Use:
python gmail_archiver.py --label STARRED --base-path ./starred_archives
What it does: Archives all starred emails to ./starred_archives. Your top-priority emails are now securely saved.
Example 9: The Exit Strategy
Scenario: You’re leaving your job and want to archive all emails labeled Work before deleting them.
Use:
python gmail_archiver.py --label Work --base-path ./job_backup --delete
What it does: Archives emails labeled Work, stores them in ./job_backup, and deletes them from Gmail. A tidy way to exit!
Example 10: Big Attachments, Big Savings
Scenario: Your Gmail is running out of space! Archive emails with attachments larger than 10 MB.
Use:
python gmail_archiver.py --query "size:10485760" --base-path ./large_attachments
What it does: Finds all emails with attachments larger than 10 MB, archives them into ./large_attachments, and retains them in Gmail.
Example 11: Project Skunk works
Scenario: You’re working on a project named “Skunk works” and need all related emails labeled Work, starting from January 1, 2022, and mentioning “Skunk works”.
Use:
python gmail_archiver.py --start-date 01-01-2022 --label Work --query "Skunk works" --base-path ./project_emails
What it does: Archives all emails labeled Work, received after January 1, 2022, and containing “Skunk works,” into ./project_emails.
Example 12: Full Backup Mode
Scenario: No filters, no queries. You want to back up everything, everywhere, all at once.
Use:
python gmail_archiver.py --base-path ./all_emails
What it does: Archives all emails in your Gmail account into ./all_emails. No emails are deleted, and no filter is applied — just a clean, comprehensive backup.