A professional-grade web scraping solution that automates LinkedIn profile data collection using Selenium WebDriver and GitHub Actions CI/CD pipeline.
Features β’ Demo β’ Installation β’ Usage β’ Documentation
This project demonstrates advanced web scraping techniques combined with modern DevOps practices. Built with Python and Selenium, it intelligently extracts LinkedIn profile data from Google search results while implementing anti-detection measures and running autonomously in the cloud.
Perfect for:
- π― Recruitment & HR professionals seeking candidate data
- π Market researchers analyzing professional demographics
- π Lead generation and business development
- πΌ Career coaches building industry insights
- π Developers learning automation & web scraping
- π Intelligent Search - Automated Google search with customizable queries
- π Anti-Detection - Randomized user agents, delays, and stealth mode
- π Data Export - Clean CSV output with timestamps
- π Fresh Data - Each run fetches new results, not cached data
- β‘ Efficient Scraping - Optimized selectors and error handling
- βοΈ GitHub Actions Integration - Serverless execution in the cloud
- π Manual Trigger - On-demand workflow execution
- π¦ Artifact Storage - Automatic result archiving (30 days)
- π³ Docker Support - Containerized for easy deployment
- π Secure - No credentials stored, environment-based configuration
- π Unique File Naming - Timestamped files for each run
- π― Customizable Queries - Search any LinkedIn profile type
- π‘οΈ Error Recovery - Graceful handling of failures
- π Scalable Architecture - Easy to extend and modify
- π§ Production Ready - Robust error handling and logging
Actions β LinkedIn Scraper Bot β Run workflow
βββ π Custom Search Query: "site:linkedin.com/in/ data scientist"
βββ β° Use Timestamp: true/false
βββ βΆοΈ Run workflow
title,link,scraped_at
"John Doe - Senior Software Engineer",https://linkedin.com/in/johndoe,2026-01-19T14:30:25
"Jane Smith - Data Scientist at Google",https://linkedin.com/in/janesmith,2026-01-19T14:30:27
"Michael Johnson - Full Stack Developer",https://linkedin.com/in/michaelj,2026-01-19T14:30:29linkedin-scraper-bot/
βββ π bot.py # Main scraper with anti-detection
βββ π¦ requirements.txt # Python dependencies
βββ π³ Dockerfile # Container configuration
βββ π§ runtime.txt # Python version specification
βββ π README.md # This file
βββ π« .gitignore # Git ignore rules
βββ π .github/
βββ workflows/
βββ βοΈ scraper.yml # GitHub Actions workflow
- Python 3.11+
- Google Chrome or Chromium
- Git
-
Clone the repository
git clone https://github.com/EHTISHAM-AI-ENTHUSIAST/linkedin-scraper-bot.git cd linkedin-scraper-bot -
Install dependencies
pip install -r requirements.txt
-
Run the bot
# Default search query python bot.py # Custom search query set SEARCH_QUERY=site:linkedin.com/in/ python developer python bot.py # With visible browser (for debugging) set HEADLESS=false python bot.py
# Build the image
docker build -t linkedin-scraper .
# Run the container
docker run -e SEARCH_QUERY="site:linkedin.com/in/ AI engineer" linkedin-scraper-
Navigate to Actions
- Go to your repository on GitHub
- Click the Actions tab
- Select LinkedIn Scraper Bot workflow
-
Run Workflow
- Click Run workflow button
- Search Query: Enter your custom search (e.g.,
site:linkedin.com/in/ UX designer) - Use Timestamp: Choose
truefor unique files,falsefor overwriting - Click Run workflow
-
Download Results
- Wait for workflow to complete (~2-5 minutes)
- Scroll down to Artifacts section
- Download
linkedin-profiles-{run_number}.zip - Extract and open CSV file
# Set environment variables
export SEARCH_QUERY="site:linkedin.com/in/ machine learning engineer"
export HEADLESS=true
export USE_TIMESTAMP=true
# Run the scraper
python bot.py| Variable | Default | Description |
|---|---|---|
SEARCH_QUERY |
site:linkedin.com/in/ software engineer |
Google search query string |
HEADLESS |
true |
Run browser without GUI (true/false) |
USE_TIMESTAMP |
false |
Add timestamp to filename (true/false) |
OUTPUT_FILE |
linkedin_profiles.csv |
Output filename |
MAX_RESULTS |
30 |
Maximum profiles to scrape |
CHROME_BIN |
Auto-detect | Chrome binary path (for GitHub Actions) |
# Find Python developers
SEARCH_QUERY="site:linkedin.com/in/ python developer"
# Find people at specific company
SEARCH_QUERY="site:linkedin.com/in/ Google software engineer"
# Find by location
SEARCH_QUERY="site:linkedin.com/in/ designer San Francisco"
# Find by title and skills
SEARCH_QUERY="site:linkedin.com/in/ DevOps AWS kubernetes"| Technology | Purpose |
|---|---|
| Python 3.11 | Core programming language |
| Selenium WebDriver | Browser automation & scraping |
| Chrome/Chromium | Headless browser engine |
| GitHub Actions | CI/CD & cloud execution |
| Docker | Containerization |
| CSV | Data export format |
graph LR
A[Manual Trigger] --> B[GitHub Actions]
B --> C[Setup Python 3.11]
C --> D[Install Chrome]
D --> E[Install Dependencies]
E --> F[Run Scraper Bot]
F --> G{Success?}
G -->|Yes| H[Save CSV]
G -->|No| H
H --> I[Upload Artifact]
I --> J[Commit to Repo]
J --> K[Complete]
- β No credentials required - Uses public Google search
- β Rate limiting - Built-in delays to avoid detection
- β Error handling - Graceful failures, no data corruption
- β Anti-detection - Randomized user agents & human-like behavior
- β GitHub secrets ready - Easy integration with private APIs
- β GDPR compliant - Public data only
- π Proxy rotation support
- π§ Email notifications on completion
- πΎ Database integration (PostgreSQL/MongoDB)
- π REST API wrapper
- π± Mobile app integration
- π€ AI-powered profile analysis
- π Dashboard & analytics
Contributions are welcome! Please feel free to submit a Pull Request. For major changes, please open an issue first to discuss what you would like to change.
- Fork the repository
- Create your feature branch (
git checkout -b feature/AmazingFeature) - Commit your changes (
git commit -m 'Add some AmazingFeature') - Push to the branch (
git push origin feature/AmazingFeature) - Open a Pull Request
This project is licensed under the MIT License - see the LICENSE file for details.
I offer professional web scraping and automation services:
- π― Custom Scraper Development - Tailored solutions for your needs
- βοΈ Cloud Automation - GitHub Actions, AWS Lambda, Azure Functions
- π Data Pipeline Development - ETL processes and integrations
- π€ Bot Development - Telegram, Discord, WhatsApp bots
- π Data Analysis - Python, Pandas, visualization
Available for freelance projects and consulting.
If this project helped you, please consider giving it a βοΈ on GitHub!
Built with β€οΈ by EHTISHAM-AI-ENTHUSIAST
Specializing in Web Scraping, Automation, and AI Solutions