Skip to content

EHTISHAM-AI-ENTHUSIAST/Linkedin-Scraper-Bot

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

21 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

πŸ€– LinkedIn Scraper Bot

Intelligent Web Scraping Solution with Cloud Automation

Python Selenium GitHub Actions License Status

A professional-grade web scraping solution that automates LinkedIn profile data collection using Selenium WebDriver and GitHub Actions CI/CD pipeline.

Features β€’ Demo β€’ Installation β€’ Usage β€’ Documentation


πŸ“‹ Overview

This project demonstrates advanced web scraping techniques combined with modern DevOps practices. Built with Python and Selenium, it intelligently extracts LinkedIn profile data from Google search results while implementing anti-detection measures and running autonomously in the cloud.

Perfect for:

  • 🎯 Recruitment & HR professionals seeking candidate data
  • πŸ“Š Market researchers analyzing professional demographics
  • πŸ” Lead generation and business development
  • πŸ’Ό Career coaches building industry insights
  • πŸš€ Developers learning automation & web scraping

✨ Features

Core Functionality

  • πŸ” Intelligent Search - Automated Google search with customizable queries
  • 🎭 Anti-Detection - Randomized user agents, delays, and stealth mode
  • πŸ“Š Data Export - Clean CSV output with timestamps
  • πŸ”„ Fresh Data - Each run fetches new results, not cached data
  • ⚑ Efficient Scraping - Optimized selectors and error handling

Cloud & DevOps

  • ☁️ GitHub Actions Integration - Serverless execution in the cloud
  • πŸ”„ Manual Trigger - On-demand workflow execution
  • πŸ“¦ Artifact Storage - Automatic result archiving (30 days)
  • 🐳 Docker Support - Containerized for easy deployment
  • πŸ”’ Secure - No credentials stored, environment-based configuration

Professional Features

  • πŸ“ Unique File Naming - Timestamped files for each run
  • 🎯 Customizable Queries - Search any LinkedIn profile type
  • πŸ›‘οΈ Error Recovery - Graceful handling of failures
  • πŸ“ˆ Scalable Architecture - Easy to extend and modify
  • πŸ”§ Production Ready - Robust error handling and logging

🎬 Demo

Workflow Interface

Actions β†’ LinkedIn Scraper Bot β†’ Run workflow
β”œβ”€β”€ πŸ” Custom Search Query: "site:linkedin.com/in/ data scientist"
β”œβ”€β”€ ⏰ Use Timestamp: true/false
└── ▢️ Run workflow

Output Preview

title,link,scraped_at
"John Doe - Senior Software Engineer",https://linkedin.com/in/johndoe,2026-01-19T14:30:25
"Jane Smith - Data Scientist at Google",https://linkedin.com/in/janesmith,2026-01-19T14:30:27
"Michael Johnson - Full Stack Developer",https://linkedin.com/in/michaelj,2026-01-19T14:30:29

πŸ“ Project Architecture

linkedin-scraper-bot/
β”œβ”€β”€ 🐍 bot.py                      # Main scraper with anti-detection
β”œβ”€β”€ πŸ“¦ requirements.txt            # Python dependencies
β”œβ”€β”€ 🐳 Dockerfile                  # Container configuration
β”œβ”€β”€ πŸ”§ runtime.txt                 # Python version specification
β”œβ”€β”€ πŸ“„ README.md                   # This file
β”œβ”€β”€ 🚫 .gitignore                  # Git ignore rules
└── πŸ“‚ .github/
    └── workflows/
        └── βš™οΈ scraper.yml         # GitHub Actions workflow

πŸš€ Installation

Prerequisites

  • Python 3.11+
  • Google Chrome or Chromium
  • Git

Local Setup

  1. Clone the repository

    git clone https://github.com/EHTISHAM-AI-ENTHUSIAST/linkedin-scraper-bot.git
    cd linkedin-scraper-bot
  2. Install dependencies

    pip install -r requirements.txt
  3. Run the bot

    # Default search query
    python bot.py
    
    # Custom search query
    set SEARCH_QUERY=site:linkedin.com/in/ python developer
    python bot.py
    
    # With visible browser (for debugging)
    set HEADLESS=false
    python bot.py

Docker Deployment

# Build the image
docker build -t linkedin-scraper .

# Run the container
docker run -e SEARCH_QUERY="site:linkedin.com/in/ AI engineer" linkedin-scraper

🎯 Usage

GitHub Actions (Recommended)

  1. Navigate to Actions

    • Go to your repository on GitHub
    • Click the Actions tab
    • Select LinkedIn Scraper Bot workflow
  2. Run Workflow

    • Click Run workflow button
    • Search Query: Enter your custom search (e.g., site:linkedin.com/in/ UX designer)
    • Use Timestamp: Choose true for unique files, false for overwriting
    • Click Run workflow
  3. Download Results

    • Wait for workflow to complete (~2-5 minutes)
    • Scroll down to Artifacts section
    • Download linkedin-profiles-{run_number}.zip
    • Extract and open CSV file

Command Line

# Set environment variables
export SEARCH_QUERY="site:linkedin.com/in/ machine learning engineer"
export HEADLESS=true
export USE_TIMESTAMP=true

# Run the scraper
python bot.py

βš™οΈ Configuration

Environment Variables

Variable Default Description
SEARCH_QUERY site:linkedin.com/in/ software engineer Google search query string
HEADLESS true Run browser without GUI (true/false)
USE_TIMESTAMP false Add timestamp to filename (true/false)
OUTPUT_FILE linkedin_profiles.csv Output filename
MAX_RESULTS 30 Maximum profiles to scrape
CHROME_BIN Auto-detect Chrome binary path (for GitHub Actions)

Custom Search Examples

# Find Python developers
SEARCH_QUERY="site:linkedin.com/in/ python developer"

# Find people at specific company
SEARCH_QUERY="site:linkedin.com/in/ Google software engineer"

# Find by location
SEARCH_QUERY="site:linkedin.com/in/ designer San Francisco"

# Find by title and skills
SEARCH_QUERY="site:linkedin.com/in/ DevOps AWS kubernetes"

πŸ› οΈ Technical Stack

Technology Purpose
Python 3.11 Core programming language
Selenium WebDriver Browser automation & scraping
Chrome/Chromium Headless browser engine
GitHub Actions CI/CD & cloud execution
Docker Containerization
CSV Data export format

πŸ“Š Workflow Architecture

graph LR
    A[Manual Trigger] --> B[GitHub Actions]
    B --> C[Setup Python 3.11]
    C --> D[Install Chrome]
    D --> E[Install Dependencies]
    E --> F[Run Scraper Bot]
    F --> G{Success?}
    G -->|Yes| H[Save CSV]
    G -->|No| H
    H --> I[Upload Artifact]
    I --> J[Commit to Repo]
    J --> K[Complete]
Loading

πŸ” Security & Best Practices

  • βœ… No credentials required - Uses public Google search
  • βœ… Rate limiting - Built-in delays to avoid detection
  • βœ… Error handling - Graceful failures, no data corruption
  • βœ… Anti-detection - Randomized user agents & human-like behavior
  • βœ… GitHub secrets ready - Easy integration with private APIs
  • βœ… GDPR compliant - Public data only

πŸ“ˆ Roadmap & Future Enhancements

  • πŸ”„ Proxy rotation support
  • πŸ“§ Email notifications on completion
  • πŸ’Ύ Database integration (PostgreSQL/MongoDB)
  • 🌐 REST API wrapper
  • πŸ“± Mobile app integration
  • πŸ€– AI-powered profile analysis
  • πŸ“Š Dashboard & analytics

🀝 Contributing

Contributions are welcome! Please feel free to submit a Pull Request. For major changes, please open an issue first to discuss what you would like to change.

  1. Fork the repository
  2. Create your feature branch (git checkout -b feature/AmazingFeature)
  3. Commit your changes (git commit -m 'Add some AmazingFeature')
  4. Push to the branch (git push origin feature/AmazingFeature)
  5. Open a Pull Request

πŸ“ License

This project is licensed under the MIT License - see the LICENSE file for details.


πŸ™‹β€β™‚οΈ Support & Contact

Need help or want to hire me for a project?

GitHub Email LinkedIn


πŸ’Ό Professional Services

I offer professional web scraping and automation services:

  • 🎯 Custom Scraper Development - Tailored solutions for your needs
  • ☁️ Cloud Automation - GitHub Actions, AWS Lambda, Azure Functions
  • πŸ”„ Data Pipeline Development - ETL processes and integrations
  • πŸ€– Bot Development - Telegram, Discord, WhatsApp bots
  • πŸ“Š Data Analysis - Python, Pandas, visualization

Available for freelance projects and consulting.


⭐ Show Your Support

If this project helped you, please consider giving it a ⭐️ on GitHub!


Built with ❀️ by EHTISHAM-AI-ENTHUSIAST

Specializing in Web Scraping, Automation, and AI Solutions


Python Selenium DevOps GitHub Actions

About

A production-ready LinkedIn profile scraper that runs on GitHub Actions, with anti-detection features, Docker support, and professional documentation.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors