Ben Wooding https://woodingben.com/ Portfolio Tue, 20 Jan 2026 17:08:14 +0000 en-GB hourly 1 https://wordpress.org/?v=6.9.4 203558964 Plug-in: Google Scholar Orderer https://woodingben.com/plug-in-google-scholar-orderer/ Tue, 20 Jan 2026 17:08:10 +0000 https://woodingben.com/?p=3051 In this post I demo how to scrape information from Google Scholar.

The post Plug-in: Google Scholar Orderer appeared first on Ben Wooding.

]]>

Plug-in: Google Scholar Orderer – Smarter Academic Search with Venue Rankings

Stop guessing about publication quality. Start seeing it at a glance.

If you’ve ever spent hours scrolling through Google Scholar results trying to figure out which papers come from reputable venues, this browser extension is for you.

The Problem

Google Scholar is incredibly powerful for finding academic literature, but it has one major limitation: it tells you nothing about venue quality. A paper from a top-tier conference like NeurIPS appears the same as one from an unknown workshop. You’re left to manually check each venue’s reputation—a tedious process that slows down literature reviews.

The Solution

Google Scholar Orderer is a free browser extension that enhances Google Scholar with:

🏆 Venue Ranking Badges

See color-coded badges instantly showing venue quality from multiple ranking systems:

  • CORE Rankings (A*, A, B, C) — The gold standard for computing research venues
  • SJR Quartiles (Q1-Q4) — Scimago Journal Rankings based on Scopus data
  • JCR Quartiles (Q1-Q4) — Journal Citation Reports impact metrics
  • h5-index — Google Scholar’s own 5-year citation metric

📊 Sort by Citations

Reorder any search results by citation count with a single click. Find the most influential papers first, or discover hidden gems with fewer citations.

👤 Author Profile Analysis

Visit any researcher’s profile page and instantly see a visual breakdown of their publication quality distribution. Great for evaluating potential collaborators, reviewers, or candidates.

🔍 Smart Venue Matching

Google Scholar often truncates long venue names with “…”. The extension intelligently matches these partial names against its database. When there’s ambiguity, a simple “?” button lets you fetch the complete venue name with one click.

How It Works

The extension works entirely in your browser. When you search on Google Scholar:

  1. It extracts venue names from each search result
  2. Matches them against a local database of 500+ ranked venues
  3. Displays badges showing all available ranking information
  4. Provides rich tooltips with full venue details on hover

Privacy-focused: All matching happens locally. No data is sent to external servers.

Who Is This For?

  • PhD students conducting literature reviews
  • Researchers evaluating where to submit their work
  • Hiring committees assessing candidate publication records
  • Anyone who wants to quickly gauge publication quality

Get Started

The extension is free and open source. It works with Chrome, Edge, Firefox, Brave, and Opera.

GitHub: https://github.com/Kiguli/Google-Scholar-Orderer

Installation takes under a minute—just download, enable developer mode in your browser, and load the extension.


Have feedback or found a venue that should be included? Open an issue on GitHub or leave a comment below.


The post Plug-in: Google Scholar Orderer appeared first on Ben Wooding.

]]>
3051
Scraping Google Scholar https://woodingben.com/scraping-google-scholar/ https://woodingben.com/scraping-google-scholar/#comments Fri, 26 Aug 2022 23:13:52 +0000 http://woodingben.com/?p=2135 In this post I demo how to scrape information from Google Scholar.

The post Scraping Google Scholar appeared first on Ben Wooding.

]]>

Scraping Google Scholar

Table of Contents

Scraping Total Citations

The following snippet of code acquires the value for the total number of citations on a users profile. Both [USERNAME] and [LANGUAGE] should be replaced with the respective profile username (found in the scholar page URL) and language you wish to use (en = English). This value can then be embedded in your website on a button or inside the text as you wish. WordPress snippets enable this in an easy way; the below code can be copied into a new snippet and then select the option “only display when inserted into a post or page”. A small code snippet will then be available to embed in your webpage that looks something like {code_snippet id=[id] name = [name] php format} (note: the curly brackets should be replaced with square brackets, I used curly to stop WordPress trying to run the snippet!). 

				
					<?php

$profile = "https://scholar.google.com/citations?user=[USERNAME]&hl=[LANGUAGE]&oi=ao";

$contents = file_get_contents($profile);

$citations_xpath = '//*[@id="gsc_rsb_st"]/tbody/tr[1]/td[2]';

$dom = new DOMDocument();

@$dom->loadHTML($contents);

$xpath = new DOMXPath($dom);

$citations = $xpath->query($citations_xpath);

$value = $citations->item(0)->nodeValue;

echo $value;

?>
				
			
Scraping H-Index

As before, the following code snippet acquires the h-index from the profile. Both [USERNAME] and [LANGUAGE] should be replaced.

				
					<?php

$profile = "https://scholar.google.com/citations?user=[USERNAME]&hl=[LANGUAGE]&oi=ao";

$contents = file_get_contents($profile);

$hindex_xpath = '//*[@id="gsc_rsb_st"]/tbody/tr[2]/td[2]';

$dom = new DOMDocument();

@$dom->loadHTML($contents);

$xpath = new DOMXPath($dom);

$hindex = $xpath->query($hindex_xpath);

$value = $hindex->item(0)->nodeValue;

echo $value;

?>
				
			
Scraping All Article Information

The following block of code extracts all the information required to construct the table that can be found in my portfolio, consisting of recent articles, authors, citations, etc.. [USERNAME] and [LANGUAGE] should be replaced, additionally [SORT] can take either pubdate for the most recent papers or citedby for the most cited papers. By putting the key authors name in [KEY AUTHOR] you can add bold around their name whenever the name appears, e.g. for me I would have: keyAuthor = “B Wooding”.

				
					<?php

$baseUrl = "https://scholar.google.com";
$profile = "/citations?hl=en&user=[USERNAME]&view_op=list_works&hl=[LANGUAGE]&sortby=[SORT]";

$contents        = file_get_contents($baseUrl.$profile);
$citations_xpath = '//*[@id="gsc_a_b"]';

$dom = new DOMDocument();
@$dom->loadHTML($contents);

$xpath = new DOMXPath($dom);

$table = $xpath->query($citations_xpath);

$records = [];
foreach ($table as $row) {
    $trs = $row->getElementsByTagName('tr');

    foreach ($trs as $tr) {
        $td = $tr->getElementsByTagName('td')->item(0);

        $title     = $td->getElementsByTagName('a')->item(0)->nodeValue;
        $titleHref = $td->getElementsByTagName('a')->item(0)->getAttribute('href');
        $authors   = $td->getElementsByTagName('div')->item(0)->nodeValue;
        $conference = $td->getElementsByTagName('div')->item(1)->nodeValue;
        $cites     = $tr->getElementsByTagName('td')->item(1)->nodeValue;
        $year      = $tr->getElementsByTagName('td')->item(2)->nodeValue;

        $keyAuthor = "[KEY AUTHOR]";
        $authors = str_replace($keyAuthor, "<strong>$keyAuthor</strong>", $authors);

        $records[] = [
            'title'     => $title,
            'titleHref' => $baseUrl.$titleHref,
            'authors'   => $authors,
            'conference' => $conference,
            'cites'     => $cites,
            'year'      => $year,
        ];
    }
}

?>

<style>
  #php-table {
    font-family: sans-serif;
    color: #18181b;
    border-collapse: collapse;
    width: 100%;
  }

  #php-table thead th {
    padding: 1rem 0.8rem;
  }

  #php-table thead tr th:first-child {
    text-align: center;
  }

  #php-table th, td {
    border: 1px solid #ccc;
    text-align: left;
    padding: 0.6rem 0.8rem;
  }

  #php-table tr:nth-child(even) {
    background-color: #f6f6f6;
  }

  #php-table td > p {
    font-size: smaller;
    color: #777;
    margin: 0.4rem 0 0;
  }
</style>

<table id="php-table">
    <thead>
    <tr>
        <th>Title</th>
        <th>Cited&nbsp;by</th>
        <th>Year</th>
    </tr>
    </thead>
    <tbody>
    <?php
    foreach ($records as $record): ?>
        <tr>
            <td>
                <a target="_blank" href="proxy.php?url=<?= $record['titleHref'] ?>"><?= $record['title'] ?></a>
                <p><?= $record['authors'] ?></p>
                <p><?= $record['conference'] ?></p>
            </td>
            <td><?= $record['cites'] ?></td>
            <td><?= $record['year'] ?></td>
        </tr>
    <?php
    endforeach; ?>
    </tbody>
</table>
				
			

The post Scraping Google Scholar appeared first on Ben Wooding.

]]>
https://woodingben.com/scraping-google-scholar/feed/ 2 2135