Happy birthday, Charles…

On 14 de Feb de 2026 By Frederico MestreIn OpinionLeave a comment

A baby was born 217 years ago in Shrewsbury. This child, Charles, would grow into a reluctant revolutionary. At 22, he set out on a voyage that would change both his life and the course of science. Later, he laid out his ideas in a book. Books have a remarkable power: they let their authors whisper their ideas directly into our minds, and they may revolutionise the way we think.

Charles Darwin (1809–1882) travelled around the world aboard the HMS Beagle, collecting observations and specimens, and later withdrew to Down House, where he spent most of his life developing his ideas and writing “On the Origin of Species by Means of Natural Selection“.

Darwin, in 1854, before publishing “On The Origin…”.

A few years ago, visiting my brother in Cambridge, I went to an exhibition showing his famous notebook, the one where he first wrote his idea, his “Eureka moment”… the Tree of Life, as he conceived it.

This notebook has an interesting story on its own. Back in 2001, the curators of the library searched and searched, and ended up concluding that this and other notebooks had, most likely, been stolen. The local police and Interpol were informed. The Director of Library Services at Cambridge appealed for its safe return: “This public appeal could be critical in seeing the notebooks safely return, for the benefit of all, and I would ask anyone who thinks they may be able to help to get in touch. We would be hugely grateful to hear from any staff, past or present, members of the book trade, researchers, or the public at large, with information that might assist in the recovery of the notebooks. Someone, somewhere, may have knowledge or insight that can help us return these notebooks to their proper place at the heart of the UK’s cultural and scientific heritage.“

Then, in March 2022, the notebooks were left anonymously at the library, in a pink gift bag, inside a brown envelope. Whoever returned the two notebooks (this one and another) was kind enough to write “Librarian, Happy Easter X“. A polite thief!

I can’t tell you how excited I was to see his notebook. I’ve always been like this: standing before a painting by a great artist, I imagine them there, brush in hand, shaping the colours and thoughts on the canvas. Walking through a room Churchill once entered, or passing through a castle door that D. Afonso Henriques, the first king of Portugal, once crossed I kind of imagine their presence. These objects and spaces have a strange power to connect us to the people we admire. And there I was, in the presence of the notebook where Darwin first dared to write down his revolutionary idea.

Way before the internet, Darwin was also innovative in building a global network of correspondents. He exchanged observations, publications, and specimens with naturalists distributed throughout the world. Some countries were hubs of scientific activity; others, like Portugal (my country), contributed more sparingly. In fact, Darwin’s correspondence with Portuguese naturalists was rare, with only one confirmed correspondent: Francisco Arruda Furtado (1854–1887) (you may find their correspondence here), a naturalist based in the Azores.

This reminds us that revolutionary ideas spread across oceans, languages, and cultures, often carried by a single letter, a book or a curious mind. Scientific revolutions are inherently exciting, yet I struggle to judge whether they are becoming rarer or more frequent. On one hand, new discoveries seem to appear every day (mostly technological breakthroughs); on the other, transformative ideas like Natural Selection or Relativity remain extraordinarily rare… but this is beside the point…

Let’s celebrate Darwin… a few days past his birthday!

Beautiful Models, Ugly Policies

On 12 de Jan de 2026 By Frederico MestreIn OpinionLeave a comment

Ecology in 2025 talked big about transformation, but mostly just optimised business‑as‑usual conservation. Knowledge moved faster, models got smarter (AI moved from experimental applications to widespread use in ecological research), and our figures became painfully beautiful (I love that!), yet the underlying conservation machinery barely flinched. We celebrated marginal gains in our models’ performance and ever‑finer global forecasts, while leaving mostly untouched the slow, political, and unequal systems that decide who actually benefits from that knowledge and how quickly it enters real decision-making.

We devised ever more elaborate ways of predicting where species might be and how ecosystems might respond, with Species Distribution Models (SDM) (e.g. Poggiato et al. 2025), ensemble pipelines, sophisticated methods to predict biotic interactions (e.g. Habedank et al. 2025), and AI workflows promising transformative gains (e.g. Rafiq et al. 2025). Every week, papers are published that bring some incremental improvement to SDM, to the inference of biotic interactions or to how ecological networks respond to every type of disturbance (the areas to which I pay particular attention). Nevertheless, most of these advances amounted to smoother code, slightly higher performance scores, and prettier global layers sitting atop fundamentally unchanged institutions, incentives, and timelines for action, leaving the underlying distribution of vulnerability and influence in conservation largely intact.

Meanwhile, a lag remains between publishing a risk map and anyone with power having to respond to it. In 2025, we became more efficient at producing knowledge about unequal vulnerabilities than at changing the structures that perpetuate those vulnerabilities, turning high-resolution forecasts into yet another layer of documentation on problems we still struggle to politically confront.

Concerningly, the rising political instability, trade conflicts, and democratic backsliding diverted attention and bargaining power away from climate and biodiversity commitments, even as evidence accumulated that climate change and biodiversity loss are closely linked and rapidly worsening (Keck et al. 2025). The current political scene and lack of foresight of political leaders (who treat climate change as a hoax, allow exploration of protected areas, or do not consider biodiversity at all in public policies) are putting climate change and biodiversity loss in the backseat of everyone’s minds.

While we are incrementally using the vast technological capabilities we have at our disposal to create better ecological models and predictions, the world is gradually paying less attention. However, climate change and biodiversity loss do not care about our political divisions and beliefs (reality has the crazy habit of existing whether you believe in it or not).

The question we have to ask ourselves is: How good is it to have beautiful models and amazing results, if almost everyone in the executive power is looking away?

A good example, the recent recovery of the Iberian Lynx (Picture by David Osta – Pixabay).

Diversity and Extinction Risk of Amphibians in the Colombian Andes

On 28 de Nov de 2025 By Vinicius BastaziniIn Publications & PostersLeave a comment

The Andean mountain range in South America is characterized by a remarkable topological diversity, with steep altitudinal gradients occurring over a relatively small spatial extent. This rapid shift in elevation creates a wide range of environments, microclimates and habitats, fostering high levels of species diversity and endemism. As a result, this “compressed” vertical landscape contributes to high levels of beta diversity, as communities change significantly over short distances, making the Andes one the most biologically rich and ecologically and evolutionarily dynamic regions in the planet.

Amphibians are a striking example of this extraordinary biodiversity, with the Colombian portion of the Andes forming the heart of this amphibian hotspot (Figure 1). The country hosts 892 amphibian species, making it the second richest country in the world in the number of amphibian species (Figure 1), while also (sadly!) ranking first in the number of threatened species. The distribution of amphibian species is structured across four distinct elevational life zones in the Colombian Andes, each reflecting the region’s broad range of ecosystems: Tropical (100–1,000 m), Sub-Andean (1,000–2,350 m), Andean (2,350–3,500 m), and Páramo (above 3,500 m).

Figure 1. A sample of amphibian species illustrating the richness of the Colombian Andes. Photos by Pamela González-del-Pliego.

However, despite their ecological importance, this mountainous ecosystem has been severely impacted by habitat degradation since the sixteenth century, with rates of transformation accelerating in recent decades. Today, the Colombian Andes are home to nearly 70% of the country’s human population, placing immense anthropogenic pressure on the local ecosystems. As in other parts of the planet, agricultural expansion, cattle grazing, and infrastructure development have led to severe habitat loss and fragmentation. These pressures are directly linked to the alarming declines in amphibian populations. This high levels of biological diversity and endemism coupled with intense anthropogenic pressure make the Colombian Andes a critical hotspot for both biodiversity and conservation challenges.

In a new study, we explored changes in amphibian diversity and extinction risk across the different life zones of the Colombian Andes, providing novel insights into biodiversity patterns, shifts in species composition across spatial and altitudinal gradients, threat status, and the potential responses of amphibian assemblages to ongoing human impacts in the region.

Our results show that amphibian species richness in the Colombian Andes is highest in the sub-Andean region, followed by the Andean, Tropical, and Paramo regions. Overall, species richness peaks at intermediate elevations, reflecting the strong influence of elevation on community composition. Although we found clear differences ins species composition among life zones, taxonomic beta-diversity across regions is largely dominated by the nestedness component — that is, species in less diverse sites are subsets of those in more species-rich sites. The Paramo, despite having lower richness, contributes the most to total beta-diversity and to the nestedness component.

We also found that threat levels among amphibian species increase with elevation. In the Tropical region, approximately 30% of species are threatened, whereas in the sub-Andean, Andean, and Paramo regions, the percentages rise to 55%, 66%, and 60%, respectively. This pattern parallels human density and the proportion of pasture within species’ range areas, which also increase with elevation. Species in the Tropical region tend to have significantly larger range areas compared to species from higher-elevation zones. Consequently, regions with higher human density and greater agricultural intensification tend to harbor a disproportionately higher percentage of threatened species relative to non-threatened species, underscoring the strong link between anthropogenic pressures and extinction risk in Andean amphibians.

Our study contributes to a fundamental understanding of amphibian distribution patterns by highlighting differences in species richness, beta-diversity, and the proportion of threatened species across the four life regions, as well as the effects of human impacts on each region and its vulnerable species, thereby guiding conservation strategies and the spatial arrangement of protected areas to safeguard the long-term persistence of this unique biodiversity hotspot.

To find out more, read our new study “Diversity and extinction risk of Colombian Andean amphibians across life regions” just published in the book Andean herpetofauna – explorations of diversity, ecology, and conservation, which offers a comprehensive and multidisciplinary perspective on the region’s amphibians and reptiles. The book is organized into 18 chapters and encompasses a wide range of themes, including diversity and distribution patterns, evolutionary histories and life strategies, as well as ecological and physiological adaptations to high-altitude environments. It also addresses pressing conservation challenges and emerging threats, and critical knowledge gaps that remain to be filled, to create effective conservation policies and strategies to safeguard the Andean herpetofauna.

Scientific Dialogues

On 15 de Oct de 202515 de Oct de 2025 By Frederico MestreIn OpinionLeave a comment

This list focuses on (what I’m calling) scientific dialogues—in which researchers respond to each other’s work using published papers as messengers. Often, a paper sparks a reply, which may, in itself, prompt a response, creating a back-and-forth between research teams. What emerges is a public conversation carried out in the pages of scientific journals.

It is important to clarify that this list reflects my personal scientific perspective and research interests.

1. Classical metapopulations, are they real?

I haven’t directly worked with metapopulations in a few years, but my PhD had a strong component related to the ecological concept of “metapopulation”. But what is a metapopulation? It is neither a collection of isolated populations nor a large population. It is something in between. A set of “populations” connected by occasional dispersal events. What does this mean, in practical terms? Let’s imagine a species, such as the one I studied in my PhD (Microtus cabrerae), which prefers a specific type of habitat patch with a specific set of plant species. However, a few individuals travel, between habitat patches, in the “habitat matrix”. But do metapopulations exist in reality?

I think that, like so many other things in Science, and in Ecology in particular, the “metapopulation” concept is a useful simplification of reality. One that makes it easier to understand the complexity of Nature. This is the debate promoted by these two papers.

The classical metapopulation theory and the real, natural world: a critical appraisal – Baguette (2004)

This paper critically evaluates classical metapopulation theory, arguing that it has limited applicability to real-world populations. The author shows that most empirical examples are actually non-equilibrium systems found at the margins of species’ distributions with small, declining populations. Through butterfly case studies, Baguette demonstrates that metapopulational dynamics occur only when local populations are very small, while more stable systems exist where most habitat patches remain occupied. He concludes that classical metapopulation theory should not be universally applied to conservation planning without first verifying that real populations match the theory’s assumptions about high turnover rates and equilibrium states.

Metapopulation theory, its use and misuse – Hansky (2004)

The response by Hanski, one of the main proponents of metapopulations, contains a line that truly makes you pause and say, ‘wow!’: ” I start with a personal statement. Though Baguette (2004) does not say so clearly, his criticism is rather squarely focused on my work.”. Hanski critically engages with Baguette’s appraisal of metapopulation theory, defending patch-occupancy models as powerful tools for understanding species persistence in highly fragmented landscapes. He emphasizes that classical metapopulation models, especially using the incidence function approach (a mathematical depiction of the concept), are most useful for systems where many habitat patches exist and substantial population turnover occurs, with notable contributions in clarifying extinction thresholds and transient dynamics. He concludes that while classical metapopulation models may have limited value for small patch networks, they remain essential for studying the dynamics and conservation of species in fragmented and specialized habitats.

2. Climate change and species extinction

I’ve also worked with species range shifts as a result of climate change. In fact, my PhD thesis was precisely about this issue. This paper, by Thomas et al. (2004), is widely regarded as a landmark study addressing the effects of climate change on species distributions. It elicited a few responses in Nature (I’m not sure if there are more; these were the ones I could find!).

Extinction risk from climate change – Thomas et al. (2004)

This foundational study estimates extinction risks for over a thousand species under future climate scenarios by modeling species’ climatic niches and projecting range shifts by 2050. Using species–area relationships with different dispersal assumptions, they predict that 15–37% of species in their sample regions could be committed to extinction, with risks increasing under more severe warming. The work highlights climate change as a major threat comparable to habitat loss and underlines

Effects of changes in climate and land use – Buckley & Roughgarden (2004)

This response challenges Thomas et al.’s methodology, arguing that summing individual species’ range areas introduces circularity and overestimates extinction risk. They caution that extinction predictions based on aggregated range sums do not correctly reflect species-area relationships and call for more nuanced approaches to assessing habitat loss and climate impacts. They acknowledge climate change remains a valid extinction threat despite these methodological concerns.

Climate change and extinction risk – Harte et al. (2004)

Harte and colleagues point out that Thomas et al. may overestimate extinction risks by assuming species respond uniformly across their range and ignoring population-level genetic adaptation to climate. They discuss how species-level climate envelopes may mask important local adaptations, which could buffer some impacts. They advocate for integrating population genetics and refined spatial methods to improve extinction projections under climate change.

Uncertainty in predictions of extinction risk – Thuiller et al. (2004)

This paper emphasises the large uncertainties that arise from varying species distribution modeling techniques used to predict climate impacts on species ranges and extinctions. It illustrates how different modeling approaches may lead to notably different extinction estimates and flags sensitivity to key parameters like the species–area relationship exponent. The authors call for caution in interpreting extinction risk results and for more research to quantify model uncertainties.

Uncertainty in predictions of extinction risk/Effects of changes in climate and land use/Climate change and extinction risk (reply) – Thomas et al. (2004)

In their reply, Thomas et al. defend their original extinction risk estimates, reaffirming that climate change poses a major threat to biodiversity. While they acknowledge the uncertainties highlighted by critics they argue these do not undermine the overall risk signal. Their response underscores the urgent need for mitigation and adaptation actions to prevent large-scale extinctions.

3. Are we modelling species niches or their distributions?

This discussion revolves around whether correlative models in Ecology should be referred to as Ecological Niche Models (ENMs) or Species Distribution Models (SDMs), and what assumptions, implications, and conceptual clarity each term brings to ecological modeling. A great deal of my work in Ecology has been related to projecting species’ future potential distributions, resorting to these modelling approaches. I tend to agree with Warren’s argument, preferring ENM to SDM. What we model is, in fact, the portion of a species’ ecological niche that its occurrence allows us to see, even though there might be portions of the ecological niche not occupied.

In defense of ‘niche modeling’ – Warren (2012)

The author argues for retaining the term “niche modeling,” claiming that most applications of such models inherently assume they estimate some subset of the conditions under which species can survive and reproduce—that is, the niche. He contends that using only “distribution modeling” obscures the necessary niche assumption and may hinder progress by hiding the underlying conceptual framework. Warren acknowledges methodological and data limitations, but insists that explicit recognition of the niche assumption is crucial, especially when the models are used for evolutionary and conservation studies.

‘Niche’ or ‘distribution’ modelling? A response to Warren – McInerny & Etienne (2013)

In their response, McInerny and Etienne suggest that “niche” is confounding and impedes conceptual progress; instead, they advocate for the greater clarity offered by “species distribution modeling” (SDM). They assert that “niche” holds multiple, historically loaded meanings which introduce ambiguity about what the models are estimating. They further argue that SDM is better positioned to drive methodological and theoretical innovation, and that robust science calls for transparent acknowledgment of the data’s limitations and the many processes—abiotic, historical, dispersal, biotic interactions—that shape real species distributions. Ultimately, they see SDM as a more neutral, integrative term, challenging users to clarify model purposes and limitations.

‘Niche modeling’: that uncomfortable sensation means it’s working. A reply to McInerny and Etienne – Warren (2013)

Warren’s reply reasserts that the “niche” perspective is vital precisely because it provokes discomfort and skepticism, which in turn foster greater caution and transparency in model construction and interpretation. He challenges the argument that terminology should be exclusively dictated by the input data, pointing out that the aim of SDM/ENM is to model a biological phenomenon, not just data patterns. He cautions that “distribution” terminology may cloak conceptual pitfalls, whereas “niche” makes underlying assumptions visible and open to scrutiny—an outcome he sees as beneficial for the field.

4. Are inferred food webs a reliable method to address the effects of land-use at a continental scale?

In recent years, I’ve been more and more involved in studies addressing the effects of human-induced disturbances in ecological interactions, mostly food webs. I have already referred to this particular question in a previous post.

In a nutshell, the study by Botella et al. (2024) found that land-use intensification reduces predator presence and alters food web structure. This approach, however, was critiqued by Brimacombe et al. (2024) for relying on assumptions and lacking empirical validation. Botella and colleagues responded by acknowledging the limitations and stressing the difficulty of collecting fine-scale data. The exchange reveals key challenges in ecological network inference and the importance of robust methods.

Land-use intensity influences European tetrapod food webs – Botella et al. (2024)

Botella et al. present a macroecological study reconstructing local meta-food webs for 67,000+ European tetrapod communities using species presence data and a metaweb of known interactions. They concluded that an increased land-use intensity generally reduces apex predator and basal species proportion. However, some contexts show divergent responses (e.g., Mediterranean forests and Atlantic croplands), indicating strong context dependence. The methodology involves reconstructing food webs by assuming known interactions occur if species co-occur in a cell, focusing on spatial comparison rather than real-time interaction sampling (this relates to the section below).

Applying a method before its proof of concept: A cautionary tale using inferred food webs – Brimacombe et al. (2024)

Brimacombe et al. challenge the validity and practical utility of inferred meta-food web approaches. Their main point is the assumption that species interactions will always occur wherever two species (even years apart). Which, according to them, lacks ecological realism. Furthermore, they challenge that the statistical effect of land-use intensity, concluding that Botella et al. interpret their results and title boldly.

Don’t bite the hand that feeds you: Meta food webs help in the face of the Eltonian shortfall – Botella et al. (2024)

Botella et al. respond by defending local meta-food web approaches. They argue this method is widely used in Ecology when empirical interactions are hard to sample, and comparisons across space are needed. They acknowledge the limitation of vague temporal resolution but note that empirical food web sampling is also fraught with shortfalls (scarcity, bias, limited temporal/spatial window). They contend their statistical approach focuses more on effect sizes relevant to context than mere significance, and that context-dependent findings are ecologically valid. They call for future validation with empirical food webs but maintain that current meta-food web approaches are a reasonable, scalable starting point for macroecology.

5. Is species co-occurrence a good proxy for inferring biotic interactions?

This is closely related with the previous section. Through my work with the biogeography of food webs, I’ve resorted to two sources in order to obtain local networks to address the effects of human disturbances. I’ve used empirical food webs downloaded from databases, and I’ve used local food webs inferred from co-occurrence and information on trophic interactions.

There’s a long and rich history of debate on this topic, but here I’ll just refer to a special issue of the journal Biodiversity Informatics titled “Debate: Ecological Interactions and Geographic Co-occurrence” This issue brings together five key papers—two from each side of the debate, along with a concluding commentary by a third author—offering an excellent overview of the main arguments. While many other papers have contributed to this controversy, they fall outside the scope we’ve chosen to highlight: direct responses between authors.

Generally, the argument for co-occurrence is that “co-occurrence is a necessary but not sufficient condition for a biotic interaction to occur”, and this is precisely the argument made by Stephens et al. in their response to Peterson. I must say, I’ve implicitly (and explicitly) used this argument to infer interactions at large (continental) spatial scales, and I think it is a fair argument at these spatial scales.

That’s it!

Many other exchanges between researchers could be included here, I’m sure. These are the ones that felt most relevant to my work at different stages of my research. If you recall others that might be of interest in Ecology, feel free to share them in the comments below!

Climate Change Impacts on the Biodiversity of the Brazilian Cerrado: A Synthesis

On 2 de Oct de 2025 By Vinicius BastaziniIn Publications & PostersLeave a comment

The Brazilian Cerrado is a vast tropical savanna and one of the most important biodiversity hotspots on the planet. Spanning nearly 2 million km² (Fig. 1), it encompasses a dynamic mosaic of landscapes that sustains thousands of plant and invertebrate species, along with hundreds of amphibians, reptiles, birds, and mammals (Fig. 2)—many found nowhere else in the world—with some estimates exceeding 320,000 species (Parron et al. 2008). Beyond its extraordinary biological richness, the Cerrado is also known as the “cradle of waters” (“berço das águas” in portuguese), playing a crucial role in maintaining South America’s hydrological cycle, as it is home to the headwaters of major river basins and important underground aquifers. (Latrubesse et al. 2019).

Despite its ecological and hydrological importance, less than 20% of the Cerrado’s native vegetation remains somewhat undisturbed, and only around 7.5% of its area is protected within public conservation areas (Strassburg et al. 2017). Over the past decades, the Cerrado has become Brazil’s main agricultural frontier and a global center for commodity production (World Economic Forum 2024), which has intensified the anthropogenic pressures in the region. Climate change has reinforced these pressures, with recent evidence showing that the Cerrado is becoming hotter and drier, experiencing longer and more severe dry seasons (Hofmann et al. 2021, 2023), which threaten its unique biodiversity and compromise essential ecosystem functions and services.

Fig. 1. Distribution of the Cerrado (in green) — a vast savanna and one of the world’s richest hotspot of biodiversity.

In a recently published study, we provide a comprehensive synthesis of the current knowledge on climate change trends in the Cerrado, examining their effects on terrestrial biodiversity across multiple levels of biological organization and on key ecological processes.

Fig. 2. Examples of Cerrado Flora and Fauna. Photos from Brasília and Chapada dos Veadeiros National Parks.

Our review shows that climate change is significantly impacting the Cerrado, with its effects amplified by anthropogenic pressures—particularly land-use conversion, which directly alters fire regimes and the balance of carbon stocks and fluxes in the region. The Cerrado is becoming hotter and drier due to large-scale climate shifts, including the expansion of Hadley cells and the South Atlantic Subtropical Anticyclone, which alter atmospheric circulation and divert humidity, and the warming of the northern tropical Atlantic, which reduces moisture transport into South America. These processes are intensified by the widespread loss of native vegetation driven by the expansion of Brazil’s agricultural frontier, which reduces evapotranspiration, and contributes to local warming. The most evident regional changes include rising temperatures, higher vapor pressure deficits, decreased rainfall, and a prolonged, hotter, and more arid dry season.

We also show that the footprints of climate change are increasingly noticeable in the Cerrado, with observed impacts and model projections indicating effects across multiple levels of ecological organization. Ecological populations are struggling to cope with rising temperatures, facing reduced reproductive success as water becomes scarce and shifting their daily activity and seasonal timing in ways that alter selection pressures, ultimately driving species losses and/or range shifts. At the community level, species that once co-occurred are falling out of sync in space and time, disrupting biotic interactions, and reducing species diversity, which tends to leave communities poorer and more homogenized. At the ecosystem scale, primary productivity and biomass is declining, decomposition rates are shifting, and essential biogeochemical cycles—such as carbon and nutrient flows—are being disrupted. And these are just a few of the ecological changes we were able to identify in the Cerrado.

Safeguarding this world-renowned biodiversity hotspot and preventing further irreversible losses will require strong climate-mitigation and adaptation measures, especially a major shift in agricultural practices, large-scale ecosystem restoration, and expansion of the network of protected areas.

To find out more, read our new paper “Climate change in the Brazilian Cerrado: A looming threat to terrestrial biodiversity” just published in WIREs Climate Change.

References

Hofmann, G. S., Cardoso, M. F., Alves, R. J., Weber, E. J., Barbosa, A. A., de Toledo, P. M., … & de Oliveira, L. F. (2021). The Brazilian Cerrado is becoming hotter and drier. Global Change Biology, 27(17), 4060-4073.

Hofmann, G. S., Silva, R. C., Weber, E. J., Barbosa, A. A., Oliveira, L. F. B., Alves, R. J. V., … & Cardoso, M. F. (2023). Changes in atmospheric circulation and evapotranspiration are reducing rainfall in the Brazilian Cerrado. Scientific Reports, 13(1), 11236.

Latrubesse, E. M., Arima, E., Ferreira, M. E., Nogueira, S. H., Wittmann, F., Dias, M. S., … & Bayer, M. (2019). Fostering water resource governance and conservation in the Brazilian Cerrado biome. Conservation Science and Practice, 1(9), e77.

Parron, L. M., Aguiar, L. M. de S., Duboc, E., Oliveira-Filho, E. C., Camargo, A. J. A. de, & Aquino, F. de G. (2008). Cerrado: desafios e oportunidades para o desenvolvimento sustentável. Embrapa.

Strassburg, B. B., Brooks, T., Feltran-Barbieri, R., Iribarrem, A., Crouzeilles, R., Loyola, R., … & Balmford, A. (2017). Moment of truth for the Cerrado hotspot. Nature ecology & evolution, 1(4), 0099.

World Economic Forum. (2024). The Cerrado: Production and protection. 41p.

Scaling traits and functions: How habitat area shapes the multidimensional nature of functional diversity

On 18 de Sep de 202522 de Sep de 2025 By Vinicius BastaziniIn Publications & PostersLeave a comment

For over a hundred years, the Species–Area Relationship (SAR) has served as a fundamental concept for understanding changes in species diversity across spatial scales. Often described as the closest thing to a “law” in ecology, SAR captures the pervasive pattern that larger islands, or habitat patches, support more species than smaller ones. This principle has played a central role in ecological theory, helping researchers not only describe patterns of species diversity in oceanic archipelagos and patchy landscapes, but also to elucidate the spatial processes that shape ecological community dynamics. Because this pattern is so widespread—appearing across ecosystems and taxa worldwide—it has heavily influenced conservation practice, informing the identification of biodiversity hotspots, predicting species loss under habitat degradation, and guiding the design of protected areas to prioritize larger habitat patches within fragmented environments. For a comprehensive discussion of SAR’s history, theory, and broader applications, see Matthews et al. (2021) for an excellent overview.

Despite this long history, only recently have ecologist started to extend the SAR framework to encompass other facets of diversity, such as phylogenetic and functional diversity, in what is now often referred to as the Diversity–Area Relationship (DAR; e.g., Dias et al. 2020)—as well as higher levels of ecological complexity that emerge from the spatial scaling of networks formed by interacting species, known as the Network–Area Relationship (NAR; e.g., Galiana et al. 2022).

However, to make matters even more complicated, the distinct facets of biodiversity are themselves multidimensional. That is, each facet can be described along several dimensions that reflect not only different ways of measuring biological diversity, but also the ecological processes and “meanings” behind them. For example, functional diversity doesn’t just capture the presence or absence of species traits in an ecological community; it also reflects how many different kinds of ecological roles species perform in the ecosystem, how evenly those roles are distributed across species, and how distinct those roles are from one another. This means that when scaling biodiversity across space, we are not only counting how many species or functions exist, but also measuring how the distribution of organism’s traits and/or ecological roles varies, overlaps, and responds to environmental changes, shaping ecosystem structure and functioning. As a result, to fully understand how biodiversity scales with area requires that ecologists integrate the multiple facets and dimensions of biodiversity.

In a recently published study, we evaluated how different dimensions of functional diversity scale with area—the Functional Diversity–Area Relationship (FDAR). Specifically, we examined the following dimensions of functional diversity: Functional Richness, defined as the sum of differences in traits among taxa; Functional Divergence, which reflects the average trait difference between taxa; and Functional Regularity, which captures how evenly these trait differences are distributed. To explore the spatial scaling properties of functional diversity, we applied these metrics to bird assemblages (Fig. 1) inhabiting natural small-scale habitat “archipelagos” in southern Brazil, including wet grassland patches in swales and depressions surrounded by drier grasslands, bulrush stands in coastal plain, and washouts on sandy marine beaches (Fig. 2).

Fig. 1. A potpourri of birds from our habitat islands: Clockwise from top-left – Wren-like Rushbird (Raphael Kurz), Plumbeous Rail (Caio Belleza), Chestnut-capped Blackbird, South American Tern, American Oystercatcher, Southern Lapwing, Grassland Yellow-Finch, Straight-billed Reedhaunter, and Black-and-white Monjita (all others by Rafael A. Dias).

To understand how functional diversity scales with area, we compared a large range of statistical models commonly used in SAR studies, applying an information-theoretic approach in arithmetic space. Modeling FDARs this way preserves the original scale of the data, which allowed us to interpret the ecological meaning of the relationship between the multiple dimensions of functional diversity and area. For example, some curves may rise quickly in small patches and then level off, indicating that additional area adds little new functional diversity beyond a certain patch size. Others may increase steadily across all patch sizes, showing that diversity keeps accumulating as habitats get larger. In sigmoidal curves, gains in functional diversity are concentrated in intermediate-sized patches, while very small and very large patches contribute relatively little to increase in functional diversity. Thus, this multimodel approach allowed us to capture differences in how diversity accumulates with area and to consistently compare patterns across functional dimensions and habitat types.

Fig. 2. Examples of the three habitat-island types sampled in southern Brazil. From left to right: wet-grassland patches occurring in swales and depressions within drier grasslands, bulrush (Schoenoplectus californicus) stands on the coastal plain, and washouts on sandy marine beaches. Photos by Rafael A. Dias.

We show that functional diversity scales differently depending on the dimension under consideration. Functional Richness tended to rise sharply with patch size, as larger patches supported more species and, consequently, more ecological functions. In contrast, Functional Divergency increased more gradually, suggesting substantial functional overlap among species’ traits, while Functional Regularity often declined, reflecting that certain functional roles were either overrepresented and/or absent from habitat patches. Our results also suggest that species richness is the main driver of FDARs, although other ecological factors may also influence which functional roles were filled, with presence–absence models producing steeper slopes than abundance-weighted ones. Interestingly, some small and medium patches contained unique combinations of traits, missing from the larger patches, highlighting that smaller habitats, though often neglected in conservation planning, can contribute rare and irreplaceable functional roles.

At last, we also found limited evidence of trait clustering (i.e., species exhibiting greater similarity in traits than expected by chance) or trait overdispersion (i.e., species exhibiting greater dissimilarity in traits than expected by chance), indicating that stochastic processes influenced trait-driven community assembly. The few cases of clustering or overdispersion that we detected occurred across patches of all sizes, suggesting that habitat patch area does not affect these patterns.

Taken together, our results emphasize the need for a multidimensional perspective on FDARs to advance our understanding of community assembly in patchy environments, and to guide conservation strategies, as our result suggest that while larger habitat patches are crucial for maintaining species and functions, smaller patches may play a complementary and fundamental role by safeguarding functional uniqueness.

To find out more, read our new paper “The spatial scaling of multiple dimensions of functional diversity in habitat islands” just published in Functional Ecology.

References

Dias, R. A., Bastazini, V. A. G., Knopp, B. D. C., Bonow, F. C., Gonçalves, M. S. S., & Gianuca, A. T. (2020). Species richness and patterns of overdispersion, clustering and randomness shape phylogenetic and functional diversity–area relationships in habitat islands. Journal of Biogeography, 47(8), 1638-1648.

Galiana, N., Lurgi, M., Bastazini, V. A., Bosch, J., Cagnolo, L., Cazelles, K., … & Montoya, J. M. (2022). Ecological network complexity scales with area. Nature ecology & evolution, 6(3), 307-314.

Matthews, T. J., Triantis, K. A., & Whittaker, R. J. (Eds.). (2021). The species–area relationship: theory and application. Cambridge University Press.

Global conflict between shipping and marine biodiversity

On 12 de Aug de 202512 de Aug de 2025 By Frederico MestreIn Extended Abstracts, Publications & PostersLeave a comment

Humans dominate the Earth—we inhabit it, we disrupt it, and we traverse it. Ecologists have dedicated themselves to studying the impact of roads on terrestrial systems since the end of the 20th century. Road Ecology is the branch of Ecology that studies the impacts of terrestrial transportation networks on biodiversity, a term coined by the ecologist Richard T.T. Forman in the late 1990s. Its objectives include mapping and identifying the drivers of roadkill hotspots, prioritising sites for mitigation structures, assessing the impacts of traffic noise and light on species’ behaviour, reproduction, and habitat use, and developing guidelines to integrate ecological considerations into policy and planning. Recently, we also assessed the effects of road networks across Europe on biotic interactions (interactions between species), thus evaluating the effects on whole biological communities.

However, this is on land, but most of the planet is Ocean, and global commerce relies heavily on maritime transportation. Understanding the impacts of marine transportation on ecosystems is more complex, given the dynamic nature of oceans. This difficulty is understandable for several reasons. Roads on land are physical, fixed features—we know exactly where they are. In contrast, “marine roads” are diffuse, defined by patterns of traffic density. Moreover, wildlife casualties resulting from interactions with vessels are rarely observable at the site of impact, making it far more challenging to identify mortality hotspots. Therefore, as far as I’m aware, there was no study addressing the conflicts of marine traffic with biodiversity at a global scale and for a wide variety of marine species. There were, of course, some studies, mostly local or taxonomically restricted (e.g. Marçalo et al., 2025; Ritter & Panigada, 2019; Welsh & Whiterington, 2023). There was one study, in particular, that really caught my attention (Nisi et al. 2024). It addresses the conflict between four whale species and shipping. The species studied were: blue (Balaenoptera musculus), fin (Balaenoptera physalus), humpback (Megaptera novaeangliae), and sperm whales (Physeter macrocephalus). By compiling a huge dataset of whale locations to model the global distributions of four species and combining these with vessel positions, the authors were able to derive a global estimate of collision risk. Another study conducted a similar approach, but assessed the conflict with whale sharks (Rhincodon typus) (Womersley et al., 2024). But these works had one drawback: they addressed only cetaceans and whale sharks.

In our recent study, we compiled a large dataset of species ranges for sea turtles (7 species), cetaceans (85 species), and pinnipeds (32 species) sourced from AquaMaps and seabirds (370 species) from Carneiro et al. (2024).

We present a global analysis of shipless areas, quantify the overlap between shipping density and the distributions of marine taxa known to be vulnerable to vessel activity—cetaceans, sea turtles, pinnipeds, and seabirds. We identify locations where high biodiversity coincides with either low or intense shipping activity, designating them as Priority Preservation Areas (PPA) (low-conflict) and Priority Mitigation Areas (PMA) (high-conflict). We also evaluate the coverage of these zones within Marine Protected Areas (MPAs), Exclusive Economic Zones, and the High Seas.

Our results show that MPAs currently encompass 12.1% of shipless areas, 15.2% of PPAs, and 16.2% of PMAs, while no-take MPAs cover 6.8%, 9.5%, and 5.6% of these zones, respectively. Shipless areas are largely confined to polar and remote oceanic regions; PPAs are predominantly found at high southern latitudes, whereas PMAs cluster along coasts, especially in the mid-Pacific, southern Indian Ocean, and South Atlantic.

We highlight the importance of preserving low-conflict zones and applying targeted mitigation measures—such as traffic rerouting and speed reductions—in high-conflict areas. This framework directly supports global marine conservation priorities, including the 30 × 30 biodiversity target.

This is a simple, but necessary, first step: identify conflicts and areas to protect. Far more challenging steps lie ahead!

Our research:

Mestre F; D’Amico M; Bastazini VAG; Assis J; Jacinto D; Marçalo A & Ascensão F. (2025). Mapping global shipless areas and conflict zones between shipping and large marine vertebrates. Biological Conservation. DOI: https://doi.org/10.1016/j.biocon.2025.111431.

Reading lists in Ecology – Short Essays

On 25 de Jul de 2025 By Frederico MestreIn Opinion, Tips & Tricks2 Comments

Research papers are the “bricks” that build the wall we call “science.” But sometimes we need to look at this “wall” and reflect upon it. This reflection pushes us forward. This is a curated reading list of ecological essays that have stood the test of time or that we consider to be historically relevant for the science of Ecology. Many were written decades ago, yet they remain relevant for understanding the foundations of our field. In an era of rapid scientific output, it’s essential to pause, take a breath, and revisit the ideas that have shaped ecological thought. These essays offer not just knowledge, but perspective—a refreshing dive into the history of Ecology.

Of course, these are my choices, and I’m sure I’m missing/forgetting a few…

What initially sparked my interest in this type of writing was a series of essays —A View from the Park, published in Oikos, which were preceded by Thoughts from the Tropics.

1. Dan Janzen’s thoughts from the tropics

This is a collection of ten essays published in Oikos from 1985 to 1988, encapsulating the pioneering ecological insights and conservation philosophies of Daniel Janzen, an American evolutionary ecologist renowned for his extensive work in tropical ecosystems, particularly in Costa Rica. See the full list below at the end of the post.

2. A View from the Park (Oikos Virtual Issues)

A series of 27 essays written between 1990 and 1999 by John Lawton and originally published in Oikos. These are now available as a virtual special issue. These are short, beautifully written and provide some great perspectives on Ecological thought. A few years later, the same author wrote a retrospective essay. I provide the full list at the end of this post, but these are just a few that I find particularly interesting:

Are species useful? – The author discusses what is a good argument for conserving a species. Is it the fact that it might be useful to us? I’m going to give a spoiler here: the answer is “no”.

Patterns in ecology – Lawton argues that large-scale patterns are essential to ecology, yet underappreciated compared to small-scale experimental manipulation.

3. Other Classical essays

Concluding Remarks – Hutchinson (1957) – These were supposed to be just the concluding remarks of a scientific symposium. However, in it, Hutchinson formalised a precursor of the modern concept of the “ecological niche” as a multidimensional space of environmental variables.

Homage to Santa Rosalia or Why Are There So Many Kinds of Animals? Hutchinson (1959) – Here, Hutchinson (again!), explores the diversity of species and the complexity of ecological niches. And the title of this essay is just wonderfull.

A Note on Trophic Complexity and Community Stability – This essay by R. T. Paine challenges the widely held ecological axiom that greater species diversity and trophic complexity inherently lead to more stable communities. Paine argues that there is little empirical evidence to support this belief and instead highlights the critical role of specific “keystone species” in maintaining or disrupting community structure, regardless of overall complexity. Through case studies from marine ecosystems, he demonstrates how the removal or overpopulation of such species can cause significant ecological changes, suggesting that stability depends more on the influence of key species than on general trophic diversity. The essay calls for a reevaluation of traditional assumptions about community stability.

Why philosophers should be interested in otters, and why otters should be interested in philosophy – by van Liere (2004) – This “viewpoint” explores the ethical debate between ecocentric and biocentric philosophies in the context of otter reintroduction in the Netherlands. It highlights the moral complexities of valuing ecosystems versus individual animals and questions the justification of conservation actions from both philosophical perspectives.

Journals, such as Trends in Ecology and Evolution (TREE) also publish short essays and comments. A good example of TREE’s Discussions. Some really good examples are the discussions on The value of biodiversity (1997) and Ecology’s oldest pattern? (2001).

The first one discusses the fundamental link between biodiversity and ecosystem function, placing particular emphasis on how species richness underpins the effective delivery of ecosystem services, while the second revisits what may be the earliest recognised pattern in ecological science: the latitudinal diversity gradient, in which species richness increases from the poles toward the tropics.

4. Blogs

Finally, blogs are an excellent outlet for this type of scientific text. Additionally, blogs allow for an open discussion of Ecological topics. Here are three blogs we find particularly interesting (we are purposely excluding those associated with scientific journals and including only those that are still active). We also highlight a few posts we find particularly interesting.

Dynamic Ecology

Dynamic Ecology is authored by the ecologists Jeremy Fox, Brian McGill, and Meghan Duffy. It offers a blend of ideas, opinions, commentary, advice, and humor aimed at fellow academic ecologists and ecology students. The blog serves as a platform for professional conversations, complementing traditional scientific communication methods like peer-reviewed papers and conferences. Here are three great posts in Dynamic Ecology:

Zombie ideas in ecology – In this 2011 blog post Jeremy Fox critiques persistent ecological theories that continue to influence thinking despite being theoretically flawed or empirically unsupported. He focuses on the Intermediate Disturbance Hypothesis (IDH), which suggests that species diversity peaks at intermediate levels of disturbance due to a balance between competition and colonization.

Is macroecology like astronomy? – In this 2012 blog post Jeremy Fox examines the analogy between macroecology and astronomy, a comparison often drawn to defend macroecology’s reliance on observational data. He acknowledges that astronomy is a successful observational science, but questions whether macroecology can emulate its success.

The five roads to generality in ecology (UPDATED) – Here the author explores how ecologists can derive broad, unifying insights from the inherently diverse and context-dependent nature of ecological systems. He outlines five distinct approaches: 1. meta-analysis and statistical description; 2. focusing on key processes via simplified models; 3. identifying statistical attractors; 4. developing unifying theoretical frameworks, and 5. embracing context-dependent generalizations.

Ecological Rants

Ecological Rants is a thought-provoking blog by Charles Krebs and Judy Myers, offering candid insights into pressing ecological issues. Aiming to foster open dialogue beyond academic journals, the blog addresses topics that impact scientists, policymakers, and the public alike. Krebs and Myers use this platform to challenge assumptions, discuss methodological concerns, and advocate for evidence-based ecological science. Their posts invite constructive debate and encourage readers to engage critically with the complexities of ecological research and its real-world implications. Here are great posts in this blog:

The Correlation Coefficient as an Enemy of Science – Here, Charles Krebs critiques the prevalent misuse of correlation coefficients in ecological research. He argues that an overreliance on correlation without proper consideration of causation can lead to misleading conclusions. Krebs emphasizes that while correlation can indicate a relationship between variables, it does not establish a cause-and-effect link.

How Can We Best Advance Ecological Research – Ecologist Charles Krebs identifies four major challenges hindering progress in ecological science: 1. overabundance of ecological problems; 2. competing research paradigms (descriptive, long-term studies of ecosystems, and mechanistic, experimental approaches); 3. individualistic research culture; 4. inconsistent long-term funding.

Biodiversity Science – Here ecologist Charles Krebs critically examines the objectives and challenges of biodiversity science, focusing on two primary goals: 1. cataloging all species and 2. preventing species extinctions.

Conservation Bytes

ConservationBytes.com, created by ecologist Corey Bradshaw, is dedicated to highlighting, discussing, and critiquing conservation science. This blog has a very nice section on professional/writing tips for researchers and another one on classical works.

The state of global biodiversity — it’s worse than you probably think – Here the authors write about the fact that we are “without a doubt well within a sixth mass extinction event”.

More is better – This post is about the ‘diversity-productivity relationship’. This relationship predicts that higher plant species diversity should lead to higher net productivity. But the evidence is not clear and the author writes about a paper addressing this point.

5. Links

Dan Janzen’s thoughts from the tropics

1. On Ecological Fitting

2. Seeds as Products

3. Lost Plants

4. Science is Forever

5. Blurry Catastrophes

6. Habitat Sharpening

7. Oh, I Forgot about Zoos

8. When, and When Not to Leave

9. There are Differences between Tropical and Extra-Tropical National Parks

10. Buy Costa Rican beef

A View from the Park

1 Selwood.

2. Warbling in different ways.

3. Ecology as she is done, and could be done.

4. Are species useful?

5. There are not 10 million kinds of population dynamics.

6. Snakes, water and density dependent freedoms.

7. (Modest) advice for graduate students.

8. Eat Caribbean bananas.

9. On the behavior of autoecologists and the crisis of extinction.

10. Something new under the sun?

11. Peer review, co-evolution and tortoises.

12. What will you give up?

13. Webbing and WIWACS.

14. Ecology of the afterlife.

15. Patterns in ecology.

16. Corncrake pie and prediction in ecology.

17. Bog-plodging and sustainable development.

18. Nessiteras rhombopteryx.

19. Words: when is a fish not a fish?

20. The science and non-science of conservation biology.

21. Small is beautiful, and very strange.

22. Green tourism and nature’s services.

23. Small earthquakes in Chile and climate change.

24. Pigeons, peregrines and people.

25. Size matters.

26. http://www.worries.

27. Time to reflect: Gilbert White and environmental change.28. John Lawton’s View from the Park 28: a retrospective.

The Dynamics of the “Gentle Way”: Exploring Judo Attack Combinations as Networks in R

On 27 de May de 202528 de May de 2025 By Vinicius BastaziniIn R CodeLeave a comment

As the Judo World Championship draws near this June in Budapest, it feels like the perfect time to bring together my passion for Judo (and Brazilian Jiu-Jitsu) with my gusto for complex network analyses — a fusion that’s been a long time in the making! While my posts typically focus on biodiversity-related topics and statistical modeling, I’ve long considered sharing some thoughts on one of my most cherished interests: Judo. This martial art, with its rich history and intricate techniques, has fascinated me since my childhood. Judo, which means the gentle way in Japanese, is at its competitive heart a dynamic “chess match” of throws, holds, submission techniques, and strategic combinations. While individual techniques (called waza) are foundational, the real artistry lies in how they are chained together — through renraku-waza (combination techniques) and renzoku-waza (continuous combination techniques). Thus, in a Judo match, these individual techniques usually unfold as sequences of moves, often building toward a decisive action that is likely to result in a score.

But how can we objectively analyze which attack combinations work best together? Which techniques serve as crucial setups, and which are reliable “killer moves”? This is where network analysis can offer us some insights.

In this post, we’ll explore how to model combinations of Judo throwing techniques as a network using R, trying to uncover hidden patterns in attacking strategies. So, what we will be doing is treating each individual throwing technique as a node in a network, with an edge (or link) connecting two nodes when one technique naturally sets up or transitions into another as part of an attack sequence. In our Judo attack combination network we should be able to detect: i) which techniques are most frequently used to initiate successful combinations; ii) which techniques are common finishers, “killer moves”; iii) which techniques are most “influential” —or important— in the overall strategic attacking system?

It is important to note that, this post is not intended as a comprehensive review of judo attack combinations; rather, it draws from some classic literature (Kashiwazaki and Nakanishi 1995, Kawaishi 1963, van Haesendonck 1968). I will focus exclusively on two-move attack sequences using only techniques currently recognized by the Kodokan, the temple of Judo. A more exhaustive analysis—particularly relevant for high-performance athletes—would require empirical data from competitions, a broader inclusion of technical variations, etc.

In any case, I believe this post offers insights that will resonate with both martial arts enthusiasts and scientists interested in network analysis. For our ecologist readers, do not worry! I will provide links to some examples of ecological applications (EA) of these network analyses. You can also find more info in older network-related pots here.

Let’s start by loading the necessary packages, building the network from the compiled data, and visualizing it as an interactive network, where you can choose one node, i.e., technique, and see all its relationships.

## Packages 
require(igraph)
require(ggplot2)
require(dplyr)
require(tidyr)
require(RColorBrewer)
require(bipartite) 
require(bbmle)
require(influential)
require(visNetwork)

# -----------------------------
# 1. Build the Graph (directed)
# -----------------------------
cat("\n--- 1. Creating Directed Graph of Judo Attack Combinations ---\n")

# Define attack transitions between judo techniques
# Each line indicates a valid transition in a combination (directed from left to right)
attack_combinations_igraph =graph.formula(
  Seoi.nage-+Seoi.otoshi,
  Seoi.nage-+O.uchi.gari,
  Seoi.nage-+Ko.uchi.gari,
  Ippon.seoi.nage-+Seoi.otoshi,
  Ippon.seoi.nage-+Ko.uchi.gari,
  Ippon.seoi.nage-+Osoto.gari,
  Harai.goshi-+Osoto.gari,
  Harai.goshi-+Uchi.mata,
  Harai.goshi-+Soto.makikomi,
  Uchi.mata-+O.uchi.gari,
  Uchi.mata-+Ko.uchi.gari,
  O.goshi-+O.uchi.gari,
  O.goshi-+Ko.uchi.gari,
  O.goshi-+Harai.goshi,
  O.uchi.gari-+Uchi.mata,
  O.uchi.gari-+Ko.uchi.gari,
  O.uchi.gari-+Osoto.gari,
  O.uchi.gari-+Tai.otoshi,
  O.uchi.gari-+Harai.goshi,
  Ko.uchi.gari-+O.uchi.gari,
  Ko.uchi.gari-+Seoi.nage,
  Ko.uchi.gari-+Ippon.seoi.nage,
  Ko.uchi.gari-+Hane.goshi,
  Osoto.gari-+Harai.goshi,
  Osoto.gari-+O.uchi.gari,
  Osoto.gari-+Ko.soto.gake,
  Osoto.gari-+Sasae.tsurikomi.ashi,
  Osoto.gari-+Okuri.ashi.harai,
  Osoto.gari-+Hiza.guruma,
  Ko.soto.gari-+Osoto.gari,
  Ko.soto.gari-+Tai.otoshi,
  Ko.soto.gari-+Harai.goshi,
  Hiza.guruma-+Harai.goshi,
  Hiza.guruma-+Sasae.tsurikomi.ashi,
  Hiza.guruma-+Osoto.gari,
  Hiza.guruma-+De.ashi.harai,
  Okuri.ashi.harai-+Sode.tsuri.komi.goshi,
  Okuri.ashi.harai-+Tai.otoshi,
  Okuri.ashi.harai-+Harai.goshi,
  Okuri.ashi.harai-+Ippon.seoi.nage,
  Okuri.ashi.harai-+Seoi.nage,
  Tai.otoshi-+Ko.uchi.gari,
  Tai.otoshi-+O.uchi.gari,
  Hikikomi.gaeshi-+O.uchi.gari,
  Hikikomi.gaeshi-+Ko.uchi.gari,
  Hikikomi.gaeshi-+Harai.goshi,
  Hikikomi.gaeshi-+Ko.soto.gari,
  Hikikomi.gaeshi-+Sukui.nage,
  Tsuri.komi.goshi-+O.uchi.gari,
  Tsuri.komi.goshi-+Sode.tsuri.komi.goshi,
  Hane.goshi-+O.uchi.gari,
  Sasae.tsurikomi.ashi-+Uchi.mata,
  Sasae.tsurikomi.ashi-+Tai.otoshi,
  De.ashi.harai-+Tai.otoshi,
  De.ashi.harai-+Yoko.gake,
  Hiza.guruma-+Ko.soto.gake,
  Hiza.guruma-+Hane.goshi,
  Ko.soto.gake-+Hane.goshi,
  Ko.soto.gake-+Ko.uchi.gari,
  Ko.uchi.gari-+Ko.uchi.makikomi,
  Uki.goshi-+O.uchi.gari,
  Uki.goshi-+Tsuri.goshi,
  Tsuri.goshi-+O.uchi.gari,
  Koshi.guruma-+Ashi.guruma,
  Harai.goshi-+O.uchi.gari,
  Hane.goshi-+Harai.goshi,
  Hane.goshi-+Hane.makikomi,
  Ushiro.goshi-+Tai.otoshi,
  Ushiro.goshi-+Ura.nage,
  Tsuri.komi.goshi-+Harai.goshi,
  Tsuri.komi.goshi-+Ko.uchi.gari,
  Uchi.mata-+Harai.goshi,
  Tai.otoshi-+Seoi.otoshi,
  Uki.otoshi-+O.uchi.gari,
  Uki.otoshi-+Tomoe.nage
  #Koshi.guruma-+Kani.bassami# since this is an illegl movement, I'm excluding it
)

# We will now manually add the self-loops, that is, moves that can follow themselves
self_loops <- c(
  "Osoto.gari", "Osoto.gari",
  "Ippon.seoi.nage", "Ippon.seoi.nage",
  "Ko.soto.gari", "Ko.soto.gari",
  "Hiza.guruma", "Hiza.guruma",
  "Tai.otoshi", "Tai.otoshi",
  "Tsuri.komi.goshi", "Tsuri.komi.goshi"
)
attack_combinations_igraph <- add_edges(attack_combinations_igraph, self_loops)

# Create a node data frame for use with visNetwork
nodes <- data.frame(
  id = V(attack_combinations_igraph)$name,
  label = V(attack_combinations_igraph)$name,
  value = 15,  # All nodes same visual size
  color = "lightblue",
  font = list(color = "black")
)

# Extract edge list from the igraph object
edges <- igraph::as_data_frame(attack_combinations_igraph, what = "edges")

# -----------------------------
# 2. Interactive Network Plot
# -----------------------------
cat("\n--- 2. Interactive Network Plot using visNetwork ---\n")

# Build interactive graph with directional arrows and physics layout
visNetwork(nodes, edges, main = "🥋 Judo Attack Combination Network") %>%
  visNodes(font = list(size = 18)) %>%
  visEdges(
    arrows = "to",  # Show arrowheads
    color = list(color = "black", highlight = "black", hover = "black")
  ) %>%
  visOptions(highlightNearest = TRUE, nodesIdSelection = TRUE) %>%
  visPhysics(
    solver = "forceAtlas2Based",
    forceAtlas2Based = list(gravitationalConstant = -50),
    stabilization = TRUE
  )

🥋 Go to the interactive Judo attack combination network

We will start our analyses by looking at degree centrality (EA1, EA2), a node-level metric that quantifies a technique’s direct importance within attack combinations by counting how many other techniques it either sets up (out-degree) or is preceded by (in-degree) (EA3, EA4). Specifically, in-degree quantifies how many different techniques commonly lead into a particular attack, marking it as a frequent follow-up move, a “finisher” attack. On the other hand, out-degree measures how many subsequent techniques a technique typically sets up, highlighting its versatility as an initial move.


# -----------------------------
# 3. Degree Calculation
# -----------------------------
cat("\n--- 3. Calculating In, Out, and Total Degrees for Techniques ---\n")

# Compute degrees for each node
node_names <- V(attack_combinations_igraph)$name
in_degree <- degree(attack_combinations_igraph, mode = "in")
out_degree <- degree(attack_combinations_igraph, mode = "out")
total_degree <- degree(attack_combinations_igraph, mode = "all")

# Create data frame summarizing degrees
degree_df <- data.frame(
  Technique = node_names,
  InDegree = in_degree,
  OutDegree = out_degree,
  TotalDegree = total_degree
)

# Output summary
print("Degree Data for each Technique:")
print(degree_df)

# -----------------------------
# 4. Plot Network - Node Size Proportional to Its Degree
# -----------------------------
cat("\n--- 4. Plotting Networks Based on Degree ---\n")

# Set layout to be consistent across plots
set.seed(42)
layout_fr <- layout_with_fr(attack_combinations_igraph)

# Plot with in-degree node sizes
V(attack_combinations_igraph)$size <- in_degree * 2 + 5
V(attack_combinations_igraph)$label <- V(attack_combinations_igraph)$name
V(attack_combinations_igraph)$label.cex <- 0.7
V(attack_combinations_igraph)$label.color <- "black"
V(attack_combinations_igraph)$color <- "lightblue"
E(attack_combinations_igraph)$arrow.size <- 0.4
E(attack_combinations_igraph)$color <- "gray30"

plot1 <- function() {
  plot(attack_combinations_igraph, layout = layout_fr, main = "In-Degree Network")
  legend("topleft", legend = c("Low Degree", "Medium", "High"),#add legend
         pt.cex = c(6, 10, 14) / 5, pch = 21, pt.bg = "lightblue", col = "black",
         bty = "n", title = "Degree Scale")##add legend
}

# Plot with out-degree node sizes
V(attack_combinations_igraph)$size <- out_degree * 2 + 5

plot2 <- function() {
  plot(attack_combinations_igraph, layout = layout_fr, main = "Out-Degree Network")
}

# Combined panel plot
par(mfrow = c(1, 2), mar = c(1, 1, 4, 1))
plot1()
plot2()

To facilitate the visualization of each technique, we can plot the data on in- and out-degree as a barplot:

# -----------------------------
# 5. Bar Plot of Degrees
# -----------------------------
cat("\n--- 5. Plotting Degree Bar Charts ---\n")

# Transform data to long format for ggplot
degree_df_long <- degree_df %>%
  pivot_longer(cols = c(InDegree, OutDegree),
               names_to = "DegreeType",
               values_to = "DegreeValue") %>%
  mutate(DegreeType = factor(DegreeType, levels = c("InDegree", "OutDegree")))

# Create grouped bar chart
degree_plot <- ggplot(degree_df_long, aes(x = reorder(Technique, -DegreeValue), y = DegreeValue, fill = DegreeType)) +
  geom_bar(stat = "identity", position = position_dodge(width = 0.9)) +
  scale_fill_brewer(palette = "Set2", name = "Degree Type", labels = c("In-Degree", "Out-Degree")) +
  labs(
    title = "Judo Technique Combination Degrees",
    subtitle = "In-degree: follow-up attack\nOut-degree: initiation attack",
    x = "Judo Technique",
    y = "Degree Count"
  ) +
  theme_minimal(base_size = 12) +
  theme(
    axis.text.x = element_text(angle = 65, hjust = 1, vjust = 1, size = 9),
    plot.title = element_text(hjust = 0.5, face = "bold", size = 16),
    plot.subtitle = element_text(hjust = 0.5, size = 10),
    legend.position = "top",
    panel.grid.major.x = element_blank(),
    panel.grid.minor.y = element_blank()
  ) +
  geom_text(aes(label = ifelse(DegreeValue > 0, DegreeValue, "")),
            position = position_dodge(width = 0.9), vjust = -0.25, size = 2.5)

print(degree_plot)

So we see that Ouchi-gari has the highest total degree and in-degree, indicating it’s both a common follow-up and a frequent endpoint in attack sequences. Meanwhile, Hiza-guruma and O-soto-gari have the highest out-degree, suggesting they are often used as initiating moves in combination attacks.

Now, we will look at the cumulative degree distribution that illustrates the broader pattern of these connections (EA3, EA5, EA6), showing whether a few key techniques act as central hubs or if connections are more evenly spread across many techniques in combination sequences. We will now fit, three basic statistical models that can possibly describe the degree distribution: the power law, truncated power law, and exponential model. A power law distribution suggests that a small number of techniques are highly connected, participating in many combinations, while most techniques are only sparsely connected. This reflects a “rich-get-richer” dynamic, where techniques that already have many connections are more likely to gain additional ones. Such behavior is common in scale-free networks (see for instance EA5). On the other hand, a truncated power law follows a similar pattern but imposes a natural threshold, limiting the number of connections even for the most connected techniques. This implies that while preferential attachment may be present, constraints cap the dominance of any single technique. In contrast, an exponential distribution indicates that high-degree throwing moves are rare and the probability of a technique being highly connected declines rapidly. This model suggests a more uniform structure, compared to a power law model, where connections are distributed more evenly and there is no strong hub dominance. We will identify the “best-fitting” model based on the second-order Akaike Information Criterion (AICc).

# -----------------------------
# 6. Degree Distribution & Model Fitting
# -----------------------------
cat("\n--- 6. Degree Distribution & Model Comparison ---\n")

# Compute cumulative degree distribution
deg <- degree_distribution(attack_combinations_igraph, cumulative = TRUE, mode = "all")
deg_table <- data.frame(Degree = 0:(length(deg) - 1), CumulativeProbability = deg)

# Filter zero values
deg_table <- deg_table[deg_table$Degree > 0 & deg_table$CumulativeProbability > 0, ]

# Power-law
pl_model <- nls(CumulativeProbability ~ a * Degree^(-b),
                data = deg_table, start = list(a = 1, b = 2),
                control = nls.control(warnOnly = TRUE))

# Exponential model
exp_model <- nls(CumulativeProbability ~ a * exp(-b * Degree),
                 data = deg_table, start = list(a = 1, b = 0.1),
                 control = nls.control(warnOnly = TRUE))

# Truncated power-law
tpl_model <- nls(CumulativeProbability ~ a * Degree^(-b) * exp(-c * Degree),
                 data = deg_table, start = list(a = 1, b = 1.5, c = 0.05),
                 control = nls.control(warnOnly = TRUE))

# Compare models using second order Akaike information criterion (AICc)
AICctab(pl_model, exp_model, tpl_model, nobs = length(deg), weights = TRUE, delta = TRUE, base = TRUE)

# Plot with fitted models
plot(deg_table$Degree, deg_table$CumulativeProbability, log = "xy", pch = 16,
     xlab = "Degree", ylab = "P(K ≥ k)", main = "Cumulative Degree Distribution")

curve(coef(pl_model)[1] * x^(-coef(pl_model)[2]), add = TRUE, col = "blue", lwd = 2)
curve(coef(exp_model)[1] * exp(-coef(exp_model)[2] * x), add = TRUE, col = "red", lwd = 2)
curve(coef(tpl_model)[1] * x^(-coef(tpl_model)[2]) * exp(-coef(tpl_model)[3] * x),
      add = TRUE, col = "darkgreen", lwd = 2)

legend("bottomleft", legend = c("Power-law", "Exponential", "Truncated PL"),
       col = c("blue", "red", "darkgreen"), lwd = 2)

Based on model selection using AICc, the truncated power law provided the best fit for the degree distribution, suggesting that while a few techniques are highly connected within attack sequences, there’s a natural limit to how dominant any single technique can be.

Now, we will refine our understanding of which techniques are most critical, by calculating the Integrated Value of Influence (IVI) (EA7, EA8) for each technique. The IVI algorithm estimates the importance of nodes in a network – in our case, specific Judo throwing techniques – by combining the most important topological features of the network formed by these techniques. The IVI is the synergistic product of local, semi-local, and global network centrality measures, capable of identifying the most “regulatory” or pivotal techniques within the attack combination network. Thus, the most ‘influential’ techniques (those with the highest IVI values) are those that simultaneously exhibit high levels of “hubness” and a strong potential to direct or facilitate the flow of successful attack combinations.

# -----------------------------
# 7. IVI (Integrated Value of Influence)
# -----------------------------
cat("\n--- 7. Calculating IVI Centrality ---\n")

Graph_IVI = ivi(attack_combinations_igraph, mode = "all")

cent_network.vis(
  graph = attack_combinations_igraph,
  cent.metric = Graph_IVI,
  legend.title = "IVI",
  plot.title = "Attack Combination Network – IVI",
  layout = "kk",
  dist.power = 1.5,
  legend.position = "right",
  boxed.legend = TRUE,
  show.labels = TRUE
)

Based on the integrated value of influence, we can see t that Ouchi-gari has the highest influence in the network, followed by Harai-goshi and Hiza Guruma. Most of the other moves have very low values of IVI. This indicates that these techniques, foot throwing techniques (Ashi-waza), play a central role in shaping the overall structure of attack combination strategies.

At last, we will search for underlying strategic groupings within our Judo network, using the walk trap algorithm for community identification (EA9, EA10). This algorithm works by simulating short random walks starting from different techniques (nodes) in the network. The fundamental idea is that these random walks are more likely to get ‘trapped’ within densely connected groups of techniques – our “communities”of nodes. The algorithm hierarchically merges nodes to identify clusters of techniques that are more frequently used in conjunction, potentially representing distinct strategic approaches.

# -----------------------------
# 8. Community Detection
# -----------------------------
cat("\n--- 8. Community Detection using Edge Betweenness ---\n")

ceb = cluster_walktrap(attack_combinations_igraph)
V(attack_combinations_igraph)$color <- rainbow(length(ceb))[membership(ceb)]
V(attack_combinations_igraph)$label <- V(attack_combinations_igraph)$name
V(attack_combinations_igraph)$label.cex <- 0.7
V(attack_combinations_igraph)$label.color <- "black"
V(attack_combinations_igraph)$size <- 12
E(attack_combinations_igraph)$arrow.size <- 0.4
E(attack_combinations_igraph)$color <- "gray40"

par(mar = c(0.5, 0.5, 0.5, 0.5))
layout_fr <- layout_with_fr(attack_combinations_igraph)

plot(ceb, attack_combinations_igraph, layout = layout_fr,
     main = "Technique Clusters")
legend("topright", legend = paste("Cluster", 1:length(ceb)),
       col = rainbow(length(ceb)), pch = 21, pt.bg = rainbow(length(ceb)),
       bty = "n", cex = 0.8)

We can identify 9 clusters of techniques within our attack combination network. Most clusters were small, but two stood out as larger, more cohesive groups: The first cluster features techniques like Seoi-nage, Tai-otoshi, and Seoi-otoshi, being dominated by hand throwing techniques (Te-waza), while the the second cluster includes O-uchi-gari, Osoto-gari, Uchi-mata, and Hiza-guruma, is dominated by powerful foot throwing techniques (Ashi-waza). A more detailed analysis could help identify the factors behind these groupings — such as biomechanical similarities, preferred grips, or common tactical setups.

References

Kashiwazaki, K., Nakanishi, H. (1995). Attacking Judo: A Guide to Combinations and Counters. Ippon Books.

Kawaishi, M. (1963). Standing judo: The combinations and counter-attacks. Budoworks.

van Haesendonck, F.M. (1968) Judo: Ecyclopédie par l’Image. Éditions Erasme.

Mapping research landscapes and dynamics: Some basic bibliometric analyses with R

On 6 de May de 202525 de May de 2025 By Vinicius BastaziniIn R Code, Tutorial1 Comment

Understanding how scientific knowledge develops requires more than merely counting papers and citations. It requires a careful evaluation of how research topics and themes interconnect and transform over time. This is where bibliometric analysis becomes essential. As the volume of scientific journals and papers continues to grow exponentially, bibliometric analyses become indispensable for mapping and synthesizing an increasingly complex information landscape.

Through the analysis of publication information and citation patterns, bibliometric analyses allow us not only to assess scholarly productivity and impact, but more importantly, to quantify the scientific communication processes and to analyze and create indicators that reveal the dynamics and evolution of scientific information within specific disciplines and research programs, organizations, research teams or geographical regions. These kinds of tools are especially valuable for gaining a clearer understanding of research dynamics, which is essential when conducting literature reviews or shaping research strategies.

In this post, I’ll share some R code I developed for a recent bibliometric analysis project I have been involved with. While it’s not as comprehensive or user-friendly as established R packages such as Bibliometrix, which offers a rich suite of tools and a easy to use interface, this custom approach gave me the flexibility and control I needed for more tailored data handling and visualization. In this post, we’ll walk through a handful of simple bibliometric analysis and visualization techniques using R to reveal key patterns in research data, focusing on keywords and publication trends. Of course, this approach can also be extended to other important data fields, such as words in titles or abstracts. More specifically we will look at:

1. Word Cloud

We’ll start by building a basic word cloud that highlights the most frequently used keywords across the bibliographic dataset. Here, word size will be proportional to its frequency, offering a fast, intuitive snapshot of the field’s dominant terminology.

2. Keyword Co-Occurrence Network

Next, we’ll construct a keyword co-occurrence network, a visual map that shows how often keywords appear together in academic papers. Each keyword is represented as a node in the network, and edges (or links) are drawn between them when they co-occur in the same study. The size of a node reflects how frequently a keyword appears, while the thickness of an edge indicates how strongly two keywords are associated. We’ll also apply a community detection algorithm, the Louvain method, to identify clusters within the network— that is, groups of keywords that frequently appear together across documents. These clusters represent thematic groupings or potential research subfields, revealing the underlying conceptual structure of the literature and highlighting how different topics are connected. This approach might help to reveal the structure of a research field, showing which themes are more central, which are less developed, and how different areas of research are interconnected.

3. Thematic Map

Based on the co-occurrence network, we’ll generate a thematic map, using Callon’s centrality and density metrics. In this map, each cluster, named after the most common word in a cluster, from the co-occurrence network is represented as a bubble, and its size is determined by the frequency of words in the cluster. The X-axis represents the centrality of the cluster in the network, that is, the degree of interaction with other clusters in the graph, measuring the importance of a research topic. The Y-axis represents density, a metric of the internal strength of a cluster’s network and the growth of the topic. When mapping the themes in this plot, we can identify:

Motor themes (top right corner): Themes in this quadrant have high centrality and density, indicating that the themes are well-developed and crucial for structuring the research field.
Niche themes (top left corner): Themes that are highly specialized and well-developed in terms of internal research but have more limited interaction with other themes.
Peripheral them (bottom left corner): Themes with low centrality and low density, suggesting that they are underdeveloped and marginal, representing themes that either emerging or in decline in the literature.
Basic themes (bottom right corner): These themes have high centrality and low density. They are often essential for transdisciplinary research, meaning they may serve as foundational topics that cross the boundaries of multiple themes, but despite their central role in the network, these themes have low density of connections.

4. Yearly Keyword Trends

To understand the temporal dynamics of research fields, we’ll build a yearly keyword trend diagram using a Sankey plot. This diagram maps the flow of the most frequent keywords (here, I will be using a cut off of the ten most common key words, but this could be done for as many keywords as necessary), revealing how interest in specific topics rises or fades over time.

5. Decade-Based Keyword Evolution

At last, we’ll take a look at how the research field is changing through time, by aggregating keyword data by decade ( or whatever time frame one might want to look at). This decade-based evolution diagram shows the progression of top keywords (once again, I will use a cut off of the ten most common key words per decade) from one decade to the next, capturing long-term shifts and the persistence or disappearance of major research themes.

To start off, we’ll simulate a simple bibliographic dataset to work with. This will consist of a data frame containing 500 publications, each tagged with a publication year ranging from 2000 to 2025, along with a set of keywords. For the purpose of this basic tutorial, I’ve created a list of keywords that might typically appear in an evolutionary ecology or eco-evolutionary research paper, so let’s pretend that we are conducting a review on something like “coevolutionary dynamics in ecological communities”. Of course, in your real-world application, you’ll be working with your own bibliographic data, which will likely include additional fields such as authors, titles, abstracts, journals, citation counts, etc. That’s perfectly fine—the analyses here will use columns in this simulated data frame named "Year" and "Author_Keywords", but you can easily adapt the code by modifying the column header to match the structure of your dataset. And as I mention before, some of these analyses can be used for other bibliographic information, besides keywords. And it goes without saying: in your real-world application, you’ll be working with messy, inconsistent data, so you’ll likely need to do a lot of data cleaning/handling, such as combining similar keywords, handling typos and linguistic variations, and so on — so keep that in mind as you build your solution.

So, let’s start by loading the necessary packages, “creating” and organizing our dataset, and defining parameters for the analysis:

#### 1. Load packages
library(dplyr)
library(tidyr)
library(stringr)
library(igraph)
library(ggplot2)
library(ggrepel)
library(RColorBrewer) 
library(wordcloud)  
library(networkD3)
library(htmlwidgets)

### 2.Parameters
cat("--- 2. Setting Parameters ---\n")

# Simulation Parameters
n_studies = 500             # Number of simulated studies
start_year = 2000           # Start year for publications
end_year = 2025             # End year for publications
keywords_per_study = 5      # Number of keywords per study

# Keyword Pool
keyword_pool <- c(
  "coevolution", "arms race", "mutualism", "antagonism", "host-parasite",
  "plant-pollinator", "Red Queen hypothesis", "selection pressure", "adaptation",
  "phylogeny", "gene flow", "speciation", "ecological interaction", "community structure","community assembly","community disassembly",
  "predator-prey", "ecophylogenetics", "co-speciation", "evolutionary dynamics", "co-phylogenetic analysis",
  "adaptive dynamics", "local adaptation", "trait evolution", "phylogenetic signal",
  "functional traits", "ecological networks", "niche differentiation", "coevolutionary networks"
)

# Analysis Parameters
keyword_column_sim = "Author_Keywords" # Name for the keyword column in simulated data
year_column_sim = "Year"               # Name for the year column in simulated data
min_cooccurrence = 3                 # Min times two keywords must appear together
min_keyword_freq_network = 3           # Min total frequency for a keyword to be in the network plot
max_words_cloud = 75                   # Max words to display in the word cloud

# Sankey Parameters
num_top_keywords_yearly = 10           # Number of top keywords for the YEARLY Sankey diagram
num_top_keywords_per_decade = 10       # Number of top keywords PER DECADE for the DECADE Sankey diagram

### 3. Simulate Bibliographic Data
cat("--- 3. Simulating Study Data ---\n")

simulated_studies <- tibble(
  paper_id = 1:n_studies,
  Year = sample(start_year:end_year, n_studies, replace = TRUE)
)

# Generate keywords for each study
keywords_list <- lapply(1:n_studies, function(i) {
  sample(keyword_pool, keywords_per_study, replace = FALSE)
})

# Combine into a long format data frame (one row per keyword per paper)
keywords_long_sim <- simulated_studies %>%
  mutate(keywords = keywords_list) %>%
  unnest(keywords) %>%
  rename(!!sym(keyword_column_sim) := keywords,
         !!sym(year_column_sim) := Year) %>%
  select(paper_id, !!sym(year_column_sim), !!sym(keyword_column_sim)) %>%
  mutate(keyword = str_trim(tolower(!!sym(keyword_column_sim)))) %>%
  select(paper_id, year = !!sym(year_column_sim), keyword) %>%
  distinct(paper_id, year, keyword) # Ensure unique keyword per paper/year instance

cat("Generated", n_studies, "studies (", start_year, "-", end_year, ") with",
    nrow(keywords_long_sim), "unique keyword instances per paper/year.\n")
cat("Example simulated data:\n")
print(head(keywords_long_sim))

# Use this simulated data for the rest of the analysis
keywords_long <- keywords_long_sim

#### 4. Calculate Overall Keyword Frequencies
cat("\n--- 4. Calculating Overall Keyword Frequencies ---\n")
keyword_total_freq <- keywords_long %>%
  # Count unique keywords per paper first, then sum across papers
  distinct(paper_id, keyword) %>%
  count(keyword, name = "total_freq", sort = TRUE)

cat("Top overall keywords (based on number of papers):\n")
print(head(keyword_total_freq))

Now that we have our “dataset”, let’s create a visual representation of the word data, building a simple word cloud:

### 5. Word Cloud 
cat("\n--- 5. Generating Word Cloud ---\n")

if (exists("keyword_total_freq") && nrow(keyword_total_freq) > 0) {
  cat("   - Creating Word Cloud (Check RStudio Plots Pane)...\n")
  tryCatch({
    
    wordcloud(words = keyword_total_freq$keyword,
              freq = keyword_total_freq$total_freq,
              min.freq = 2, # Show words appearing in at least in 2 papers
              max.words = max_words_cloud,
              random.order = FALSE, # Plot most frequent words first
              rot.per = 0.30,      # Percentage of words rotated
              colors = brewer.pal(8, "Dark2")) # Color palette
    title(main = "", line = -1) # If you want, add a title near the top
  }, error = function(e) {
    cat("     > Error generating word cloud:", conditionMessage(e), "\n")
  })
  
} else {
  cat("   - Skipping word cloud (no keyword frequency data available).\n")
}

Now, let’s create the co-ocurrence network, and identify clusters:

### 6. Keyword Co-occurrence Network Analysis
cat("\n--- 6. Building Keyword Co-occurrence Network ---\n")
# (Uses keywords_long which contains year info, but pairs are per paper regardless of year)

# 6a. Generate keyword pairs within each paper
cat("   - Generating keyword pairs...\n")
keyword_pairs_unnested <- keywords_long %>%
  group_by(paper_id) %>%
  filter(n() >= 2) %>%
  summarise(pairs = list(combn(keyword, 2, simplify = FALSE)), .groups = 'drop') %>%
  unnest(pairs) %>%
  mutate(
    keyword1 = sapply(pairs, `[`, 1),
    keyword2 = sapply(pairs, `[`, 2)
  ) %>%
  select(keyword1, keyword2)

# 6b. Standardize & Count Pairs
cat("   - Counting co-occurrences (min =", min_cooccurrence, ")...\n")
keyword_pair_counts <- keyword_pairs_unnested %>%
  mutate(
    temp_kw1 = pmin(keyword1, keyword2),
    temp_kw2 = pmax(keyword1, keyword2)
  ) %>%
  select(keyword1 = temp_kw1, keyword2 = temp_kw2) %>%
  count(keyword1, keyword2, name = "weight") %>%
  filter(weight >= min_cooccurrence)

# 6c. Create and Filter Graph
graph_plot_obj <- NULL
communities <- NULL

if(nrow(keyword_pair_counts) > 0) {
  cat("   - Creating graph object...\n")
  graph_obj <- graph_from_data_frame(keyword_pair_counts, directed = FALSE)
  
  cat("   - Filtering graph (min degree =", min_keyword_freq_network, ") & detecting communities...\n")
  node_degrees <- degree(graph_obj, mode = "all")
  nodes_to_keep <- names(node_degrees[node_degrees >= min_keyword_freq_network])
  
  if(length(nodes_to_keep) > 0){
    graph_filtered <- induced_subgraph(graph_obj, V(graph_obj)$name %in% nodes_to_keep)
    graph_filtered <- delete.vertices(graph_filtered, degree(graph_filtered) == 0)
    
    if (vcount(graph_filtered) > 0 && ecount(graph_filtered) > 0) {
      communities <- cluster_louvain(graph_filtered)
      num_communities <- length(unique(membership(communities)))
      cat("     > Detected", num_communities, "communities (Louvain).\n")
      
      node_data <- tibble(name = V(graph_filtered)$name) %>%
        left_join(keyword_total_freq, by = c("name" = "keyword")) %>%
        mutate(total_freq = ifelse(is.na(total_freq), 1, total_freq))
      
      V(graph_filtered)$size <- log1p(node_data$total_freq) * 2.5
      V(graph_filtered)$label <- V(graph_filtered)$name
      V(graph_filtered)$community <- membership(communities)
      V(graph_filtered)$total_freq <- node_data$total_freq
      
      # Assign colors based on community
      if (num_communities > 0) {
        num_colors_needed = length(unique(V(graph_filtered)$community))
        if (num_colors_needed > 8) {
          community_colors <- colorRampPalette(brewer.pal(8, "Set2"))(num_colors_needed)
        } else if (num_colors_needed > 2) {
          community_colors <- brewer.pal(max(3, num_colors_needed), "Set2")[1:num_colors_needed]
        } else if (num_colors_needed == 2) {
          community_colors <- brewer.pal(3, "Set2")[1:2]
        } else { # num_colors_needed == 1
          community_colors <- brewer.pal(3, "Set2")[1]
        }
        community_map <- setNames(community_colors, sort(unique(V(graph_filtered)$community)))
        V(graph_filtered)$color <- community_map[as.character(V(graph_filtered)$community)]
      } else {
        V(graph_filtered)$color <- "grey"
        community_map <- NULL
      }
      
      graph_plot_obj <- graph_filtered
      cat("     > Filtered graph ready:", vcount(graph_plot_obj), "nodes,", ecount(graph_plot_obj), "edges.\n")
      
    } else {
      cat("     > Warning: Graph empty after filtering.\n")
      graph_plot_obj <- NULL
      communities <- NULL
    }
  } else {
    cat("     > Warning: No nodes met minimum degree requirement.\n")
    graph_plot_obj <- NULL
    communities <- NULL
  }
} else {
  cat("   - Warning: No keyword pairs met the minimum co-occurrence threshold.\n")
  graph_plot_obj <- NULL
  communities <- NULL
}

# 6d. Visualize Network
if (!is.null(graph_plot_obj)) {
  cat("   - Plotting Co-occurrence Network (Check RStudio Plots Pane)...\n")
  tryCatch({
    par(mar=c(1, 1, 3, 1))
    plot(graph_plot_obj,
         layout = layout_nicely(graph_plot_obj),
         vertex.frame.color = "grey40", vertex.label.color = "black",
         vertex.label.cex = 0.7, vertex.label.dist = 0.4,
         edge.color = rgb(0.5, 0.5, 0.5, alpha = 0.4), edge.curved = 0.1,
         edge.width = scales::rescale(E(graph_plot_obj)$weight, to = c(0.3, 3.0)),
         main = "Keyword Co-occurrence Network (Simulated Data)",
         sub = paste("Nodes sized by log(# Papers), Min Degree >=", min_keyword_freq_network)
    )
    if (!is.null(community_map) && length(community_map) <= 12 && length(community_map) > 1) {
      legend("bottomleft", legend = paste("Cluster", names(community_map)),
             fill = community_map, bty = "n", cex = 0.7, title="Communities")
    }
    par(mar=c(5.1, 4.1, 4.1, 2.1)) # Reset margins
  }, error = function(e){
    cat("     > Error plotting network:", conditionMessage(e), "\n")
    par(mar=c(5.1, 4.1, 4.1, 2.1)) # Reset margins on error
  })
} else {
  cat("   - Skipping network plot (no valid graph).\n")
}

In this example, the clustering algorithm identified three distinct clusters—groups of words that frequently co-occur across the papers. Based on these clusters, we will create a thematic map, where each cluster is represented as a bubble, visually illustrating the relationships and centrality of research themes within the broader network of keywords. This map will help us to better understand the underlying structure of the field and how different research topics are interconnected.

### 7. Thematic Map Analysis
cat("\n--- 7. Generating Thematic Map (Callon's Metrics) ---\n")

# Helper function
calculate_callon_metrics <- function(graph, communities_object, cluster_id) {
  if (is.null(graph) || !is.igraph(graph) || is.null(communities_object)) {
    return(list(centrality = 0, density = 0, n_keywords = 0))
  }
  cluster_nodes_indices <- which(membership(communities_object) == cluster_id)
  if (length(cluster_nodes_indices) == 0) {
    return(list(centrality = 0, density = 0, n_keywords = 0))
  }
  n_nodes_in_cluster <- length(cluster_nodes_indices)
  subgraph <- induced_subgraph(graph, cluster_nodes_indices)
  internal_weight_sum <- if (ecount(subgraph) > 0) sum(E(subgraph)$weight, na.rm = TRUE) else 0
  density <- internal_weight_sum
  external_weight_sum <- 0
  all_incident_edges_indices <- E(graph)[.inc(cluster_nodes_indices)]
  if (length(all_incident_edges_indices) > 0) {
    all_incident_edges <- E(graph)[all_incident_edges_indices]
    ends_matrix <- ends(graph, all_incident_edges, names = FALSE)
    mem <- membership(communities_object)
    is_external <- (mem[ends_matrix[,1]] != cluster_id) | (mem[ends_matrix[,2]] != cluster_id)
    external_edges <- all_incident_edges[is_external]
    if (length(external_edges) > 0) {
      external_weight_sum <- sum(E(graph)$weight[external_edges], na.rm = TRUE)
    }
  }
  centrality <- external_weight_sum
  return(list(centrality = centrality, density = density, n_keywords = n_nodes_in_cluster))
}


thematic_plot_obj <- NULL

if (!is.null(graph_plot_obj) && !is.null(communities) && length(unique(membership(communities))) > 0) {
  cat("   - Calculating Centrality and Density for communities...\n")
  
  community_ids <- unique(membership(communities))
  thematic_metrics <- lapply(community_ids, function(comm_id) {
    metrics <- calculate_callon_metrics(graph_plot_obj, communities, comm_id)
    nodes_in_comm_indices <- which(membership(communities) == comm_id)
    community_node_names <- V(graph_plot_obj)$name[nodes_in_comm_indices]
    community_node_freqs <- V(graph_plot_obj)$total_freq[nodes_in_comm_indices]
    
    if(length(community_node_names) > 0 && length(community_node_freqs) > 0 && !all(is.na(community_node_freqs))){
      most_frequent_keyword <- community_node_names[which.max(community_node_freqs)]
      community_label <- str_trunc(most_frequent_keyword, 30)
    } else {
      community_label <- paste("Cluster", comm_id)
    }
    
    return(tibble(
      community_id = comm_id, label = community_label,
      Centrality = metrics$centrality, Density = metrics$density,
      n_keywords = metrics$n_keywords
    ))
  })
  
  thematic_data <- bind_rows(thematic_metrics) %>%
    mutate(Centrality = as.numeric(Centrality), Density = as.numeric(Density)) %>%
    filter(!is.na(community_id), n_keywords > 0, is.finite(Centrality), is.finite(Density))
  
  if(nrow(thematic_data) > 0) {
    cat("   - Creating Thematic Map plot object...\n")
    median_centrality <- median(thematic_data$Centrality, na.rm = TRUE)
    median_density <- median(thematic_data$Density, na.rm = TRUE)
    median_centrality <- ifelse(is.finite(median_centrality), median_centrality, 0)
    median_density <- ifelse(is.finite(median_density), median_density, 0)
    cat("     > Quadrant thresholds (Medians): Centrality=", round(median_centrality,2), ", Density=", round(median_density,2), "\n")
    
    thematic_plot_obj <- ggplot(thematic_data, aes(x = Centrality, y = Density)) +
      geom_hline(yintercept = median_density, linetype = "dashed", color = "grey50") +
      geom_vline(xintercept = median_centrality, linetype = "dashed", color = "grey50") +
      geom_point(aes(size = n_keywords), alpha = 0.7, color = "steelblue") +
      geom_text_repel(aes(label = label), size = 3.0, max.overlaps = 15,
                      box.padding = 0.4, point.padding = 0.6) +
      scale_size_continuous(range = c(4, 12), name = "# Keywords\nin Theme") +
      ggplot2::annotate("text", x = median_centrality, y = Inf, label = "Motor Themes", hjust = 0.5, vjust = 1.5, size = 3.5, color = "grey40", fontface="bold") +
      ggplot2::annotate("text", x = -Inf, y = Inf, label = "Niche Themes", hjust = -0.1, vjust = 1.5, size = 3.5, color = "grey40", fontface="bold") +
      ggplot2::annotate("text", x = -Inf, y = -Inf, label = "Emerging/\nDeclining", hjust = -0.1, vjust = -0.5, size = 3.5, color = "grey40", fontface="bold") +
      ggplot2::annotate("text", x = median_centrality, y = -Inf, label = "Basic Themes", hjust = 0.5, vjust = -0.5, size = 3.5, color = "grey40", fontface="bold") +
      labs(
        title = "Thematic Map (Callon's Centrality & Density)",
        subtitle = "Keyword Clusters from Co-occurrence Network (Simulated Data)",
        x = "Centrality (Links to other themes)",
        y = "Density (Internal theme links)"
      ) +
      theme_minimal(base_size = 12) +
      theme(
        plot.title = element_text(hjust = 0.5, face = "bold"),
        plot.subtitle = element_text(hjust = 0.5),
        plot.margin = margin(20, 20, 20, 20)
      )
    
    # Visualize Thematic Map (Print to RStudio Plots Pane)
    cat("   - Plotting Thematic Map (Check RStudio Plots Pane)...\n")
    print(thematic_plot_obj)
    
  } else {
    cat("   - Warning: No valid thematic data to plot.\n")
  }
} else {
  cat("   - Skipping Thematic Map (network or communities missing).\n")
}

In this simulated example, “ecological networks” was positioned at the center of the plot, indicating its central role within the research landscape, while “ecophylogenetics” was classified as a motor theme, reflecting its importance and well-developed nature in the field. On the other hand, “evolutionary dynamics” appeared as a peripheral theme, suggesting that it is underdeveloped or marginal in the current body of research.

Now, let’s see the evolution of the research field, by first building a Sankey diagram, with the association between years, and the “most”used keywords in our simulated example:

### 8. Yearly Keyword Trend Analysis
cat("\n--- 8. Generating Yearly Keyword Trend Sankey Diagram ---\n")

sankey_plot_obj_yearly <- NULL

# 8a. Count Keywords per Year
cat("   - Counting keyword frequency per year (using unique paper/keyword counts)...\n")
keyword_yearly_counts <- keywords_long %>%
  distinct(paper_id, year, keyword) %>% # Count keyword once per paper per year
  count(year, keyword, name = "yearly_count") %>%
  filter(yearly_count > 0)

# 8b. Identify Top Keywords Overall (Using paper frequency calculated earlier)
cat("   - Identifying top", num_top_keywords_yearly, "keywords overall for yearly Sankey...\n")
if(exists("keyword_total_freq") && inherits(keyword_total_freq, "data.frame") && nrow(keyword_total_freq) > 0) {
  top_keywords_df_yearly <- keyword_total_freq
} else {
  cat("     > Warning: 'keyword_total_freq' not found. Recalculating based on yearly counts (less accurate representation of 'overall').\n")
  top_keywords_df_yearly <- keyword_yearly_counts %>% group_by(keyword) %>% summarise(total_freq = sum(yearly_count)) %>% arrange(desc(total_freq))
}

top_keywords_yearly <- top_keywords_df_yearly %>%
  slice_head(n = num_top_keywords_yearly) %>%
  pull(keyword)

if(length(top_keywords_yearly) > 0){
  cat("     > Top keywords for Yearly Sankey:", paste(top_keywords_yearly, collapse = ", "), "\n")
  
  # 8c. Prepare Data for Yearly Sankey
  sankey_data_yearly <- keyword_yearly_counts %>%
    filter(keyword %in% top_keywords_yearly)
  
  if(nrow(sankey_data_yearly) == 0) {
    cat("   - Warning: No yearly counts found for the top keywords. Skipping Yearly Sankey.\n")
  } else {
    cat("   - Preparing data for Yearly Sankey diagram...\n")
    year_nodes_chr_yr <- as.character(sort(unique(sankey_data_yearly$year)))
    keyword_nodes_sankey_yr <- unique(sankey_data_yearly$keyword)
    # Prefix years to distinguish from keywords if necessary
    all_node_names_yr <- c(paste0("Y:", year_nodes_chr_yr), keyword_nodes_sankey_yr)
    
    nodes_df_yr <- data.frame(name = all_node_names_yr, stringsAsFactors = FALSE) %>%
      mutate(id = row_number() - 1)
    
    links_df_yr <- sankey_data_yearly %>%
      mutate(
        source_name = paste0("Y:", as.character(year)),
        target_name = keyword
      ) %>%
      left_join(nodes_df_yr %>% select(name, source_id = id), by = c("source_name" = "name")) %>%
      left_join(nodes_df_yr %>% select(name, target_id = id), by = c("target_name" = "name")) %>%
      filter(!is.na(source_id), !is.na(target_id)) %>%
      transmute(
        source = source_id, target = target_id,
        value = yearly_count, group = target_name # Color links by target keyword
      ) %>%
      filter(value > 0)
    
    if(nrow(links_df_yr) == 0) {
      cat("   - Warning: Failed to create valid links for Yearly Sankey diagram. Skipping.\n")
    } else {
      # 8d. Generate Yearly Sankey Diagram Object
      cat("   - Creating Yearly Sankey plot object...\n")
      sankey_plot_obj_yearly <- sankeyNetwork(
        Links = links_df_yr, Nodes = nodes_df_yr, Source = "source",
        Target = "target", Value = "value", NodeID = "name",
        NodeGroup = NULL, LinkGroup = "group", units = "Papers",
        fontSize = 11, nodeWidth = 30, nodePadding = 15, sinksRight = TRUE,
        margin = list(top=5, bottom=5, left=5, right=5)
      )
      
      # 8e. Visualize Yearly Sankey (Print to RStudio Viewer Pane)
      if (!is.null(sankey_plot_obj_yearly)) {
        cat("   - Plotting Yearly Sankey Diagram (Check RStudio Viewer Pane)...\n")
        sankey_title_yr <- paste0("Flow of Top ", num_top_keywords_yearly, " Keywords Over Time (Yearly, Simulated)")
        sankey_plot_obj_yr_title <- htmlwidgets::prependContent(sankey_plot_obj_yearly,
                                                                htmltools::h3(sankey_title_yr, style = "text-align:center;"))
        print(sankey_plot_obj_yr_title)
      } else {
        cat ("   - Warning: Yearly Sankey plot object could not be created.\n")
      }
    }
  }
} else {
  cat("   - Warning: No top keywords identified for yearly Sankey. Skipping.\n")
}

At last, let’s see how the importance of keywords has shifted over the past decades by visualizing the changing importance of the most frequently used keywords in our simulated example. To do this, we’ll build another Sankey diagram that illustrates the flow and evolution, and the importance of these keywords across time. Please have in mind , that although I will set the number of top keywords per decade to 10, the Sankey diagram may display more than 10 keywords overall because it includes the top 10 from each decade, and if different keywords are dominant in different decades, this can lead to a larger combined set of unique keywords across the entire timeline.

### 9. Decade-Based Keyword Evolution
cat("\n--- 9. Generating Decade-Based Keyword Evolution Sankey ---\n")

sankey_plot_obj_decades <- NULL

# 9a. Aggregate by Decade and Calculate Frequencies
cat("   - Aggregating by Decade and Calculating Frequencies (using unique paper/keyword counts per decade)...\n")
keywords_decades <- keywords_long %>%
  filter(!is.na(year)) %>%
  mutate(decade = floor(year / 10) * 10) %>% # Calculate decade
  select(paper_id, decade, keyword) %>%
  distinct() # Count each keyword only once per paper within a decade

keyword_decade_counts <- keywords_decades %>%
  count(decade, keyword, name = "count") %>%
  arrange(decade, desc(count))

if(nrow(keyword_decade_counts) == 0){
  cat("   - Warning: No keyword counts per decade found. Skipping Decade Sankey.\n")
} else {
  cat("   - Decade counts calculated.\n")
  
  # 9b. Identify Top Keywords for Each Decade
  cat("   - Identifying top", num_top_keywords_per_decade, "keywords per decade...\n")
  top_keywords_per_decade <- keyword_decade_counts %>%
    group_by(decade) %>%
    slice_max(order_by = count, n = num_top_keywords_per_decade, with_ties = FALSE) %>%
    ungroup()
  
  keywords_to_track <- unique(top_keywords_per_decade$keyword)
  
  if(length(keywords_to_track) == 0) {
    cat("   - Warning: No top keywords identified across decades to track. Skipping Decade Sankey.\n")
  } else {
    cat("     > Total unique keywords to track (top", num_top_keywords_per_decade, "in any decade):", length(keywords_to_track), "\n")
    
    # Filter the counts to only include these keywords
    sankey_base_data_dec <- keyword_decade_counts %>%
      filter(keyword %in% keywords_to_track)
    
    if(nrow(sankey_base_data_dec) == 0){
      cat("   - Warning: No counts found for selected keywords to track. Skipping Decade Sankey.\n")
    } else {
      
      # 9c. Prepare Nodes and Links for Decade Sankey
      cat("   - Preparing nodes and links for Decade Sankey...\n")
      nodes_df_dec <- sankey_base_data_dec %>%
        mutate(name = paste0(decade, "s: ", keyword)) %>% # Node label: "2000s: coevolution"
        select(name) %>%
        distinct() %>%
        mutate(id = row_number() - 1)
      
      decade_list <- sort(unique(sankey_base_data_dec$decade))
      links_list_dec <- list()
      
      if (length(decade_list) > 1) {
        for (i in 1:(length(decade_list) - 1)) {
          current_decade <- decade_list[i]
          next_decade <- decade_list[i+1]
          
          current_decade_data <- sankey_base_data_dec %>% filter(decade == current_decade)
          next_decade_data <- sankey_base_data_dec %>% filter(decade == next_decade)
          common_keywords <- intersect(current_decade_data$keyword, next_decade_data$keyword)
          
          if (length(common_keywords) > 0) {
            temp_links_dec <- tibble(keyword = common_keywords) %>%
              mutate(source_name = paste0(current_decade, "s: ", keyword)) %>%
              left_join(nodes_df_dec %>% select(name, source_id = id), by = c("source_name" = "name")) %>%
              mutate(target_name = paste0(next_decade, "s: ", keyword)) %>%
              left_join(nodes_df_dec %>% select(name, target_id = id), by = c("target_name" = "name")) %>%
              # Link value is the count in the *target* decade (flow into that decade)
              left_join(next_decade_data %>% select(keyword, value = count), by = "keyword") %>%
              filter(!is.na(source_id), !is.na(target_id), !is.na(value), value > 0) %>%
              select(source = source_id, target = target_id, value = value, group = keyword)
            
            if(nrow(temp_links_dec) > 0){
              links_list_dec[[as.character(current_decade)]] <- temp_links_dec
            }
          }
        } # End for loop
      } # End if length(decade_list) > 1
      
      if (length(links_list_dec) > 0) {
        links_df_dec <- bind_rows(links_list_dec)
      } else {
        links_df_dec <- tibble(source = integer(), target = integer(), value = numeric(), group = character()) # Empty tibble
      }
      
      if (nrow(nodes_df_dec) == 0 || nrow(links_df_dec) == 0) {
        cat("   - Warning: Could not create valid nodes or links for Decade Sankey. Skipping.\n")
      } else {
        cat("     > Decade Nodes:", nrow(nodes_df_dec), "; Decade Links:", nrow(links_df_dec), "created.\n")
        
        # 9d. Generate Decade Sankey Diagram
        cat("   - Creating Decade Sankey plot object...\n")
        num_groups_dec <- length(unique(links_df_dec$group))
        if (num_groups_dec <= 12 && num_groups_dec > 0) {
          color_palette_dec <- RColorBrewer::brewer.pal(max(3, num_groups_dec), "Paired")[1:num_groups_dec]
          color_scale_js_dec <- paste0('d3.scaleOrdinal(["', paste(color_palette_dec, collapse = '","'), '"]);')
        } else if (num_groups_dec > 12) {
          color_scale_js_dec <- 'd3.scaleOrdinal(d3.schemeCategory10);'
          cat("     > Warning: >12 keyword groups for Decade Sankey, colors may repeat.\n")
        } else {
          color_scale_js_dec <- 'd3.scaleOrdinal(["#cccccc"]);' # Default grey
        }
        
        
        sankey_plot_obj_decades <- sankeyNetwork(
          Links = links_df_dec, Nodes = nodes_df_dec, Source = "source",
          Target = "target", Value = "value", NodeID = "name",
          LinkGroup = "group", NodeGroup = NULL, units = "Papers",
          fontSize = 10, nodeWidth = 35, nodePadding = 10, # Adjusted node width/padding
          sinksRight = FALSE, # Keep temporal flow L->R
          colourScale = JS(color_scale_js_dec),
          margin = list(top=5, bottom=5, left=5, right=5)
        )
        
        # 9e. Visualize Decade Sankey (Print to RStudio Viewer Pane)
        if(!is.null(sankey_plot_obj_decades)){
          cat("   - Plotting Decade Sankey Diagram (Check RStudio Viewer Pane)...\n")
          sankey_title_dec <- paste0("Evolution of Top ", num_top_keywords_per_decade, " Keywords by Decade (Simulated)")
          sankey_plot_obj_dec_title <- htmlwidgets::prependContent(sankey_plot_obj_decades,
                                                                   htmltools::h3(sankey_title_dec, style = "text-align:center;"))
          print(sankey_plot_obj_dec_title)
        } else {
          cat("     > Warning: Decade Sankey plot object is NULL.\n")
        }
      } 
    } 
  } 
}