OpenScienceLab

FAIR vs Open Data: Are They the Same?

Álvaro. OpenScienceLab — Fri, 06 Mar 2026 01:00:00 +0000

It is a common misconception that FAIR data must always be completely open and that all open data are inherently FAIR. However, this is not the case. Let us explore the definitions of both concepts and clarify the differences between them.

What Is Open Data?

Open Data refers to data that can be freely accessed, reused, and redistributed by anyone. This concept is grounded in the principle of universal access, meaning that data should be available to all without the need for special permissions or payments. However, it is also important to note that the reuse and redistribution of open data are often regulated by licenses, such as Creative Commons, to ensure proper usage and attribution.

What Is FAIR Data?

FAIR data refers to data that follows the FAIR principles: findable, accessible, interoperable, and reusable. These principles were formalized in 2016 to guide better data management and stewardship, particularly in the context of digital research outputs:

Findable: Data should have rich metadata and unique identifiers, making it easy for researchers and tools to locate it.
Accessible: Data and metadata should be retrievable through standardized protocols, possibly with authentication when necessary.
Interoperable: Data should use formats, vocabularies, and standards that allow for integration with other datasets.
Reusable: Data should have clear usage licenses and detailed provenance to enable validation and reuse.

Importantly, the FAIR principles do not require data to be open, as we will see in the next section, but instead focus on how data should be organized and described to maximize its discoverability and reuse by both humans and machines.

Notice that FAIR is a spectrum, not a binary state! Data can be more or less FAIR depending on the quality of metadata, the use of standards, and other factors.

The Difference Between FAIR Data and Open data

FAIR is not equal to open. The “A” in FAIR refers to “accessible under well-defined conditions”, which means that data do not necessarily need to be open to everyone. There are circumstances where openness could pose risks, such as with confidential or personal data. In essence, data should always be FAIR, but they are not required to be open in all cases.

However, it is important to note that data or data infrastructures should not use the FAIR principles as a justification for restricting openness when it is appropriate and necessary to share the data!

Beyond accessibility conditions, another key distinction lies in the focus of each concept: FAIR is primarily concerned with how data are structured, described, and managed to enable effective discovery, interoperability, and reuse, whereas open data is primarily concerned with legal permissions and the removal of access barriers. Data can be openly available online yet lack sufficient metadata, standardized formats, or clear provenance, making them difficult to interpret or reuse in practice. Ideally, data should be both FAIR and open whenever possible, but the two concepts address different dimensions of responsible data sharing.

New Horizons in Open and FAIR Data

As we approach the 10th anniversary of the FAIR data concept in 2026, it is essential to consider today’s evolving data requirements. With the rapid rise of AI, data needs extend far beyond just being FAIR. Some suggestions to extend the FAIR principles include:

The extension of the FAIR principles to FAIR-R (where the “R” stands for readiness) has been proposed to address the specific needs of AI systems. In this context, data readiness for AI refers to the process of preparing and ensuring the quality, accessibility, and suitability of datasets before they can be used in AI applications.

Or the proposal of adding the dimension of “understanding,”, sugested in a recent article in the Scholarly Kitchen. This proposal emphasizes that data should not only be accessible and reusable, but also comprehensible and interpretable within specific contexts.

As we move forward, the future of open and FAIR data lies in their ongoing refinement to meet the ever-changing demands of technology, ethics, and research, ensuring that data remains not just accessible and reusable, but also ready for the challenges of modern AI and beyond.

Celebrating Education Week with Richard West

Álvaro. OpenScienceLab — Wed, 04 Mar 2026 09:41:44 +0000

Celebrating Open Education Week by Welcoming Richard West to OpenScienceLab

We take advantage of Open Education Week 2026 to welcome Richard West to UC3M as a Fulbright Visiting Scholar. Open Education Week is a celebration of the global Open Education Movement, organized by Open Education Global. Its goal is to raise awareness about the movement and its impact on teaching and learning worldwide. And we are excited to announce Dr. West’s presence during this week, as his work exemplifies the very spirit of open education.

About Richard West

Dr. Richard West is a professor of instructional psychology and technology at Brigham Young University in the United States, and is visiting UC3M this semester as a Fulbright Scholar. His research expertise is in open education and open microcredentials/badges. He has created several popular open textbooks in the Edtechbooks.org platform including:

With David Wiley, he is the co-author of the 5Cs of Open Education Infrastructure framework, which defines the key characteristics of open education and articulates current opportunities for open education through generative AI.

He sees open education as having a reciprocal relationship with open science, where according to West:

“Open science can provide students with stronger and more current reading materials while also providing students the opportunity to reproduce and re-analyze research as open renewable assignments that are more engaging and that produce more productive learning. In return, open pedagogical practices can incorporate student work in creating open science that contributes back to the research community.”

Open Science for Sovereignity: the Next Challenge

Álvaro. OpenScienceLab — Mon, 02 Mar 2026 10:00:12 +0000

Open Science for Sovereignty: the Next Challenge

Monday, 9th March 2026 – 12:00 (UTC+01:00).

Day(s)

Hour(s)

Minute(s)

Second(s)

Streaming / Watch the conference

Conference Summary

As countries adjust to a rapidly changing geopolitical environment, leaders of middle powers are embracing national sovereignty with a twist: that to achieve sovereignty, countries must cooperate with other middle powers on technological development. Many European countries (as well as Canada) suffer from fragile innovation ecosystems in which firms find it difficult to develop and scale up technology at home, leading to innovation leakage to more power states. Reversing this trend is key to gaining sovereignty. In this presentation, Prof. Gold will discuss strategies to do so, including cooperation among researchers across middle-powers, policy experimentation, and deploying open science to build capacity and industry support.

About Richard Gold

E. Richard Gold is a CIGI senior fellow and a James McGill Professor with McGill University’s Faculty of Law and was the founding director of the Centre for Intellectual Property Policy.

He holds a B.Sc. from McGill University, an LL.B. from the University of Toronto, and an S.J.D. and LL.M. from the University of Michigan Law School.

He specializes in the legal, social, political, and economic aspects of intellectual property (IP) law and innovation. As a McGill professor, he teaches IP, international IP, and innovation policy, and, over the years, he has advised organizations like Health Canada, the WHO, and WIPO.

Why Scientific Publishing Is Stuck in the Past and How Open Science Can Fix It

Álvaro. OpenScienceLab — Fri, 06 Feb 2026 07:45:13 +0000

We often discuss how outdated the way researchers are assessed is: rewarding the number of publications and relying on obsolete metrics that fail to capture the real value of scientific work. What we talk about much less is another factor that deeply reinforces this problem: the way research itself is published.

Why Scientific Publishing Still Looks Like It Did Decades Ago?

Why do journals still publish static PDFs? A simple answer is: because it is cheap and extremely profitable.

From the publishers’ perspective, the PDF-based paper is an ideal product. It is standardized, scalable, and requires minimal innovation. The infrastructure has been amortized for decades, and the marginal cost of publishing yet another paper is very low, resulting in extremely high profit margins.

And taking into account where the money comes from (authors and institutions paying substantial APCs, sometimes reaching several thousand dollars per paper), it becomes clear why popular publishers (no need to name them; you already know exactly who they are) maintain this model. Authors and their institutions are left with little real choice: they must pay to ensure that years of work do not remain locked away in a drawer, and to remain competitive for funding, positions, and more favorable working conditions. But what makes this model even more striking is that the peer-review process is mostly unpaid. Reviewers, who are themselves researchers, volunteer their time and expertise to evaluate manuscripts, improve their quality, and uphold scientific standards. Editors are often academics as well. The intellectual labor that sustains the system is largely provided for free (authors pay to publish, while reviewers are not compensated, simple logic!)

Under these conditions, there is little incentive for publishers to fundamentally rethink how research is communicated. Moving away from static PDFs toward more modular, interactive, or living research outputs would require structural changes and potentially disrupt a business model that currently works very well for them. To give just one obvious example: in the Web 4.0 era, it is hard to justify why scientific articles still cannot natively support interactive features like data visualizations, simply because we remain tied to an outdated, static PDF format.

Modern Research Produces Far More Than Articles

Code, datasets, models, protocols, benchmarks, negative results, software tools, living documents, and shared infrastructures. Much of the work that enables scientific progress never fits neatly into a traditional paper and is therefore undervalued or ignored. By centering the entire system around papers, we create predictable consequences:

We prioritize quantity over substance,
Delay the dissemination of knowledge,
Discourage openness and collaboration,
And systematically overlook essential contributions that do not translate into “publishable units.”

Some of the types of research outputs

Innovation in publishing is not technologically difficult.
It is just economically inconvenient for publishers.

As long as prestige, evaluation, and funding remain tied to journals and their traditional formats, publishers have little reason to challenge the system that secures their revenue.

Open Science is the Solution

If the problem is structural, the solution must be structural as well.

A fairer and more sustainable research ecosystem requires breaking ties with organizations that extract value from science without proportionally contributing to it. As long as the infrastructure of research (publication, evaluation, and discovery) is controlled by profit-driven intermediaries, meaningful change will remain limited.

Encouragingly, some institutions are starting to take concrete steps in this direction. Recent initiatives such as the decision by CNRS (2025) to break free from Web of Science set an excellent precedent for other research institutions to follow. These moves matter even more because open alternatives already exist. Platforms like OpenAlex show that open, community-driven bibliographic infrastructures are both possible and effective.

Detaching research assessment and discovery from commercial interests is not just an ideological stance; it is a practical requirement for progress. Open infrastructures enable:

Fairer evaluation practices,
Reproducibility and transparency,
Broader access to knowledge,
And recognition of diverse research outputs beyond papers.

Open Science is not simply about open access to PDFs.
It is about reclaiming control of the research ecosystem itself.

Step by step, these changes can help build a research ecosystem that is more just, more efficient, and ultimately more aligned with the public good it is meant to serve. Hopefully, more organizations will join these initiatives and commit to free, open, and publicly accessible knowledge.

The Problems of Using AI for Academic Research

Álvaro. OpenScienceLab — Mon, 01 Dec 2025 08:12:56 +0000

Artificial intelligence has become a popular tool for finding information, summarizing texts, and even suggesting academic sources. But while AI can speed up the research process, it also introduces a set of risks that many users overlook.

Invented Information and “Hallucinated” Sources

Some widely used AI tools—such as general-purpose models like GPT—often produce citations, authors, or articles that look convincing but simply don’t exist. A recent study by Oladokun et al. (2025) found a disturbingly high frequency of false or non-existent citations, with 42.9% in ChatGPT-3.5 and 51.8% in ChatGPT-4o, underscoring how unreliable GPT models are for this task. And this problem is also present in other AI search tools like Perplexity, Perplexity Pro, DeepSeek Serarch, Copilot, Grok2, Grok3 y Gemini (Jaźwińska et al., 2025)

However, there are specialized AI tools designed specifically for locating scientific articles—such as Consensus (free) or Elicit (freemium)—that significantly reduce the risk of invented references. Nevertheless, while they avoid generating fake citations, they still present other challenges that we will explore below.

Researcher-Induced Bias and Selective Queries

Another subtle but important problem arises when the researcher’s own question reinforces pre-existing assumptions. For example, a query such as “show me articles that support [specific claim]” inadvertently promotes confirmation bias, because the AI will prioritize papers aligned with that statement while ignoring literature that challenges or contradicts it.

And Another Layer of Bias: Training Data Limitations

Beyond user-induced bias, AI systems also reproduce biases embedded in their training data. Because they learn from internet-scale text and highly digitized sources, they tend to favor:

English-language publications,
Dominant theoretical perspectives over minority or emerging viewpoints.

Misinterpretations and Mixed Content

Even when an AI retrieves a real and correctly cited article, it may still misunderstand the study’s conclusions or blend information from multiple texts. This can produce summaries that appear accurate at first glance but actually distort the original meaning. In some cases, the AI may attribute a conclusion to a paper that never makes such a claim—or worse, one that argues the opposite. This issue is especially common in specialized academic retrieval tools, such as the previously mentioned Consensus and Elicit, and it can even occur in systems like NotebookLM, where the model may misinterpret scientific content that the user has personally uploaded.

Difficulty Assessing Source Reliability

Another major limitation of AI-powered research tools is their inability to consistently distinguish between high-quality, peer-reviewed literature and sources with no credibility, for example, from journals known for questionable research practices (QRP).

That’s why AI should be understood as a complementary tool—not a substitute for academic judgment. Using it responsibly requires verifying references, consulting original papers, questioning assumptions, and engaging critically with the literature. When researchers combine the efficiency of AI with thoughtful human oversight, they can benefit from its capabilities while minimizing its most common risks.

References

5942012 {16784933:CXNGZGIS},{16784933:4WPPQ3KV} 1 apa 50 default 2835 https://opensciencelab.uc3m.es/wp-content/plugins/zotpress/

Six Ways Open Science Is (and Isn’t) Changing Research Culture

Álvaro. OpenScienceLab — Fri, 14 Nov 2025 06:55:29 +0000

This year, Science Europe released something big.

This is The Scoping Review: the Contributions of Open Science to Research Culture, on how open science is actually shaping the way we work, collaborate, and care in research. Open science in real practice, for real.

It’s the first systematic look at whether open science really delivers on its promises of:

Inclusion
Transparency
Integrity
Care
Collaboration
and Freedom.

The review scanned 62 studies from around the world, mapping how these six values show up —or don’t— in real research settings. And the results have been inspiring, sometimes messy and sometimes uncomfortable.

So here’s our OpenScienceLab digest.

A quick tour through the six cultural values of open science. What the evidence says, where it falls short, and what we can learn moving forward.

Let’s talk about culture. Research culture.

Ready? Let’s dive in.

Equality, Diversity & Inclusion

Open science likes to say it’s for everyone.

But… in the Scoping Review, just five studies actually explored this dimension and told us indeed a split story.

On the bright side, openness can amplify women’s visibility and leadership.

While the flip side is sharper when it reveals that openness without justice can deepen inequality.

So maybe the question isn’t how open we are, but for whom that openness works.

Recommendation: Pair every open policy with equity checks. Otherwise, inclusion remains just another promise.

Openness & Transparency

It’s the heart of the open science promise when it says make everything visible, shareable, reviewable.

Out of the 62 studies, over twenty dealt with openness and transparency. But again, the story splits in two.

When openness is grounded in trust, it works.

Yet, when it’s imposed from above, things break.

In the end, transparency is only as strong as the trust behind it and only works when trust runs both ways.

Recommendation: Instead of enforcing openness, build confidence first. The culture change will follow.

Integrity & Ethics

Everyone agrees that science should be honest, transparent, and accountable. The question is whether openness is really helping us get there.

In the Scoping Review, around a dozen studies looked at how open science interacts with integrity. The picture is promising, though perfection is still out of reach.

Some good news first. Open practices like preregistration and replication are slowly moving from rebellion to routine.

Still, there’s a catch. When openness turns into pressure, for example to publish, to disclose, to perform, it can backfire.

So, open science can strengthen ethics when it empowers, not overwhelms.

Recommendation: Support honesty, and integrity takes root.

Care & Collegiality

We talk plenty about data and metrics, almost never about care.

In the Scoping Review, only one study focused directly on care. One. And it said a lot by what it revealed.

Open science depends on curators, data stewards, and behind-the-scenes work that often goes unseen.

It’s a quiet reminder that openness still runs on invisible labour. The kind that keeps research flowing, but rarely gets credit.

Recommendation: Make care visible. Recognize the hidden roles that make openness possible.

Collaboration

Open science loves to celebrate teamwork. Collaboration, however, doesn’t just happen because we say “open”.

In the Scoping Review, collaboration showed up in several studies, often tied to infrastructures and citizen science.

When researchers build things together, like data platforms, communities and shared tools, collaboration grows strong and steady.

Even so, when the system rewards competition over connection, cooperation crumbles.

True collaboration takes time, care, and shared purpose, well beyond data sharing. In fact, it’s something data alone cannot achieve.

Recommendation: Build collaboration as culture, starting with relationships.

Autonomy & Freedom

We often think openness gives researchers more freedom. But does it always?

In the Scoping Review, only a few studies touched this theme, and most of them sounded a warning.

It suggests when openness turns rigid, creativity suffers.

At its best, open science can empower autonomy when it protects the space for creativity to thrive.

Recommendation: Keep openness flexible. Apart from sharing, freedom in research means having space to think.

Reading Culture Through Openness

The Scoping Review shows that open science is changing research culture.

—Unevenly, imperfectly, yet unmistakably.

With this digest, the OpenScienceLab reads those shifts up close, tracing where culture is already changing and where it still resists. The process is already underway.

The full version of the Science Europe analysis is available at: https://doi.org/10.5281/zenodo.17379695.

TWIN4MERIT- International Thematic Symposium

Álvaro. OpenScienceLab — Mon, 03 Nov 2025 09:08:29 +0000

TWIN4MERIT – International Thematic Symposium

Rethinking Research Assessment

Monday, 24th November 2025 from 9:45 to 16:00 PM (UTC+01:00).

Platform: Microsoft Teams.

Day(s)

Hour(s)

Minute(s)

Second(s)

Watch the Symposium online

Symposium Summary

The TWIN4MERIT International Thematic Symposium will provide a platform for experts to discuss the evolving landscape of research assessment. The event will cover various topics, including the role of university libraries in enhancing research visibility through institutional repositories and open access policies, the influence of sustainability rankings on university evaluations, and the growing importance of citizen engagement in research. Discussions will also focus on rethinking traditional academic evaluation metrics, with a shift towards more inclusive and narrative-based approaches, particularly for early-career researchers. The symposium will conclude with a panel on the future of research assessment in the open science era, emphasizing the need for more collaborative, transparent, and inclusive evaluation systems.

Speakers

PhD. Gema Bueno de la Fuente

Universidad de Zaragoza

PhD. Nuria Bautista Puig

Consejo Superior de Investigaciones Científicas (CSIC)

PhD. Emanuel Kulczycki

Adam Mickiewicz University in Poznań

PhD. Yensi Flores Bueso

University College Cork

PhD. Zacharias Maniadis

University of Cyprus

PhD. Eva Méndez

Universidad Carlos III de Madrid

Symposium Schedule

Time (CET, UTC+1):	Time (EET, UTC+2):	Session Details:
09:45 – 10:00	10:45 – 11:00	Welcome & Introduction (15 min) Moderator: Welcome participants and introduce the symposium objectives.
10:00 – 10:45	11:00 – 11:45	PhD. Gema Bueno de la Fuente (Universidad de Zaragoza) Session 1 – University Libraries & Research Assessment (45 min) Abstract: This session explores how institutional repositories and open access policies are transforming the visibility, evaluation, and strategic positioning of research in higher education. The first part examines the management and governance of institutional repositories, analysing how they function not only as archives but as instruments for enhancing institutional identity and research transparency. The second part focuses on the role of open access policies in increasing research visibility and compliance with national and international mandates. Finally, the session outlines practical strategies through which libraries can play an active role in research assessment, supporting academics in data curation, metadata standardisation, and the responsible use of metrics.
10:45 – 11:30	11:45 – 12:30	PhD. Núria Bautista Puig (CSIC – Consejo Superior de Investigaciones Científicas) Session 2 – Reframing research assessment: from sustainability rankings to citizen engagement (45 min) Abstract: This session explores how sustainability and participation are reshaping the way research and higher education institutions are assessed. The first part analyses five major sustainability rankings (UI GreenMetric, THE Impact Rankings, QS Sustainability Rankings, STARS, and People & Planet) to examine how they operationalise sustainability through their indicators. This comparison reveals that most rankings still prioritise environmental aspects over social and educational ones, highlighting the need for more holistic and transparent frameworks that capture the multidimensional nature of sustainability in universities. The second part extends this reflection to the participatory dimension of research, analysing projects from citizen science databases. The results show that, although public participation is widespread, the depth of engagement remains limited.
11:30 – 13:00	12:30 – 14:00	Lunch Break
13:00 – 13:45	14:00 – 14:45	PhD. Emanuel Kulczycki (Adam Mickiewicz University in Poznań) Sub-session 3a – “The Evaluation Game” (45 min) Abstract: This session explores how the rules of academic evaluation are being questioned and redefined in the face of growing criticism of traditional research assessment systems. The first part examines the limitations of conventional indicators such as journal impact factors and citation counts, highlighting their inability to capture the diversity, quality, and societal relevance of research. The discussion then turns to the systemic challenges that perpetuate metric-driven cultures, including incentive misalignments, disciplinary biases, and the marginalisation of open and collaborative practices. The second part presents alternative strategies for improving academic evaluation, drawing on emerging frameworks such as responsible metrics, qualitative peer review, and narrative-based assessment. Overall, The Evaluation Game invites participants to rethink how value and excellence are defined in academia, encouraging a shift towards fairer, more transparent, and context-sensitive models of research assessment.
13:45 – 14:30	14:45 – 15:30	PhD. Yensi Alejandra Flores Bueso (University College Cork) Sub-session 3b – Narrative CVs and Early-Career Researchers (45 min) Abstract: This session examines how narrative CVs are reshaping research assessment towards more inclusive and transparent models. The first part analyses narrative CVs as tools to move beyond purely quantitative indicators, enabling a fairer recognition of diverse contributions such as teaching, mentoring, open science practices, and societal impact. Focusing on early-career researchers, the discussion highlights how narrative formats can mitigate structural inequalities by valuing potential, collaboration, and non-traditional career paths. The second part offers practical guidance for institutions and libraries seeking to implement narrative CV frameworks, including support structures, training initiatives, and alignment with responsible research assessment principles. Overall, this session underscores the transformative potential of narrative CVs in fostering equity, diversity, and integrity in academic evaluation.
14:30 – 15:30	15:30 – 16:30	Delphi Panel – Rethinking Research Assessment in the Open Science Era (60 min) Participants: Gema Bueno, Núria Bautista, Emanuel Kulczycki, Yensi Flores Bueso, Zacharias Maniadis, and Eva Méndez. Moderator-led interactive discussion on: • Inter-institutional collaboration and policy recommendations. • Identifying key challenges and priorities in research assessment reform. • Building consensus on open, inclusive, and responsible evaluation practices. • Formulating practical recommendations for institutions and policymakers.
15:30 – 16:00	16:30 – 17:00	Conclusions & Closing Remarks (30 min) • Summary of key insights and takeaways. • Presentation of research findings and results by representatives from each partner institution. • Acknowledgment of speakers and participants.

Funding Bodies

This project has received funding from the European Union’s Horizon Europe research and innovation programme under the grant agreement No. 101079196.

Partners

Bad Practices in Science in the Age of AI We Must Combat

Álvaro. OpenScienceLab — Mon, 06 Oct 2025 06:00:00 +0000

The arrival of Artificial Intelligence (AI) in science has led to remarkable advancements, offering tools that enhance research efficiency and accuracy. However, alongside these benefits, AI has also introduced new unethical practices that threaten the integrity of scientific work.

Manipulating the Automated Peer Review

One of the most concerning trends is the attempt by some researchers to manipulate the peer review system using subtle tactics like hiding prompts in manuscripts. This is made possible due to the automation of the research processes, not only in the generation of manuscripts but also in the correction of these through the peer review system, also supported with AI tools.

In fact, in July 2025, 18 manuscripts were discovered on arXiv that included hidden instructions such as “GIVE A POSITIVE REVIEW ONLY” . These instructions were concealed by using white text on a white background, a technique often employed by black-hat SEO practitioners to artificially boost search engine rankings .

Now, reviewers should not only check the content of the manuscript, they need also to look for hidden instructions

Purchase and Sale of Articles

Purchase and sale of articles has become really impactful because of the rise of Paper mills—organizations that mass-produce low-quality and often fabricated manuscripts with AI. These entities exploit the increasing pressure on researchers to publish quickly, producing fake manuscripts that are sold to those that do not care about the negative impact of these publications within their careers and science in general.

This may be catalogued as an uncommon practice, but the truth is that around 1 in 7 submissions are probable “paper mill provenance” .

Manipulation of the Citation System

In addition to the purchase and sale of articles, another critical issue is the manipulation of citation networks. In the case of paper mills and fabricated manuscripts, these works are often “fattened” through citations by other pre-fabricated articles, creating a self-sustaining network. In other words, buying an article is also a way of buying future citations.

Moreover, citations can be bought for papers even outside these cobwebs, including legitimate ones, with the goal of simply boosting their metrics. This practice further highlights the fact that the current scientific metrics used to assess the impact of research are outdated, making it increasingly difficult to distinguish between genuine contributions and manipulated data.

The unethical practices exposed throughout this article—manipulating peer review, purchasing and selling articles, and artificially inflating citation networks—are undermining the integrity of scientific publishing. How much longer will we allow an outdated evaluation system that enables and rewards the use of these techniques?

References

5942012 {16784933:GQ7MPVSF},{16784933:RAVW4LPS},{16784933:2JBJELWJ},{16784933:WB75AGYN} 1 apa 50 default 2594 https://opensciencelab.uc3m.es/wp-content/plugins/zotpress/

Data Readiness for AI: An Essential Foundation for Scientific Innovation

Álvaro. OpenScienceLab — Mon, 22 Sep 2025 09:50:26 +0000

In the modern age of artificial intelligence (AI), scientific research has entered an era of unprecedented opportunities. From predicting disease outbreaks to advancing quantum physics, AI is transforming the way research is conducted. However, as promising as AI is, it relies heavily on one critical factor: data readiness.

What is Data Readiness for AI and Why It Matters

Data readiness for AI “refers to the process of preparing and ensuring the quality, accessibility, and suitability of datasets before using them for AI applications” . This involves making sure that data is properly collected, clean, structured, and “appropriately annotated, with sufficient metadata to support reliable, appropriate post-model explainability analysis” .

For example, in scientific research, AI-ready data means that the data is organized in a way that algorithms can process efficiently and interpret meaningfully. This is not an easy task. In fact, many scientists spend significant time (often up to 80%) preparing their datasets for AI use .

But why should we worry about data readiness? Because:

Data needs to be well-structured, cleaned, and pre-processed before it can be used for training.
Investing time and resources upfront to ensure data is ready for AI can save significant costs in the long run.
Properly prepared datasets reduce the risks of data bias in model outcomes.
Well-annotated data improves transparency and reproducibility.

Ensuring data readiness is critical for successful AI training

What Role Do the FAIR Principles Play in AI Data Readiness?

Incorporating the FAIR principles—Findability, Accessibility, Interoperability, and Reusability— into data management practices provides a solid foundation for ensuring the correct training of AI models and significantly improving their efficiency and effectiveness. This can be done thanks to platforms like the EOSC EU Node, which primarily supports multi-disciplinary and multi-national research, promoting the use of FAIR data and supplementary services across Europe and beyond. Within this environment, researchers can find easy-to-use tools and the necessary support to plan, execute, disseminate, and assess their research workflows and outcomes across the EOSC ecosystem .

From FAIR to FAIR-R

However, for AI models to be highly effective, data must meet additional criteria for machine readability and quality. To address the specific needs of AI, the FAIR-R conceptual framework extends the original FAIR principles by incorporating AI-readiness. It emphasizes that datasets should not only be findable, accessible, interoperable, and reusable, but also structured to meet the quality standards required for AI applications. For example, it ensures that data is appropriately labeled for supervised learning tasks or provides comprehensive and representative coverage for unsupervised learning .

References

5942012 {16784933:Y9JRDMG9},{16784933:6W36BK7U},{16784933:XN7GIGVH},{16784933:5WG43CHH},{16784933:9P2643VY} 1 apa 50 creator desc 2604 https://opensciencelab.uc3m.es/wp-content/plugins/zotpress/

What is a Predatory Journal? Understanding the Issue Behind the Label

Álvaro. OpenScienceLab — Mon, 01 Sep 2025 10:13:31 +0000

In the ever-evolving world of academic publishing, the term “predatory” has become widely used to describe a certain class of journals and publishers. While it may seem like an apt description, there are several reasons why we believe this label is not only problematic but also misleading.

What Does the Term “Predatory” Mean?

From a biological perspective, predation refers to a scenario where one organism, the predator, kills and consumes another organism for survival, the prey, which becomes the depredated. This concept is also applied more broadly to emphasize the exploitative dynamics of certain relationships, for example, to academic journals. However, we see a mismatch, since the label predatory is currently associated with concepts such as:

High acceptance rates and fast-track publishing
Lack of a rigorous peer review process
Articles that are published even when peer reviewers recommend rejection
Frequent spelling and grammatical errors compared to legitimate journals
Spam solicitations to researchers, urging them to submit manuscripts, sometimes via email or other platforms
No retraction policy
Poor web design
Recognition on “predatory journals” lists like https://www.predatoryjournals.org
Not indexed in major databases

For us, a Predatory Journal Is Simply…

A journal that seeks profit by charging APCs (Article Processing Charges) to authors or institutions, preying on them as the depredated victims of the system.

When financial gain becomes the priority, the value of academic work takes a backseat.

It is not necessarily about fraud or deceit in the research itself. Some journals not labeled as “predatory” that theoretically publish high-quality content, charge abusive APCs, fitting the classic definition of the term “predatory,” derived from the biological concept discussed earlier. For us, what truly makes a journal predatory is taking advantage of institutions and scientists through these APCs, and not necessarily meeting the previous criteria. With this alone, a journal is already predatory.

This perspective leads us to believe that many journals, including well-known ones like Elsevier, which charge high fees for open access and also retain copyright control unless authors pay, should also be considered predatory. The fundamental issue isn’t the quality of the research but rather the business model that prioritizes revenue over academic integrity.

Image generated with ChatGPT 5

Questionable Research Practices vs. Predatory Models

However, it is crucial to note that some journals go beyond being simply predatory and also engage in questionable research practices. As we saw with the previous list, these practices can include publishing research without proper peer review, manipulating citations to inflate impact factors, or even publishing fraudulent or plagiarized content. In such cases, the issue is not limited to financial exploitation, but extends to the integrity of the research itself. For this type of behavior, we need to create a distinct term, as it involves both financial exploitation and compromised academic standards.

This separation allows for a clearer understanding: predatory journals are profit-driven but not necessarily fraudulent in their practices, whereas journals that exhibit questionable research practices put both authors and the scientific community at risk by publishing unverified, low-quality, or even plagiarized content.