Science Integrity Digest

The Camel’s Camel

eliesbik — Fri, 13 Mar 2026 03:37:46 +0000

On some days, after hours of image scanning, I feel like doing something different. Today, I searched for some scientific papers containing “tortured phrases”. This term was first coined by Guillaume Cabanac et al. in a 2021 preprint.

Tortured phrases are bizarrely synonymized versions of standard scientific terms produced when authors run copied text through paraphrasing or translation software to disguise plagiarism. This can lead to funny sounding word combinations, such as “bosom malignancy” instead of “breast cancer“.

Today, I found a beautiful example of synonymized plagiarism involving the microbiome of a camel’s udder.

A camel’s camel – image drawn by ChatGPT

The paper in question is called Molecular and Microbial Studies Survey for Camel Udder Microbiota, published in Advances in Environmental Biology (2019), DOI: 10.22587/aeb.2019.13.2.2 [PubPeer].

This appears to be a very l0w-quality journal, published by The American-Eurasian Network for Scientific Information (AENSI), which states that “AENSI Journal exists to publish results of research in the all area of science and Technology.” The AENSI publisher has been listed among potential predatory publishers, e.g. in Beall’s List.

But I digress. You are here for the camel’s camel. It’s right there in the abstract, which reads:

The definition and definition of microbes on the camel’s camel is very important so as not to induce contamination of raw milk, which is important for producers and consumers of milk for several reasons, the most important of which is the preservation of human health from diseases resulting from contaminated milk or diseases that affect the breast of the camel or camels Both the Food and Drug Administration (FDA) and the Food and Drug Administration.

Traditional detection methods are known to be many steps, time-consuming, or costly. On the other hand, molecular detection methods are faster and more sensitive, such as gene sequencing of the gene for the 16S gene.

I felt out of breath after reading that first sentence! But yes, let’s learn more about the definition and definition of microbes on the camel’s camel! Such an exciting topic.

Screenshot of the abstract – showing the same text as the quote above.

Tortured phrases

The paper systematically replaces common microbiology terms with bizarre synonyms and paraphrases. Here are some nice examples:

Paper wording	Normal wording
creature assortment	genus
beast swarm	animal hosts
cubicle	bacterial cell
biological specialties	ecological niches
destructiveness parts	virulence factors
saw sorts	recognized species
minuscule life forms	bacteria
mentality heart imbuement	Brain Heart Infusion medium

Some other sentences in this paper that your brain might have some trouble with processing:

“Current genomic approaches are nisus to discover the premise of various bacterial phenotypes by predicting the result of flexible changes occurring in the midst of the improvement of mortal masses.”
“Because of this physical cell increment, reducing its monetary esteem.”
“Anticapsular antibodies were appeared to give defensive invulnerability in a creature demonstrate, so particular intrigue has been set on the investigation of the container and its potential as an objective for immunization against GBS”
“By differentiate, a dN/dS < 1 proposes the nearness of sanitizing determination and the protection of protein work.”

Plagiarized from a PhD thesis

Usually, such tortured phrases are signs of copied-and-pasted text from another source, with the text synonymized to avoid detection by a plagiarism checker. Today, one can simply use a generative AI model to rewrite such text, of course, but this paper is from 2019, when such tools were not yet commonplace.

It’s not always easy to translate sentences with those synonymized terms back into regular English and find the original text. But it did not take me too long to find the most likely source for the camel udder paper: A 2017 PhD thesis by Alexandre Almeida at the Université Pierre et Marie Curie in Paris.

Here are some side-by-side screenshots. On the left, the 2017 PhD thesis (the original text), and on the right, the 2019 camel udder paper.

Left: 2017 PhD thesis. Right: The 2019 camel udder microbiome paper.

A stolen figure with non-udders

Figure 1 from the paper appears to be a screenshot (with caption and all) from Omer et al., Biomedical Research (2017), no DOI, available here: https://www.alliedacademies.org/articles/prevalence-and-molecular-detection-of-sarcocystis-spp-infection-in-the-dromedary-camel-camelus-dromedarius-in-riyadh-city-saudi-ar.html

No udder is to be seen here, although we can appreciate the completely irrelevant photos of an infected camel’s heart and tongue. At least it is better than a camel toe (NSFW)!

But wait, the authors wrote another camel udder microbiome paper!

Other than the introduction, the whole camel udder microbiome paper is not about the camel udder microbiome – how dumb of me to expect that – but about virulence factors and genetics of Streptococcus agalactiae (Group B Streptococcus or GBS).

That is funny, because if you search for the first author’s name, Google’s AI claims that “His research, such as the 2019 study published in Advances in Environmental Biology, contributes to understanding bacterial communities in camels.”

But perhaps that is because three of the four authors did indeed publish another paper about the camel udder microbiota, called Molecular Identification for Camel Udder Microbiota, also published in Advances in Environmental Biology, one month later, in March 2019.

This second paper also has some funny-sounding phrases, that might be the result of synonymized text:

“Camel drain is considered a standout amongst the essential nourishment sources…”
“Camel udder microbiota is critical because It can survive and produce meat and milk.”
“Mastitis stays to be the most critical monetary ailment impediment amid dairy cows around the world.”
“In any case, camels are kept in framework and asset poor peripheral territories.”

Editorial fail

How a paper containing sentences like these passed peer review is hard to understand. The journal Advances in Environmental Biology is published by AENSI, a publisher that has appeared on lists of potential predatory publishers and operates a cluster of similarly named journals. When editorial oversight is weak and peer review is non-existent, sentences about “definition and definition of microbes on the camel’s camel” can end up in the scientific literature. And once published, such papers can be indexed, cited, and even summarized by AI tools as legitimate scientific contributions.

In the end, the most memorable scientific contribution of this paper may not be its findings on the camel microbiota, but the unforgettable discovery that microbes live on “the camel’s camel.”

UnEDXpected Peaks

eliesbik — Tue, 10 Feb 2026 07:46:36 +0000

Over the past couple of days, I have been reviewing a series of Materials Science papers, all co-authored by the same group of researchers from the Universities of Lahore, Chakwal, and Sargodha in Pakistan. While reviewing them, one analytical technique kept standing out for unusual reasons.

Materials Science

Materials science studies the composition and structure of materials and how these determine their properties. It uses a range of different techniques, such as Scanning Electron Microscopy (SEM) or Transmission Electron Microscopy (TEM) to look at the compounds, X-ray Diffraction (XRD) to identify crystal structures, and calorimetry to study phase transitions. Many of these techniques sit at the intersection of physics and chemistry.

Coming from a molecular biology/microbiology background, I am not very familiar with these techniques or with what the expected results should look like. I am much more used to looking at photos of blots, gels, or tissues, so in my past searches for science integrity concerns, I have not really focused on materials science papers.

That changed thanks to the efforts of Reese Richardson and several other science sleuths, who created the Collection of Open Science Integrity Guides (COSIG). The collection currently contains dozens of guides explaining how to examine scientific papers for problems in general, but also on how to find problems in specific methods or fields, including those in materials science or statistics. As one of their slogans says: “Anyone can do forensic metascience“. These guides have helped me identify problems across a much broader range of analytical methods.

One technique covered in the COSIG guides is one that often appears in materials science papers: Energy-dispersive X-ray spectroscopy. And, as we will see below, some papers will claim to have produced some unbelievable results.

Energy-dispersive X-ray spectroscopy

Energy-dispersive X-ray spectroscopy (EDX or EDS) is a technique used to determine which elements are present in a material. When a sample is struck by an electron beam, it emits X-rays with energies characteristic of specific elements, enabling researchers to identify and quantify the material’s composition. The resulting spectrum shows peaks at energies corresponding to those elements.

Each element emits X-rays at specific, known energies, so the peaks that appear in an EDX spectrum occur at predictable positions. For example, carbon (C) will have a peak at 0.28 keV, nitrogen (N) at 0.39 keV, and oxygen (O) at 0.52 keV. Some elements might have multiple peaks, such as iron (Fe), with peaks around 0.7, 6.4, and 7.1 keV.

Below is an EDX spectrum of the mineral crust from a shrimp, taken from Wikimedia. We see the expected C peak around 0.3 keV, the O peak at 0.5 keV, and the Fe peaks at the expected positions. The Ka and Kb values behind the element labels indicate the electron shell (K, L, or M). The a/alpha denote the drop of an electron from the adjacent shell (e.g., L to K), while b/beta is an electron dropping two shells (e.g., M to K).

Elemental Energy dispersive X-Ray microanalyses of the mineral crust of Rimicaris exoculata. Source: Wikimedia. Taken from: Corbari L et al., Biogeosciences (2008), DOI: 10.5194/bg-5-1295-2008

Common concerns with EDX plots

Several issues in EDX papers in scientific papers might indicate data alteration or fabrication.

As explained in the COSIG guide on EDX, the easiest problem to spot is the presence of peaks at unexpected positions. There is a useful look-up table from the Lawrence Berkeley National Laboratory. Even just remembering that the O peak should be around 0.5 keV and that C, N, and O, should appear in that order, is enough to find many problems. Also, the elements hydrogen (H) and helium (He) do not produce peaks in an EDX spectrum, so if you see those peaks in a published paper, that is another sign that the data might be made up.

A problematic set of papers from Pakistani researchers

Following up on a lead provided by another sleuth, I found more than 30 problematic papers [spreadsheet] by collaborating materials scientists from the Universities of Lahore, Chakwal, and Sargodha, published from 2016 to 2025. The papers have one author in common, Asif Mahmood, who held several Assistant and Associate Professor positions at the Universities of Lahore, Chakwal, and Rasul.

Professor Mahmood is currently listed at a Chairperson and Associate Professor at the Department of Pharmacy at Rasul University. While his Google Scholar profile still works, his ORCID account [pdf] appears to have been deactivated, and his ResearchGate page leads to a dead link.

His frequent co-authors are Rai Muhammad Sarfraz from the University of Sargodha, Hira Ijaz at Riphah International University, Nadiah Zafar at the University of Lahore, and Umaira Rehman at the University of Sargodha.

At least four papers by Asif Mahmood have been retracted [PubPeer posts here, here, here, and here], mostly for overlapping images. One of these retractions was covered by Retraction Watch.

Most of Mahmood et al.’s papers follow the same structure: synthesizing hydrogel nanocomposites for the oral delivery of certain drugs, then testing their physical and chemical features. The concerns in these papers vary from SEM photos found in multiple papers representing different studies, repetitive noise patterns in XRD plots, bar plots with identical error bars, to unrealistic EDX plots.

UnEDXpected EDX plots

Several EDX plots in these papers contain peaks at unexpected positions.

Here is a figure from Zafar et al., Saudi Pharmaceutical Journal (2023), where the O peak deviates from the expected 0.52 keV position, or even shows two peaks!

Source: Zafar et al., Saudi Pharmaceutical Journal (2023), DOI: 10.1016/j.jsps.2023.06.004 [PubPeer]

In Mahmood et al., J. Drug Delivery Sci Techn (2016), the peaks do not follow the expected C-N-O order. While the O peak is at the expected 0.52 keV, the C peak jumps around from the correct position at 0.28 keV to an impossible 1.8 keV in the bottom plot. Hydrogen (H) does not produce an X-ray peak, so the H peaks indicated at 2.6 keV in panel A or at 0 keV in panels B or C are quite unexpected.

Source: Mahmood et al., J. Drug Del Sci Techn (2016), DOI: 10.1016/j.jddst.2016.09.005, [PubPeer]

This Hussain et al. paper published in Int J Biol Macromolecules (2022), has similar problems in its EDX plots, as shown below. The right plot shows C, N, and O peaks, but in the wrong order, the O peaks bounce around from 0.5 to 0.9 keV, and the peaks have a strange blocky appearance.

Source: Hussain et al., Int J Biol Macromol (2022), DOI: 10.1016/j.ijbiomac.2022.01.064, [PubPeer]

Six papers by Mahmood et al. describing various hydrogel materials for drug release unexpectedly contain very similar, wonky EDX plots. Note the incorrect order of C-N-O, overhanging peaks, and the same wavy lines on the right of the left and middle plots.

EDX plots from six different papers display unexpected similar features. DOI: Batool et al. (2022) DOI: 10.3390/gels8030190 [PubPeer] – Batool et al. (2023), RETRACTED, DOI: 10.3390/gels9010060 [PubPeer] – Ayesha Mahmood et al., Polymer Bulletin (2023), DOI: 10.1007/s00289-022-04401-0 [PubPeer] – Saba Arshad et al., Polymer Bulletin (2024), DOI: 10.1007/s00289-024-05167-3 [PubPeer] – Kanza Shafiq et al., Pharmaceuticals 2022, DOI: 10.3390/ph15121527 [PubPeer] – Fatima Noor et al., Journal of Drug Delivery Science and Technology (2023), DOI: 10.1016/j.jddst.2023.104924 [PubPeer]

The EDX plots in this paper by Malatani et al., Gels (2023), DOI: 10.3390/gels9030187 have another disorderly set of C-N-O peaks, the O is found at the incorrect values, and peaks have strange serrated ‘shoulders’.

Source: Malatani et al., Gels (2023), DOI: 10.3390/gels9030187 [PubPeer]

Even stranger EDX plots in this set were from Shabir et al., Polymer Bulletin (2025), DOI: 10.1007/s00289-025-05917-x [PubPeer]. The peaks in these EDX spectra look unexpectedly wobbly and serrated, with some peaks appearing to fall to the left. Some elemental peaks are at unexpected positions. For example, several peaks appear to have <0 (negative) values, which is unexpected. For example, the element Ca has expected peaks at 0.34, 3.7, and 4 keV, but these plots show peaks at >6 keV. Perhaps related to the incorrect peak positions is the unclear labeling of the X-axis, where the numbers 2, 4, and 6 do not always appear to correspond to a tick.

Souce: Shabir et al., Polymer Bulletin (2025), DOI: 10.1007/s00289-025-05917-x [PubPeer]

Perhaps the wonkiest EDX plot was found in Farya Shabir et al., Pharmaceutics (2023), DOI: 10.3390/pharmaceutics15010062. Although the peaks appear to be at the correct positions, they look like they were hand drawn after a couple too many Gin and Tonics.

Source: Farya Shabir et al., Pharmaceutics (2023), DOI: 10.3390/pharmaceutics15010062. [PubPeer]

Peer Review Congress Chicago – Day 3

eliesbik — Wed, 17 Sep 2025 23:43:25 +0000

Good morning from Chicago! We will start Day 3 (last day) of the 10th International Congress on Peer Review and Scientific Publication. @peerreviewcongress.bsky.social / peerreviewcongress.org/peer-review-… #PRC10 <— This hashtag will give you all the posts! You can also click on this feed, created by @retropz.bsky.social: bsky.app/profile/did:…

[Day 1] and [Day 2].

Opening

It’s 8 am and we will kick off the day with the opening lecture by Zak Kohane, ‘A Singular Disruption of Scientific Publishing—AI Proliferation and Blurred Responsibilities of Authors, Reviewers, and Editors‘ – The current model of peer review is not dead, but on the operating table.

ZK: Massive increase of scientific publications. No one can read all of these. On top of that, more papers are retracted. Surgisphere retractions of two papers showing amazing results – but data was made up. (see: www.the-scientist.com/the-surgisph…)

ZK: The missing author is AI. Is AI use acceptable in writing scientific papers? Yes. It helps non-English speakers write better. And science papers are not literary competitions. AI can help us make our message clear. It can help us make better decisions in medicine.

ZK: It is easy to use AI to manipulate or make up data. But often, this is done when humans ask it to. There are many other applications where AI is helpful or better. We need to certify data through an analysis chain of provenance with public / cryptographically certified toolkits.

ZK: AI can help humans to peer review a paper. At NEJM AI, we just reviewed a paper using human editors, two AI models (GPT-5/Gemini) and discussion with human editors. In my opinion, the AI reviews were very high quality, equally good as humans, noticing issues human had missed.

Discussion: * Worry about humanizing AI, eg. to include it as an author. * AI gets things wrong and can be manipulated – we still need thoughtful humans to review. * Some AI models will take what you input in it – We ask authors if they are willing to have their manuscript reviewed by AI.

Use of AI to Assess Quality and Reporting

Our next session is: ‘Use of AI to Assess Quality and Reporting’, in which we will have four talks. First, ‘Natural Language Processing to Assess the Role of Preprints in COVID-19 Policy Guidance‘ by Nicholas Evans. Preprints formed the basis of policy decisions during the COVID-19 pandemic.

NE: What were the consequences of using preprints during this time to policies in: 1. DHS master question list 2. NIOSH Q&A about COVID-19 in the workplace. We compared preprint with postprint abstracts, looking for changes in text, such as simplifications, additions, using NLP and coding.

We found several changes relevant to the two policy documents, and checked if the policies were updated. Example: transmission from school children to family members at home in preprint (no association) was reanalyzed in paper to a significant association.

Preprints get it wrong – policy document authors should be aware of updates. Policies have not been good in updating their guidance once peer-reviewed papers have reached different conclusions. What gets caught in peer review matters. Discussion: US policy changes not might have been useful.

Next: Leveraging Large Language Models for Assessing the Adherence of Randomized Controlled Trial Publications to Reporting Guidelines by Lan Jiang. Previous paper: www.nature.com/articles/s41…: Annotated dataset of 100 RCT protocol comparing protocol to publication.

LJ: Use generative models to assess whether a RCT manuscript includes recommended detailed information. 119 questions to scan for SPIRIT/CONSORT elements. For 24% of questions, >80% of articles did not report relevant item. We compared human- vs model-annotated text. Model did well.

LJ: * The questions help pinpoint missing RCT characteristics in greater detail * LLMs hold promise for evaluating RCT adherence to reporting standards. * Future: explore how to efficiently use responses to these questions in peer review workflows. * Do editors/peer reviewers want this?

Discussion: * We do not rely on the order of the text in the manuscript * Are you going to look into e.g. STROBE for observational studies? Yes * Yes, we would love to incorporate this in our editorial process * Were there any items that were rarely reported over the whole set? Yes

Next: Fangwen Zhou: ‘Understanding How a Language Model Assesses the Quality of Randomized Controlled Trials: Applying SHapley Additive exPlanations to Encoder Transformer Classification Models‘ PLUS: Premium Literature Service helps with clinical literature overload by selecting articles.

FZ: Many literature extractors are a black box, we do not know how it makes decisions. We trained..?? 18K articles, 49K unique words. Some words have negative impact such as ‘non-randomized’, or positive impact such as ‘noninferiority’. (this talk is over my head, lots of technical lingo)

* SHAP matches manual criteria (EB: of what?) * Highlights can aid manual appraisal * Flags potential overfitting [I have no idea what this talk was about – what was the goal? What was developed? It was full of technical lingo]

Discussion: [the people who are asking questions appear to have understood the talk, so I guess I am just dumb, hahaha] * Can we imagine fraudsters using this? * Something about hierarchical clustering * Would this be better than reading the studies yourself?

Next: Using GPT to Identify Changes in Clinical Trial Outcomes Registered on ClinicalTrials.gov by Xiangji Ying * Comparing changes between different versions of registered outcomes in clinical trials: * 225 trials with 3,424 outcomes using AI * Outcomes may be added, changed, or removed.

XY: We downloaded prospective and last registrations from Clinical Trials gov – Defined elements, matched outcomes, and detected changes. LLM-based approached achieved high accuracy in defining and identifying changes in outcomes. GPT feedback supports decision review and interpretation.

XY: Our approach performed with high accuracy. It detected changes across versions. Provides transparent, rationale-backed assessments. It is a scalable, low-cost tool that can assist trialists and registries to improve outcome registration quality. Helps flag potential reporting biases

Discussion: * The model gives bullet-pointed train of thought. * How much should the LLM do, or should humans do better in structuring the metadata in their clinical trials? * It is highly accurate, but where is the LLM making mistakes?

On to the coffee break! The conference is playing nice 80s/90s songs, by the way. Even the transcription screen is happy! I will be dancing to the coffee pot.

Open Science, Availability of Protocols, and Registration

After the coffee break, we continue with the session ‘Open Science, Availability of Protocols, and Registration’, with five talks. First talk: Lukas Hughes-Noehrer with ‘Perceived Risks and Barriers to Open Research Practices in UK Higher Education‘

LHN: Open Research is evidently recognized as integral to science. But uptake has been challenging. We surveyed opinions and practices in open and transparent research at 15 HEIs in the UK (2023). We used NVivo14 to organize. Some responses were hilarious, others nasty.

We received 2,567 submissions – mostly in medicine, biology, engineering, mostly from stage II (research associate) and phase 1 (junior) researchers. Main risk and barriers: ethical, fear of misattributions or theft of ideas. Institutional barriers: no infrastructure/training.

LHN: Many respondents disliked ‘go figure it out yourself’ mentality, so they did not provide access to the data. Do not just sign up for ‘membership’ and then ignore open science practices. We need practical solutions, more training, and rewards! Link: www.nature.com/articles/s41…

Discussion: * Tickbox exercise – should we not check at the start of a study, where we review the methods, as opposed to an afterthought once study has been done? * Is it a perceived lack of support? or is it real? Speaker handle: @lhughesnoehrer.bsky.social

Next: ‘Use of an Open Science Checklist and Reproducibility of Findings: A Randomized Controlled Trial‘ by Ayu Putu Madri Dewi. This project is part of the osiris4r.eu project. BMJ is one of the first journals to require authors to share analytic codes from all studies.

APMD: Open science checklist covers 13 open science items, including if code is open and available, if paper is open access, and if preprint is reported. Primary outcome is reproducibility (EB: not sure between what). Secondary: availability of data/code. journals.plos.org/plosbiology/…

APMD: we randomized 402 papers, and check for reproducibility Conclusion: adding checklists is doable within editorial workflows. (EB: I am not sure which manuscripts were checked and what they were checked for, and what intervention was – does reproducibility mean if data was open???)

Discussion: * Did authors try to improve their manuscript after they received the manuscript? – we did not check for this * Did you count ‘data is available upon request’ as open data? No.

Next: ‘Nonregistration, Discontinuation, and Nonpublication of Randomized Trials in Switzerland, the UK, Germany, and Canada: An Updated Meta-Research Study‘ What proportion of RCTs are registered, discontinued, published? [EB: not sure who the speaker is – multiple names are listed]

Methods for DISCO II and III (no idea what that means) Comparing 2012 and 2016. Industry-sponsored RCTs had 98% registered – doing really well. Non-industry sponsored did good, 90%. Completion status: Here we found lots of dicontinuation, usually because of poor recruitment.

Availability of study results: Industry does much better there. * 90% of trials were registered * 30% of trials were discontinued * 20% of trials did not share results * 20% of trials without result publication were not registered at all Just published: jamanetwork.com/journals/jam…

Discussion: * Did you look into reasons why studies were not published? No, but would be interesting – perhaps results were negative * Why is there more adherence to registration in industry? For academic trials, there is not the same pressure.

Next: ‘Factors Associated With Improper Clinical Trial Registration, Registration Deficiencies, and Publication Status of Submissions to The BMJ‘ by David Blanco ICMJE mandated in 2005 that all clinical trials should be preregistered. The impact has been significant.

DB: However, many trials are improperly registered, because, e.g., authors might not believe their study is a clinical trial. We looked at 239 improperly and 239 properly registered trials. We extracted variables on study and author characteristics. How do they differ?

DB: Trials with 10 authors (vs. 1 author) / with higher number of participants (vs. lower) / with CONSORT mention / with declared funding had lower odds of improper preregistration. Authors from Asia (vs Europe) had higher odds. Retrospective registration is common.

DB: Journals should require manuscripts to report several features, such as enrollment dates. Editors should verify registration status. Journals endorsing ICMJE policy should reject improperly registered trials. Authors should be aware that registration is not the same as ethics approval!

Discussion: * How many trials had one author? That seems low. – We had a couple * Are the same authors failing to register their trials repeatedly? We do not have that data, but good suggestion. * Do you see patterns in affiliation, gender? We did not study this.

Next: Eunhye Lee with ‘Registered Clinical Trial Trends in East Asia and the United States, 2014 to 2025‘ Asia has become a major hub in clinical trials. We extracted data from ICTRP (clinicalTrials + 19 others) and classifed if they were RCTs, country, etc. China is on the rise.

EL: Chinese trials are most commonly registered on their ‘local’ registry, ChiCTR. Japanese trials mostly on local registry JPRN. Neoplastic, cardiovascular, metabolic diseases dominate the RCT registrations in all five countries. China surpassed US / Japan in total clinical trials and RCTs.

EL: * China and Japan predominantly rely on their own local registries * Problems such as missing data, inconsistent formatting, and data entry errors were prevalent. * Many trials still remain unregistered * Standardization and integration of trial registries are essential

EL: * Traditional Chinese Medicine now has its own trial registry * Are there incentives in China for registration of trials? Not sure but in South Korea it is required for publication and much pressure to publish.

Open Science and Data Sharing

After the lunch break, the next session will be ‘Open Science and Data Sharing’ The first of three talks will be by Vincent Yuan, with ‘Researcher Adherence to Journal Data Sharing Policies: A Meta-Research Study‘ VY: Science runs on trust – but we need evidence.

VY: Sharing data is beneficial, and journals are introducing policies that either recommend or require data sharing. Study on top-5-in their field journals with a data sharing policy that published original research. We looked at papers with COVID-19 relevance for data sharing statement.

VY: 134 journals, 27 required data sharing (rest recommended). We included 1868 interventional and 10k observational studies. About half of them actually had statement intending to share their data, mostly in journals that required data sharing.

VY: Aggregate data (instead of individual results) was perceived as sufficient – journals need to be more precise in their statements. Researchers’ intentions to share data rarely align with best practice. Many say ‘upon request’, but requests are frequently ignored. Trust needs evidence.

Discussion: * Did you check if the data was actually available? No * How do you share data if it involves patient data? Legal restrictions to protect personal information – Data sharing statements need to be reasonable * Do journals mention FAIR principles? www.go-fair.org/fair-princip…

Next: ‘A Funder-Led Intervention to Increase the Sharing of Data, Code, Protocols, and Key Laboratory Materials‘ by Robert Thibault – The Aligning Science Across Parkinson (ASAP) research initiative has 98% preprint depositions and lots of resources in our database.

RT: ASAP’s Open Science Policy, parkinsonsroadmap.org/open-science…, includes immediate open access etc. Our Compliance Workflow includes revising manuscripts to match open science policy. We require Key Resources Tables describing key chemicals and antibodies in detail.

RT: Results: 102 publications in last year. Most manuscripts do good job in sharing data, but large increase in sharing from manuscript stage to publication. Research record should not just include vetted claim, but stands upon peer review, analysis, raw data, procedures, and lab materials.

Discussion: * Do you check if requirements are actually there? We check if links help, but quality is hard to check. We recommend a ReadMe file. * Does published article link to the preprint? Not always. * Did you meet resistance? Yes, not everyone is used to sharing code or images.

Discussion: * Every funder should do this! Much better to require this work from the start, not have it checked for by the journal at the end of the process. * What do you do if someone does not comply? We have not had that happen so far.

Next: Kyobin Hwang with ‘Medical Journal Policies on Requirements for Clinical Trial Registration, Reporting Guidelines, and Data Sharing: A Systematic Review‘ – Transparent reporting in a clinical trial is so important. Trial registration, adherence, and datasharing helps with this.

KH: We extracted clinical trials from 380 journals, extracting journal name, number of trials, language, open access model, trial registration, reporting guidelines, data sharing plan etc. We looked for must/need vs. encouraged/preferred. Link to journals: drive.google.com/file/d/1Yhh4…

KH: Most journals were specialty journals, most were mixed open access. Most journals required trial registration, 66% required reporting guidelines, but the other policies were less stringent, where policies were just recommended. OA and high-impact-factor journals were more stringent.

KH: We found substantial variation across journals, few policies on trial protocol and data sharing. We need to develop and implement policies for trial registration, reporting guidelines and data sharing. This needs to be enforced during peer review. This will improve transparency.

Discussion: * Surprising that not 100% of journals required pre-registration. * Journals are not always enforcing their own policies. * Sometimes these policies are long, not everyone reads them. How can we do better at breaking down these or other barriers?

We will now have a break, followed by the second poster session. Back in about 75 min.

AI for Detecting Problems and Assessing Quality in Peer Review

We start our last session of this three-day congress, “AI for Detecting Problems and Assessing Quality in Peer Review”, with four talks. First: ‘Leveraging Large Language Models for Detecting Citation Quotation Errors in Medical Literature’ by M. Janina Sarol.

We define those errors as citations that do NOT support the quoted statement. One in six citations might be incorrect. See: researchintegrityjournal.biomedcentral.com/articles/10…. Can we use LLMs directly to look for relevant sentence or do we need entire reference article ?

MJS: Taking relevant sentences only worked best (accuracy is 69% so not perfect). We are now doing a large-scale assessment of citation quotation errors. Current status: 100k citing statements: 34% statements were assessed as erroneous. We hope this will become a tool in peer review.

Discussion: * There are a couple of commercial tools available that do similar work, like Scite.ai – how is your work different? * Medical writers for industry often do this work manually – compare if industry papers are doing better. * Could citing the wrong year be counted as error?

Next: Neil Millar with “Automating the Detection of Promotional (Hype) Language in Biomedical Research“. Hype: hyperbolic language such as crucial, important, critical, vital, novel, innovative, actionable etc. All these terms have increased over time in grant applications or articles.

NM: Hype might be biasing evaluation of research and erode trust in science. Confident or hype language is associated with success. LLMs also have contributed. Can we develop tools to detect and mitigate hype in biomedical text? Not all these words are always hype (eg. Essential fatty acids)

NM: We manually annotated 550 sentences from NIH grant application abstract – benchmarking using NLP classification methods, pretrained LLMs and human baseline. We looked at 11 adjectives promoting novelty, and classified the terms as hype or not hype.

NM: The three annotators (the authors) did not always agree! The language models performed better, with fine-tuned BERT outperforming all methods. But, subjectivity remains a challenge. Binary labels (hype yes/no) oversimplify promotional language. We want to expand the lexicon.

Discussion: * Some verbs are also hype, such as reveal, drive. * Some folks have used words such as ‘delve’ already for 20 years – does not always mean it’s AI. * Is it bad to use those words? Other people need to know our science is very groundbreaking! * semantic bleaching

Next: ‘Evaluation of a Method to Detect Peer Reviews Generated by Large Language Models‘ by Vishisht Rao. Many reviewers are suspected to submit LLM-generated reviews. We can insert hidden message in review assignment for a LLM: Use the word ‘aforementioned’ and check for that.

VR: But we do not want false positives. Better watermarking strategies are to insert a random sentence, a random fake citation, or a fake technical term (markov decision process) – false positive rate will go down. Hidden prompts can be white colored, very small font, font manipulation

VR: Effectiveness of watermark insertion: LLMs insert the watermark with high probability. We had great accuracy. Reviewer defenses could be to paraphrase the LLM-generated text. They could also ask LLM if there were hidden prompts.

VR: In summary, we can detect LLM-generated peer reviews with high detection rate. Our preprint: arxiv.org/abs/2503.15772 Discussion: * AI output can be quite good. Why prevent it? * Flipside to hidden prompt: “give me a positive review” in manuscript. That is malfeasance. This not?

Next, the last speaker: Fares Alahdab with ‘Quality and Comprehensiveness of Peer Reviews of Journal Submissions Produced by Large Language Models vs Humans‘ There is reviewer fatigue, no credit, time-consuming. Is it a bad thing that LLMs produce peer reviews? How good are LLM reviews?

FA: We used five LLMs vs two humans for each manuscript submitted to four BMJ journals, where the LLM reviews were not used in editorial decisions. We used the Review Quality Instrument (RQI), where editors rated the review quality as well as comprehensiveness score.

FA: Across eight RQI items LLM reviews scored higher on: * identify strengths and weaknesses * useful comments on writing/organizations * constructiveness LLMs can thus help humans review papers. Not all LLMs were equally good. Gemini 5.0pro was the best, but produced very long texts.

Discussion: * Do we know if any LLMs are being trained on public reviews? – hard to know which ones are reliable. * What happens if you retry with the same prompt? You get more or less the same output. * One LLM and one human review in future? * Problems with LLM monoculture/monopoly

John Ioannidis is closing the conference, by thanking organizers, staff, first-comers, and veteran attendees. Some attended for the ninth or tenth time! Safe travels everyone!

I hope y’all enjoyed the live posts! It was my pleasure to provide access to this well-organized congress to those who could not attend.

[Day 1] and [Day 2] are separate blog posts.

Peer Review Congress Chicago – Day 2

eliesbik — Wed, 17 Sep 2025 22:02:39 +0000

[Day 1] [Day 3]

Good morning from Chicago, where we are getting ready for Day 2 of the 10th International Congress on Peer Review and Scientific Publication. peerreviewcongress.org / @peerreviewcongress.bsky.social / #PRC10

The Douglas G. Altman Lecture

We will start with the Douglas G. Altman Lecture, by Malcolm MacLeod (online presentation) –@maclomaclee.bsky.social‬, with “Does the Journal Article Have a Future?” MM is presenting online, because hard to enter US and out of protest to a hostile environment to scientists.

MM: Scientific publishing: The denuding of public assets to augment private or corporate wealth. Some scientific publishers make lots of profits. They are supposed to add value, but do they? [I have trouble hearing what the speaker is saying].

MM: It can take a long time between submission and publication; some of what gets published is biased (positive results) or fabricated (paper mills). Metaresearch papers often describe problems but do not offer solutions.

MM: Actions: * Require all research to have institutional sponsor, guarantee work was done as described * To be supported by public protocol * Conduct in-house statistics / methodological review * Facilitate, not obstruct, evidence synthesis efforts (and he will run the Berlin Marathon!)

Discussion: * More likely to review to/for non-profit publishers * With the current political situation in the US, where NIH researchers will now have to pay more to make their research open, is the time for a journal article over?

Bias, Study Outcomes, and Reporting Concerns

After today’s opening lecture, we will continue with the next session: “Bias, Study Outcomes, and Reporting Concerns“, moderated by Steven Goodman @stevengoodman.bsky.social‬ – who is proud to say the Stanford Metrics institute is not making a lot of money

First: Jae Il Shin with ‘Bias, Study Outcomes, and Reporting Concerns Immortal Time Bias Prevalence and Effects on Estimates in Systematic Reviews and Meta-Analyses‘ Immortal time bias = ? (I cannot follow this talk because the introduction slide was shown too briefly).

Sorry, folks, if I can not grasp the topic, I cannot summarize this talk in posts here. It seems this paper might be helpful to introduce the topic: jamanetwork.com/journals/jam… “cognizant of ITB”? Anyway, here is the conclusion slide of the speaker, for those who are interested.

Some of the questions from the audience are so long….. more like stories, and when they have finally got to the point where they reach the question, they interrupt themselves with more comments.

Next: Yiwen Jiang (online) with ‘Effect Estimates for the Same Outcomes Designated as Primary vs Secondary in Randomized Clinical Trials: A Meta-Research Study‘ [I do not even understand the title – how can the same outcome be both primary and secondary?? – this session is clearly over my head]

This talk again is full of terms that I do not understand enough to put it into a post here. But here is the last slide of Dr. Jiang’s talk:

I skipped some of the morning talks to see this: Chicago as seen from the Chicago River, on a wonderful boat tour by the Chicago Architecture Center. Shout out to volunteer Bill, who did a great job making us look at all these tall buildings and their different styles, from a different perspective.

While we were doing the boat tour, we missed several talks, including a very spectacular, Star Wars-themed talk by Nihar Shah.

See this BlueSky post by Allegra Torres:

Nihar Shah absolutely winning at presentations with a musical fanfare intro and a costume change to discuss anonymizing reviewers to each other in peer review discussions (I have some bias as a Star Wars fan!) #PRC10

and this one by Jay Patel:

But did you get a photo?

Editorial and Publishing Processes and Models

Back from lunch, we continue #PRC10 with ‘Editorial and Publishing Processes and Models‘, starting with a talk by Christos Kotanidis, with ‘Changes to Research Article Abstracts Between Submission and Publication‘ – CK: we looked at all original research articles submitted to @nejm.org in 2022.

CK: we matched them with papers not submitted to NEJM. We scored changes in RCTs based on Trial design, Primary outcome, adverse events, and conclusions, resulting in a TPAC score (+4 better published version; -4 better submission). High JIF journals had lower scores (0.6) than others (0.75)

CK: Most often, the abstract conclusions were changed. While on average, papers improved, that number was lowest among lower-impact journals. Read more here: www.acpjournals.org/doi/10.7326/…

Discussion: * Perhaps high impact factor journals have more resources to make editorial changes. * Your paper is behind a paywall!

Next: Aaron Clauset with ‘Manuscript Characteristics Associated With Editorial Review and Peer Review Outcomes at Science and Science Advances‘ – Elite journals have profound influence in science discourse/careers – a lot of submissions get desk-rejected.

Anonymous data set 110k submissions covering 2015-2020 –> What author / manuscript characteristics correlate with editorial / review success? Editorial review is a much stronger filter than peer review. Science sends only 17% of ms to review, and only 6% are accepted.

High institutional prestige, geography, topic, and large team size are strongly correlated with success. Small associations for author gender. Editors have stronger correlations than authors. Science (professional editors) and Science Advances (academic editors) have similar profiles.

Discussion: * Can you measure quality? Peer review might be the best measure. * Can authors appeal decisions? Yes, but that data is messy and we did not use it for this dataset. * Would blinding review for authors/institutions make a difference?

Next up: Nicola Adamson (who talks very fast) with ‘Investigating Changes in Common Vocabulary Terms in eLife Assessments Across Versions in a Publish, Review, Curate Model‘ – In Oct 2022 eLIfe announced a new model “Publish, Review, Curate”, papers sent out to peer review require a preprint.

NA: Manuscripts are assessed for significance and strength. using qualifying terms (landmark – useful). We looked at distributions of terms in first/final versions. Weaker papers received higher scores after revisions. (Even the transcription screen cannot keep up with this speaker).

NA: Conclusions: * Authors revise to improve work in most cases, particularly where it was first rated as incomplete or inadequate. * Most versions of record are declared after single round of revision. * Significance of findings terms change less often than strength of evidence terms.

Discussion: * Do papers still improve after the first round of revision? * The weakest score term was ‘useful’ – were there no papers that were not useful? – not useful papers have already been filtered out by an editor * Can you correlate metrics with scores of e.g. citations? Not yet.

There will now be a coffee break and a poster session of about an hour.

Coffee break and poster viewing

There is just one coffee pot (the other two are decaf and hot water) and TEN milk pots. This is not a good ratio

Posters!

Coffee has been ratioed even more

Peer Review Times and Payment Incentives

Late afternoon session, moderated by Kirsten Bibbins-Domingo
@kbibbinsdomingo.bsky.social: “Peer Review Times and Payment Incentives” with Emilie Gunn @emiliemgunn.bsky.social ‬about “Results of Testing the Gold Standard 2-Week Reviewer Deadline” Editors at JCO Oncology Practice struggle to find reviewers (like all editors!).

EG: Would changing the review time to three weeks help? Offering the 21 day deadline actually reduced the reviewer conversion. Reviewers typically submit within 1-2 days of the deadline regardless of total time given. Should we shorten the deadline time then? Add EiC name?

Next, ‘Analysis of Decisions and Lead-Time in Ethical Review Boards in Sweden‘ from Emmanuel Zavalis. 20K applications 2021-2023: increasing over time, half of the submissions are amendments, mainly from healthcare /universities. 90% of applications are accepted (some need revisions).

EZ: Many applications are not approved in the required 60 days – delays have effects for postdocs with a 2-year project. They might switch to a different topic. We want to collect further data from other countries to compare. Open to suggestions.

Next: David Maslove with ‘Monetary Incentives for Peer Review at a Medical Journal: A Quasi-Randomized Experimental Study‘. Should we pay peer reviewers? We sent 2 letters, one offered USD $250 to review for Critical Care Medicine. Study: journals.lww.com/ccmjournal/a…

DM: Incentivized reviewers had slightly higher positive response, and were a bit faster in sending in reviews. The incentivized manuscript surprisingly had a slightly longer time-to-acceptance. The peer review time, however, is just a small fraction of the total editorial time.

Discussion: * Were those US or Canadian dollars? US * Should we then pay peer reviewers? Does not seem to have a lot of effect. * Effect on rejection rate? Not real. * Did you look at quality of review? Should we pay for a horrible review? * Might create jealousy of folks not paid.

Next: ‘Exploring Views on Remuneration for Review: A Survey of BMJ’s Patient and Public Reviewers‘ with Sara Schroter @bmj.com: All our reviews are open and pubilshed – we integrate patients into all the work we do. Patients are asked to review as well – get same subscription rewards.

SS: In Nov 2024, we offered a choice of 50 pounds or a 12-month subscription. Survey: Would you be more likely to review if we offered 50 pounds or a subscription? Patients like this idea and found it important that they were recognized, but did not think it would change their review.

SS: But negative comments as well: it would increase tax return work, might take away their benefits, or concerns about ethics or influence on quality. Conclusions: Mixed views – but important to provide flexible options.

Discussion: * We all are or will be patients. Would it be problematic if we offer different incentives to academic reviewers vs patients?

Journal Prestige Can and Should Be Earned

Last talk: Simine Vazire @simine.com, with ‘Journal Prestige Can and Should Be Earned‘. SV: I wear many hats. As Editor in Chief of Psychological Science I would love my journal name to mean anything, but on the other hand, should we value one journal over another?

SV: Does a journal’s prestige track its quality? – probably. But among top tier journals, it might be more messy. And a lot of peer review is a black box. We do not know what journals are doing in terms of quality. There should be more transparency about this process.

SV: The time has come to ask more of journals. Journals should state their goals, give us info to evaluate, catch and correct errors, biases, corruption. Nullius in Verba. Journals can do more: publish peer review history, require open data, declare CoI, policy for appeals, conduct audits.

SV: We can scrutinize journals’ claims of superiority. When journals fail us, they should lose prestige. Currently, there is no punishment for journals’ impact factor if they publish irreproducible or fraudulent papers. Errors are not a problem, but preventable errors are.

SV: Which journals should get prestige? Basic requirements: * Transparent submissions so reviewers can evaluate claims * Peer review checks accuracy * Peer review should be transparent so community can evaluate * Rigor, reproducibility, replicability, innovation, impact, novelty, etc.

SV: Which journals should get prestige? Those that meet our standards. Open data is on the rise, but still low. journals.sagepub.com/doi/full/10…. Reprodubility increases with author involvement. royalsocietypublishing.org/doi/full/10….

SV: Transparency is now the default at Psychological Science. We hired our best critics as our editors. New policies: publish peer review history, require open data, declare CoI if editors publish as editors, and conduct reproducibility checks. journals.sagepub.com/doi/full/10….

SV: I would like to ask the scientific community to expect more of journals. We should expect them to be transparent and accountable, just as they expect us to be transparent and accountable. We can choose to review for journals that align with our vision.

Discussion: * With so many papers now being published, how can we focus on reproducibility? It’s extra effort. * Some ‘top’ journals make lots of profit – we should expect them to do more. * Top journals don’t have to do anything. We will still want to publish there.

Discussion: * One person’s (or journal’s, society’s) decision can sometimes change something meaningful. Ripple effect. * How many big / risky studies turn out to be the truth? Can we label studies as risky? Follow up years later? * How much does this cost? Lots of volunteer work!

Discussion: * If we can show it is important and works, it might become sustainable in the future.

That ends today’s congress. See you tomorrow!

[Day 1] [Day 3]

Peer Review Congress Chicago – Day 1

eliesbik — Wed, 17 Sep 2025 05:26:18 +0000

It’s Peer Review Week! A perfect time to post my notes from the 10th International Congress on Peer Review and Scientific Publication, which was held at the Swissôtel in Chicago, two weeks ago, September 3-5, 2025.

This was my first time attending this congress. I tried to live-post all the talks on BlueSky [except for one session where I sneaked out].

You can find most posts about this conference under the hashtag #PRC10 on BlueSky or X. Andrew Porter @retropz.bsky.social, Research Integrity and Training Adviser at the Cancer Research UK Manchester Institute, created a BlueSky feed as well.

This post will contain (lightly edited) notes from Day 1. Click here to see the posts from [Day 2] and [Day 3].

Opening and welcome session

Good morning, everyone! Live from the Swissôtel in Chicago, I will be posting from the Tenth International Congress on Peer Review and Scientific Publication. The room was empty 30 min ago, but is quickly filling up. #PRC10 peerreviewcongress.org

John Ioannidis is opening the congress by welcoming everyone online and in the room, and giving some housekeeping rules.

Then, Ana Marušić takes the floor with the Drummond Rennie Lecture ‘Forward to the Past—Making Contributors Accountable‘. How do you hold all authors (sometimes 100s or 1000s!) accountable for the contents of a paper? Authors might want to remain anonymous, group authorship, AI as an author.

AM: new problems: Fake affiliations (‘octopus affiliations’), authors without ORCID. We need better trustmarkers CRediT addresses some issues: www.elsevier.com But people are always going to lie about their contributions! jamanetwork.com

AM: We should define authorship contributions at the start of the study, not at the end. Kiemer et al: journals.plos.org We should continue doing research on authorship and move to an organizational and community culture of responsible authorship.

Author and Reviewer Use of AI

After a Q&A session, with questions from in-room and online participants, we will proceed to the first session of the day: ‘Author and Reviewer Use of AI’, featuring four presentations. AI = artificial intelligence

Isamme AlFayyad with: ‘Authors Self-disclosed Use of Artificial Intelligence in Research: Submissions to 49 BMJ Group Biomedical Journals‘. 2023: BMJ group policy on AI use by authors. Authors are asked: Was AI used for the creation of this manuscript? We studied the responses of 25k manuscripts.

IA: Only 5.7% of the authors answered yes to question about use of AI. Much lower % than found in self-reporting studies. Most used ChatGPT, and most used it to improve the quality of their writing (associated with European and Asian authors). We will test with AI-detection tools

Discussion/Q&A: * We should distinguish between AI use to improve language and AI use in the methods. * Which AI-detection tools are reliable? Most tools are not great. * Do journal editors respond differently if authors disclose the use of AI?

Next (program change): Mario Malički with ‘Comparison of Content in Published and Unpublished Peer Review Reports‘: We compared 140K open (Springer Nature) vs 117k (Elsevier) unpublished peer review from 233 medical journals (2016-2021, before LLMs). Language was analyzed by AI models.

Open peer reviews were longer, had some more praise, were a bit more informative, had more suggestions for improvement, and were more similar to each other. Women and reviewers with Western-affiliations wrote longer reports. Open peer review may help make reports less negative.

Discussion: * Some fields (eg. AI) have even more open processes, where reviews are conducted openly during peer review. Medical field is more hesitant (also fewer preprints!). * How about consent? Is it ethical to study reviews, in particular those that were unpublished?

Roy Perlis @royperlis.bsky.social with ‘Factors Associated With Author and Reviewer Declared Use of AI in Medical Journals‘, presenting the JAMA view. In 2023, JAMA added questions about use of AI to create/edit manuscripts – increasing 2.5 -> 4% in May 2025. jamanetwork.com

RP: Most authors used AI for language and editing. Those who used AI were over-represented in Letters to the Editor. JAMA also asks reviewers to disclose use of AI (while stating it is not allowed). This is less than 1%. Reviews that used AI had slightly better review ratings.

Discussion: * How about use of AI in revisions? * Might authors answer truthfully or give desired answer? * Correlation with industry-sponsored affiliations? * Is JAMA using AI for editorial process? * Why was % of authors reporting AI so low? – will likely increase.

Next: Daniel Evanko with ‘Quantifying and Assessing the Use of Generative AI by Authors and Reviewers in the Cancer Research Field‘ We use Pangram text classifier arxiv.org We studied 46.5k manuscript submitted to @theaacr.bsky.social‬ AACR journals.

DE: We detected linear increase of AI use after Nov 2022, launch of ChatGTP. Non-native English speaking authors used about 2x more AI, no clear associations with journal characteristics. AI-generated text in abstract: less likely to be sent out for peer review (confounded w/ lower quality)

DE @evanko.bsky.social‬: Disclosure of AI use is not reliable. Less than 25% of authors who used AI (as detected by tool) actually disclosed it. All AI-written Letters to the Editor were rejected without consideration until April 2025.

Discussion (which is cutting into the break time, with very long questions and answers) * We are increasingly use AI – is that getting more usual and acceptable? * How reliable are AI-classifiers? Pangram is very good. * Language is evolving? Or not? * Sending notifications to authors.

Authorship and Integrity Issues

After the break, we will continue with the next session: “Authorship and Integrity Issues“, introduced and moderated by Lex Bouter @lexbouter.bsky.social‬.

The first speaker (program change) is Tim Kersjes with ‘Paper Mill Use of Fake Personas to Manipulate the Peer Review Process‘

TK: The story started by a reader flagging two nearly identical math papers in 2 different journals. The math was wrong. Both papers were submitted at the same time. Why was this not detected during peer review? We discovered a paper mill: non-existing authors, fake reviewers.

TK: We identified 55 published articles in this paper mill, with 26 confirmed fake personas claiming real institutional affiliations with a wide geographical spread. Four journals affected (Open Access), suggested fake peer reviewers with non-institutional emails.

TK: Authors did not reply or replied that they did not care if the paper would be retracted – they just needed it to graduate.
Retaliation: we received lots of spam after retracting papers.
We learned that paper mills can abuse peer review system. Do not rely on author-suggested reviewers.

EB: I really applaud publishers, in this [email protected], sharing these stories, so that we all can learn and put up safeguards.
Discussion: It is increasingly more difficult to detect paper mills.
Should we do better in checking the identity of authors and reviewers?

Discussion: * Institutional email addresses might not work, not everyone has them and ECR move to a different institution. But we do need better in verifying identity. * Should we have a list of bad actors? This is tricky (legal issues) – perhaps hold institutions accountable.

Next: Ana-Catarina Pinho-Gomes with ‘Comparison of Reasons for Retraction of Biomedical Articles by Women and Men Authors‘ (virtual presentation): We investigated gender differences in authorship of retracted papers using @retractionwatch.com database (65K retracted papers).

ACPG: Retractions for misconduct or ethical/legal reasons had an underrepresentation of women. Gender equality might enhance research integrity. Limitations: assigning gender based on names. Discussion: are women more clever, and not caught? Paper: journals.plos.org/plosone/arti…

Next: Laura Wilson at ‪@tandfresearch.bsky.social‬: ‘Authorship Changes as an Indicator of Research Integrity Concerns in Submissions to Academic Journals‘: Paper mills: authorship for sale. Are authorship changes during editorial process (after submission) an indicator for fraud?

LW: We studied all requests across 1321 TF journals to change 3 or more authors after initial submission. * 81% of authorship changes were denied. * Requests were 32x more likely to be investigated by ethics team for other concerns * Is this more likely to happen in certain journals?

Discussion: * Reasons for authorship changes? Sometimes because additional work was done after peer review. * Why not also include requests to include 1 or 2 authors? * Dropping authors during peer review is also suspicious.

(also, this room is soooo cold)

Next: Nicholas DeVito with ‘Notifying Authors That They Have Cited a Retracted Article and Future Citations of Retracted Articles: The RetractoBot Randomized Controlled Trial‘. Hypothesis: Notifying authors who cited retracted papers will decrease their citations in the future.

ND: We notified 250k authors who cited retracted papers (two people in the room got one of those emails). It did not work! After 1 year, we found no significant difference in citing those retracted papers. But it was received mostly positively although some negative replies. www.retracted.net

ND: The email system worked well, and we hope this will at least prevent some retracted papers to be cited in the future. Discussion: * Crossmark better than Retraction Watch database? * Not all citations of retracted works are out of ignorance * Better to cite retraction notice.

Next: Leslie McIntosh @mcintold.bsky.social ‬ with: ‘How a Questionable Research Network Manipulated Scholarly Publishing‘: Authorship and transparency are both trust markers in science. Emails/institutions/funders can be used as trust markers

LM: 232 organizations in 40 countries were affiliated with authors of a suspicious set of papers; 76 institutions were not matched to our database, such as the “Novel Global Community Educational Foundation”; many of these had addresses that were residential homes. arxiv.org/abs/2401.04022

We are now having a lunch break. Back in about 1.5 h!

Diversity and Research Environment

After lunch, we will continue with presentations in the session “Diversity and Research Environment”, starting with Michael Mensah ‪@drmichaelmensah.bsky.social‬ with “An Analysis of Equity, Diversity, and Inclusion Concerns From JAMA Network Peer Reviewers“

MM: JAMA @jama.com‬ journals now have an EDI Checkbox, where peer reviewers can flag manuscripts for concerns about diversity and equity etc. We analyzed a set of those flagged papers, using JAMA Pediatrics as a case study (had most flagged papers). These comments are confidential.

MM: We cannot present specific cases here because of the confidentiality of the comments. But in general, some manuscripts used non-standardized questions about race identification, or they hypothesized certain results solely to race or gender, without considering other factors.

Discussion: * Should journals rely on reviewers to pick up on these issues? Should this not be the role of editors? Redundancy might be best. * JAMA has guidelines on how to report on sex, gender.

Next, a virtual presentation by Clare Ardern with ‘Assessment of an Intervention to Equalize the Proportion of Funded Grant Applications for Underrepresented Groups at the Canadian Institutes of Health Research‘ – Equalisation was introduced at CHIHR in 2016, and expanded in 2021.

CA: We looked at Early Career Researcher and principal investigator applicants in CIHR grant competition – 60 committees with 2500 applications twice a year. The funding for equalised grants comes from a different ‘pot’ – it does not displace other winning applicants.

CA: The funding success rates for ECRs, women, and French researchers rose after the intervention. Discussion: * In US, affirmative action is no longer allowed in certain fields, how is that in Canada? * Do the number of applications match the number of PIs in each group?

Next: Noémie Aubert Bonn with: ‘Extracting Research Environment Indicators From the UK Research Excellence Framework 2021 Statements‘ REF assesses research institutions for research outputs, impact, and environment. The output % is going down, more appreciation for human factor.

NAB: Research environment considers institutional statements about unit context, people, infrastructure, collaboration. Qualitative indicators and narratives are important to add context, e.g. statements about sabbatical leave. EDI statements most on gender. (QR code, but not sharable).

NAB: Discussions ongoing to include missing voices (e.g. support staff, people who left academia) The QR code leads to this link, which for me is just a Microsoft sign in page: livemanchesterac-my.sharepoint.com/personal/noe… Perhaps @naubertbonn.bsky.social can help?

Discussion: * Self-reporting statements by universities can be misleading (‘tear-jerking’) – are these environment statements validated at all? – this is valid criticism – we are looking into better criteria.

Research Misconduct and Integrity

After the tea break, we continue with the “Research Misconduct and Integrity” session (Woohoo!). We start off with ‘Retractions and Democracy Index Scores Across 167 countries‘ by Ahmad Sofi-Mahmudi (online presentation): We studied relationship between democratic status and retractions – sensitive topic.

ASM: Higher democracy scores were associated with lower retraction rates (per 10,000 publications). GDP per capita was positively associated with retraction rates, suggesting better detection mechanisms in wealthier, more effectively governed countries.

ASM: Link to report: prct.xeradb.com/democracy-an… Discussion: * What led you to do this study? Wondering what drives cheating culture – system’s impact on individuals. * Many authoritarian regimes have Excellence programs, rewarding high output and thus increasing misconduct chance.

Next: Daniel Moreira with ‘Characterizing Problematic Images in Retracted Scientific Articles‘: We focus on developing software tools. We started with @retractionwatch.com data (56K entries), 8K of which involved images, and coupled them to @pubpeer.com discussions –> 2,078 articles.

DM: Most of the retractions had problematic Western blots. Taxonomy based on Paul Brookes’ work – most manipulations were replicating (reuse of images). Problems can be within figures, between figures, between documents, or paper mills (large scale, unrelated authors).

DM: Most problems fall in between-paper-replication group. We found a lack of standardized information in retraction notices. We are now developing software for copy-move annotation and challenges. See: www.nature.com/articles/s41…

EB: Here is Paul Brookes’ paper about detecting image problems and taxonomy: pmc.ncbi.nlm.nih.gov/articles/PMC…

Next: Reese Richardson with ‘Misidentification of Scanning Electron Microscope Instruments in the Peer-Reviewed Materials Science and Engineering Literature‘ Paper: journals.plos.org/plosone/arti… We looked at SEM images, which usually have a metadata banner with crucial data.

RR: Some @PubPeer comments led us to analyze mismatches between instrument make in Methods description and instrument make in metadata banner. In a set of 11K articles, our automated screen found 2,400 articles with incorrectly identified SEM. Only 13 were flagged on PubPeer.

RR: @reeserichardson.bsky.social:‬ Authors from Iran and Iraq were over-represented in this set, certain authors did this repeatedly. Several had formulaic titles without common authorship, suggestive of paper mills – involved journals frequently featured in paper mill advertisements.

Discussion: * How many papers do not have the banner? (not many, at least not in high resolution). Publishers might require banner, in high resolution! * Could a smart fraudster not cut off the banner? Absolutely! * Are there other types of data where we can apply the same analysis?

Next: John Ioannidis with: ‘Indicators of Small-Scale and Large-Scale Citation Concentration Patterns‘: H-index is a popular citation index, but can be easily gamed. Can we identify citation concentration? We looked at 1.5M authors from Scopus and focused on 1% extremes of 3 indicators.

JI: outliers in: 1: citations/h^2 2: A50%C – citing authors who cumulatively explain 50% of citations 3: A50: # co-authors with shared co-authorship exceeding 50 papers. Exclude Physics and Astronomy papers. Preprint: arxiv.org/abs/2406.19219 Paper: journals.plos.org/plosone/arti…

JI: Retractions were overrepresented, in particular in group 1. These extreme concentration patterns distort citation-based metrics. Not necessarily misconduct, but outliers that need to be considered in interpreting metrics. We need better metrics to probe gaming of metrics.

Discussion: * Do your results indicate that the problem is not so bad? Or was the threshold too stringent? * Should we intervene? * Do these indicators predict retractions? Yes, indicator 1 can. * Differences between fields? Yes, each of the 3 indicators has their own set of authors.

Next up: Reese Richardson @reeserichardson.bsky.social (again!) with ‘Scale and Resilience in Organizations Enabling Systematic Scientific Fraud’ – in which we tried to evaluate how large the paper mill problem is. Our dense paper: www.pnas.org/doi/abs/10.1…

RR: #1: Editors are inroads – some editors are much more associated with PubPeer-flagged or retracted papers #2: Through shared images, we found 2,000 articles that are connected to many others, suggestive for paper mills.

RR: #3: using the Internet archive we looked into a publication ‘broker’ that avoids and adapts to integrity measures. They replace one journal with another if they got caught. Integrity measures (deindexing a journal) are very infrequently applied. #4 skipped

RR: #6: paper mill products already outpace the integrity measures used to contain them. Widespread defection. Systemic fraud is growing and a viable object of study for metascience. We need more publication metadata from publishers. Also: plug for @COSIG – cosig.net

Discussion: * What type of data from publishers should be available for this type of research? Need to couple such tools to metadata availability! [Applause] * Retraction notice is not reliable for paper mill identification. * It took a lot of resistance and pushback to get this published.

* Should we worry about mass retractions, which are mostly about papers that might not be important? Is number of single-paper retractions going down? Should we worry more about that? – Perhaps those paper mills have a wider bad impact in terms of teaching people that misconduct is ok.

Next: ‘Patterns of Paper Mill Papers and Retraction Challenges‘ by Anna Abalkina @abalkina.bsky.social: We identified over 60 suspicious email domains associated with Tanu.pro – and identified 1,517 papers in 380 Scopus journals. Many countries – core is in Ukraine, Kazakhstan, and Russia.

AA: These papers are still being published, despite warning several publishers. Some have translated plagiarism (e.g. of bacherlor theses). Some were published through non-existing peer reviewers, e.g., “Leon Holmes”.

AA: Data from @springernature.com: 8,432 submitted papers identified as coming from Tanu.pro – most got immediately rejected, but at least 79 got published. 48 now retracted. Authors on 13 papers had the same verbatim reply. Most authors were middle-career.

AA: There are no COPE guidelines about how to handle paper mill products like these. More papers and resources about Tanu here: papermills.tilda.ws/paperstanupro Discussion: What is the extent of the problem? Hard to know. We recognize some paper mills but not all!

Next: Renee Hoch from @plos.org with:’Sustainable Approaches to Upholding High Integrity Standards in the Face of Large-Scale Threats: Insights From PLOS One‘: @plosone.org has adaptable system-wide screening to screen for integrity issues on a large scale.

RH: There has been an increasing number of integrity cases over the past years. Highest representation from India, Pakistan, and China, but also multi-national. We screen e.g. for duplicate submissions, human research ethics approval, and paper mill signals.

RH: These integrity interventions have resulted in increased desk reject rate – Preventing misconduct to get published is less work than retracting later. We must be agile in adapting our approaches to respond to new trends, but megajournals have an advantage in detecting those.

This ends today’s scientific session. On to the Opening Reception!

Click here to go to Peer Review Congress Chicago – [Day 2] and [Day 3].

Discontinuous ridiculous stools – a preprint full of tortured phrases and stolen data

eliesbik — Tue, 29 Jul 2025 03:15:04 +0000

“Patients with provocative entrail illness unclassified gave to crisis division a 3-day history of sickness, retching, migraine and irregular stomach torment alongside discontinuous ridiculous stools as of late.”

If you cannot wrap your brain around this sentence, don’t worry. Neither can I.

A photo of a very ridiculous stool: a poop-emoji cake, with big white googly eyes and twisted candles on top. Taken at uBiome headquarters, March 2017.

Tortured phrases

The wording is full of tortured phrases, a specific way of rewording text used by authors who want to disguise plagiarized text. To avoid detection by plagiarism detection tools, they run the copied text through ‘synonymizer’ software to find alternative words. It can result in nonsensical or even funny-sounding phrases.

Some common tortured phrases include:

“Counterfeit consciousness” instead of “artificial intelligence”
“Profound neural organization” instead of “deep neural network”
“Colossal information” instead of “big data”
“Bosom peril” instead of “breast cancer”
“Haze figuring” instead of “cloud computing”

Such tortured phrases were first described by Guillaume Cabanac, Cyril Labbé, and Alexander Magazinov, in their 2021 preprint Tortured phrases: A dubious writing style emerging in science. Evidence of critical issues affecting established journals. Their Problematic Paper Screener database currently has over 21,000 papers containing five or more of those phrases. Most were published in the early 2020s, with their incidence declining a bit after 2022 – presumably because ChatGPT and other generative Artificial Intelligence tools can do a much better job rewriting, and thus hiding, plagiarized text.

Graph from the Problematic Paper Screener, showing the number of papers containing at least 5 known tortured phrases, plotted per year.

A preprint with lots of strange synonymized phrases

Doing a search for the tortured phrase “provocative gut illnesses” (Inflammatory Bowel Disease; IBD) in Google Scholar, I found this gem in the medRxiv preprints collection. It is called Significance of headache in inflammatory bowel diseases, by Baqir Ali Khalid et al., DOI: 10.1101/2023.02.05.23285412, first uploaded in February 2023. The seven authors are affiliated with five universities and medical colleges in Pakistan.

Screenshot of the title and authors from the medRxiv page.

In the article, the authors claim to have collected ‘data’ from 20 IBD patients who presented at the emergency department with headache and bloody stools.

The text is very hard to understand, with some over-synonymized sentences. See if you can figure out what the authors mean by:

“Cerebral vein apoplexy can be deadly and finding is trying as side effects are vague”
“We might want to urge clinicians to continually reexamine their choices, particularly in the event that there is nonappearance of clinical improvement after a generally deep rooted treatment”
“EIMs address the primary driver of horribleness in Compact disc.” (Perhaps the authors mean Crohn’s Disease, which is often abbreviated as “CD”.)
“In the last option study, the chances proportion was 2.66 (95% certainty stretch = 1.08-6.54) contrasted with everybody” (Chances proportion = odds ratio; certainty stretch = confidence interval)
“As neuropathic torment seriously influences personal satisfaction and legitimizes explicit medicines, it appears to be vital to be aware on the off chance that some IBD patients ought to profit from such medicines”

Turning a single case report into a 20-patient study

In the preprint, the authors describe 20 patients with “provocative entrail illness” (read: inflammatory bowel disease) who are described in detail. They all had a 3-day sickness, they all had “irregular stomach torment alongside discontinuous ridiculous stools as of late”, and they had all been in contact with “youngsters” two days earlier. Hmmm, all 20 of them?

Then their blood test results are not given as average values, but as very specific measurements: “Blood tests upon the arrival of affirmation showed a C-responding protein of 86 mg/L, a typical white platelet count and lack of iron frailty (hemoglobin 110 g/L). Other lab discoveries like liver and pancreas chemicals, creatinine, urea and electrolytes were ordinary. Egg whites was low (26 g/L) as it was during earlier weeks.”

This sounded more like a single blood test, rather than the result from a set of 20 patients.

With some sleuthing, it wasn’t hard to find the original text. Remember, text with tortured phrases is plagiarized text, so translating the text back into regular biomedical expressions could lead you to the source paper.

Here, the source paper was a case report about a 15-year old IBD patient, published by Orfei et al. in BMJ Case Reports, in 2019, DOI: 10.1136/bcr-2018-227228. The authors of the 2023 medRxiv preprint appear to have taken the 2019 case report to make it sound like a set of 20 patients.

Here is a side-by-side comparison of the 2019 BMJ Case Reports paper (left) and the 2023 medRxiv preprint (right). I have color-coded some sentences to help with the navigation. Can you spot the stray “her” that the preprint authors left in by accident?

Left:
Text from Martina Orfei et al., BMJ Case Rep (2019)
DOI: 10.1136/bcr-2018-227228. Right: text from
Baqir Ali Khalid et al., medRxiv Preprint (2023)
DOI: 10.1101/2023.02.05.23285412

Copied data from National Health Interview Survey

The preprint continues with “data” collected from “20 patients”. Tables 1 and 2 list patient characteristics such as smoking status, body mass index, and migraine occurrence. Interestingly, the text reads “The overall age-adjusted prevalence of migraine or severe headache was 15.4% (n = 9,062)
and of IBD was 1.2% (n = 862).” With only 20 patients in the study, those are unexpected n’s.

Using some of the numbers in Table 1, I could easily find the source of the data. All values were identical to those in Yong Liu et al., Headache (2021), DOI: 10.1111/head.14087, a study carried out on 60,436 US adults who participated in the 2015 and 2016 National Health Interview Survey. That’s a lot more than 20 patients!

Here’s a side-by-side comparison of the NHIS 2021 data and the 2023 preprint. The values are all identical.

Comparison of data found in Yong Liu et al., Headache (2021), DOI: 10.1111/head.14087 (left) and Baqir Ali Khalid, medRxiv (2023), DOI: 10.1101/2023.02.05.23285412 (right).

There are probably other scientific papers that were used to generate this preprint, but the evidence seems clear. Patient data was copied from two older sources, then the copied text was synonymized to avoid plagiarism.

You can find my analysis here on PubPeer. I also left a comment on the medRxiv website, and also notified the preprint server organizers of this tortured phrases pearl.

Preprint claiming that COVID-19 mRNA vaccines cause transcriptomic dysregulation is deeply flawed

eliesbik — Sat, 26 Jul 2025 00:24:24 +0000

Today, 25 July 2025, a preprint was posted claiming that significant gene expression changes were found in individuals with new-onset cancer and other diseases after receiving mRNA COVID-19 vaccines, compared to healthy individuals.

A preprint is a non-peer reviewed manuscript – a study or hypothesis that has not yet been evaluated by other scientists. These articles should always be read with caution. Preprints can be brilliant, misguided, or completely bonkers – but they have not been peer-reviewed.

So let’s take a closer look at this preprint.

Update, 12 September 2025: The preprint was withdrawn for “unresolved ethical issues concerning ethical oversight, legitimacy of institutional boards, validity of the study design, and potential biases in study interpretation that compromise the overall trust in the research findings.“

The manuscript compares blood samples from three groups of participants:

Group 1: 3 individuals who developed new-onset disease symptoms after COVID-19 vaccination
Group 2: 7 individuals with new-onset cancer diagnoses following vaccination
Control group: 803 healthy individuals

RNA was extracted from these individuals’ blood samples and sequenced to identify which genes were up- or downregulated between the groups.

Very unequal group sizes

From the groups listed above, it is clear that the particpant numbers were highly imbalanced. With fewer than 10 individuals in the two ‘sick’ groups, it is difficult to draw any meaningful conclusions. In such small groups, a single patient with an unusual RNA expression pattern could significantly skew the results. In contrast, outliers in the much larger control group will have far less influence on the average. This kind of imbalance increases the risk of generating unreliable or non-reproducible findings.

No details about the study’s participants

The preprint provides almost no information about the individuals enrolled in the study. While it describes the RNA extraction and sequencing methods in detail, it offers little to nothing about the participants themselves. In studies like this, researchers are expected to include a “Table 1” — a summary of key demographics and clinical characteristics (such as age, sex, race, BMI, smoking status, or other relevant factors) for each participant group at baseline. This table helps readers assess whether the groups were comparable at the start of the study.

Unfortunately, no such table is provided here. We don’t know the average age of the groups. Were the individuals who developed symptoms or cancer older or younger than the healthy controls? Were they from different geographic regions? Were they otherwise healthy prior to vaccination? None of this is disclosed — a major omission.

The timing of adverse events is also unclear. How long after vaccination did symptoms or cancer appear — days, weeks, or months? The authors do not give any specific details.

Most importantly, there is no information about the vaccination status of the 803 healthy individuals. Were they all vaccinated as well? Without that, the comparison becomes almost meaningless.

Cancer might not be vaccine-related

Diseases such as cardiovascular injury, thrombosis, or cancer can occur at any time. If someone is diagnosed with cancer a few weeks after receiving a vaccine, that doesn’t mean the vaccine caused it – it could simply be a coincidence. To assess whether a vaccine increases cancer risk, a more rigorous study design would compare, for example, 1,000 vaccinated individuals with 1,000 unvaccinated individuals, tracking how many develop cancer over time.

This study did not do that. In fact, we don’t even know how many of the healthy participants in the control group had been vaccinated — a critical piece of missing information.

It’s also important to remember that during the height of the COVID-19 pandemic and lockdowns, many people skipped routine medical check-ups, either due to overwhelmed hospitals or fear of exposure. Clinics were closed, appointments were delayed, and hospitals were full of critically ill patients. For many, it did not feel like the right time to schedule a mammogram or colonoscopy.

As the pandemic eased and vaccinations became widely available, people began returning to doctors for overdue screenings. Not unexpectedly, some of these delayed check-ups led to new diagnoses, including cancer. But that doesn’t mean the vaccines caused the cancer. It just means both events — vaccination and diagnosis — happened around the same time, following a long period of medical disruption.

Unclear ethics permits

The preprint states that the participating clinics obtained ethical approval through their own institutional review boards (IRBs). However, these clinics appear to be small businesses, often led by a single individual or a very small team. It is unclear whether they actually have formal IRB committees — or whether they are even authorized to recruit participants for human research.

Adding to the confusion, the clinics listed in the ethics section do not match the affiliations of the study’s authors. This raises further questions: Were the individuals involved properly trained to conduct research on human subjects? Were appropriate ethical procedures followed?

Interestingly, the IRB approval numbers provided all follow a very similar format, suggesting they may have come from the same external IRB service — though the preprint does not clarify this.

More details need to be given about the validity and lack of conflict of interest regarding these ethical permits, as well about the qualifications and training of the staff in these clinics to conduct human research.

Potentially biased recruitment

It is also unclear how and where the participants were screened and recruited. Enrolling over 800 healthy individuals and persuading them to donate blood is not an easy task. Where was this done, and how was it organized?

Even more crucially, how were the 10 post-vaccination patients selected? Were they chosen from a larger pool of vaccinated individuals in which most did not experience cancer, thrombosis, or other serious outcomes? If so, how many people were screened, and what were the inclusion criteria?

This is a critical issue that needs to be addressed. Without clear information about recruitment methods, it raises the concern that these cases may have been selectively chosen – a practice known as “cherry-picking” – which can seriously bias the results and undermine the study’s validity.

RNA expression of sick vs healthy people is expected to differ

It is not surprising that the transcriptomic profiles of people with cancer or other illnesses differ from those of healthy individuals. That is exactly what we would expect. And that’s essentially what this preprint found.

But despite how it’s being presented on social media, this is not the bombshell the authors claim it to be. It simply shows that sick people have gene expression patterns consistent with illness. There is no evidence here of a causal link between their disease and the COVID-19 vaccine.

No link between cancer and vaccination is proven

This study is so poorly designed that it demonstrates only one thing: people who are sick have different RNA transcription profiles than people who are healthy.

Duh. That’s a basic and well-known fact.

Because of its deeply flawed design, lack of proper controls, and absence of key participant data, this study offers no evidence – absolutely nothing – that mRNA vaccines cause cancer.

It is especially disappointing that a group of authors with MDs and PhDs attached their names to such a weak and misleading piece of work. One of the lead authors, Peter McCullough, has had his board certifications revoked — and studies like this help explain why.

Frankly, this paper wouldn’t even be acceptable at a high school science fair, let alone meet the standards expected of serious biomedical research.

Update 28 July:

Reese Richardson found a lot more problems in the Catanzaro preprint. He discovered the source of the 803 ‘healthy’ controls (who turned out to be deceased patients), inconsistencies with the sequencing center, and a dash of plagiarism! Read his blog post here: No, mRNA vaccines do not cause “transcriptomic chaos”.

Science Integrity Digest Summer 2025

eliesbik — Fri, 25 Jul 2025 21:02:53 +0000

It is hard to find the time to post here. I’m getting lots of requests to help scanning papers for image problems, and am also traveling a lot to give talks and be in panels. So my ‘monthly’ digests have now turned into quarterly digests, hahaha.

These past months, I have traveled to Berlin to receive the Einstein Foundation Award, to Oxford for the FAIRS Meeting, participated in a workshop in Stockholm organized by the Royal Swedish Academy of Sciences about the Reformation of Science Publishing, a conference in London at the Royal Society about the Future of Science Publishing, and a gathering with other science detectives and journalists in Krakow, Poland. In between, I gave several talks at research institutions and medical schools. I am getting pretty good at packing suitcases!

Here is a round-up of some noteworthy articles about research integrity.

Secret Sleuth Society Meeting in Krakow, Poland.

COSIG

COSIG stands for the Collection of Open Science Integrity Guides. It is an open‑source, community‑led set of guides intended to help more people to conduct post‑publication peer review. The project was led by Reese Richardson, and published in June 2025. Cosig.net currently has 30 guides on a range of topics, including general tips on how to comment on PubPeer or report integrity concerns, as well as discipline-specific forensic tools for image forensics, plagiarism detection, or statistical red flags. The idea is that anyone can be a sleuth and contribute to critically reading scientific papers. All tools are freely downloadable as individual PDFs or as a combined set. Read more about it on Retraction Watch.

Science Integrity ‘sleuths’ in the news

Research-integrity sleuths say their work is being ‘twisted’ to undermine science. Some sleuths fear that the business of cleaning up flawed studies is being weaponized against science itself. “We try to point out those bad papers because we still believe in science and want to make science better,” says Elisabeth Bik, a microbiologist and image-integrity specialist based in San Francisco, California. But, she adds, “I am very worried about how the work we do in pointing out bad papers is currently being misused, or even weaponized, to convince the general public that all science is bad”. [Miryam Naddaf, Nature, July 2025]
The COVID-19 pandemic transformed this scientist into a research-integrity sleuth. Lonni Besançon has faced online abuse, threats and legal challenges. But he remains proud of his service to science and society. “My main message to young scientists is simple: don’t be afraid to speak up. Research misconduct may not always be visible at first glance, but if you are vigilant and critical, you can spot it.”[Christine Ro, Nature, July 2025]
Quality of scientific papers questioned as academics ‘overwhelmed’ by the millions published. Widespread mockery of AI-generated rat with giant penis in one paper brings problem to public attention. Sir Mark Walport, the former government chief scientist and chair of the Royal Society’s publishing board, said nearly every aspect of scientific publishing was being transformed by technology, while deeply ingrained incentives for researchers and publishers often favoured quantity over quality. “Volume is a bad driver,” Walport said. “The incentive should be quality, not quantity. It’s about re-engineering the system in a way that encourages good research from beginning to end.” [Ian Sample, The Guardian, July 2025]
Journal plagued with problematic papers, likely from paper mills, pauses submissions. The halt will let Taylor & Francis focus on checking Bioengineered’s papers for fraudulent works and paid authorships. “Today feels like a big win for the scientific record,” says René Aquarius, a biomedical scientist in Radboud University Medical Centre’s neurosurgery department. Aquarius led a group of sleuths who published a preprint in March suggesting the journal was rife with problematic papers and that Taylor & Francis was not acting fast enough to investigate them. [Jeffrey Brainard, Science, July 2025]
Detecting fraud ‘is a job for professionals, not peer reviewers’. Integrity investigator Elisabeth Bik urges publishers, journals and institutions to take responsibility [Sophie Hogan, Research Professional News, July 2025]
More than two dozen papers by neural tube researcher come under scrutiny. One of the studies, published in 2021 in Science Advances, received an editorial expression of concern on 21 May, after the journal learned that an institutional review of alleged image problems is underway. Renowned data-integrity consultant Elisabeth Bik found the potential problems and posted her findings on PubPeer two months ago after scrutinizing about 100 of Yang’s studies. [Claudia López Lloreda, The Transmitter, June 2025]
Science sleuths flag hundreds of papers that use AI without disclosing it
Telltale signs of chatbot use are scattered through the scholarly literature — and, in some cases, have disappeared without a trace. Both Glynn and Strzelecki have identified instances in which publishers have removed the telltale AI phrases without indicating that the paper has been modified, dubbed stealth corrections. [Diana Kwon, Nature, April 2025]

Research Integrity at Publishers and Institutions

Academic Research Integrity Investigations Must be Independent, Fair, and Timely. We propose that research integrity violations of substantial scale should be independently investigated by appropriately resourced specialists. Such investigations should be completed within a time frame that facilitates meaningful corrective action when required or exoneration of the accused party when appropriate; completion of an investigation should rarely extend beyond one year and the results of the investigation should be made public.- Schrag, Patrick, Bik – J Law Med Ethics [May 2025]
Springer Nature launches new tool to spot awkward, tortured phrases. The new tool is based upon the tortured phrases catalogue of the Problematic Paper Screener that was created by Guillaume Cabanac, Cyril Labbé and Alexander Magazinov. It flags unusual and awkwardly constructed phrases that could indicate that an author used paraphrasing tools to prevent plagiarism from being discovered. [Rebecca Trager, Chemistry World, July 2025]
River Valley Technologies launches Research Integrity Dashboard. River Valley has already submitted some 4,000 animations of questionable images on PubPeer. The Research Integrity Dashboard integrates with existing workflows, offering both large and small publishers a scalable, customisable approach to safeguarding research ethics [Research Information, May 2025]
Frontiers broadens AI‑driven integrity checks with integration of Cactus’ Paperpal Preflight and Clear Skies’ Papermill Alarm and Oversight into AIRA [Frontiers, July 2025]

Notable retractions

The Institut Hospitalo Universitaire-Méditerranée Infection in Marseille (IHU-MI) now has 56 papers retracted, 47 authored by its former director Didier Raoult. I wrote about concerns in IHU papers here, here, and here. You can follow the retractions and corrections of all 846 IHU-MI articles with concerns at IHU-Correction.com or at the Retraction Watch Leaderboard. Raoult has reached position 12 on that leaderboard.
Sodom comet paper to be retracted two years after editor’s note acknowledging concerns. Scientific Reports has retracted a controversial paper claiming to present evidence an ancient city in the Middle East was destroyed by an exploding celestial body – an event the authors suggested could have inspired the Biblical account of Sodom and Gomorrah [Ellie Kincaid – Retraction Watch, April 2025]. I wrote about my concerns about the images in this paper in October 2021.
Former cancer researcher who sued university for discrimination hits 35 retractions. More than 100 of Rao’s papers have comments on PubPeer, most originating from a user called Lotus azoricus. We now know that pseudonym belongs to sleuth Elisabeth Bik. [Kate Travis, Retraction Watch, May 2025]
After 15 years of controversy, Science retracts ‘arsenic life’ paper. Many scientists, including David Sanders, a biologist at Purdue University in Lafayette, Ind. who has previously argued for the paper’s retraction in posts for Retraction Watch, believe the paper’s results were simply the result of contamination of the authors’ materials. [Ellie Kincaid, Retraction Watch, July 2025]

Royal Society in London

ScienceGuardians, where disgruntled authors complain about PubPeer

eliesbik — Fri, 18 Apr 2025 23:19:21 +0000

On Twitter/X, @SciGuardians, associated with the website ScienceGuardians.com, is promising to ‘uncover’ some big conspiracy of fraudulent @pubpeer.com users.

But in reality, the account appears to be run by one or more disgruntled scientists with dozens of problematic papers. And there is no big reveal.

Is it a PubPeer look-alike?

ScienceGuardians.com presents as a ‘Journal Club’ where people can comment on scientific papers using anonymous accounts. Very similar to PubPeer, actually. Here, anonymous user “VelvetPhantom” comments on a paper about silver nanoparticles.

Is it a COPE look-alike?

The ScienceGuardians website also features some training and resources, with flowcharts that resemble those used by COPE, the Committee on Publication Ethics. One flowchart covers the ethical reuse of figures and tables, another the addition of authors during the revision stage of submitted articles.

Who are the ScienceGuardians?

It is a bit unclear why ScienceGuardians set up a PubPeer-look-alike site and COPE-like guidelines, but of course, they are free to do so. As Oscar Wilde said, ‘Imitation is the sincerest form of flattery that mediocrity can pay to greatness‘.

But who is behind ScienceGuardians?

It might be tempting to speculate that the ScienceGuardians account and website could be run by a bunch of disgruntled scientists, whose papers have been criticized on PubPeer.

Only one person has publicly stated that they joined the platform: Dr. Wafik El-Deiry, a cancer researcher and Associate Dean for Oncologic Sciences at Brown University and the co-Editor in Chief of Oncotarget.

Dr. El-Deiry has published over 1,000 articles, as per Dimensions.ai. Before joining Brown University, he has worked at the Howard Hughes Medical Institute, the University of Pennsylvania, and Penn State University. According to his Wikipedia page, he has authored 13 papers that have been cited over 6,000 times.

He also has earned nearly 70 comments on PubPeer. They cover the usual range of duplicated and overlapping images. For a summary of the concerns raised on his papers, you can read this post on ForBetterScience.

El-Deiry was not happy about his PubPeer comments

Dr. El-Deiry was, understandably, not happy with all those PubPeer comments about his papers.

But instead of addressing the concerns by looking up the original data and providing the correct images, Dr. El-Deiry replied to most PubPeer comments with a screenshot of a lengthy statement, in which he lamented that PubPeer ‘has no moral, legal, or any other authority to smear the reputation of numerous individuals publicly including on social media‘.

He also wrote ‘Underlying political motivations are extremely troubling as is the anonymous nature of public attacks‘, ‘political motives by anonymous accusers‘, and ‘blackmail and exploitation‘.

ScienceGuardians in full DARVO mode

The SciGuardians X-account appears to be doing precisely what El-Deiry claims to despise. It anonymously tries to smear the reputation of PubPeer commenters, by making all kinds of false accusations.

It follows the classical DARVO pattern, a term coined by Dr. Jennifer Freyd that stands for “Deny, Attack, and Reverse Victim and Offender.”

In a long tweet posted today, ScienceGuardians complains about science critic ‘Clare Francis‘ (a pseudonym) who has used several PubPeer accounts to post concerns about Dr. El-Deiry’s papers. In an attempt to DARVO these critiques, Francis is described as the villain, and Dr. El-Deiry as the victim. For example, Francis’s accounts are described as ‘fraudulent aliases‘, and their posts as ‘threats, disrespect, harassment, and extortion‘ with an ‘unrelenting obsession‘ and ‘orchestrated attacks, deceptive behaviors, and fraudulent tactics‘. Dr. El-Deiry is painted as the victim of ‘coordinated personal attacks’.

Of note, nothing in the gish-gallop of adjectives and bold-face statements in that tweet shows any proof of threats, fraud, or extortion.

On April 11, SciGuardians tweeted about the ‘Coordinated Attacks on the Scientific Community‘ by PubPeer users who criticized papers by Dr. Sabine Hazan and Professor Jörg Rinklebe, who has earned a whopping 288 PubPeer posts.

In an April 10 SciGuardians post, the retraction of a paper by Dr. Sabine Hazan from Frontiers in Microbiology was described as ‘unjustified‘, ‘premature‘, and ‘a case in point of how commercially-driven decisions can distort the scientific process, stifling innovation and inquiry‘.

The ScienceGuardians’ use of libelous language

By using sensational language including words like ‘breaking‘, ‘uncovering‘, and ‘revealing‘, the @SciGuardians account is trying to pique interest.

The account has promised to reveal ‘evidence of fraud, identity manipulation, and coordinated misconduct‘ – but so far, all it has done is show that there are some prolific PubPeer users who have posted numerous concerns about Dr. El-Deiry’s and Dr. Hazan’s papers. Duh, that is hardly breaking news. And not fraudulent at all.

In several posts (here and here), SciGuardians discusses the ‘coordinated attacks’ by Pubpeer ‘perpetrators‘, a term suggesting illegal or fraudulent activities.

In fact, by using words such as ‘perpetrator‘, ‘false identities‘, ‘wire fraud‘, ‘prosecution‘, ‘confesses‘, and ‘expose‘, they make a nothing-burger out of showing that some people post under a pseudonym, just like folks at the ScienceGuardians’ website do! Ain’t nothing wrong or fraudulent with using a pseudonym.

Is ScienceGuardians related to CureGuardian?

The style of SciGuardians’ tweets reminds me of those by Matt Nachtrab, who relentlessly harassed me after I criticized papers related to his beloved $SAVA company – and who lost $50 million by ignoring our repeated warnings. You can find some examples of his tweets here, here, here, and here.

In an interesting detail, Matt was so angry with our PubPeer comments that he started CureGuardian.org – a name uncannily similar to ScienceGuardians.com. Might they be related? (just speculating here!)

Becoming ScienceGuardians

The ScienceGuardians’ website claims to be a ‘global hub for upholding the highest standards of integrity‘ and a ‘beacon of trust and accountability‘ seem to not match the libelous language they spout on X very well. But… freedom of speech and all of that, right?

Anyone can become a member of their platform, as long as you have an email address associated with an academic or publishing institution – no luck trying to sign up with your gmail account. They claim to be inclusive, where ‘every member, regardless of their role in the scienceguardians community, can share their insights and concerns. This inclusivity ensures that the perspectives of all stakeholders are respected and considered.’

So I just signed up with my Stanford affiliate email address. Let’s see how inclusive they are. Will they approve my membership?

Update: Unexpectedly, ScienceGuardians approved my membership the next day. I could even copy/paste some PubPeer comments, which I did for a week or so. Apparently, I am now the most prolific Science Guardian, hahaha.

Science Integrity Digest – catching up

eliesbik — Fri, 21 Feb 2025 22:59:51 +0000

Apologies for not posting for a while. Fall 2024 was a busy time for me, with travel to give talks in Australia and Singapore and some events in the US. Instead of trying to catch up with everything that has happened since September, here are some highlights.

Einstein Foundation Award

I’m thrilled to have won the 2024 Einstein Foundation Award for Promoting Quality in Research. The Einstein Foundation Berlin selects three winners each year. The 2024 winners are:

Early Career Award: Helena Jambor and Christopher Schmied at PixelQuality – Best practices for publishing images. ‘PixelQuality has established guidelines and checklists for publishing clear and reproducible images. It now aims to disseminate and refine them to handle AI-assisted image generation and analysis.’ ‘Research images are the proof of scientific findings, not just visuals. PixelQuality has set new standards for their reproducibility and transparency.’
Institutional Award: PubPeer – ‘PubPeer has become an essential part of the research communication landscape, with over 300,000 comments logged so far. It is estimated that since 2012, 19 percent of all retractions of papers worldwide in all academic domains had a prior discussion on the site. Beyond identifying flaws and fraud, PubPeer functions as an important tool to jointly improve scientific publications through „liquid feedback“.’
Individual Award: Elisabeth Bik – ‘Elisabeth Bik’s work in uncovering manipulated images, fraudulent research data and publications has created enormous impact all over the world. Her work has led to heightened awareness of questionable research practices and generated widespread attention to responsible conduct of research in the scientific community.’

The award ceremony will take place in March 2025 in Berlin, Germany.

The Einstein Foundation Award trophy is a piece of chalk, created by Professor Axel Kufus at the Berlin University of the Arts to honor ‘a basic tool that has been used to bring knowledge into the world for as long as we can remember’. Source: https://award.einsteinfoundation.de/about

The Bik Fund

Instead of accepting the Einstein Foundation award money for myself, I’ve decided to put it into ‘The Elisabeth Bik Science Integrity Fund’ – where I hope to help other science sleuths with small grants for e.g., traveling to conferences, buying equipment or software, or training. Our type of work often does not fit into the hypothesis-driven model of government or charity funds, so I hope to help by filling this funding gap.

The Bik Fund will be part of The Center For Scientific Integrity, Retraction Watch’s parent 501(c)3 non-profit organization.

It is my hope that we can make this fund grow, so we can help more integrity warriors in the future – donations to the Bik fund are tax-deductible and very welcome.

Press coverage:

Announcing the Elisabeth Bik Science Integrity Fund – Retraction Watch
Renowned scientific integrity investigator endows fund to support fellow sleuths – Microbiologist Elisabeth Bik donates $200,000 to support training, travel – Jeffrey Brainard – Science
There’s Fraud in Science – A New Fund Seeks To Tackle It – Molly Coddington – Technology Networks

Doctored

Charles Piller’s new book ‘Doctored – Fraud, Arrogance, and Tragedy in the Quest to Cure Alzheimer’s’ came out earlier this month.

The book describes how Matthew Schrag, a neuroscientist at Vanderbilt University and a fellow image-sleuth, discovered possibly altered images in a highly-cited paper about amyloid plaques in Alzheimer’s patients and in other papers connected to the biotech company Cassava Sciences (see also below). It also gives a disturbing insight into the amount of fraud in neuroscience and the lack of action by journals, institutions, and government agencies.

Matthew Schrag and other ‘science sleuths’—Kevin Patrick, Mu Yang, and I—worked for two years checking thousands of papers in Alzheimer’s research and related fields for the book, uncovering a concerning number of potential problems in papers by Sylvain Lesné, Berislav Zlokovic, Eliezer Masliah, and others.

Press coverage of Piller’s book:

How Fraud, Greed and Negligence Have Stymied Alzheimer’s Research and Progress Toward a Cure – Alexis Madrigal – KQED
The Science We Depend On: Behind Alzheimer’s Disease with Charles Piller – Ivan Oransky – MedCentral
The Devastating Legacy of Lies in Alzheimer’s Science – Charles Piller – The New York Times
Have doctors been wrong about how to treat Alzheimer’s disease? – The Economist
Tariffs, Future of Cancer and Alzheimer’s, Global Debt Cycles | Wall Street Week – Bloomberg

Podcasts with Charles Piller and/or Matthew Schrag (there are many more!)

Charles Piller (right) and I at a book signing event for ‘Doctored’ – February 2025 at the Commonwealth Club in San Francisco, CA

Cassava Sciences Phase 3 did not work

I have written in the past (here and here) about problematic images in papers by Dr. Hoau-Yan Wang and Cassava Sciences (NASDAQ: $SAVA), a biotech company testing a drug targeting Alzheimer’s disease. These concerns were first spotted by Matthew Schrag and published in a Citizen Petition, in an attempt to halt Cassava’s clinical trials. Despite the apparent problems, the FDA did not stop Cassava’s clinical trials of their Alzheimer’s drug, Simufilam. But Cassava’s stock dropped significantly, upsetting a lot of investors.

For three years, fans of the $SAVA stock harassed the Dr. Wang critics – including me – claiming that the drug would work fine; that all stockholders would be rich as long as they would HODL; and that all our concerns were FUD. They joined forces in a SAVAges Discord app, where they talked about how one day they would all buy Maseratis, Lamborghinis, and even a SAVA island — and how retarded and fraudulent the SAVA-critics were.

The authorities begged to differ, however. In March 2024, Science reported that FDA inspectors had found many problems in Dr. Wang’s lab, ranging from uncalibrated instruments, lack of control samples, and stored data files, to leaving out outliers based on subjective criteria. Another report by the City University of New York (CUNY) to the Office of Research Integrity (ORI), leaked to Science, described ‘egregious misconduct’ in Dr. Wang’s lab. In June last year the Department of Justice announced that Professor Wang had been charged with operating a ‘Multimillion-Dollar Grant Fraud Scheme’.

Meanwhile, lawsuits were filed back-and-forth. Cassava Sciences was suing the people who filed the FDA citizen petition and who set up the website CassavaFraud.com, while stockholders were suing Cassava in a class-action lawsuit for misrepresenting the quality of the experiments performed by Dr. Wang. The Securities and Exchange Commission (SEC) charged Cassava with misrepresenting clinical trials result, and later settled for $40M. And CUNY, instead of officially announcing their misconduct findings, was investigating who leaked their misconduct investigation report to Science.

Finally, last December the biotech company announced that the Phase 3 trial did not meet the anticipated endpoints. In other words, Simufilam did not halt or improve Alzheimer’s disease. The Cassava Sciences stock immediately dropped to $2-3 dollars, and the SAVAges Discord became a valley of despair. No more SAVA island.

$SAVA stock price, last 5 years. Source: https://www.nasdaq.com/market-activity/stocks/sava

Didier Raoult enters the Retraction Watch Leaderboard

As I’ve written previously, many papers from Professor Didier Raoult, the former director of the IHU Méditerranée Infection in Marseille, France, are problematic.

Some appear to contain manipulated images, while others describe research conducted without proper ethical permits. A number of Raoult’s studies were performed on vulnerable populations, such as homeless people, or on study participants in the Global South, which I [gasp!] dared to label neocolonial science.

As described in this paper by Fabrice Frank et al, the IHU-MI produced a set of 248 studies all with the same IRB approval number, 09–022. Despite sharing the same permit, the publications varied in terms of sample type, demographics, and countries. Other ‘multi-use’ IRB numbers were found too. Typically, an IRB approves one particular study, and researchers are not allowed to reuse this permit for other, unrelated, studies. Dr. Raoult appears to have cared little for such rules, however.

After raising our concerns on PubPeer and social media, and also writing to journal editors, several journals have started to issue Expressions of Concern and retractions.

Fabrice Frank has generated a list of all IHU-MI papers with concerns – there are over 800!

In fact, Raoult has now earned so many retractions (32 as of today) that he has entered the Retraction Watch Leaderboard. The most significant of them is the retraction of the paper that claimed that Hydroxychloroquine could treat COVID-19. I criticized this paper in my blog post in April 2020, days after its publication. It took more than four years to get it retracted.

With over 600 of his papers on PubPeer and 225 EoCs, it seems safe to assume that for Raoult’s position on the Retraction Watch Leaderboard list, the only way is up.

Artist: Sara Gironi Carnevale. Source: https://www.science.org/content/article/failure-every-level-how-science-sleuths-exposed-massive-ethics-violations-famed-french

Science Integrity Digest

The Camel’s Camel

Tortured phrases

Plagiarized from a PhD thesis

A stolen figure with non-udders

But wait, the authors wrote another camel udder microbiome paper!

Editorial fail

UnEDXpected Peaks

Materials Science

Energy-dispersive X-ray spectroscopy

Common concerns with EDX plots

A problematic set of papers from Pakistani researchers

UnEDXpected EDX plots

Other problems:

Peer Review Congress Chicago – Day 3

Opening

Use of AI to Assess Quality and Reporting

Open Science, Availability of Protocols, and Registration

Open Science and Data Sharing

AI for Detecting Problems and Assessing Quality in Peer Review

Peer Review Congress Chicago – Day 2

The Douglas G. Altman Lecture

Bias, Study Outcomes, and Reporting Concerns

Editorial and Publishing Processes and Models

Coffee break and poster viewing

Peer Review Times and Payment Incentives

Journal Prestige Can and Should Be Earned

Peer Review Congress Chicago – Day 1

Opening and welcome session

Author and Reviewer Use of AI

Authorship and Integrity Issues

Diversity and Research Environment

Research Misconduct and Integrity

Discontinuous ridiculous stools – a preprint full of tortured phrases and stolen data

Tortured phrases

A preprint with lots of strange synonymized phrases

Turning a single case report into a 20-patient study

Copied data from National Health Interview Survey

Preprint claiming that COVID-19 mRNA vaccines cause transcriptomic dysregulation is deeply flawed

Very unequal group sizes

No details about the study’s participants

Cancer might not be vaccine-related

Unclear ethics permits

Potentially biased recruitment

RNA expression of sick vs healthy people is expected to differ

No link between cancer and vaccination is proven

Update 28 July:

Science Integrity Digest Summer 2025

COSIG

Science Integrity ‘sleuths’ in the news

Research Integrity at Publishers and Institutions

Notable retractions

ScienceGuardians, where disgruntled authors complain about PubPeer

Is it a PubPeer look-alike?

Is it a COPE look-alike?

Who are the ScienceGuardians?

El-Deiry was not happy about his PubPeer comments

ScienceGuardians in full DARVO mode

The ScienceGuardians’ use of libelous language

Is ScienceGuardians related to CureGuardian?

Becoming ScienceGuardians

Science Integrity Digest – catching up

Einstein Foundation Award

The Bik Fund

Doctored

Cassava Sciences Phase 3 did not work

Didier Raoult enters the Retraction Watch Leaderboard