Federated Analytics

Lucky 13! First national TRE cohort is on-boarded

Philip Quinlan — Wed, 21 Dec 2022 14:21:10 +0000

Lucky 13! First national TRE cohort is on-boarded

Scottish public health data now discoverable via the Cohort Discovery Search Tool

We are delighted to announce that COVID data from Public Health Scotland’s (PHS) is now live on the Health Data Research Innovation Gateway‘s Cohort Discovery Search Tool (CDST).

This is a massive achievement for the team, as PHS is the first national-scale Trusted Research Environment (TRE) to on-board their COVID data, as well as the second TRE overall. The first was the Edinburgh Parallel Computing Centre (EPCC) who on-boarded the ISARIC 4C cohort.

Not only does PHS’s addition bring CO-CONNECT’s running total of on-boarded COVID cohorts up to 13, it has also astronomically increased the number of searchable datasets on the CDST.

Thanks to PHS, a further 5.2 million records are now discoverable, making it the largest cohort the team has ever worked with. This title was previously held by ISARIC 4C, which contains around 270,000 records.

The post Lucky 13! First national TRE cohort is on-boarded first appeared on Federated Analytics.

CO-CONNECT Shortlisted for MRC award

Philip Quinlan — Tue, 06 Dec 2022 09:30:00 +0000

CO-CONNECT Shortlisted for MRC award

The project is a finalist for the Medical Research Council's 2022 Open Impact Prize

By Gabriella Linning

We are delighted to announce that the CO-CONNECT project has been shortlisted in for the Medical Research Council (MRC) 2022 Impact Prize. The final results will be announced at an MRC awards ceremony in March 2023, the exact details of which are yet to be released.

The MRC Impact Prize, which recognises both individuals and groups who have made outstanding impacts in medical research, is split into three categories:

Winning entries for each of these categories will be awarded a financial prize of up to £20,000. Recipients must then put this money towards activities that either:

Further their learning or development;
Enable wider dissemination of their research impact

CO-CONNECT was nominated by team members based at The University of Nottingham for the Open Science Impact prize, under the submission title “Making COVID-19 response data FAIR“.

The Open Science Impact prize is awarded for outstanding contributions to advancing open science in medical research. That is, “developing and implementing pioneering open science practices and principles to make sure research methods, findings and outputs more accessible, transparent and reproducible.”

Each category now has three shortlisted nominations. The other finalist in the Open Science Impact category are the ‘Core Outcome Measures in Effectiveness Trials (COMET)‘ Initiative based at the University of Liverpool, and ‘Enabling rapid communication, dissemination and uptake of COVID-19 mathematical modelling‘ from Imperial College London.

We wish all finalists the very best of luck.

Jargon Buster

MRC

MRC is the shortened name for the Medical Research Council, which is responsible for co-ordinating and funding medical research across the UK. It is part of UK Research and Innovation (UKRI).

Dissemination

The action of spreading something (normally information) widely.

FAIR

In this context, FAIR is an acronym for: ‘[Making COVID-19 response data] Findable, Accessible, Interoperable or Reusable’.

Where can I find out more about...

MRC?

Visit the MRC website at: https://www.ukri.org/about-us/mrc/

MRC 2022 Impact Prize?

Visit the UKRI website at: https://www.ukri.org/what-we-offer/prizes/mrc-awards-and-recognition/mrc-impact-prize/#contents-list

the Open Science Impact prize

Visit the UKRI website at: https://www.ukri.org/what-we-offer/prizes/mrc-awards-and-recognition/mrc-impact-prize/open-science-impact/#contents-list

the Outstanding Team Impact prize

Visit the UKRI website at: https://www.ukri.org/what-we-offer/prizes/mrc-awards-and-recognition/mrc-impact-prize/outstanding-team-impact/#contents-list

the Early Career Impact prize

Visit the UKRI website at: https://www.ukri.org/what-we-offer/prizes/mrc-awards-and-recognition/mrc-impact-prize/early-career-impact/#contents-list

Finalists in each MRC Impact Prize 2022 category

Open Science Impact

‘COVID-19 response data FAIR‘ (CO-CONNECT) – The University of Nottingham
‘Core Outcome Measures in Effectiveness Trials (COMET)‘ Initiative – The University of Liverpool
‘Enabling rapid communication, dissemination and uptake of COVID-19 mathematical modelling‘ – Imperial College London.

Outstanding Team Impact

‘RECOVERY trial: the world’s largest study of COVID-19 therapies‘ – The University of Oxford
‘The impact of poor menstrual health and hygiene on adolescent schoolgirls and interventions to improve girls’ health and equity‘ – Liverpool School of Tropical Medicine
‘Next generation imaging of human brain function‘ – The University of Nottingham

Early Career Impact

Dr Natalie Shenker, Research Fellow – Imperial College London
Dr Segun Fatumo, Wellcome Intermediate Fellow – MRC Uganda Viral Research Institute and London School of Hygiene & Tropical Diseases
Dr Amy Orben, Programme Track Scientist, The University of Cambridge

The post CO-CONNECT Shortlisted for MRC award first appeared on Federated Analytics.

CO-CONNECT Shortlisted in HDR UK’s Annual Awards

Philip Quinlan — Thu, 24 Nov 2022 12:00:00 +0000

CO-CONNECT Shortlisted in HDR UK's Annual Awards

We are happy to announce that CO-CONNECT has been short-listed for Health Data Research UK’s Annual Award for Reproducibility Recognitions.

The project has been nominated for the team’s groundbreaking and innovative work with the Cohort Discovery Search Tool, which aims to make COVID-19 research cohorts more accessible, discoverable and searchable to researchers.

The results will be announced at HDR UK’s 2022 Annual Scientific Conference on Wednesday 14 December. The event will run online from 9:30-17:00.

This award is given to the project or group that best demonstrates HDR UK’s core value of reproducibility – the ability to repeat research, reuse techniques or verify results.

HDR UK asked the health data research community to tell them about how they make their research reproducible or reusable, whether it is by using or creating an open-source software, reporting guidelines or fair data.

Whilst CO-CONNECT’s work has drawn to a close, its legacy lives on. Publicly accessible technical documentation, specialised software and explainer videos are available for other data custodians to independently on-board their research cohorts onto Cohort Discovery. It is hoped that in time the tool will expand to include cohorts for other diseases besides COVID-19.

We wish all nominees the very best of luck.

Where can I find out more about...

the HDR UK Annual Awards & shortlist?

Visit the HDR UK website at: https://www.hdruk.ac.uk/news/hdr-uk-annual-awards-shortlist-announced-for-excellence-in-health-data-research/

HDR UK's Scientific Conference?

Visit the HDR UK website at: https://www.hdruk.ac.uk/news-opinion-events/events/health-data-research-uk-scientific-conference-2022-data-for-global-health-and-society/

the Cohort Discovery Search Tool?

You can read more on either:

Our FAQs page at – https://co-connect.ac.uk/faq/
Our Overview page at – https://co-connect.ac.uk/overview/
the HDR UK Innovation Gateway website at – https://www.healthdatagateway.org/about/cohort-discovery

CO-CONNECT's technical documentation & other resources?

Technical Documentation

You can access CO-CONNECT’s technical documentation at: https://co-connect.ac.uk/docs/

Software

You can access and find out more about CO-CONNECT’s software by visiting our overview pages. Our explainer videos are also available here.

Overview page: https://co-connect.ac.uk/overview/
Finding data: https://co-connect.ac.uk/finding-data/
Accessing & analysing data: https://co-connect.ac.uk/analysing-data/

Explainer Videos

Our explainer videos can also be accessed separately on our YouTube channel at: https://www.youtube.com/@co-connect6367/videos

The post CO-CONNECT Shortlisted in HDR UK’s Annual Awards first appeared on Federated Analytics.

CO-CONNECT Draws to a Close

Philip Quinlan — Mon, 14 Nov 2022 12:30:00 +0000

CO-CONNECT Draws to a Close

The CO-CONNECT team celebrated the end of the project with a final get-together to reflect on what they have accomplished, and the legacy they are leaving behind

By Gabriella Linning

“It was an apt end to the project, with fascinating talks from a wide range of people with different areas of expertise – both from the internal CO-CONNECT team and a number of Data Partners involved in the project. Everyone had a great time and many new connections were made“

On Thursday 27 October, the CO-CONNECT project officially drew to a close. The team commemorated this event by holding a final meeting at the Wellcome Trust’s Gibbs Building in London.

Well attended by team members, Data Partners and project partners from across the UK, the meeting provided the perfect opportunity for attendees to catch-up and reflect upon their work over the last two years.

There were presentations given from across CO-CONNECT’s work packages, each focusing on what their team has delivered and the lessons they have learnt. Among the speakers were four of the project’s patient and public representatives (Alex Sloan, Antony Chuter, Jillian Beggs and Kauser Iqbal), who presented on their experiences working on the project.

“The main feedback we received was that attendees now have a new understanding of the breath of the project and the potential for building upon its legacy.”

What is CO-CONNECT's Legacy?

Changing the conversation

Overall, perhaps the most significant impact from the CO-CONNECT project has been changing the conversation around federated analytics.

In the early stages of the CO-CONNECT project, one of the biggest questions being asked was why should a data infrastructure that facilitates federated data analyses – such as the Cohort Discovery Search Tool (CDST) – be developed and supported? Particularly, by data custodians in charge of Trusted Research Environments.

Whilst the time sensitivity of early COVID-19 research certainly helped the team to answer this question, it was their innovation that has helped switched the conversation from “Why?” to “How?”. How can these resources be developed? How can they allow greater data connectivity and visibility without compromising on security and governance?

By the end of this project, CO-CONNECT has successfully onboarded 50 datasets onto the Health Data Research Innovation Gateway and and thirteen data sets discoverable through the Cohort Discovery Search Tool, including 12 COVID-19 research cohorts.

“After attending the end of project day it was really good to see the positive vibes across the whole group. The project took place during a chaotic period of time with lots of conflicting priorities where dealing with the here and now had to take front stage so considering all of that the work that has been achieved is, in my opinion, quite impressive.”

Setting new foundations for data infrastructure

CO-CONNECT has successfully created a new type of secure data architecture. One which has been approved by organisations hosting both unconsented and consented datasets, and can be enhanced and re-used for other purposes.

Cohort Discovery Search Tool

The CDST remains live and active within the Health Data Research Innovation Gateway, and will continue to reduce the time and effort for both researchers and data custodians (our Data Partners) to carry out feasibility queries across multiple datasets from all over the UK.

Perhaps most excitingly, however, new cohorts may continue to be on-boarded. This is thanks to the innovative processes, software and tools developed by the CO-CONNECT team, which are publicly available for re-use.

These resources can be used by other research groups to on-board new cohorts onto the CDST, covering all diseases and not just COVID-19.

Processes & Software

The team has developed new processes for on-boarding new cohorts onto the CDST, as well as open-source software cohorts into OMOP and open APIs to enhance and expand their functionality. This includes:

the White Rabbit software, which has been accepted by Data Partners as a software for profiling and producing meta-data on sensitive datasets;

the CaRROT Mapper and CaRROT CDM , two Open Source tools developed by the CO-CONNECT software team. These speed up the mapping of data sets to the OMOP standard, and are freely available for others to map their own data (see below). Other projects are already utilising them.

Furthermore, in collaboration with Data Partners, CO-CONNECT researched federated analysis. This is where the data remains within the Data Partners’ secure environment, but questions about the data are sent through the Health Data Research Innovation Gateway website and summary results returned. This method would provide more complex trend analysis rather than simple counts.

Our resources

Access the CaRROT Common Data Model (CDM)

Visit out GitHub site at: https://github.com/HDRUK/CaRROT-CDM

Access the CaRROT Mapper software

Visit our GitHub repository at: https://github.com/HDRUK/CaRROT-Mapper

Access our technical documentation

To access our comprehensive documentation for use in data governance applications and for technical teams implementing the solution, visit the webpage at: https://co-connect.ac.uk/?p=2523&elementor-preview=2523&ver=1668417018#https://co-connect.ac.uk/docs/

Watch our Explainer Videos

Our range of explainer videos are for data custodians and the general public to understand the solution, benefits and how we have protected patient confidentiality.

Watch our explainer videos at: https://www.youtube.com/watch?v=ghXyU9THr18&list=PLHXTHXI_vn3bqTDcr3hZ2DgjF83mWFHLp

Where can I find our more about...

the CaRROT softwares?

The CO-CONNECT software team developed two Open Source tools called CaRROT Mapper and CaRROT CDM to speed up the mapping of data sets to the OMOP standard. These are freely available for others to map their own data and other projects are utilising them.

To find out more, visit the GitHub repository at: https://hdruk.github.io/CaRROT-Docs/CaRROT-CDM/About/

the Cohort Discovery Search Tool?

Visit the webpage at: https://www.healthdatagateway.org/about/cohort-discovery

CO-CONNECT's work into federated analysis?

Lorem ipsum dolor sit amet, consectetur adipiscing elit. Ut elit tellus, luctus nec ullamcorper mattis, pulvinar dapibus leo.

the Health Data Research Innovation Gateway?

Visit the HDR Innovation Gateway website at: #https://www.healthdatagateway.org/

how patient confidentiality is maintained?

Visit our overview page at: https://co-connect.ac.uk/overview/

“Many of us are of an age when, in a medical setting, found ourselves talked about in clinical terms that are not patient-friendly as if we were not in the room. To me, the PPI role, kept the patient in the room, involved in the discussion and, more importantly, actively engaged in seeking solutions.

So often, we see medical issues being dealt with in splendid isolation, treating the condition sometimes to the detriment of the person or not even considering other issues in the patient’s life or, indeed, what they can contribute to their own recovery. In this project, a holistic multi-disciplinary approach brought together the finest minds and skills to find a solution to support the body – the NHS – with the patients’ full involvement.”

Thank you everyone for all of your hard work.

The post CO-CONNECT Draws to a Close first appeared on Federated Analytics.

ISARIC is 4C-ing!

Philip Quinlan — Fri, 04 Nov 2022 14:13:04 +0000

ISARIC is 4C-ing!

ISARIC 4C becomes CO-CONNECT's 12th COVID-19 cohort to be on-boarded onto the Cohort Discovery Search Tool

By Gabriella Linning

We are pleased to announce that ISARIC 4C has become the 12th cohort to be made live and discoverable on the Health Data Research Innovation Gateway’s Cohort Discovery Search Tool.

With an additional 270,000 people’s data now available, this marks a major milestone for the CO-CONNECT team as it is the largest cohort to be on-boarded thus far.

The ISARIC 4C COVID-19 cohort contains data on patients of all ages from across England, Scotland and Wales who were either admitted to hospital with COVID-19, or were admitted and later diagnosed with COVID-19.

The post ISARIC is 4C-ing! first appeared on Federated Analytics.

ISARIC 4C

Philip Quinlan — Fri, 04 Nov 2022 13:15:32 +0000

ISARIC 4C

Name: International Severe Acute Respiratory and emerging Infection Consortium – Comprehensive Clinical Characterisation Collaboration (ISARIC 4C)

Principal Investigator: Professor Peter Openshaw (Co-lead)

Leading Institutions: Imperial College London

HDR Innovation Gateway Metadata Rating: Platinum

The team at ISARIC 4C have been preparing for a respiratory pandemic years before the global spread of the SARS-CoV-2 virus and COVID-19.

Thanks to this foresight, ISARIC 4C’s investigators were able to recruit, as well as collect data and biological samples, since the beginning of the COVID-19 pandemic.

Now the data of over 270,000 patients across England, Scotland and Wales is now live and discoverable on the Cohort Discovery Search Tool. In total, there is information on 270,230 patients who had both data and samples collected, as well as 2,510 data-only patients.

These were patients of all ages who were admitted to hospital with COVID-19, as well as patients in hospital who were subsequently diagnosed with COVID-19.

ISARIC 4C is a key partner of UK-CIC.

The post ISARIC 4C first appeared on Federated Analytics.

The journey towards effective and efficient health data research – a personal perspective

Philip Quinlan — Tue, 01 Nov 2022 13:41:12 +0000

The journey towards effective and efficient health data research –
a personal perspective

By Karen Mooney, a PPI representative

PPI representative Karen Mooney gives her thoughts on why it is important for effective nation-wide health data infrastructures to be developed and maintained across the UK.

“Our personal data is recorded somewhere in governmental systems from cradle to grave. Is it, therefore, unreasonable to use it for our own betterment and for that of our community and the National Health Service? I think not.”

More and better information improves the quality of research, which ultimately benefits patients. It also benefits society in terms of more cost-effective solutions and the potential for developing preventative treatments that minimise hospital occupancy.

I see the effective use of patient data as a journey, and having been involved in the PPI team of the CO-CONNECT project, I have witnessed the development of an infrastructure to improve the flow of data research traffic.

In my naivety, I had assumed that all four nations of the UK would have comparable systems and processes. I have been shocked at the differences in approach, which have made for delays in progress. Add to that a variety of lab technology for which measurements need to be equalised to ensure that we don’t compare trains with planes.

All of this requires funding and maintenance. We all know where the potholes are, and sometimes we must use costly diversions and alternative forms of travel to get there. Key networks could and should become the central nervous system of future research. Maintaining and growing such an infrastructure will improve the scope, depth, quality and speed of research to bring benefits directly to patients and our communities via new and improved interventions.

Our personal data is recorded somewhere in governmental systems from cradle to grave. Is it, therefore, unreasonable to use it for our own betterment and for that of our community and the National Health Service? I think not. Not so long as individual identities are protected, and this rich and valuable resource of data is ring-fenced for our benefit and by that, I mean not sold to the highest bidder for profit-making enterprise.

In a world that favours more, better, faster and louder, often driven by necessity, or indeed threat as we have seen with COVID-19, can we afford to omit the touchstone of reality or policing that comes with listening to the public and patient voice?

Whilst the data researchers use is from health records, including test results, we must never forget that the information is about a person. It is their story, and people are more willing to share their stories and vulnerabilities when they feel safe.

Click on the button below to hear more from Karen in here blog post:

The post The journey towards effective and efficient health data research – a personal perspective first appeared on Federated Analytics.

What are CO-CONNECT’s thoughts on: the 2022 IPDLN conference?

Philip Quinlan — Tue, 11 Oct 2022 09:41:38 +0000

What are CO-CONNECT's thoughts on: the 2022 IPDLN conference?

Work Package 2 Lead Esmond Urwin tells us about his experience presenting at the 2022 IPDLN conference in Edinburgh, Scotland.

By Gabriella Linning

The 2022 International Population Data Linkage Network (IPDLN) conference was recently held in Edinburgh, Scotland. Between the 7-9th of September, well over 400 people from around the world came together to present and discuss in-depth complex questions around data linkage.

During this event, five members of the CO-CONNECT team gave four unique presentations about or related to the project. A summary of all these presentations, and links to their abstract summaries, can be found in our news piece about the conference.

Esmond Urwin, from the University of Nottingham, was one of these team members. Esmond gave a presentation outlining why there should be a standard process by which serological data – that is, data collected from analysing blood serum – is collected and recorded, and how this could potentially be achieved.

He agreed to sit with me and tell me more about his experience.

In your own words, can you explain broadly what your presentation was about?

Jargon Buster

Antibody

Antibodies are blood proteins created by your body in response to counteracting a specific antigen – a toxin or foreign substance that triggers an immune response.

Antibodies chemically combine with these substances that the body recognises as being potentially harmful – such as bacteria, viruses, and foreign substances in the blood.

You can find out more about antibodies by visiting this website: https://www.genome.gov/genetics-glossary/Antibody

Serology

Serology is the scientific study or diagnostic examination of blood serum, especially with regard to the response of the immune system to pathogens or introduced substances.

You can find out more by visiting the following website: https://www.healthline.com/health/serology

Qualitative

Qualitative data isdata that cannot be objectively measured or counted, or data that is descriptive – it tells you about the ‘meaning’ or underlying qualities of an item or process. Qualitative data in statistics is also known as categorical data – data that can be arranged categorically based on the attributes and properties of a thing or a phenomenon.

In this case, data on COVID-19 test results are qualitative because even though you can count how many tests are positive, negative and indeterminate, in reality it doesn’t tell you much. It is categorical. How strong is the immune response? Which variant of the SARS-CoV-2 virus are people testing positive for? These are answers a simple count cannot help researchers determine on their own.

To find out more, visit the following website: https://www.questionpro.com/blog/qualitative-data/

Quantitative

Quantitative data is any data that has numerical properties. One of the most important functions of quantitative data is to answer questions like “How many?”, “How often?”, “How much?”. This data can be verified and conveniently evaluated using mathematical techniques. The only way to answer these questions is to collect data that is quantifiable, meaning it can be measured.

To find out more, visit the following website: https://www.questionpro.com/blog/qualitative-data/

The CO-CONNECT Work Package 2 presentation explained why a COVID serology (blood) data standard is needed and what it should look like. Currently, across the UK, public testing laboratories submit COVID serology results that contain qualitative test results – meaning are they positive, negative or indeterminate – but there is no standard reporting of quantitative results, like number results and data about antibody levels.

And so, why is developing a standard system for reporting and recording quantitative information important?

Positive and negative results (qualitative results) give a good indication of who has and does not have COVID antibodies, thus giving an understanding of levels of infection for a section of a population at that specific point in time. To better understand the characteristics of an infection and actual levels of antibodies, quantitative data is needed. With this quantitative data, antibody levels within a population can be analysed, this can then be used to determine: how outbreaks of COVID might affect people; how widespread more different mutations of the COVID virus might be; and in turn better define what can be done to counter such outbreaks

What does this standard look like? How did you and your team members go about making it?

We began by first collecting an initial set of data attributes collected from CO-CONNECT’s contributors – colleagues, data partners and so on. In total, we gathered 36 attributes that came together to create our first draft of our data standard. Of course, this is quite a lot, so the next task was to strip it back.

At first, we had a difficult time determining which of these original 36 were the ‘important’ attributes as they all seemed important to us. At least at first. Spending time understanding each of the attributes and using the knowledge gathered from existing healthcare data standards helped focus on what was actually important. This, and really thinking critically, focusing on which of the attributes would represent the ‘bare minimum’ – what is actually needed? – helped us to define the final 12 data attributes that make up our new standard for reporting COVID serological data.

We validated our work by collecting feedback from many of CO-CONNECT’s project partners, NHS laboratory technicians and the Work Package 2 members. together, they supported our work, proving that within the context of 3 NHS laboratory pilot studies, that the 12 attributes chosen were the correct ones.

Following on from this, further study of COVID serology studies nationally and worldwide proved that the chosen data standard attributes existed and were accounted for elsewhere in the world, further underlining the fact that these are representative and worthy of a COVID serology minimum data standard.

Summary of Esmond's work

Esmond and his team set about creating a standard practice guideline on how COVID serology data should be recorded across the UK. This is to help improve the collection of comparable, quantitative data which would help researchers to better understand the characteristics of an infection.

For example, how do the level of antibodies change through time after someone is infected with the SARS-CoV-2 virus? how long does it take for antibody levels to build up? how long do they last?

To do this, Esmond and his team brainstormed an initial list of 36 characteristics that all COVID serology recording processes should have. With further feedback, investigation and real-world testing, they were able to bring this down to a 12 point standard. A guideline outlining the bare minimum characteristics that should be incorporated into serological data recording across the UK.

What did your audience think about this idea? What types of questions did they ask? Were there any surprises?

Overall, I would say there were two veins of questioning. On the one hand, some audience members asked about the representation of assay and test kits, which are currently accommodated for within the standard. Then others focused on what it would take to make this a reality and bring it to life within the UK, to which the answer focused upon the funding of a worthwhile but time-consuming task. The variations between laboratory setups across the UK can differ quite a bit, hence the time and effort to assess each of those setups and enact the standardisation should not be underestimated.

Jargon Buster

Assay

An analysis that can qualitatively assess or quantitatively measure the presence or amount or the functional activity of a target entity, such as specific antibodies in the blood or a biochemical in a cell.

the standard

The set of guidelines Esmond and his colleagues worked to put together to be used as a measure, norm or model to help make COVID serological recording procedures comparable and consistent across the UK.

Interesting, could you expand on these issues a little bit more? Why were they of interest to the audience?

So, different laboratories record and structure COVID test data in different ways. Many of these laboratories may use the same methods for messaging and structuring for recording those results, but there are variations between them. Additionally, laboratories use different assays, test kits and laboratory machines, together with different laboratory software to record the data. Therefore, there are a multitude of variations throughout the testing process which can:

make the data recorded seem different between laboratories, because it has been for example, been recorded using different words (from lab to lab), even though those words may mean the same thing;
not enough explanation / definition is given about which laboratory machines and test kits are being used, together with the actual machine rules that define what is a positive or negative COVID serology test result.

The data standard presents a defined way in which assays / test kits (and associated number codes) can be described and recorded in a lab test result message. Thus, laboratory technicians and managers will know exactly what to write in a message and where it should be written. This seeks to eliminate variations between different laboratories, presenting a standard set of terms to use and the precise position within a laboratory message with which to record that information – making it easier to understand who is doing what, where, how and the final COVID serology testing results.

What did other members of the CO-CONNECT team have to say about the conference?

“This was my first big post-COVID conference coming two years into the CO-CONNECT project. It was a great opportunity to meet people in person that I’d spent two years talking to on Teams. My talk on the different governance approaches to getting approval for federated data projects for both consented and unconsented data was well received with many follow up questions being asked.”

The post What are CO-CONNECT’s thoughts on: the 2022 IPDLN conference? first appeared on Federated Analytics.

CO-CONNECT takes on the IPDLN Conference

Philip Quinlan — Tue, 13 Sep 2022 10:25:35 +0000

CO-CONNECT takes on the IPDLN Conference

By Gabriella Linning

Last week, several members of the CO-CONNECT team attended and presented at the 2022 International Population Data Linkage Network (IPDLN) Conference in Edinburgh, Scotland.

Four of our team members – Antony Chuter, Emily Jefferson, Esmond Urwin, Gordon Milligan and Jillian Beggs – spoke at the event, giving engaging and thought provoking presentations about their work on CO-CONNECT. Continue reading to find out more.

Presentation slide from the opening session at the 2022 IPDLN Conference in Edinburgh, Scotland

CO-CONNECT Technical Programme Manager Gordon Milligan, University of Dundee

CO-CONNECT team member Esmond Urwin, University of Nottingham, beginning his presentation on why there should be a standard format datasets for recording COVI D-19 related serology (type of blood testing) data. Find out more below.

Team members Chris Hall (left), Esmond Urwin (middle) and Gordon Milligan (right) enjoying themselves at the IPDLN conference.

Group photo of the CO-CONNECT team and colleagues from the University of Dundee's Health Informatics Centre (HIC). From left: Gordon Milligan, Smarti Reel, Emily Jefferson, Shahzad Mumtaz, Chuang Gao, Jenny Johnston, Susan Krueger, Christian Cole and Chris Hall. (Middle) Antony Chuter

Group photo of CO-CONNECT team members with colleagues from the University of Dundee's Health Informatics Centre. From left: Gordon Milligan, Smarti Reel, Emily Jefferson, Shahzad Mumtaz, Chuang Gao, Jenny Johnston, Susan Krueger, Christian Cole and Chris Hall. (Middle) Antony Chuter

CO-CONNECT Patient and Public Involvement (PPI) Leads Jillian Beggs (left) and Antony Chuter (right) after giving their presentation on how CO-CONNECT team meaningfully incorporated patient and public voices into their work. Read more bellow)

Group photo of CO-CONNECT team members with colleagues from the University of Dundee's Health Informatics Centre (2). From left: Gordon Milligan, Smarti Reel, Emily Jefferson, Shahzad Mumtaz, Chuang Gao, Jenny Johnston, Susan Krueger, Christian Cole and Chris Hall. (Middle) Antony Chuter

CO-CONNECT Principal Investigator Emily Jefferson as she begins her overview presentation on the CO-CONNECT project and the Cohort Discovery Search Tool. Read below to find out more.

Team members Gordon Milligan (left), Jenny Johnston (middle) and Esmond Urwin (right) enjoying themselves at the IPDLN conference.

Who spoke?

About what?

The COVID - Curated and Open aNalysis aND rEsearCh plaTform (CO-CONNECT)

In this presentation, Emily Jefferson, one of CO-CONNECT’s Principal Investigators, introduced both the CO-CONNECT project and the Health Data Research Innovation Gateway’s Cohort Discovery Search Tool to her colleagues at the IPDLN conference.

Here, Emily gave attendees an overview of the work done by the project as well as the capabilities of the Cohort Discovery Search Tool, providing insight as to how this data infrastructure will promote future COVID-19 research by making UK COVID-19 data FAIR – Findable, Accessible, Interoperable and Reusable.

Augmenting laboratory COVID serology granularity for SARS-CoV-2 reporting

Here, CO-CONNECT team member Esmond Urwin made the argument for why more detailed quantitative (number-based) data should be recorded when serological samples are tested in laboratories.

Esmond highlighted that qualitative (description-based) data alone does not provide detailed enough data for researchers to understand how prevalent the SARS-CoV-2 virus is among the UK population.

Therefore, Esmond suggested that there be a minimum dataset – a universal example that outlines what should be recorded and how – is required to support standardised data reporting.

Developing a new Governance Approval Process to support federated discovery and meta-analysis of data across the UK through the CO-CONNECT project

Gordon gave a presentation discussing the different governance approaches to get approval for federated data projects (such as CO-CONNECT) for both consented and unconsented data. According to Gordon, his presentation was well received with many follow-up questions being asked.

Jillian Beggs

Antony Chuter, PPI Co-Investigator

Collaborating with patient and public members in developing the COVID - Curated and open analysis and research platform (CO-CONNECT)

With our Patient and Public Involvement (PPI) Coordinator Tracy Jackson on leave, CO-CONNECT’s joint PPI Leads Jillian Beggs and Antony Chuter filled in to tell the attendees of the IPDLN conference about their experience working with the project.

Here, the two leads spoke about how the team were able to create an innovative approach to ensuring public voices are heard and acted upon within data linkage networks. To our knowledge, CO-CONNECT was the only project to have PPI voices heard during the course of the three day conference.

The post CO-CONNECT takes on the IPDLN Conference first appeared on Federated Analytics.

Follow-COVID: a journey from paper to the Innovation Gateway

Philip Quinlan — Mon, 29 Aug 2022 14:00:26 +0000

Follow-COVID:

A journey from paper to the Innovation Gateway

By Gabriella Linning

Scott Horban, CO-CONNECT Data Team Leader, tells me what it took to get the Follow-COVID cohort live on the HDR Innovation Gateway.

Gathering data

Data entry by the box load

The CaRROT, the rabbit and the Innovation Gateway

The three benefits of collecting data on paper

Final thoughts

Follow-COVID is one of the projects my CO-CONNECT colleagues have partnered with to make their cohort data available on the Health Data Research (HDR) Innovation Gateway’s Cohort Discovery Search Tool.

Out of all the cohorts the team has successfully worked on thus far, Follow-COVID has, perhaps, the most interesting route to being discoverable.

Led by Dr David Connell at the University of Dundee, Follow-COVID looks to identify the long-term impacts and future healthcare needs of patients who have become severely ill with COVID-19.

Gathering the data

At the beginning of Follow-COVID, patient volunteers attended clinics in the Tayside, Lanarkshire & Highland regions of Scotland, where they answered extensive paper-based questionnaires.

Instead of being a basic form-filling exercise, however, clinical staff actively engaged with their volunteers in an interview-style set up. A staff member would read the questions out to a volunteer, the volunteer would provide their answer, and the staff member would fill in their answer on the questionnaire form.

Afterwards, it was the job of CO-CONNECT team members Scott Horban and Shameema Farvin Stalin to design a digital structure that would allow for the information given in the questionnaires to be “input into an efficient electronic format”.

Essentially, this means they had to design a digital template that the questionnaire information could be typed into. This template would re-organise and store the information in a way that is more efficient for a computer to process, and therefore, for researchers to use.

But why was this process necessary? Why could the patients’ answers not simply be copied from the questionnaire and typed into a computer? If the information needed to be stored electronically, why was the questionnaire not designed to be compatible with the structure of a computer database?

Scott tells me it was simply due to how quickly the project was developed in response to the global pandemic.

“…the pressures and timescales of COVID-19 expedited the entire process…..meaning [we] were not involved until after the forms had been designed and data had been collected.”

In other words, the speed at which Follow-COVID was established, and its data collection processes were designed, meant there was no opportunity for Scott and his colleagues to provide their expertise in designing the questionnaire.

Now, I must stress that this method of data collection in no way negatively reflects the quality of the information collected, or the skill of the researchers involved in it. Researchers in different fields have differing expertise and skills.

Scott’s medical research colleagues were able to create an effective form to gather comprehensive information about their volunteers, it simply lacked the structure to be conveniently and constructively stored electronically. Therefore, Scott and his team “had to design a structure which reflected the content of the Case Report Forms but also worked in an electronic database setting”.

Data entry by the box load

Once their work was done, Scott’s team turned over their work to their colleagues on the Data Entry Team at the University of Dundee’s Health Informatics Centre (HIC).

Scott says that what happened next was quite a sight, with Follow-COVID’s forms “literally [coming] to HIC in cardboard boxes”.

Apparently, the Data Entry Team then spent two months manually entering and checking around 40 double-sided pages of data for each participating volunteer.

“The 83 patients from Tayside and Lanarkshire alone required the HIC team to input 47,842 individual data values.”

Whilst this situation may seem odd in the world of health data research, Scott assures me that this type of task is actually not that uncommon “[data entry] has been one of HIC’s core areas of service for many years, so while we expected the manual entry process to take some time, our data entry experts were well prepared for and experienced in this type of work.”

Either way, it is certainly more laborious and low-tech than I imagined.

The CaRROT, the rabbit and the Gateway

After the Dundee data entry team completed this time consuming and meticulous task, however, is when all the seemingly complex science-y and tech-y stuff began to happen.

Scott did an excellent job explaining the process of how a cohort is integrated into the Gateway’s Cohort Discovery Search Tool. Now I will do my absolute best to pass on the message. To help me do this, I have also created a “Jargon Buster” down below to help explain some of the more technical terms.

The first, and perhaps most important step, was for Follow-COVID’s data to be de-identified. This makes sure that the individuals included in the cohort cannot be identified by researchers who handle or use the data.

After the data has been de-identified, it is then time for it to be “pre-processed”. Pre-processing happens for two reasons: (1) to help maintain data security and governance (I’ll expand upon the reason why later) and (2) to help make sure that the data is understandable to something called the CaRROT-Mapper Tool (which I will also refer back to shortly).

Pre-processing involved using a tool called OHDSI WhiteRabbit to generate metadata representing the de-identified data.

This stage is done by the Data Partner in charge of the cohort with support provided by the CO-CONNECT team. In the case of Follow-COVID, HIC were acting as the Data Partner, so the generation of metadata was supported by Erum Masood, who also works at the University of Dundee (Follow-COVID’s Data Custodian).

The Jargon Buster

Cohort

A cohort is any group of people with a shared characteristic.

In this case, the Follow-COVID cohort is all of the people who volunteered to participate in the Follow-COVID study.

The Follow-COVID data or dataset is all of the information provided by the people involved in the study (the Follow-COVID cohort) which researchers use for their research.

Innovation Gateway

The Health Data Research Innovation Gateway is a portal enabling researchers and innovators in academia, industry and the NHS to search for and request access to UK health research data.

The Gateway was developed with input from patients and researchers, and provides a library of information including data held and managed in the NHS, research charities, research institutes and universities. Researchers can search, browse and enquire about access.

The Innovation Gateway does not hold or store any patient or health data.

Click here to find out more.

Click here to visit the Innovation Gateway.

CaRROT-Mapper

CaRROT-Mapper is a piece of software that was developed by the wider CO-CONNECT development team, based at the Universities of Nottingham, Edinburgh and Dundee.

Metadata

Metadata is data that describes or gives information about other data.

In this case the metadata is describing or giving researchers information about what is happening in the Follow-COVID cohort.

This may include the headlines, table column names, types of data and totals.

How is the cohort’s data organised? How many headings or categories are used?

What are these categories called?

What types of data do these categories organise? Yes and no questions? description boxes? multiple choice answers?

How many women are in the cohort?

How many people in the cohort have asthma?

To learn more about, and see further examples of metadata, please visit: What is Metadata (with examples) – Data terminology (dataedo.com)

Data Partner

CO-CONNECT are collaborating with a range of organisations who host and manage data. They are the legal data controllers of the data. We term them CO-CONNECT “Data Partners”.

A Data Source (or cohort as written in this article) is a specific data set hosted and managed by the Data Partner.

Mapping & mappings

Mapping can be defined as transforming the data to fit a widely agreed upon standard structure so that it can be more easily compared with other datasets.

Essentially, with mapping you are aiming find a common format that can be used to link the data of multiple datasets together.

It can be viewed a bit like phone chargers – a few years ago every manufacturer had their own different connector, but now everyone has uses the likes of USB-C, meaning standard cables can be used across many more phones without an adapter.

Common Data Models (CDMs)

Each cohort, database or dataset will have their own method or structure which they use to record and organise their data. This can make it difficult for researchers to look for or use information across multiple cohorts, as the same type of information might be recorded in different ways (such as different coding languages) or stored in different locations.

Using a Common Data Model (CDM) is one of the ways data researchers use to overcome this problem. CDMs are software tools that can help pool together data from various data-sources (such as the cohorts of CO-CONNECT’s data partners).

In a sense, CDMs are third-party data translators, reading the different coding languages used in each cohort and re-writing their information in one standardised, easy to read language that is easier for researchers to search through.

CDMs - Paint Catalogue Analogy

Another method of visualising how CDMs work involves a bit of imagination.

Picture that you are on a website that sells cans of paint from various brands.

Now, for the sake of argument, also imagine that each brand organises their paint differently on their own websites. For example, some may first organise their paint by block colour, then shade. Others may organise by finish (e.g. matte, glossy) then colour, then shade.

When deciding what shade a can of paint is, one brand may simply label their paints as being ‘light’ or ‘dark’, another may label their’s as being ‘very light’, ‘light’, ‘neutral’, ‘dark’ or ‘very dark’ and so on.

Furthermore, each brand will have different ways (or languages) for how they name their paints. One brand may be very direct, saying they sell light blue, very light blue and very dark blue. Another says they sell duck egg blue, sky blue and sea blue. Some may just say they sell paint colours called duck egg, sky and sea. Other brands may sell “azul” paints.

This is all to say that searching for information on the different blue paints across these imaginary websites would be incredibly inconvenient.

Finding out how many different cans of blue paint are available in shops near me? Or how many light blues are there with a matte finish? These types of questions would take you a long time.

Therefore, instead of manually sifting through all of these different websites, you go to one retailer that sells paint cans from multiple brands (our CDM). It re-organises all of the paint cans from across these different websites into one, single, standardise categorical system. One that is easy to navigate and search through with the help of filters.

The metadata was then standardised, meaning it was converted into a format that was easier for the CaRROT-Mapper tool to read.

The metadata is mapped by the CaRROT-Mapper in order to create a list of rules or instructions that can be used later on to help map the real, de-identified data.

Mapping the metadata allows CO-CONNECT to help Data Partners prepare their cohorts for on-boarding onto the Cohort Discovery Search Tool, without compromising data security. This is because the CO-CONNECT team never handle the real data themselves.

Instead, the CO-CONNECT team then use the metadata to create synthetic or ‘pretend’ data. This synthetic data is then used alongside the list of instructions created earlier to test how good the metadata mappings are. This is done by running them through another software tool called the CaRROT-CDM (Common Data Model), which was developed by the Universities of Nottingham and Edinburgh.

Provided the test goes well, the CO-CONNECT team then tell the Data Partner how to use the information gained from the test to run the CaRROT-CDM on the real, de-identified data.

The Data Partner can then test the CDM and ensure data security within their own controlled and secure testing environment.

The CaRROT-CDM works by re-organising the Follow-COVID data into a standardised format, which allows it to be directly compared with other cohorts on the Cohort Discovery Search Tool.

This is what allows Cohort Discovery to search across all featured data cohorts to answer researchers’ initial explorative questions when searching for relevant datasets.

Where can I find out more about...

how CO-CONNECT processes data?

If you are interested in finding out more about how CO-CONNECT prepares data to be onboarded onto the Cohort Discovery Search Tool, watch our demonstration videos on YouTube at: CO-CONNECT Data Pipeline demo videos – YouTube

the Cohort Discovery Search Tool?

HDRUK Innovation Gateway | Cohort Discovery (healthdatagateway.org)

the HDR Innovation Gateway?

Click here to find out more.

Click here to visit the Innovation Gateway.

The three benefits of collecting data on paper

So, there you have it. This is all the work that was needed to get this one, singular cohort live and accessible through the HDR Innovation Gateway.

But perhaps, you are like me as I was during my conversation with Scott, left wondering: Why was the patient data collected on paper in the first place? Why did the researchers not just use the same form, but on a computer to save some time?

Scott’s answer?

“accessibility, flexibility and timeliness”.

Paper-based data collection also removes any IT literacy constraints from study design, as it means that any clinical member of staff could record the information without the need for any special access or training to use a computer system.

Scott says that the rigid structure of an electronic recording system can make it challenging to effectively categorise information that a patient provides, especially in an interview setting. In contrast, Scott says “with a pen and paper a clinician can write down any notes that they require”.

Finally, we have timeliness, which we coincidently touched on earlier. In this situation collecting data on paper was far more convenient. As Scott pointed out “the pandemic struck so suddenly and gathering data during its early stages was of utmost importance”. Therefore, it was more important to gather data rather than designing a method of holding it.

Final thoughts

So, contrary to popular belief, scientific research is not always as high-tech and automated as we might think. The story of Follow-COVID truly highlights the value that seemingly old-fashioned methods still have in the new age of health and data science. What’s most important is the quality of the data gathered, as well as the anonymity, comfort and security of the patients invovled.

This article was reviewed in consultation with members of the CO-CONNECT Patient Understanding Group (PUG).

The post Follow-COVID: a journey from paper to the Innovation Gateway first appeared on Federated Analytics.

Federated Analytics

Lucky 13! First national TRE cohort is on-boarded

Lucky 13! First national TRE cohort is on-boarded

Scottish public health data now discoverable via the Cohort Discovery Search Tool

CO-CONNECT Shortlisted for MRC award

CO-CONNECT Shortlisted for MRC award

The project is a finalist for the Medical Research Council's 2022 Open Impact Prize

By Gabriella Linning

Jargon Buster

Where can I find out more about...

Finalists in each MRC Impact Prize 2022 category

Open Science Impact

Outstanding Team Impact

Early Career Impact

CO-CONNECT Shortlisted in HDR UK’s Annual Awards

CO-CONNECT Shortlisted in HDR UK's Annual Awards

More from us:

Where can I find out more about...

CO-CONNECT Draws to a Close

CO-CONNECT Draws to a Close

The CO-CONNECT team celebrated the end of the project with a final get-together to reflect on what they have accomplished, and the legacy they are leaving behind

By Gabriella Linning

What is CO-CONNECT's Legacy?

Changing the conversation

By the end of this project, CO-CONNECT has successfully onboarded 50 datasets onto the Health Data Research Innovation Gateway and and thirteen data sets discoverable through the Cohort Discovery Search Tool, including 12 COVID-19 research cohorts.

Setting new foundations for data infrastructure

Cohort Discovery Search Tool

Processes & Software

Our resources

Where can I find our more about...

Thank you everyone for all of your hard work.

ISARIC is 4C-ing!

ISARIC is 4C-ing!

ISARIC 4C becomes CO-CONNECT's 12th COVID-19 cohort to be on-boarded onto the Cohort Discovery Search Tool

By Gabriella Linning

ISARIC 4C

ISARIC 4C

The journey towards effective and efficient health data research – a personal perspective

The journey towards effective and efficient health data research – a personal perspective

By Karen Mooney, a PPI representative

PPI representative Karen Mooney gives her thoughts on why it is important for effective nation-wide health data infrastructures to be developed and maintained across the UK.

What are CO-CONNECT’s thoughts on: the 2022 IPDLN conference?

What are CO-CONNECT's thoughts on: the 2022 IPDLN conference?

Work Package 2 Lead Esmond Urwin tells us about his experience presenting at the 2022 IPDLN conference in Edinburgh, Scotland.

By Gabriella Linning

In your own words, can you explain broadly what your presentation was about?

Jargon Buster

And so, why is developing a standard system for reporting and recording quantitative information important?

What does this standard look like? How did you and your team members go about making it?

Summary of Esmond's work

What did your audience think about this idea? What types of questions did they ask? Were there any surprises?

Jargon Buster

Interesting, could you expand on these issues a little bit more? Why were they of interest to the audience?

What did other members of the CO-CONNECT team have to say about the conference?

CO-CONNECT takes on the IPDLN Conference

CO-CONNECT takes on the IPDLN Conference

By Gabriella Linning

Who spoke?

About what?

The COVID - Curated and Open aNalysis aND rEsearCh plaTform (CO-CONNECT)

Augmenting laboratory COVID serology granularity for SARS-CoV-2 reporting

Developing a new Governance Approval Process to support federated discovery and meta-analysis of data across the UK through the CO-CONNECT project

Collaborating with patient and public members in developing the COVID - Curated and open analysis and research platform (CO-CONNECT)

Follow-COVID: a journey from paper to the Innovation Gateway

Follow-COVID:

A journey from paper to the Innovation Gateway

By Gabriella Linning

Scott Horban, CO-CONNECT Data Team Leader, tells me what it took to get the Follow-COVID cohort live on the HDR Innovation Gateway.

Table of contents

Gathering the data

Data entry by the box load

The CaRROT, the rabbit and the Gateway

The Jargon Buster

Where can I find out more about...

The three benefits of collecting data on paper

“accessibility, flexibility and timeliness”.

Final thoughts

The journey towards effective and efficient health data research –
a personal perspective