Why I have an adverse reaction to the analysis of adverse events in clinical trials
Early in my career I worked as a statistician analysing data from primary care to monitor the harm of medicines newly released into the UK population. The design was simple, we undertook Prescription Event Monitoring (PEM) and collected data on ‘green cards’ for the first 10,000 patients prescribed the new medicine in England. In the absence of access to electronic healthcare records the process was manual. Thousands of GPs across the country were generous with their time, dutifully filling out and returning these green cards back to our unit where we had a room full of people entering their data. It was an impressive team effort and my job at the end of this journey was the easy part, analysing it!
Our aim was to identify potential adverse drug reactions, and if found, any possible causality with the medicine would then be followed-up. During my time I learnt a lot about signal detection. This predominantly relies on detecting a disproportional rate of events for the medicine being studied compared to those reported across all previous studies (regardless of disease area) in the same database. As the studies were observational and often retrospective cohorts, the data had some serious limitations, but the importance of conducting routine surveillance like this has its roots in the thalidomide tragedy in the 60’s. The use of thalidomide in pregnant women resulted in thousands of children with a range of severe deformities. As detecting drug harm is important, academics and industry have diligently and significantly advanced signal detection methods over the last 25 years. Many companies have invested heavily in statistical research programmes and there is now an array of advanced methods available and software to implement these. Some may question the impact the impressive advances in methods can provide, given the limitation of the data, but the efforts to ensure the best analysis are encouraging.
After a few years I moved into clinical trials where I was struck by the contrast with harm surveillance studies. Here I saw high quality prospectively collected data and a contemporaneous and unbiased control arm (Trialist, you don’t know how lucky you are!). However, at the end of the trial all analysis that occurred for the adverse events (and all that regulators and journal editors seemed to demand) was for these to be presented in tabulations, with any continuous data being dichotomised as necessary. Between-arm imbalances were then either subjectively assessed or formally compared with inappropriate hypothesis testing. There was no clear guidance on what to include or how and as a result the harm information is sometimes overwhelming.
Based on these experiences I believe current trial analysis practice of adverse events in trials does a disservice to the data and the patients we take the time of to collect it from.
What’s the consequence and why does this matter?
Randomised controlled trials provide an early opportunity for us to start constructing a harm profile for a treatment. A harm profile will change over time and should include all events we think occur as a consequence of receiving the treatment that are not favourable for the person taking it. We want to be able to understand the whole profile so we are able to weigh up the benefit of the treatment against its ‘risks’. And with treatments which are thought to be similar in efficacy we want to be able to reliably compare their harm profile to choose the optimal one. The disappointing truth is, with our current approach to the analysis of adverse events, it not very easy to obtain a reliable and comprehendible harm picture. It’s easy to miss important events as well as finding ourselves focusing on less important ones.
When we talk about harm it’s not only the serious events that matter. There are many frequently occurring events that can matter more to patients because of the impact they have on their daily lives. Quality of life events like headache, nausea, diarrhoea and drowsiness, all of which sound rather mundane to us researchers but can have important impacts and follow-on consequences including missing school or work. It’s important we understand the true severity, duration, and frequency of these types of events if we want to be able to prescribe informatively for patients with differing demands in their life.
Can we really only monitor harm well in large pharmacovigilance cohorts?
Well, Yes and No. We certainly won’t be able to detect all rare events in clinical trials as they are often not large enough. Also, as the general population differs from trial populations they can have different reactions. However, even though the sample size is relatively small in trials compared to post-marketing surveillance this shouldn’t take away our ambition for getting the most out of the data we have carefully gathered, and we should do so in a cumulative fashion. If we take inspiration from harm surveillance studies, we can use trial data as an opportunity to detect signals of harm that can inform later phase studies as well as post marketing surveillance.
As a clinical trialist, I accept my share of responsibility for my actions having contributed to sub-optimal analysis practice. I have regularly produced and presented the obligatory adverse event table for a final report (though hasten to say I have not undertaken hypothesis testing on them!). Working in this field for over 10 years, I now have a greater understanding and deeper insight as to how the clinical trials community has ended up here. The roots of poor analysis practice no doubt arises from the complexity and amount of adverse event information. Adverse event data are multifaceted which fundamentally stems from trialists being unable to specify all harm events of interest when planning a trial, in contrast to primary and secondary outcomes. There are many other considerations such as the high number of events, the limited sample size and the lack of research and guidance in the area.
I have given this conundrum some thought over the years, and recently outlined the issues with some suggestions for how we can make a real change to practice in a paper I wrote with Dr Rachel Phillips (QMUL and Imperial College London):
Improving the analysis of adverse event data in randomized controlled trial (Journal of Clinical Epidemiology, 2022)
In the paper we describe the current status quo for AE analysis in RCTs with an aim to untangle concepts and provide clarity around current concerns for four key issues:
1. RCT data for treatment harm are undervalued
2. Adverse Event Analysis is difficult due to its multifaceted structure
3. Current analysis practice is unsatisfactory
4. The approach to selection and presentation of harm in publication is heterogeneous and biased
But more importantly we outline solutions to each of these issues, some of which can be immediately adopted to improve practice now, and the rest we need more consensus or research to cement a longer-term change. Our focus is for the analysis and reporting in trial publications, not for data monitoring committees or regulators.
The fundamental aspect we address first is what are we aiming to do when we analyse adverse events? i.e. What is the research question? We should not be hypothesis testing (see paper for reasons why), but instead can take inspiration from our pharmacovigilance colleagues and reframe the research question to ‘Detecting Signals for Adverse Reactions’. As this will provide a different context for interpretation, our follow-up actions and help improve mechanistic insight.
We then suggest upping our ambitions for analysis- why not apply established good statistical practice that we use for efficacy outcomes? Including items such as not dichotomising outcomes and undertaking adjusted analysis. We lay out multiple suggestions of where we can make immediate gains. An important part is adopting Bayesian methods for harm analysis as we can build on existing evidence, and with harm outcomes this is vital, and what we essentially currently do in an unstructured and ad-hoc manner. It therefore feels like a Bayesian approach is the natural framework for a harm context. There are also many new and existing methods for harm outcomes that are not utilised that we should also be exploring (Phillips et al 2020).
The one thing that we feel we lack, and is endorsed by a survey of clinical trial statisticians in industry and academia, is guidance for analysis that is helpful. I almost want to say prescriptive, but I mean that it is helpful in a ‘show me an example’ kind of way. There is some limited guidance that has been available for years, but it is so general, and we have found in our work that not only do people not reference, but journal editors don’t endorse it. The data are complex, bountiful and at the same time limited (for sample size) that we almost give up. The bar has been set for frequency tables and that’s what we stick to. But we need to push the boundaries, explore, and get creative especially with visualisations that are a highly effective way to communicate harm but are rarely used in publications. We need consensus to develop helpful guidance and undertake further research to determine what the best methods really are. We also need consensus to agree on how to select what harm outcomes to report form all adverse event data that gets recorded. We also need journal editors and funders to up their standards for the analysis and reporting of harm, so some engagement with these stakeholders is needed. (I am avoiding mentioning regulators for now as that is another conversation as it has different drivers).
Adverse event analysis in clinical trials is not a hot topic (does that make it a cold one?!) and I have yet to see it have its own session at an academic conference (industry, yes). I hope this will change in the future as while there are some pockets of researchers making good progress in the field, we could make more progress if we can bring our work together. Harm outcomes are fundamental to most drug trials and without suitable analysis we don’t have the full picture to examine risk-benefit or compare treatments based on harm.
Dr Victoria Cornelius
Imperial College London