Perinatal death in a term fetal growth restriction randomized controlled trial: the paradox of prior risk and consent

Linda van Wyk, MD, PhD; Kim E. Boers, MD, PhD; Sanne J. Gordijn, MD, PhD; Wessel Ganzevoort, MD, PhD; Henk A. Bremer, MD, PhD; Anneke Kwee, MD, PhD; Friso M. C. Delemarre, MD, PhD; Maria G. van Pampus, MD, PhD; Kitty W. M. Bloemenkamp, MD, PhD; Frans J. M. E. Roumen, MD, PhD; Jan M. M. van Lith, MD, PhD; BenW. J. Mol, MD, PhD; Jim G. Thornton, MD, PhD; Sicco A. Scherjon, MD, PhD; Saskia le Cessie, PhD; On behalf of the DIGITAT Study Group


Introduction
The factors that influence patient participation in trials are widely discussed. 1e4 Patients' choices to participate are, for example, influenced by individual preferences and socioeconomic background. 1,5 Participation in a clinical trial can lead to the so-called "Hawthorne effect." 6 The original Hawthorne effect was first described between 1924 and 1933 where productivity of factory workers was increased while they-as part of a study to improve quality of the production process-were more strictly supervised. 7,8 It is unknown what the exact Hawthorne effect is, in general, on the outcome of clinical studies. Some studies report better outcome (in both arms of a study) than expected, which is then explained by the Hawthorne effect. 9e11 Other studies report no such thing as the Hawthorne effect. 12e14 A factor that could influence a possible Hawthorne effect is that characteristics of people who consent to participate in clinical trials often differ from patients who decline participation. Recruitment to clinical trials is influenced by socioeconomic status, and less educated women are often less willing to participate. 15e19 The Hawthorne effect can result in better outcomes, not only in the intervention group, but also in the control group of the trial. This positive effect suggests that participating in a clinical trial may have an effect on the behavior both of doctors and patients. Doctors may feel that they are being watched and therefore could act differently and therefore adhere more strictly to protocols, perhaps leading to earlier or other interventions that could improve outcome. Patients may be more aware of risk factors because of the fact that they may be better informed regarding their condition owing to the study information provided before the trial. They may also feel that they are being watched. Subsequently, they may improve healthy behavior and the patient delay in complaints or problems may be less, leading to better outcomes.
In many pregnancy complications, it was unknown whether induction of labor with the disadvantage of less maturation of a fetus or expectant management with the disadvantage of prolonged pathologic condition is the better option. We considered a randomized trial that compared these 2 management options in term fetal growth restriction (FGR)-the disproportionate intrauterine growth intervention trial at term (DIGITAT Original Research the data of the nonparticipants were also collected in exactly the same manner as of the participants. The DIGITAT was an intention to treat analysis and compared labor induction with expectant monitoring in pregnancies complicated by FGR at term and showed equivalence for neonatal outcomes. 20,21 This study aimed to evaluate trial participation bias and to analyze generalizability of the results by (1) assessing whether and how baseline characteristics of nonparticipants differed from participants and by (2) comparing study outcomes of the 2 groups. We also aimed to (3) explore a possible Hawthorne effect among trial participants.

Study design and patients
The design of the DIGITAT has been described elsewhere. The trial compared induction of labor with an expectant management in term pregnancies complicated by antenatal suspected FGR. Participants allocated to the induction of labor group were induced within 48 hours of randomization. Participants allocated to the expectant monitoring group were monitored until the onset of spontaneous labor with daily fetal movement counts and twice weekly heart rate tracings, ultrasound examination, maternal blood pressure measurement, assessment of proteinuria, laboratory tests of liver and kidney functions, and full blood count. 22 Eligible patients who declined randomization, but agreed to follow-up of their medical data, were treated according to local protocols and patient preferences and at the discretion of the attending obstetrician. The nonparticipants of this study all officially declined participation after they received written and oral information about the study. As prespecified in the study protocol, the data of the nonparticipants who consented to data use were collected in the same way as of the participants, parallel to the trial. In the nonparticipants, women were considered to be monitored expectantly if labor was not induced and no primary cesarean delivery was performed within 48 hours after inclusion in the study. The primary outcome was a composite measure of adverse neonatal outcome. This was defined as death before hospital discharge, 5-minute Apgar score of <7, umbilical artery pH of <7.05, or admission to the neonatal intensive care unit. Secondary outcomes were delivery by cesarean delivery and instrumental vaginal delivery, length of stay in the neonatal intensive care or neonatal ward, length of stay in the maternal hospital, and maternal morbidity.

Statistical analysis
Continuous variables were summarized as means with standard deviations or medians with interquartile ranges. Continuous variables were compared using Student's t test or the nonparametric Mann-Whitney U test and presented as differences in means with 95% confidence intervals (CIs). The chisquare test and Fisher exact test were used to compare categorical variables and presented as risk differences with 95% CIs. If >5% of the observations were missing, this was indicated in the footnote of the table.
Propensity score methods were used to compare outcomes adjusting for group imbalances. 16 The propensity is the probability for women to participate in the randomized trial given their baseline characteristics (Table 1). This propensity score was calculated for all patients using logistic regression. Mean differences and risk differences were adjusted for the propensity score in linear regression models and additive risk models. Multiple imputation was used to handle missing data in the baseline variables with imputation models with baseline and outcome variables. Statistical analysis was performed using Statistical Package for the Social Sciences software (version 22.0; IBM Corp, Chicago, IL) and R (version 2.10.0; R Foundation for Statistical Computing, Vienna, Austria.), using the package Multivariate Imputation by Chained Equations. 17

Results
For the DIGITAT, 1116 women were registered as eligible for participation-650 women were randomized, and 466 women declined randomization. Of the women who declined randomization, 452 consented to follow-up of their medical data and were included in the study for follow-up of their medical data. Of these nonparticipants, 58 were induced within 48 hours and 6 had a primary cesarean delivery within 48 hours. The other 377 women were monitored expectantly (Figure).
The baseline characteristics of the 2 groups are presented in Table 1. Nonparticipants were older, had a lower body mass index (BMI), had a higher level of education, and were less likely to smoke. Nonparticipants did not have more serious growth restriction than the participants, and no differences were seen in hypertensive pregnancy complications.

AJOG MFM at a Glance
Why was this study conducted? This study aimed to investigate possible factors contributing to the generalizability of a randomized trial in term fetal growth restriction by comparing perinatal characteristics and outcome between randomized and nonrandomized patients.
Key findings Although maternal characteristics in general were more favorable in the nonrandomized participants, their perinatal outcome was less optimal. All 4 perinatal deaths occurred in the nonrandomized group. It remains debatable whether these findings can be contributed to the characteristics of the participants or their doctors or are related to the so-called "Hawthorne effect."

What does this add to what is known?
To assess generalizability of findings from a randomized trial, data on the nonrandomized patients also have to be evaluated and discussed.

Original Research
The nonparticipants and/or their doctors had a strong preference for expectant management (Figure). Only 64 women (15%) were induced or had a primary cesarean delivery within 48 hours. Subsequently, labor started spontaneously more often resulting in prolonged pregnancy (Table 2) in nonparticipants. More cesarean deliveries were performed in the nonparticipant group, and more nonparticipants developed pregnancy-induced hypertension or preeclampsia after study inclusion than participants.
To compare the differences in fetal monitoring strategies between participants and nonparticipants during the trial period, we compared the number of fetal heart rate (FHR) tracings and the number of antenatal visits. Nonparticipants have net more antenatal visits and FHR tracings than participants owing to the prolongation of pregnancy by awaiting spontaneous delivery. When comparing the number of FHR tracings and antenatal visits between the nonparticipants and participants of the expectant monitoring groups, per day of expectant management, we found significantly more FHR tracing and a tendency to more antenatal visits in the participants.
Neonatal outcomes are presented in Table 3. More babies of nonparticipants were severely growth restricted (<p3). No perinatal deaths occurred among participants. In contrast, 4 perinatal deaths occurred among nonparticipants (Fisher exact test, P¼.03). Notably, 2 intrauterine deaths occurred after following an expectant monitoring policy, 1 at 40þ2 and the other 1 at 41þ4 weeks' gestation, with time to delivery after study entry of 14 and 24 days, respectively, and birthweight of 3026 and 2780 grams, respectively. The first patient had 8 FHR tracings and 5 antenatal visits while waiting for spontaneous delivery. Clinical and pathologic examination of the fetus and placenta revealed that fetal death was probably caused by a partial placental abruption. The second patient was monitored with 10 FHR tracings and 6 antenatal visits. Postmortem examination of the fetus and placenta showed that this stillbirth was associated with FGR. The third baby died with a birthweight of 2130 grams after induction and emergency cesarean delivery because of placental abruption at 37þ2 weeks' gestation. The mother was included 1 day before delivery and declined randomization, but gave permission for follow-up of medical data. However, the suspicion of FGR was raised already at 35þ6 weeks' gestation.

Original Research
This child died after a long hospital admission (>150 days) owing to serious complications of severe asphyxia. The fourth child was delivered at 40þ2 weeks' gestation with a birthweight of 2665 grams and had been discharged from the hospital 12 days after study entry in apparent good health but died unexpectedly from an intracranial hemorrhage owing to an arteriovenous malformation at the age of 28 days.
As reported in the original trial report, 1 participating woman who had induction of labor died unexpectedly at home 10 days after delivery. She had had an uncomplicated delivery at 38þ4 weeks' gestation. No cause was found at the postmortem examination. No maternal deaths occurred in the nonparticipant group.
We observed no significant differences in composite adverse neonatal outcome between the participants and nonparticipants (6.5% vs 8.3%). Neonatal admissions and length of stay in the hospital were also similar between the groups (Table 3).

Principal findings
Nonparticipants were healthier at baseline on important clinical characteristics (BMI, education, and less smoking). Most nonparticipant women with suspected FGR and/or their doctors preferred an expectant management policy. Notably, 4 perinatal deaths occurred among nonparticipants compared with 0 among participants. The percentage of children with a severe growth restriction (<p3) was higher among nonparticipants owing to prolongation of pregnancy.

Results
Women who are less educated and with a lower socioeconomic status are often less willing to participate in clinical trials. 15e19 However, in our study, we find the opposite; that is, highly educated women were less willing to be randomized for the trial. This could be explained by the fact that Dutch women are in general very reluctant when it comes to interventions during pregnancy. This phenomenon was also observed in our study where most nonparticipants preferred an expectant management.
Despite the fact that the participants had less favorable characteristics at baseline, their perinatal mortality rate was lower than the nonparticipating patients. This finding was also observed in low birthweight infants (defined as birthweight of <2500 grams), whowhen the mothers smoked-had a lower mortality rate. 18 This so-called birthweight paradox can be explained by the fact that smoking causes FGR in otherwise healthy infants, whereas FGR in nonsmoking women is possible caused by other, possibly more severe, pathology. Another possible explanation could also be that the doctor is more aware of the risk factors in a pregnant woman who smokes and therefore is more alert. 23,24 All 4 perinatal deaths in our study were in the nonparticipant group. Notably, 3 of these 4 deaths were in pregnancies with a gestational age of more than 281 days (40 weeks) and monitored expectantly. When looking at fetal monitoring strategies (FHR tracings and antenatal visits), these 4 pregnancies were monitored intensively, similar to the monitoring of the trial participants. However, we speculate that 2 of these deaths and perhaps the third death could have been prevented by induction of labor at an earlier gestational age.
Even though we found adequate fetal monitoring in the pregnancies where a perinatal death occurred, when comparing all the participants with the nonparticipants, participants had more FHR tracings, which can been seen as a proxy of standard of care, per day of expectant management and also more antenatal visits than nonparticipants. During the time of the study, no national protocols existed for monitoring strategies for pregnancies complicated by FGR, and policy was mainly based on the

Original Research
preference of the attending obstetrician and the patient and local protocols.
Participants to the trial were required to adhere to a strict study protocol. For the nonparticipants, the management was dependent on the preference of the attending obstetrician and local protocols.
Although nonparticipants were healthier at baseline, we nevertheless found a higher incidence of a birthweight of <p3 in the nonparticipants. We have shown, in a follow-up study after 2 years, among trial participants that more severe FGR (<p3) is independently associated with a less optimal neurodevelopmental and behavioral outcome, independent from management-induction of labor or an expectant management. This outcome was not measured in the nonparticipant group, but the finding of more severe growth restriction among nonparticipants may therefore have long-term implications. 26

TABLE 2
Pregnancy outcome, fetal monitoring, and onset of labor

Original Research
In our trial, we saw a more favorable outcome among participants (less severely growth restricted, no perinatal deaths) in both the intervention and expectantly monitored groups. Apart from slightly more FHR tracings in the participants, we found no clear evidence of a Hawthorne effect. We cannot dismiss the possibility that at least part of the cause of the adverse outcomes was that most nonparticipants seemed to favor an expectant management policy, subsequently leading to relatively more advanced gestational ages and more growth restriction.

Clinical implications
The primary DIGITAT 20 advises that it is rational to choose induction of labor to prevent stillbirths. The secondary analyses of the DIGITAT 25,26 conclude that induction of labor beyond 38 weeks' gestation seems to be the most favorable management policy. The differences we found between the participants and nonparticipants could be explained by the fact that most nonparticipants preferred an expectant management. This finding could be an extraargument for induction of labor before 40 weeks' gestation in term pregnancies complicated by FGR.

Research implications
FGR was defined using a population based neonatal growth curve and without using an international accepted definition. For future studies, using an international accepted definition of FGR should be used and also hemodynamic fetal parameters should be used for the antenatal identification of FGR based on placental insufficiency. Such a trial is now running in the Netherlands at the moment. In addition, a long-term follow-up study is important to draw definitive conclusions on perinatal management. Such a studyeat the age of 11 years and older-is actually done at the moment both in randomized and nonrandomized patients.

Strengths and limitations
Data from randomized and nonrandomized patients were included prospectively in the same database. A weakness is that fetal Doppler studies were not performed. A larger sample size is needed to adequately adjust for possible confounding when comparing the number of deaths between the randomized and nonrandomized patients.
As in all randomized controlled trials, we expect that there is a group of patients who were eligible for randomization in the study, but they were not approached by the clinician for whatever reason. We unfortunately do not have information about this group of patients.

Conclusion
We found that the nonparticipants in the DIGITAT have a worse outcome than the participants, despite the fact that these women were healthier at baseline. This shows the extreme importance of collecting data of those declining Data are presented as median (interquartile range, 25th to 75th percentile) or number (percentage). For binary outcomes, CIs were not calculated when there were <5 events per group.
Apgar, appearance, pulse, grimace, activity, and respiration; CI, confidence interval; N/A, not applicable. a Difference corrected for propensity score; b According to Dutch fetal growth charts; c P<.001, P<.05; d Defined as death within 28 days after birth or before hospital discharge. Original Research randomization in the same way as for patients participating in a trial. We have no strong evidence of the presence of the Hawthorne effect. The fact that most nonparticipants preferred expectant management and thereby prolonged the possible undernourished fetal environment could explain the less favorable outcomes in these women and remains an argument for timely induction of labor. n