Alzheimer’s disease and related dementias (ADRD) are a significant public health concern for persons with the disease, their caregivers, and society. Over 5 million Americans are currently diagnosed and prevalence estimations are projected to near 14 million by 2050  . Financial projections over the next 4 decades suggest ADRD will cost the United States $20 trillion, and current annual federal funding for related research rests near $500 million  .
Caregivers of persons with ADRD are said to be the most widely studied caregiving population   . Caregiving of persons with ADRD is generally a familial role or responsibility assumed by a friend or neighbor. Schulz & Martire  have suggested that approximately 75% of home care for persons with ADRD is provided by family and friends, while the remaining 25% is purchased through the secondary market.
Successful interventions for caregivers of persons with dementia increasingly reflect a promising potential for the future  . Targeted interventions addressing behavioral symptoms associated with dementia, caregiver strain, frustration, and depression, as well as financial counseling has been found to address many important public health outcomes  . Significant work remains, as over 80% of caregivers of persons with ADRD report high levels of stress, and nearly one half experience depressive episodes   . These reactions within the caregiver role can lead to decreased caregiver health, increased functional decline, increased potential for abuse, decreased quality of life, early nursing home placement, and increased health care costs   .
Many innovative designs and approaches for applied research exist for the caregiving population, but careful consideration is warranted. The established public health impact of ADRD and related disorders necessitates the avoidance of studies using weak designs or with methodological flaws. Such studies may possess serious validity complications, resulting in either the dissemination of misinformation, or transmission of useless information to decision makers  . Basic science, randomized controlled trials (RCTs), qualitative approaches, and quasi-experimental designs serve as the primary basis for discussion.
Research design is dependent on many factors, but it namely hinges on the research question being asked, the practicality of recruiting participants for such a design, and the supporting resources available  . This section will begin with the most basic applications for research design for caregivers of persons with ADRD, which may often be seen in pilot work, and move on to explore possibilities in randomized controlled trials and alternate designs with varying levels of performance and possibility.
2.1. Basic Science
Basic science research has been used to compare the physiological symptoms of ADRD or dementia caregivers with generally demographically equivalent populations. Basic science designs can include prospective cohort studies, retrospective cohort studies, or case-control designs. These designs may typically be utilized in pilot or exploratory stages, or when basic mechanism of behavior, or incidence is unknown or is in need of further study/evidence.
An example of a prospective cohort design is found in the Shaw et al.  examination of accelerated risk of hypertensive blood pressure of caregivers of persons with ADRD. Blood pressures (BP) of ADRD caregivers and non-caregiving controls were tested semiannually over a period of 2 to 6 years (time variation due to staggered recruitment), and results suggested ADRD caregivers were more likely to meet hypertension criteria than the control.
Prospective cohort designs can be useful for assessing incidence, or the number or percentage of new cases that occur over the interval (e.g. how many ADRD caregivers met BP threshold criteria over the 2 to 6 years of study). These designs may also be beneficial in assessing the potential causes of a condition (e.g. caregiver strain increasing BP), though mixed evidence exists. Prospective approaches, when compared to retrospective, allow for a more complete data collection, and generally result in less missing data (when compared to retrospective designs). Additionally, prospective documentation of risk factors (e.g. for BP: age, gender, education, SES status, BMI, use of hypertensive medication) protects the measure from being influenced by knowledge of the outcome. Finally, the prospective approach allows for the study of multiple outcomes (e.g. could also examine blood sugar, cholesterol, etc.).
Although prospective cohort designs can point to potential causes of a condition, true causal inference is challenging due to lurking/confounding variables. Exemplified in the example, prospective studies can take some time (and significant financial resources) to complete, and generally require large numbers of patients. Study attrition can also impact sample size and recruiting mechanisms. As displayed by Shaw et al.  , prospective cohort studies can be a promising means for pilot work with caregivers of persons with ADRD, though additional designs will be needed when introducing an intervention.
Retrospective cohort designs can be useful in the assessment of incidence (e.g. caregiver strain or depression). The retrospective design can assist in the assessment of the cause of the condition and it helps in the identification of outcomes that can take an extended period to develop (e.g. elevated blood pressure). Unlike the prospective design, an inherent time and resource burden is generally lifted due to the nature of retrospective review.
Like its prospective counterpart, causal inference is challenging due to the possibility of confounding variables, as well as the potential for missing or inaccurate data. By performing retrospective analysis, the investigator has generally relinquished control over sampling methods and the quality of the predictor variables included. Like the prospective cohort, these designs will generally be useful only for pilot collections and are largely unable to properly measure an intervention. Gaugler et al.  completed a retrospective study to determine if the early use of community-based assistance services in ADRD would delay nursing home placement. With available data from more than 4500 caregivers of persons with ADRD, three years of data were used in conjunction with a Cox proportional hazards model with key variables including stress processes, duration, and community-based assistance use  . Authors found that caregivers utilizing support services were more likely to delay nursing home admission, though also highlighted shortcomings of the retrospective design, namely in this case, being unable to determine the frequency of assistance received in early stages  .
Case-control studies are generally more epidemiological in nature, comparing two groups, one with the outcome (or disease/condition) of interest and one without, and following over a prescribed period of time. While this design is relevant for persons with ADRD (e.g. examining risk factors for obtaining the disease), it is less relevant and efficient for caregiver studies as the exposure is considered the outcome. In a study examining stress and psychological morbidity of caregivers of persons with ADRD, González-Salvador et al.  , reported a participant mix of 58 caregivers of persons with ADRD, and 32 caregivers of non-ADRD persons (control). Standardized measures were used to assess participant symptomology, where psychological morbidity was found to be higher in ADRD caregivers  .
Statistical methods employed in cohort studies generally include frequency and effect (or association). This can include measures of risk ratios and relative risk. Statistical methods for case-control studies can include chi-square (2 × 2) analysis, the Mantel-Hanszel statistic (for effect modification), the Fisher’s exact test (if cell counts are <5), logistic regression (to adjust for confounders, odds ratios, and relative risk. In Yaffe et al.  prospective cohort study examining the characteristics and nursing home placement for persons with ADRD and their caregivers, authors calculated Kaplan-Meier estimates for nursing home placement. Additionally, they used Cox proportional hazards models to isolate variables independently associated with time to nursing home placement  .
2.2. Randomized Controlled Trials (RCTs)
Randomized controlled trials are generally considered as the gold standard of trial design, and include both intervention and control (or attention-control) groups. Group assignment is conducted through a randomization procedure, generally block or simple randomization with the ability to stratify randomization (making treatment groups comparable) and minimization (minimizing the imbalance of present prognostic factors). A number of variations exist for RCTs and will be discussed here, along with their potential for application to ADRD caregiving intervention research.
Superiority trials are perhaps the most common RCTs, targeting superior outcomes from the intervention versus control groups. In a study targeting behavioral symptoms of individuals with dementia, Gitlin et al.  provided up to 11 home and telephone interventions by an occupational therapist for persons with dementia and their caregivers, primarily training caregivers to identify and modify triggers of upset/agitation. This intervention group was then compared with a no-treatment control group.
Non-inferiority RCTs differ from superiority trials in that they seek to determine if the introduced intervention not inferior to another (current-practice) intervention. For instance, one may hypothesize that, with training, they can equip a licensed practical nurse (LPN), or even a certified nursing assistant (CNA) to provide the caregiver intervention in the Gitlin et al.  study. They may then elect to perform a non-inferiority trial on the basis that the LPN or CNA (randomized) arms may be easier to administer (more availability) or less costly (significantly lower reimbursement rates). While one would not expect the LPN or CNA to do a “better job” than the occupational therapist used in the original intervention, they are simply seeking a measure of equivalence (not worse than/at least as good). When planning a non-inferiority trial, we define an appropriate level of non-inferiority. This is not a clinically important measure, but a generally small difference by which we gauge if the new intervention is worse than the standard. The measure will generally be the difference between the two means, two proportions, or two survival rates.
Equivalence RCTs are used when it is not expected that the new therapy is superior to the standard, but may have some alternate advantages over the standard therapy. Equivalence trials are perhaps more difficult to apply to interventions for caregivers of persons with ADRD, though should not be completely ignored as it may be that for trials involving intensive, or invasive interventions with caregivers, these trials may be an option. Namely, this trial is intended to show the new therapy is not excessively negatively impacting the primary outcome. Unlike the non-inferiority trial, seeking “not worse than” status, we will rather seek the “same” or equivalent. When planning an equivalence trial, an equivalence region is selected, generally a range within which the two therapies will be considered equivalent (clinically). A two-sided hypothesis test focuses on the upper limit of what is considered clinically acceptable as the new treatment is generally not considered to be superior to the standard arm and is only being tested due to its favorable characteristics.
Additional characteristics impact RCTs, particularly related to blinding of participants and investigators. The double-blind RCT provides the strongest type of evidence and involves neither the patient, nor the investigator being privy to treatment arm assignment  . This type of trial would be particularly difficult to implement with caregivers of persons with ADRD when performed outside of the pharmaceutical industry, as caregivers generally have the cognitive faculty to identify treatments or instructions being received, as well as interventionists having the ability to “guess” what treatment arm is being provided. A possible approach, similar to Rogers et al.  double-blind placebo-controlled trial on donepezil (pharmaceutical intervention intended to address cognition and behavior) for persons with dementia, could include testing caregiver quality of life while their care recipient is using donepezil or placebo, or testing an alternate pharmaceutical approach to address caregiver strain through prescription of Prozac (or similar) versus placebo, though such studies may be rife with ethical complications and human subject protection considerations.
Single-blind RCT’s may generally be the highest level of evidence available for caregiver interventions. An example of a single-blind trial for caregivers of persons with ADRD includes Gitlin et al.  non-pharmacologic approach to manage challenging behaviors of Veterans with dementia. This trial includes a blinded assessor who is not privy to the assignment of Veteran/caregiver dyads receiving either an in-person behavioral intervention or a telephone (attention-control) education intervention. In this trial, it would not be feasible to blind caregivers to the intervention received, but through blinding the assessor, many concerns regarding bias can be controlled.
Non-blinded (open) RCT, when utilized, is often done so out of necessity. Constraints on research team size, funding, and design may sometimes be the cause. Observer bias can become a significant issue in these trials and they may often be prevented with the use of a blinded assessor, if at all possible.
Cross-over trials are a unique brand of RCT that involve the use of each participant serving as their own control. This allows participants to receive both interventions, thereby decreasing the number of needed participants and perhaps alleviating some ethical concerns. Participants are still randomized as they will receive therapy A and B, at different periods (e.g. AB or BA). Cross-over trials include a necessary component of “washout”, or period between interventions. While pharmaceutical interventions may be the most practical means of determining the washout period due to chemical properties and half-lives, more thought and a reasonable hypothesis will be needed for behavioral and educational interventions. This is particularly important for trials with caregivers of persons with ADRD as it has been suggested washouts are not possible for learned interventions, making the cross-over trial unfeasible  .
Cross-over trials are beneficial, in that within-subject variability would generally be minimal when compared with between-subject variability (in a traditional RCT); this also helps with sample size numbers. While it may be conceivably possible to perform a cross-over trial for caregivers of persons with ADRD, a strong evidence-based hypothesis will be needed to justify the washout period. Conceivably, one may be challenged that if the intervention is worthwhile, perhaps defined as one that provides a lasting benefit to caregivers, a substantive washout period would be impossible. Additionally, with progressive dementias, it cannot be assumed caregivers of persons with ADRD will return to their pre-treatment state to provide equitable comparison between therapies. Finally, attrition rates can be high, and more damaging, as without the second trial, the first is largely unusable for statistical analysis.
Cluster randomized (group allocation) designs are particularly useful to use in multi-center trials and are often used in children’s oncology trials. If used for caregiving, they may be best suited for interventions that can be targeted for groups of caregivers, rather than individuals. In a cluster randomized trial, specific locations will be randomized, and all caregivers at that location would take part in the same intervention. This is a particularly viable method to test therapeutic group designs with caregivers of persons with ADRD, and allows for a feasible assessment of intervention effectiveness. A somewhat similarly designed trial by Dröes et al.  studied the effectiveness of meeting center support programs for caregivers of persons with dementia. Although it was not a perfect replica, as “site randomization” did not occur, it allowed the investigator to compare centers across the country using 2 separate programs for caregiver support, via pre and post-test.
Cluster randomized designs are often beneficial for recruitment purposes as physicians or others are not faced with ethical decisions regarding treatment assignment. Additionally, the potential for unblinding, or crossover is significantly limited as only one intervention is taking place within the facility. It would be particularly important to control for reporting bias though utilizing a blinded data collector in a cluster randomized trial. Cluster randomized designs generally require a high sample size (due to power requirements) and can be particularly expensive to perform.
A number of benefits are associated with RCTs. Randomization inherently removes the bias associated with the assignment of participants to the intervention or control arms. This is important for caregiver research, as it prevents investigators from selecting the youngest, healthiest, most well-adjusted caregivers to their selected intervention arm, resulting in evident bias of assignment. RCTs, through randomization and stratification ensure comparable groups. This is additionally important in caregiver research, as caregiver “type” or relationship status of caregiver (spouse vs. non-spouse) have been found in impact a number of important study characteristics including treatment process, outcomes, and attrition rate  . The validity of the statistical tests of significance is additionally assured through the use of RCTs.
Despite the advantages, ethical issues can limit the feasibility of an RCT. The introduced intervention may be considered superior to the standard therapy or control (at least in the eyes of the designer). RCT’s generally dictate that half of the participants will not receive this novel/promising therapy and clinical equipoise may sometimes be an issue. While this may not be as egregious of a concern when designing trials for caregivers of persons with ADRD, it still must be considered. Ways to partially alleviate ethical concerns are to include, as Gitlin et al.  did, an attention-control group that is receiving some form of beneficial intervention for the alternate arm of the study.
Statistical reporting for RCTs is generally guided by the CONSORT statement, describing key statistical elements for RCTs. This guidance highlights the use of sample size calculations (power analysis), intention to treat analysis, reporting of effect size and precision, and “addressing the effects of multiple analyses on trial findings”  . Multiple equations exist for sample size calculations incorporating the desired level of confidence (e.g. 95%), the population proportion, and the margin of error. Caution is generally given to these equations as they derive the minimal number of subjects needed for the proposed analysis and do not account for attrition. Sample size estimates may also be carried over from previous studies, or pilot data, though caution should be used when doing so  .
When comparing treatment arms, Gitlin et al.  proposed performing sample size calculations based on 80% power for their two-sided alternate hypothesis using a t-test to compare the two treatment groups at the first collection period. Intention to treat analysis is performed in RCTs due to the need to avoid impact of attrition or crossover. Gitlin et al.  also used intention to treat analysis in their study to address issues of noncompliance, protocol deviations, and withdrawal and/or death.
Reporting of effect sizes allows for the description of the magnitude of the treatment effect. Effect size differs from significance tests as they focus on the meaning of the results and allow for adequate comparisons in future analyses  . Pearson’s correlation (r), used for paired quantitative data, is one of many methods for describing effect size. In line with Cohen’s  guidelines for the social sciences, Gitlin et al.  sought a medium effect size (0.50) in their study. Statistical significance is generally reported through a p-value in the context of a null hypothesis. In the Gitlin et al.  study, the primary hypothesis would be significant at a type I error rate of 0.05.
Effects of multiple analyses on trial findings can be quite concerning, and are particularly possible with data in the field of public health. When large sets of data are available with numerous variables and outcomes, data dredging can result in false positives and later reported as statically significant  . Thoughtful trial design and statistical consultation aid in the prevention of dredging. Peer review can also help prevent the possible dissemination of misinformation or findings incorrectly labeled as “statistically significant”.
Qualitative research generally answers quite different questions from those addressed in quantitative designs. Qualitative research will not address questions like “how many?”, “what are the causes?”, or “what is the strength of the relationship?” but rather attempt to generate understanding through social processes  . Through increased process understanding, a hypothesis may be generated and tested further  .
Qualitative research is particularly useful in early stages of clinical trial design to identify missing aims, variables, or needed collection intervals. Qualitative research may also assist with questionnaire and test development. Qualitative research can assist with a patient-centered design to ensure the right question is being asked and the right intervention is being tested. Many trials testing interventions to support caregivers of persons with ADRD maintain both quantitative and qualitative (mixed-methods) components to enrich the data being collected, though to do so, compromises must be made on both sides.
In a study comparing African American and Hispanic caregivers of persons with ADRD, a mixed-method design was used to harness information regarding depression, burden, behaviors, activities of daily living (ADL) scores, health status, availability of informal supports, and use of formal services  . While some of the measures were generally quantitative (Center for Epidemiologic Studies Depression Scale, Zarit burden questionnaire for perceived burden, and behavior checklist to identify difficult behaviors), other measures, including caregiver health status and the availability of informal and formal support involved a qualitative collection. This collection then leads to a thematic data analysis. For expansive amounts of data (dictated interviews, high number of participants and/or questions), NVivo  (or similar) software can be used to analyze the content.
A significant benefit of qualitative designs is the availability of open-ended questions. These questions are most likely to produce responses that are meaningful and culturally salient to the participant  . Researcher anticipation (pre-judgment) of participant answers is lessened if a structured interview has taken place. Qualitative data is inherently rich and explanatory in nature and allows for flexibility and probing, if more information is needed.
Despite the advantages naturally tied to qualitative designs, assumptions from the data being collected, generally from a fairly small group of participants, can only be generalized to that specific group of participants. Because of this, it can be difficult to make systematic comparisons between different populations. Qualitative collections are additionally dependent on the skills of the data collector, possibly creating a concern regarding inter-rater reliability, though not an insurmountable one. While this same criticism could be levied toward more quantitative designs, a much greater risk exists with qualitative collection as it is often dependent on free-string data, and bias can be easily introduced by the recorder, more so than can be done with a checklist or “yes no” questions. There are a number of ways to address bias in qualitative collections. Measuring inter-rater reliability (consensus between data collectors), creating a design that is not particularly concerned with bias (or where bias minimally impacts the results), and defining explicit bias are all possibilities. Collecting statistical data is not as convenient in qualitative models, though this can be addressed through the use of mixed-methods, rather than exclusive reliance on the qualitative collection.
Analytical methods employed for qualitative designs are largely dependent on the organization or scaling of collected data. Much demographic data and sample description can be provided through means, medians, ranges, and standard deviations. Post-coding, if data can be transformed to ordinal or interval scales, could include statistical procedures such as the Chi-square test for independence (paired observation for two variables) and odds ratios (to measure strengths of associations), along with an unlimited number of techniques, provided the data can be transformed accordingly  . In a study by Shaji et al.  , investigators performed a qualitative study to increase the knowledge base regarding care arrangements and strain experienced by caregivers of person with ADRD. Statistical procedures for methods and reporting results were displayed through frequencies and vignettes.
As highlighted, RCTs are considered the gold standard of research, but are not always possible due to ethical issues, sample availability, and other necessities. For the sake of this writing, a quasi-experimental design will be defined as a study that is missing something (e.g. randomization, a control, etc.). Therefore, this section may at first, seem a bit incomplete, to avoid repetition of designs discussed previously that would generally be assumed quasi (non-blinded RCT, crossover trials), were it not for the randomization that occurred. While meaningful quasi-experiments are present in the literature, researchers will need to keep internal validity in mind when making conclusions about the reported data.
Nonrandomized concurrent control studies can involve participants being divided into standard treatment groups and a novel intervention group. Participants would generally receive interventions over the same period of time, though the primary difference would be in the lack of randomization. This could become a possibility, particularly in a multi-center trial, for studies with caregivers of persons with ADRD. For example, if faced with limited resources for future trials, it may be feasible to compare a tailored behavioral intervention, like the Gitlin et al.  intervention with a standard intervention in an alternate (or similar) geographical area, as discussed by Schulz & Martire  , including support groups, individual counseling, or educational approaches.
Nonrandomized studies can alleviate ethical concerns regarding controls, or attention-controls receiving less efficacious therapies. Lack of randomization may also aid in subject recruitment as participants are not placed in a situation where they do not know what their intervention will be. Anytime randomization is removed, inherent problems exist. Control and intervention groups generally will not be comparable due to a number of extraneous prognostic factors. Additionally, it may be difficult to identify real differences between therapies, and if differences do exist, they may be due to differences in alternate baseline factors, including selection bias.
Single arm designs are another option when faced with time, financial, and ethical concerns. These trials have just one arm, an intervention, which is compared to the pre-set/defined outcome of previous trial/historical controls. This may be an option for trials with caregivers of persons with ADRD, particularly in pilot exploration (pre Phase III), or perhaps in place of a randomized Phase III when participant numbers are insufficient to complete a robust clinical trial. Single arm studies are beneficial for their time and cost-saving characteristics. Provided, a reasonable hypothesis, and trial design exists, many inherent biases may be avoided.
Despite their benefits, single arm studies can produce biased results in healthcare settings. Numerous confounders can unintentionally be left uncontrolled due to the comparison with a historical control. Perhaps the healthcare system has changed over time (improved or decreased quality). Changes are disease classification, clinician/investigator differences. There may also be varying accuracy and completeness in the historical control, making statistical comparison more challenging. As it is difficult to quantify the many confounders, these studies will rightfully face scientific challenge.
Pretest-posttest designs are often used in behavioral research and are generally used to compare groups or measure change resulting from an intervention. Pretest-posttest designs may be randomized or nonrandomized. These designs may be particularly useful when faced with the similar (time, financial, ethical) constraints justifying signal arm studies. Pretest-posttest designs may be especially applicable to caregiver educational or group interventions, comparing a group of caregivers who received the intervention to a control (or no control).
Pretest-posttest designs are beneficial in that they provide a relatively simple measurement of change (e.g. impact of the intervention) between groups. Maturation and history can influence these designs, as psychological and biological characteristics of the participant can change over time, and will not be recognized in the context of the intervention. For instance, death of a close family member may impact caregiver distress scores despite the intervention received.
Statistical methods for nonrandomized trials are not entirely different from randomized trials, in that effect size and precision should still be reported (though in cautionary means), additionally sample size and power calculations can still be performed. Additionally, control variables must be used requiring more complex analysis and the results are more difficult to interpret. In a nonrandomized study with caregivers and persons with dementia involving a mental health service intervention, Woods et al.  used the Fisher’s exact test to improve chances of group compatibility. They additionally used multiple linear regression to identify variables predicting general health questionnaire (primary outcome measure) scores  . Authors also used frequencies to describe their demographics and precision scores to describe study significance.
This paper was developed to provide an overview of research designs and considerations, along with examples from trials with caregivers of persons with Alzheimer’s disease and related dementias. Trials designs including basic science, randomized control, qualitative, and quasi-experimental have been highlighted.
Basic science designs are typically helpful in pilot stages, or when basic mechanisms are not completely understood. They can be less helpful when seeking definitive answers for behavioral symptomology. Randomized controlled trials were described as the gold standard in trial design as they impose random assignment of treatment, as well as a control, designed to provide equitable sample for comparison. They can be limited for ethical reasons, when it is believed one treatment is superior to another, and it can be difficult, particularly in caregiving trials, to ensure the control is not receiving some sort therapy outside of the designed intervention. Qualitative designs were identified as particularly useful in early stages of research and a significant benefit is the ability to use open-ended questions to produce culturally-relevant and salient/meaningful responses. Qualitative designs are generally limited by the manpower to conduct such, resulting in small sample sizes, and general inability to compare between populations. Quasi-experiential designs were described as useful when randomized controlled trials are not possible or warranted. They have generally been described as trials that are “missing something”, but have often been conducted as such due to ethical, financial, and timing concerns. Potential for bias is perhaps the biggest point of concern in quasi-experimental designs as confounders may be left uncontrolled due to the lack of a historical control.
It should be noted that all studies involving recruitment inherently possess volunteer bias, or the recognition that the population participating in a clinical trial is not necessarily representative of the population as a whole. Careful pre-trial consideration is warranted in the design phase to ensure that study results are transferable outside of the study sample and clinically meaningful in a field of increasing importance such as Alzheimer’s disease and related disorders. Although randomized controlled designs carry the “gold standard” adjective, a poorly designed RCT may result in less benefit to the field than a well-designed nonrandomized quasi design.
This paper was developed, in part, from the author’s doctoral qualification exam. Acknowledgements include his doctoral dissertation committee members: Dr. William C. Mann, Dr. John Kairalla, Dr.Mary Ellen Young, Dr. Orit Shechtman, and Dr. Jamie Pomeranz.