Estimation of causal treatment effects from observational studies has obvious limitations and challenges. However, the randomized controlled trial is often no alternative, due to practical or ethical reasons. If a scientific question of interest really is a causal one, the analysis should target the specific question, even if the price is strong assumptions. In most cases, a scientist would more easily relate to subject specific assumptions, than an association with few assumptions and no causal statement, which can be misleading (in both magnitude and direction). Causal methods are more explicit in the assumptions for a causal interpretation, and often more robust for certain types of bias, even though they are just as susceptible to e.g. unmeasured confounding as more traditional methods.
Short term effect of medication for treatment of ADHD is well documented     . However, questions about long-term effects are less resolved, and more long-term prospective studies on treatment of adult ADHD patients with no prior medication, in a practical clinical setting are warranted  . Selection bias, high drop-out rate and limited reports on side-effects characterizes the few that have been conducted   .
To estimate the causal effect of medication, a successful study design would compare two arms in a trial, an active on medication and a control arm without medication. However, as for ADHD, present knowledge makes inclusion to the control-arm questionable. Also, if all eligible patients are offered treatment, a high number of patients are more likely to participate. A research design where all patients are offered medication and follow-up assessments, resembles clinical practice, with findings that apply to the population of interest. Causal effects can still be estimated under additional assumptions.
If all patients start off with medication, and those that experience intolerance, or a high number of side-effects, terminate the treatment, it means that there are patients on and off medication. A direct comparison of these groups yields a biased estimate of the causal treatment effect, due to feedback between the treatment assignment process and the outcome, e.g. symptoms. If treatment decreases subsequent symptoms on the one hand, but on the other hand increases side- effects and intolerance which again leads to termination of treatment, one might expect that the direct comparison between those on and off medication is an overestimate of the true treatment effect. Bias in the opposite direction can be expected due to time-varying confounding. Prior medication likely improves subsequent symptoms which represents a confounder for the association between continued medication and final symptoms. The positive association between prior symptoms and both continued medication and final symptoms will weaken the negative association between continued medication and final symptoms, and result in underestimation of the true treatment effect.
A marginal structural model (MSM), with inverse probability weighting (IPW) can account for such feedback, and make an unbiased estimate of the treatment effect, under specific assumptions  . In spite of its popularity, in a wide range of fields, the MSM also has limitations. For example, it is sensitive to miss-specification of the treatment assignment model in the weights. Fitting a MSM can give an overly low estimated probability for treatment for certain covariate combinations, which yields a high weight in the analysis (inverse of the probability). This high weight will propagate through time since the weight in a later time period is the cumulative product of the weights at earlier periods. Imai and Ratkovic (I&R) have successfully improved the robustness of such weighting methods as in the MSM, by introducing the covariate balancing propensity score (CBPS) methodology, first for cross-sectional, and later for longitudinal data   .
In this study, a re-analysis of longitudinal follow-up data on medication history and symptoms over one year for adult ADHD patients is presented. Ordinary longitudinal analysis of the same data, has been previously published, with limited focus on causal treatment effect, and little attempt to account for time-varying confounding mentioned above  . The present analysis fits an ordinary MSM, and compares it with the refinement of improved covariate balance (CBMSM), to estimate the causal effect of treatment on symptoms at one year follow-up.
The study sample, measures used, and the formulation of the MSM and CBMSM, including necessary assumptions for causal interpretation, will be described in the Method section. In Results, the findings from the CBMSM method is presented and contrasted to the ordinary fitted MSM. The Discussion section, relates the results to the previously published analysis, lists some strengths and limitations in the methods and interprets the results.
2.1. Sample and Treatment Schedule
Patients were included at a specialized outpatient clinic in Vestfold, Norway, between May 2009 and December 2010. Referred patients were aged 18 - 60, had to fulfill DSM-5 criteria for ADHD  , and without experience with any ADHD medication in adulthood. A total of 250 patients were included. Relevant baseline characteristics were: 129 were female (52%), 140 were unemployed or work disabled (56%), 188 had at least one comorbid disorder (75%), 156 used concomitant non-ADHD medication (69%), mean age was 32.5 years (SD = 9.8), mean body weight was 77.0 kg (SD = 16.8), mean number of years of education was 11 (SD = 2.3). By DSM-IV subgroups of adult ADHD, 97 patients were categorized as ADHD-inattentive, 113 as ADHD-combined, 17 as ADHD-hyperactive, and 23 as ADHD-residual. A more complete description of the sample can be found in a recent publication  .
All patients received methylphenidate as first-line medication and psychosocial treatment according to the national treatment guidelines (Norwegian Directorate of Health, 2005). Patients were assessed for symptoms, functioning and side-effects at scheduled follow-up visits at baseline, 6 weeks, 12 weeks, 26 weeks and 52 weeks. Standard-titration with immediate-release methylphenidate (MPH-IR) was prescribed for the first six weeks; 5 mg three times a day, and if tolerated stepwise increase until maximum 60 mg/day. Thereafter a flexible dose titration was applied to optimize efficacy (maximum 120 mg/day). Shift into extended-release methylphenidate (MPH-ER) was offered at the three- month visit if patients reported difficulties with compliance, annoying fluctuations in effect or otherwise wanted to try an easier administration form. If MPH was not tolerated or was ineffective, second-line medications were short-acting dextroamphetamine (dAMP) or atomoxetine (ATX). The dose of dAMP was escalated until a maximum 50 mg/day, and dose of ATX to a maximum of 120 mg/day  .
2.2.1. Predictive Baseline Characteristics
Baseline characteristics of the sample have been thoroughly described previously   , and constitute demographics, measures of functioning, mental distress, ADHD symptom-level, and comorbid anxiety, depression, bi-polar disorder, and drug and alcohol use disorders. To be diagnosed with ADHD, patients must retrospectively have endorsed at least 6 out of 9 DSM-IV symptoms of inattention and/or hyperactivity/impulsivity in childhood, and currently been assessed by two board certified psychiatrists using the Norwegian version of the Diagnostic Interview for ADHD in Adults, second edition (DIVA 2.0)  with at least 5 out of 9 DSM-IV symptoms of inattention and/or hyperactivity/impulsivity for the last 6 months (according to diagnostic threshold for ADHD in adults in DSM-5  , and with their ADHD symptoms related to significant impairment in social, academic, or occupational functioning. Comorbid psychiatric disorders were assessed with the MINI International Neuropsychiatric Interview Plus (M.I.N.I.-Plus)    . The time-fixed covariates selected in the regression model for treatment assignment (time-varying) were baseline side-effect level (predefined) measured by mean score of the Canadian Attention Deficit Hyperactivity Disorder Resource Alliance (CADDRA) patient and ADHD medication form (8.36) for indicating side effects  , alcohol use disorder (dichotomous), any anxiety disorder (dichotomous), baseline psychosocial function by the Global Assessment of Functioning (GAF)―symptom level (GAF-S)  , and age at baseline.
2.2.2. Time-Varying Covariates and Outcome
The primary outcome measure was current ADHD symptoms on the 18-item Adult ADHD Self-Report Scale version 1.1 (ASRS), Norwegian version  . A continuous scoring method was used, and frequency of ADHD symptoms present since last visit, self-rated on a 5-point scale (0 - 4) on each item, with a sum score from 0 to 72 points. Cronbach’s alpha was 0.86.
Two psychiatrists, not involved in the treatment, assessed overall psychosocial functioning (last two weeks) by the Global Assessment of Functioning (GAF) Scale  , the split version of symptom (GAF-S) and function (GAF-F) to improve reliability  . The intra-class correlation coefficient between the raters was 0.83 for GAF-S and 0.79 for GAF-F in the pilot  .
Level of mental distress over the last week was self-rated on 90 items on a 5 point scale, and the mean of all items is referred to as Global-Severity-Index (GSI)   .
Side-effects (Mean Side Effects―MSE) was quantified by a measure of tolerability Patient and ADHD medication form. The questionnaire lists symptoms frequently associated with stimulant treatment, and each item is scored by frequency (score 0 - 3)  .
Dose is a time-varying covariate and expresses daily dose in mg (or dose equivalent) per day.
Medication (MED) is a time-varying dichotomous indicator (1/0) of whether on medication or not, at the time of assessment.
2.3. Estimation of Causal Effects
Standard notation in the causal inference literature, makes use of “counterfactual outcomes”. A counterfactual, or potential outcome, is an outcome for a hypothetical treatment regime. There are general (non-parametric) assumptions necessary to estimate causal effects from observational data  . Consistency means that a person’s counterfactual outcome under his/her observed treatment history is precisely his/her observed outcome. Exchangeability means that the treatment groups are representative for each other. In a randomized controlled trial (RCT) exchangeability is guaranteed by design, and implies that if the treatment group had been untreated, they would have responded similar to the control group. In observational data, exchangeability is unrealistic, so instead one hopes for conditional exchangeability (exchangeability within levels of confounding covariates), often called “no unmeasured confounding”. Positivity means a positive probability for all levels of treatment, in all levels of measured covariates. In other words, with treatment = yes/no, positivity requires that there are both treated and untreated persons for all combinations of the confounding covariates. Correct model specification is necessary for all models, also in causal inference, both for the outcome model and models for the weights in the present analysis.
These assumptions are helpful to interpret results, but not testable from data, although some indication of non-positivity or misspecification is given by the weight distribution in weighting methods like the MSM  . A mean of the “stabilized weights” far from 1 indicates non-positivity or model misspecification. If conditional exchangeability is satisfied, an average causal effect in the population can be estimated by combining stratified treatment effects within levels of confounding covariates, or by regression models. However, this becomes unfeasible with a high number of covariates, some of which may be continuous. With many strata and limited sample size (or severe differences between treatment groups), some strata are bound to lack a treatment group. This non-positivity is not easy detectable, and leads to extrapolation in a regression model. In light of this, propensity score (PS) methods have become popular  . PS is the probability of being treated, conditional on all confounding covariates. It is a balancing score, a univariate measure that summarizes all confounding covariates. To adjust for confounding, it suffices to stratify/condition on the PS, instead of all covariates, which avoids extrapolation, and allows for diagnosing differences in the treatment groups   . If the treatment groups have little overlap in the PS distribution, conditional exchangeability is questionable and causal inference might be flawed. The weights in the MSM are functions of the PS.
Exchangeability between treatment groups implies covariate balance, that for any covariate, the treatment groups have equal distributions, e.g. equality of weighted means. Conditional exchangeability implies that all confounding covariates are measured and balanced. In an observational study, without knowing whether or not conditional exchangeability is satisfied, to check covariate balance in measured covariates is informative and recommended  , even though measured covariate balance does not guaranty balance in unmeasured confounding covariates. Balance diagnostics in measured covariates is a minimum for unbiased estimation, and routinely reported in PS studies, but often ignored in the MSM literature, maybe due to the more complex time-varying weights  . The MSM has been found to be highly sensitive to model misspecification in the treatment assignment model  . Rare covariate combinations with low number of treated, results in large weights that dominate the analysis. Recent work of Imai and Ratkovic has focused on automatic improvement in covariate balance for the MSM in a longitudinal setting, to make it more robust. Their method, here denoted covariate balanced MSM (CBMSM), generalizes their covariate balance propensity score (CBPS) from 2014  , to the longitudinal setting, and is available in open-source software as an R package (CBPS) at CRAN   . In the present analysis, the causal effect of ADHD medication on symptoms at one year follow-up is estimated, with adjustment for time-varying confounding (MSM), with and without improved covariate balance (CBMSM).
2.4. MSM for ADHD-Symptoms
A standard longitudinal analysis (linear mixed model) of the ASRS symptoms in the present study-sample, has previously been published (Figures 1-3)  . Large individual variation in symptoms over time was found, but rapid average improvement (during first 6 weeks), followed by persistent low level (Figure 1). Different medication histories reflected acute intolerance, continuous medication, treatment termination after some time, and off and on trajectories (Figure 2).
Figure 1. Symptom level (ASRS) at different assessments during one year treatment, in a Norwegian sample of 250 patients with ADHD diagnosed in adulthood.
Figure 2. Examples of different medication histories, intolerance (top left), continuous (top right), short term (bottom left), on/off (bottom right), in a Norwegian sample of 250 patients with ADHD diagnosed in adulthood.
Figure 3. Feedback between treatment (on/off medication each period) and symptoms (ASRS). Boxplot of ASRS distribution for those on/off medication in next period (left panel), and at the end of current period (right panel), in a Norwegian sample of 250 patients with ADHD diagnosed in adulthood.
Feedback between medication and symptoms is illustrated in Figure 3. Receiving medication in time-period j, was followed by subsequent average lower symptom level than no-medication (Figure 3, right panel). On the other hand, the off-medication group at time-period j + 1, had a history of higher average symptom level at the prior time-period than the on-medication group (Figure 3, left panel). These box-plots suggest that the data contains information about the true treatment effect, and calls for methods with the ability to account for different biases, and target the effect of interest. A MSM (with IPW) can adjust for time-varying confounding, and selection bias from differential loss to follow-up, in settings where standard regression models (univariate or longitudinal) fail  .
The MSM is a model for a counterfactual outcome, univariate or repeated measure (longitudinal). In the present application, the univariate model is sufficient and easy to interpret. The MSM for the mean counterfactual end-of-study univariate ASRS level, denoted by Y (ADHD symptoms at one year follow-up), can be formulated as:
where , , , and are dichotomous (1/0) indicators of on/off medication at baseline, 6 weeks, 12 weeks, 26 weeks and 52 weeks, respectively. which denotes medication history at 52 weeks, was limited to one of five different regimes (Table 1), for reasons that will become clear later. is the counter-factual final outcome under a hypothetical one year
Table 1. Different medication regimes ( ) considered in this study, of a Norwegian sample of 250 patients with ADHD diagnosed in adulthood.
Figure 4. Causal diagram (DAG) of the longitudinal structure of the data, a Norwegian sample of 250 patients with ADHD diagnosed in adulthood.
medication history. is the marginal mean of Y with medication at baseline only, corresponds to average change in Y under medication regime (1, 1, 0, 0, 0) relative to (1, 0, 0, 0, 0), corresponds to average change in Y under medication regime (1, 1, 1, 0, 0) relative to (1, 1, 0, 0, 0), and so on.
The time-varying covariates GSI, GAF-S and GAF-F, are viewed as closest in time to the actual follow-up assessment. MSE (side-effects) and ASRS (symptom-level) contain information from the whole preceding period, but with most weight close to the follow-up assessment. MED (indicator for on/off medication) describes the treatment status at the time of assessment. MED = 0 means that medication was terminated some time during the preceding period. The parameters in Equation (1) therefore each represents the effect of half the previous and half the following period of extra medication (on average), with uniformly distributed termination times. With respect to the direction of effects, the influence is allowed from medication to symptom-level in the same period, and from symptom-level to medication in the following period (Figure 4).
To estimate the MSM in (1), a weighted univariate linear regression model (associational), conditional on medication history only, is fitted  :
with weights (stabilized) for each person at time j, specified by:
The weights are estimated by a series of logistic regressions. V represents baseline confounders and the time-varying confounders (only in denominator). Robust standard errors are used, e.g. by the “sandwich” software package in R  to account for the fact that the weights are unknown and have to be estimated  . The parameters in (2) have causal interpretations, and are unbiased estimates of the theoretical parameters in (1), under the assumptions in the previous section, no selection bias from loss to follow-up, and no measurement error  . All confounder adjustment is achieved through the weights. Weights can be calculated using the “ipw” software package in R  . With no time-varying confounding, the numerator and denominator in (3) are equal and the weights equals one. If a combination of the confounders in the denominator gives a small estimated probability for treatment, the weight can become very large, and will propagate through time, due to the cumulative product. This is automatically alleviated with the CBMSM algorithm.
2.5. Improved Covariate Balance with CBMSM
Estimating the weights in (3) for the MSM, is usually done with a series of logistic regressions by maximum likelihood (ML), in a generalized linear model (GLM). With misspecification in these PS-models, maximizing the likelihood might not balance the covariates  . I&R resolve this by explicitly imposing moment conditions (covariate balance equations), e.g. the weighted mean of a covariate in the treated group is set equal to the weighted mean in the untreated group (moment condition equal to zero). Parameter estimation and moment conditions are solved simultaneously. With only one dichotomous treatment variable (yes/no) in a cross-sectional setting, total number of balance equations equals the number of parameters, and CBPS in the simplest form is just-identified (Appendix)  . In a longitudinal setting, the potential number of different treatment histories is large, and balance is needed across all of them, which results in over-identification, i.e. more balance equations than parameters. This over-identified set of equations does not have one unique solution, but is solved to minimize a quadratic function of the moment conditions (as close as possible to zero) to give “minimum imbalance” by the generalized method of moments (GMM) (Appendix)  .
With medication = yes/no in four consecutive time periods (everyone started off on medication), the number of potential treatment histories are 16. To avoid loss of precision in parameter estimates from patterns with few patients, only monotone treatment regimens were allowed (Table 1). This means that once a patient temporary terminated medication, a restart was excluded from the analysis, instead the data for this patient was censored from the point of restart and forward. In this way, e.g. in model (1) refers to average change in Y among those with an average of 39 weeks of medication, relative to those with an average of 19 weeks (Table 1).
To assess differences between treatment groups, overlap in the PS distributions for the five groups, was examined. Here, the PS represented probability for termination of medication sooner or later, as a function of baseline covariates.
2.5.1. Covariate (im) Balance
To assess balance, the standardized mean difference (SMD) for each covariate (difference in weighted means between two treatment groups, divided by the population standard deviation), is calculated. More precisely, the SMD express imbalance, which is smaller for better balance. As a rule of thumb, a value less than 0.25 is commonly considered to be acceptable   in a cross-sectional setting. No such acceptance level has been suggested for longitudinal data. Balance is more challenging in a longitudinal setting, because of the over-identified set of equations. For each time period, every covariate is balanced on all possible current and future treatment patterns conditional on the past treatment history. With as the number of time periods (as in the present data), each covariate enters moment conditions at time period j  , for example in the first time period ( ), covariates enter 15 moment conditions of which each represent the same number of equations as number of unknown parameters and number of covariates. Obviously, for a covariate to satisfy a higher number of equations, the “best” solution is expected to be further away from an exact solution, which implies more imbalance. The GMM estimation algorithm minimizes imbalance across all moment conditions and time periods (Appendix)  .
2.5.2. Estimating the Causal Effect of ADHD Medication
Model selection for the weights in Equation (3) was based on combinations of covariates that were significant predictors for continued medication at several time-points, and with resulting imbalance as low as possible. The following logistic model was chosen:
with the abbreviations anx―indicator for any anxiety disorder at baseline, ―mean side-effects at baseline, age―age at baseline, ―baseline psychosocial function, symptom part, alc―indicator for baseline alcohol use disorder, ―dose in previous period, ―ADHD symptoms at previous assessment, ―distress at previous assessment, ―squared psychosocial function, function part at previous assessment.
A similar model for censoring in the censoring weights was constructed. Censoring was used for deviation from monotone treatment (all patients that deviated from the five different regimes in Table 1), and for those with missing values in the ASRS outcome, at some point.
2.6. Missing Data
In these data, there were missing values, both in the outcome and in covariates. The 26 week assessment had the most missing, gafs26, gaff26―66 missing, mse26―63 missing, gsi26―49 missing, asrs26―46 missing, and 27 variables had no missing values. 132 observations had complete cases with no missing values at any assessment. With substantial missingness for some covariates (more than 20%), possibly caused by side-effects, multiple imputation (MI) was considered a suitable method to reduce the impact of loss of data. The missing covariates were multiple imputed, with chained equations  under the assumption of missing at random (MAR), and performed with the MICE package in R  . The ASRS symptom outcome, was also imputed when serving as a covariate, but not as outcome. Censoring weights were chosen to correct for potential bias from missing outcomes (loss to follow-up), a straight forward extension to the inverse probability of treatment weights to correct for time-varying confounding. For the missing covariates, the data were imputed 10 times, and all analysis (measures of balance and parameter estimates) was repeated for each dataset, and combined using Rubin’s rules  . After imputation, the dataset consisted of 170 observations with no missing outcome and covariates, and with monotone treatment history.
3.1. Overlap of Different Treatment Groups
The five different groups (Table 1) were comparable, with good overlap in the whole range of predicted probability for termination of treatment (PS), although some variation in the right tail (Figure 5). The two groups with termination fol-
Figure 5. Overlap in the five different treatment groups, with respect to predicted probability for termination of medication (PS), by baseline covariates:, ―psychosocial functioning at baseline, anx―indicator for baseline anxiety disorder, alc―indicator for baseline alcohol use disorder, ―baseline ADHD symptoms, ―baseline measure of distress, ―interaction, in a Norwegian sample of 250 patients with ADHD diagnosed in adulthood.
lowing baseline and 6 weeks had clearly right―shifted probability mass (higher probability for termination) compared to the other groups, which is intuitively reasonable, being closest in time to the explanatory covariates. No signs of serious violations of conditional exchangeability (among measured baseline covariates) or positivity were found (Figure 5 and Figure 9).
3.2. Covariate (im) Balance
Figures 6-8 shows graphically the SMDs for each covariate in the model from Equation (4), in estimation of the parameters in the MSM, and Table 2 summarizes these findings, with a comparison between the results from the CBMSM weights and from an ordinary fit of the MSM, with GLM weights (“ipw” package in R)  . All results are MI-estimates based on 10 multiple imputed datasets. In estimation of both and , the imbalance is clearly reduced in the estimation from the CBMSM weights compared to unweighted estimation (Figure 6, Figure 7) and compared with ordinary fitted MSM with GLM weights (Figure 8). The recommended acceptance-level of ±0.25 in the cross-sectional setting, is indicated with dotted vertical lines. Three covariates exceeded this limit, however those considered important time-varying confounders, like previous
Figure 6. Improved balance in covariates for estimation of
, by standardized mean differences (SMD), with covariates from Equation (4) and CBMSM weights (sorted by unweighted imbalance), in a Norwegian sample of 250 patients with ADHD diagnosed in adulthood. Results are based on multiple imputation to limit influence from missing covariates (10 imputed datasets, N = 170).
: baseline mean side effects,
Figure 7. Improved balance in covariates for estimation of
, by standardized mean differences (SMD), with covariates from Equation (4) and CBMSM weights (sorted by unweighted imbalance), in a Norwegian sample of 250 patients with ADHD diagnosed in adulthood. Results are based on multiple imputation to limit influence from missing covariates (10 imputed datasets, N = 170).
: baseline mean side effects,
Figure 8. Comparison of balance in covariates for estimation of , by standardized mean differences (SMD), covariates from Equation (4) and with GLM versus CBMSM weights (sorted by GLM weight imbalance), in a Norwegian sample of 250 patients with ADHD diagnosed in adulthood. Results are based on multiple imputation to limit influence from missing covariates (10 imputed datasets, N = 170). : baseline mean side effects, alc: baseline alcohol use disorder, : baseline psychosocial function―symptom part, : dose in previous period, age: age at baseline, : ADHD symptoms at previous assessment, anx: baseline anxiety disorder, : distress at previous assessment, : psychosocial function―function part at previous assessment.
Table 2. Average and spread in absolute imbalance (SMD) among covariates in treatment assignment model, for estimation of parameters in the MSM. Results are based on multiple imputation to limit influence from missing covariates (10 imputed datasets, N = 170).
symptom-level, psychosocial functioning and distress were within the limit (Figures 6-8). Slightly less imbalance in estimation of , compared to is also indicated in the figures. This is confirmed in Table 2, where average absolute imbalance over all covariates are presented and compared with GLM weights. Estimates of early effects of treatment ( and ) are clearly based on more imbalance than later effects ( and ). This is due to more moment conditions for each covariate early in the observation period, and the fact that groups with early termination were more different from the completers than those with late termination. CBMSM weights are superior to GLM weights for estimation of all effects with respect to imbalance (Table 2). Finally, the average imbalance for estimation of both and are close to the cross-sectional acceptance level of 0.25 (Table 2).
3.3. Estimation of Causal Effects of ADHD Medication
In Table 3(a) different estimates (MI-estimates) of the causal parameters in the MSM from Equation (1) are presented, and estimates for the CBMSM method across imputations are presented in Table 3(b). Each parameter has the interpretation of an effect of hypothetical medication in a period, in addition to being on medication up until that period. CBMSM estimates (Table 3(a), left-most column) are believed to be closest to the true unknown causal parameters, and show strong effects from medication in both the 19 - 39 weeks period, and the 39 - 52 weeks period. Hypothetical medication relative to no medication in the 19 - 39 weeks period (being on medication until 19 weeks), would be expected to reduce ASRS symptoms significantly at one year with 12.06 units ( , , ), and hypothetical medication relative to no medication in the 39 - 52 weeks period (being on medication until 39 weeks) would result in a significant reduction in symptoms of 8.72 units ( , , ). The coefficient represents the longest period of medication (20 weeks on average) and had the strongest effect. It represents a “direct effect”, not mediated through the last period. Effects of hypothetical medication in the first two periods (3 - 9 weeks, and 9 - 19 weeks) were non-significant, indicating possible mediation through subsequent periods.
Compared to an ordinary fitted MSM with GLM weights (Table 3(a), column 4 - 6) the differences in magnitude are modest, when the sizable improvement in average imbalance is taken into account (Table 2). The effect from the last period is reduced by 25% relative to the CBMSM, but still strong and significant. To indicate the magnitude of selection bias from non-monotone treatment pattern or loss to follow-up, CBMSM estimates without censoring weights are presented in columns 10 - 12. Compared to the fully adjusted CBMSM estimates, the selection bias was not negligible and seemed to result in negative bias (underestimation) mostly in the 19 - 39 weeks effect with a 15% reduction in the estimate. Without any adjustment, the effects would be greatly underestimated, with a relative bias of 11% (19 - 39 weeks period) and 47% (39 - 52 weeks period), respectively (Table 3(a), column 13 - 15). For reference, an ordinary linear regression, with adjustment for time-varying covariates is included (Table 3(a), column 7 - 9). The results show serious underestimation with this approach, with 37% and 53% relative bias for the 19 - 39 weeks, and 39 - 52 weeks periods, respectively. The effect of the 39 - 52 weeks period is no longer significant, and the bias is increased, also compared to the no adjustment (unweighted) case. This clearly demonstrates failure of standard regression to account for time-varying confounding, together with direction and magnitude of this bias. In Table 3(b), parameter estimates for the CBMSM method are shown across imputations. In each imputation, the missing covariates are predicted from all available data (e.g. including side-effects), and little between-imputation variability was found.
Separate models (CBMSM) to assess influence of different periods of medication on ASRS symptoms at 26 weeks, and at 12 weeks were fitted (MI-estimates) to examine the course of treatment effects and symptoms (Table 4). With ASRS26 as outcome, the coefficient for the most recent period of medication represents the effect of hypothetical medication relative to no medication in the 19 - 26 weeks period (when being on medication prior to 19 weeks). This seven- week period of medication had a strong and significant effect, and would be expected to reduce symptoms with 20 units ( , , ), a magnitude of the same size as the total treatment effect from two periods in the ASRS52-model. Estimated effects of earlier periods were nonsignificant, in spite of representing the longest period of medication (10 weeks), possibly resulting from mediation through the last period. With ASRS12 as outcome, the coefficient for the most recent period of medication represents the effect of hypothetical medication relative to no medication in the 9 - 12 weeks period (when being on medication prior to 9 weeks). This three-week period of medication had a strong and significant effect of 13 units expected reduction in symptoms ( , , ). Estimated effect of the earlier period, 3 - 9 weeks, was nonsignificant, in spite of longer duration, possibly mediated through the last period.
In summary, medication seemed to have strong and positive effects on subsequent symptoms, across the whole year, both for early and late periods. Symp-
Table 3. (a) Estimated average causal effects of medication in different time periods on final symptoms (ASRS) at one year follow-up, in a Norwegian sample of 250 patients with ADHD diagnosed in adulthood. Results are based on multiple imputation to limit influence from missing covariates (10 imputed datasets, N = 170). CBMSM: results of improved covariate balance from Kosuke & Ratkovic, GLM: ordinary IPW estimation by logistic regression in a MSM, LM: ordinary least squares regression with adjustment for time-varying covariates, CBMSMcens-: CBMSM algorithm without adjustment for censoring (loss to follow-up or nonmonotone treatment regimes), Unweighted: without any adjustment (for confounding or censoring); (b) Parameter estimates for the CBMSM method, in each imputed dataset (10 imputed datasets, N = 170).
Column means are MI-estimates in first column, Table 3(a).
Table 4. Estimated average causal effects (CBMSM) of medication in different time periods on symptoms (ASRS) at one year, 26 weeks, and 12 weeks of follow-up, in a Norwegian sample of 250 patients with ADHD diagnosed in adulthood. Results are based on multiple imputation to limit influence from missing covariates (10 imputed datasets, N = 170).
toms, measured at early stages by ASRS12 and ASRS26 would be expected to be largely reduced by medication immediately prior to symptom assessment, with no direct effects from earlier medication. Symptoms at one year follow-up, ASRS52, would be expected to be influenced by more medication history. Hypothetical medication in the mid-period, 19 - 39 weeks (with no medication in the last period) seemed to be most influential, with a direct effect in addition to an indirect effect through medication in the 39 - 52 week period. This is in line with a dose-response effect (longer duration of treatment corresponds to larger effect), but in contrast to symptoms at early stages, with nonsignificant effects from earlier periods of treatment with longer duration. Treatment effect per week seemed strongest in the early stages, but persisted across the whole year.
In the present application of the CBMSM method, strong causal effects of medication on ADHD symptoms were found, under standard assumptions for causal inference. A MSM (GLM weights) was fitted, to account for time-varying confounding and selection bias, but with covariate balance that was greatly improved with the CBMSM. The magnitude of the treatment effects from the CBMSM, are the ones believed to be closest to the unknown true levels. The alternative estimates were smaller (25% relative difference compared to the standard MSM), which indicates that for these data, standard analysis yields underestimated treatment effects.
With a maximum of one year medication in four consecutive time periods, all periods seemed to represent improvement in ADHD symptoms, over the course of the study. In terms of treatment effect per week, early stages seemed to have the strongest influence, with an average reduction of approximately 4 units per week of symptoms at 12 weeks (ASRS12) ( , , ) for hypothetical medication in the 9 - 12 week period. However, the treatment effect persisted over the whole year, with an expected reduction of 0.7 units per week of symptoms at the final assessment (ASRS52) with hypothetical continued medication over the last 13 weeks.
The CBMSM model revealed causal information, not accessible in a standard longitudinal analysis. The average symptom level during the course of the observation period is characterized by a rapid drop in the first 6 weeks, followed by an enduring constant low level for the rest of the year (Figure 1). The improvement in symptoms could falsely be attributed to medication in the first 6 weeks alone. After 6 weeks of medication, either maximum improvement was obtained or medication could be terminated, or would be necessary to maintain a low symptom level. Adjustment for time-varying side-effects, distress, psychosocial functioning and symptoms would not result in unbiased effect of treatment. The reduction in average symptoms seen in Figure 1 underestimates the causal effects of treatment, better predicted from the sum of effects from different periods in the CBMSM model. In addition to persistent and decreasing treatment
Figure 9. Stabilized censoring weight distributions (boxplots) for each time period, for censoring from missing in outcome (ASRS52) or non-monotone treatment patterns, in a Norwegian sample of 250 patients with ADHD diagnosed in adulthood.
effect across the whole period, another notable change in the treatment effect was revealed. The different models for symptoms at 12 weeks (ASRS12) and 26 weeks (ASRS26), both showed significant causal effects of hypothetical medication in the most recent treatment period, with no significant effects from earlier periods, in spite of their length. As for symptoms at end of follow-up (ASRS52), significant causal effects were found for hypothetical medication in the two most recent periods. The causal direct effect from hypothetical medication in the 19 - 39 week period on symptoms at one year, may represent other pathways than through continued treatment in the 39 - 52 week period (pharmacological pathway). Clinical insight has suggested that a patient’s social environment needs time to adapt and trust the improvement, when the patient is treated. This adaption process is slower than the pharmacological effect, but can be important for further improvement, through motivation, support, and positive feedback. A four and a half months adaption process is one possible explanation for the direct effect from the 19 - 39 week period. In this case, treatment effect in the early periods would be mostly pharmacological. Alternatively, the direct effect from the 19 - 39 week period corresponds to a dose-response effect. In that case, other explanations for the lack of such dose-response effects on symptoms at 12 and 26 weeks are needed.
If the models for ASRS12, ASRS26, and ASRS52 had been similar with respect to causal effects of medication, they could have been combined in a repeated measures MSM (longitudinal model), for possible gain in efficiency. In the present application, however, the differences in the separate univariate models were informative.
In conclusion, the CBMSM improved covariate balance substantially in these data, compared to the standard fitted MSM, and should therefore represent estimates closer to the causal effect of medication one would find in a successful RCT.
The improved covariate balance, strengthened the treatment effect, compared to the MSM, and even more so compared to ordinary longitudinal analysis with naive adjustment for time-varying confounding, reported in the literature  . The causal model also provided possible new clinical insight with respect to the dynamics of the effect of pharmacological treatment on adults with ADHD. A persistent and strong effect across the whole year on improvement in symptoms, was supported, and with both direct and indirect causal pathways.
Appendix: Covariate Balance in Cross-Sectional and Longitudinal Data
First, the cross-sectional situation is considered; a sample of size
which means that unbiased estimation of the treatment effect is possible by conditioning on the PS alone. It also implies covariate balance between treatment groups, for example equal weighted mean in the treated and untreated. (A1) represents a dimension reduction and has led to development of propensity methods, like weighting. Inverse probability of treatment weighting  can consistently estimate the marginal mean of the counterfactual outcome for any treatment, and the method of which the marginal structural model is based. Each observation is weighted by the inverse of the probability of his/her observed treatment assignment,
In observational studies, the PS has to be estimated, commonly by e.g. logistic regression and maximum likelihood, for example parameterized by
where . If the model in (A3) is misspecified, the covariates are possibly unbalanced. To make estimation more robust for misspecification, I&R proposed to estimate the PS under an additional condition of covariate balance, formulated as
By iterated expectation it is easily seen that both terms in (A4) equals , the population mean, and also a weighted conditional mean of the treated/untreated, respectively. With (A3) as the parametric model for the PS (1-PS), the number of equations in (A4) (one for each of the K covariates), equals the number of unknown parameters, which is the just-identified case  . I & R suggested to solve these moment conditions, either by generalized method of moments (GMM) or empirical likelihood, because these methods easily generalize to the over-identified set of equations, including the longitudinal case.
In the present application, the data is longitudinal with four time periods, . The time varying covariate at a given time period j depends possibly on the past treatment history until the previous time period ( ), written . It needs to be balanced on all current and future treatment trajectories, written , and is conditional on the past history. The covariate balancing conditions are written
and can be represented in an orthogonal way, in the following manner: Let the time varying covariates be combined in a dimensional column vector . To determine the sign of each term in the moment conditions, the following dimensional column vector is needed:
As described above, the moment conditions balance covariates measured at time j across all possible current and future treatments, but not past treatments and their interactions. Therefore, moment conditions on past treatments and their interactions are not binding, and can be zero’d out. As time progresses, the number of not binding moment conditions increase. The four different “selection matrix-es” to identify which conditions to zero out, are given by:
where is the identity matrix of dimension .
The sample moment conditions for time-period j can then be written  :
where is the Kronecker product (matrix on right-hand side is multiplied by each element in matrix on left-hand side), and with
and combining for all time-periods yields the matrix G with dimension
with corresponding covariance matrix (dimension )
Since each moment condition set equal to zero, like in (A4) (for time period 1 there are 15 conditions), has the same number of equations as the number of unknown parameters, the set of equations is over-identified, and there is no unique solution. Instead a quadratic function of the moment conditions is minimized to come as close as possible to zero, to “minimize imbalance” and this is achieved by the GMM estimator  . The optimal GMM estimator for is given by 
where G is from (A7) and the covariance from (A8), where the expectation can be calculated analytically in the logistic regression case  . These estimates are used in the construction of the weights for the MSM to give estimates of the causal effects from a marginal weighted outcome regression model (Equation (2)).
 Imai, K. and Ratkovic, M. (2015) Robust Estimation of Inverse Probability Weights for Marginal Structural Models. Journal of the American Statistical Association, 110, 1013-1023.
 Faraone, S.V. and Buitelaar, J. (2010) Comparing the Efficacy of Stimulants for ADHD in Children and Adolescents using Meta-Analysis. European Child & Adolescent Psychiatry, 19, 353-364.
 Faraone, S.V. and Glatt, S.J. (2010) A Comparison of the Efficacy of Medications for Adult Attention-Deficit/Hyperactivity Disorder Using Meta-Analysis of Effect Sizes. Journal of Clinical Psychiatry, 71, 754-763.
 Castells, X., et al. (2011) Efficacy of Methylphenidate for Adults with Attention-Deficit Hyperactivity Disorder: A Meta-Regression Analysis. CNS Drugs, 25, 157-169.
 Fredriksen, M., et al. (2013) Long-Term Efficacy and Safety of Treatment with Stimulants and Atomoxetine in Adult ADHD: A Review of Controlled and Naturalistic Studies. European Neuropsychopharmacology, 23, 508-527.
 Torgersen, T., Gjervan, B. and Rasmussen, K. (2008) Treatment of Adult ADHD: Is Current Knowledge Useful to Clinicians? Neuropsychiatric Disease and Treatment, 2008, 177-186.
 Fredriksen, M. and Peleikis, D.E. (2016) Long-Term Pharmacotherapy of Adults with Attention Deficit Hyperactivity Disorder: A Litterature Review and Clinical Study. Basic & Clinical Pharmacology & Toxicology, 118, 23-31.
 Fredriksen, M., et al. (2014) Effectiveness of One-Year Pharmacological Treatment of Adult-Deficit/Hyperactivity Disorder (ADHD): An Open-Label Prospective Study of Time in Treatment, Dose, Side-Effects and Comorbidity. European Neuropsychopharmacology, 24, 1873-1884.
 Fredriksen, M., et al. (2014) Childhood and Persistent ADHD Symptoms Associated with Educational Failure and Long-Term Occupational Disability in Adult ADHD. ADHD Attention Deficit and Hyperactivity Disorders, 6, 87-99.
 Sheehan, D.V., et al. (1998) The Mini-International Neuropsychiatric Interview (MINI): The Development and Validation of a Structured Diagnostic Psychiatric Interview for DSM-IV and ICD-10. Journal of Clinical Psychiatry, 59, 22-33.
 Leiknes, K., Leganger, S. and Malt, E. (2009) MINI International Neuropsychiatric Interview (1992-2009). [MINI Internasjonalt neurospykiatrisk intervju, Norwegian Translation Version 6.0.0] Norway. Norway Mapi Research Institute, Lexington.
 Kessler, R.C., et al. (2005) The World Health Organization Adult ADHD Self-Report Scale (ASRS): A Short Screening Scale for Use in the General Population. Psychological Medicine, 35, 245-256.
 Pedersen, G., Hagtvet, K.A. and Karterud, S. (2007) Generalizability Studies of the Global Assessment of Functioning-Split Version. Comprehensive Psychiatry, 48, 88-94.
 Derogatis, L.R. and Savitz, K.L. (1999) The SCL-90-R and Brief Symptom Inventory, and Matching Clinical Rating Scales. In: Maruish, M.E., Ed., The Use of Psychological Testing for Treatment Planning and Outcomes Assessment, 2nd Edition, Lawrence Erlbaum Associates Publishers, 1507.
 Harder, V.S. and Stuart, E.A. (2010) Propensity Score Techniques and the Assessment of Measured Covariate Balance to Test Causal Associations in Psychological Research. Psychological Methods, 15, 234-249.
 Vandecandelaere, M., et al. (2016) Time-Varying Treatments in Observational Studies: Marginal Structural Models of the Effects of Early Grade Retention on Math Achievement. Multivariate Behavioural Research, 51, 843-864.
 Azur, M.J., et al. (2011) Multiple Imputation by Chained Equations: What Is It and How Does It Work? International Journal of Methods in Psychiatric Research, 20, 40-49.