A Closer Look at the Kaplan-Meier and Life Table Models in Survival Analysis
Abstract: Survival analysis comprises a set of statistical methods deployed in studying the timing and occurrence of events. This paper studied survival functions with particular reference to Kaplan-Meier (K-M) estimators and Life Tables. Secondary data of In-patients who reported cases of malaria (origin state) to time until death or recovering (censored) was extracted and analyzed using Kaplan-Meir and Life Table functions in SPSS. Through this discourse, we showed how survival probabilities could be obtained and graphed. We inferred from the data provided in this paper that the mean years of life left for a new born child e0 was 61.9 years. The expectation of life was equal to 0.016, which translates to 160 deaths per 100,000 person-years. Again, the number of new born children dying before age twenty (20) was given by l0 - l20 = 100000 - 97127 = 2873. The probability of new born children dying before age twenty (20) years was 0.0287. The population survival curves for the two classes of users of ITN, after adjusting for gender, gave us a p-value of 0.002 < 0.05 which was highly significant, implying that there was a statistically significant difference between the population survival curves. For the patients who subscribed to ITNs, the probability that they would survive for at least a day after admission was 0.9, while for non-users of ITNs, the probability was 0.8, again, the probability that patients who use ITNs will survive for at least 30 days after admission was 0.6, while for non-users the probability was 0.2. It should be noted that survival analysis is suitable for studies that has to do with time until the occurrence of events. It could also be used to identify factors which significantly influence an event. 1. Introduction

Survival Analysis derived its name from biostatistics, medical science and epidemiology; in engineering it is referred to as reliability analysis; in economics it is called duration analysis; and in sociology or demography, the method is known as event history analysis. It is the field of statistics that models time-to-event, time-to-death, or time-to-failure data. Survival analysis contains ab extremely powerful class of predictive analytics that can be used with any time-to-event predictions. Methods of survival analysis are concerned with the time from a known origin to either an event or a censoring point. It may deal with events such as the time from diagnosis of a disease to death, it can also be used in time-dependent phenomenon, such as time in hospital or time until a disease recurs. The best way to display survival data is by a Kaplan-Meier survival curve and could be complemented with Life Tablemethods. With this curve, the flow of events is seen graphically as a step function. Each time an event occurs, the survival curve is re-calculated. Additionally, Life Tables calculate survivals at fixed points in time. This paper is devoted to the study of survival functions with particular reference to Kaplan-Meier Estimators and Life Tables. Special emphasis is placed on how Kaplan-Meier curves and Life Tables are generated, analyzed and interpreted; how their values are not affected by incomplete data, and how each of the procedures could be applied in real life situations. Applying the theories behind Kaplan Meier Estimators and Life Tables reveals hidden trends in health data which invariably leads to the introduction of interventions to curb further occurrences. These procedures are useful for predictions in maintaining optimal policies in medical and biomedical studies. It also has the added advantage of being used as a research design for other research works. The purpose of this paper is threefold: To have a closer look at the dynamics of Kaplan Meier estimators lacing the discussions with examples; to give an overview of the Life Tablemodel using the actuarial method and the Kaplan Meier Method; and finally, to analyze a real case scenario using the Kaplan-Meier Methods and the Life Tablemethods.

1.1. Theoretical Underpinning

Hosmer et al.  have mentioned that the most common non-parametric approach in the literature is the Kaplan-Meier (K-M) estimator which estimates the unadjusted probability of surviving beyond a certain time point. The K-M estimator works by breaking up the estimation of $S\left(t\right)$ into a series of intervals based on observed event times. These observed data contribute to the estimation of $S\left(t\right)$ until the event occurs or are censored. According to Hosmer et al. (2008)  and Hosmer et al. (1999)  for each interval, the probability of surviving until the end of the interval is calculated, with the prior information that subjects were at risk at the beginning of the interval. In medical research for instance, it is used to measure the proportion of patients living for a certain amount of time after treatment. In employment studies, Kaplan-Meier estimators are used to measure the length of time graduates remain unemployed after graduation from college, then, in agricultural studies, researchers use this model to study how long it takes fruits to mature before they are plucked by harvesters.

Kaplan and Meier  published a paper on how to deal with incomplete observations. Thereafter, the Kaplan-Meier curves and estimates of survival data became very influential in dealing with various time-to-event data especially for cases of incomplete information. There are many real-life cases where time-to-events may be important end-point variables, these cases include cancer survival times, onset of malaria, time period between contracting HIV and progression to clinical AIDs, and quite recently, the duration from time a person gets infected with COVID-19, till death or recovery following use of medication. In these few examples, survival times may relate to actual survival with or without death being the event. The event of interest may vary. The Kaplan-Meier method resembles a modified form of the life table method. In the life table method, the time axis is divided into many discrete time intervals, usually years. The number at the beginning of the year, the number dying in the year, and the number censored or lost to follow-up in the year are all tabulated. The Kaplan?Meier method also divides the time axis into many discrete intervals. However, in this case the intervals are not defined by a fixed length but by the occurrence of an event. The probability of death during any of these intervals is simply the number who died divided by the number of individuals who started the interval. The probability of surviving the interval is one minus the probability of dying in the interval as indicated by Rich et al. and Turkson  .

In preparing for Kaplan-Meier analysis, each subject must be characterized by three variables namely:

・ The time variable which should be sequential;

・ The status variable at the end of the time (uncensored or censored); and

・ The treatment group variable.

These characteristics have been displayed in Table 1. For the construction of survival time probabilities and curves, the sequential times for each subject are arranged from the shortest to the longest time, without regard to when they entered the study. By this maneuver, all subjects within the group begin the analysis at the same point, they will be studied till the occurrence of the event of interest or censoring.

1.2. Kaplan-Meier as a Non-Parametric Technique

In non-parametric techniques, the models are adaptive and can exhibit a high degree of flexibly. The model structure is not specified a priori but is determined

Table 1. Table for Kaplan-Meier analysis.

from data, this does not necessarily mean that the model lacks parameters, but, instead, the number and nature of the parameters are flexible and not fixed in advance. In simple statistical terminology, the model does not conform to a normal distribution as they rely on continuous data and not discrete data as observed by Nisbet et al. . The above then defines the characteristics of the Kaplan-Meier estimate (K.M). The Life TableProcedure also falls under the non-parametric model. As has been described earlier, Survival analysis deals with non-negative regression and density estimates for a single random variable in the presence of censoring.

The merit of using the non-parametric model is that: it is flexible; with the model becoming more complicated as the number of observations increase. The demerits are twofold: Firstly, it is not easy to incorporate covariates, this can be overcome by fitting a different model on two subpopulations and comparing them. Secondly, the survival functions are not smooth. That is to say, they are piecewise constant. This challenge could be smoothed out by adopting the kernel smoothing techniques.

1.3. Generating the Kaplan-Meier (K-M) Product-Limit Estimator

The product limit estimator is used to estimate the survival functions, using censored data, the intervals comprise the spaces between the data as they were observed, some of the data may be censored while others may be uncensored.

Suppose that ${T}_{1},{T}_{2},\cdots ,{T}_{n}$ are right censored by constants ${t}_{1},{t}_{2},\cdots ,{t}_{n}$. We observe $\left({Z}_{j},{\delta }_{j}\right)$, $j=1,2,\cdots ,n$, where ${Z}_{j}=\mathrm{min}\left({T}_{j},{t}_{j}\right)$ and $\delta =\left\{\begin{array}{l}1,\text{ }\text{ }\text{ }\text{if}\text{\hspace{0.17em}}{T}_{j}\le {t}_{j}\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{uncensored}\\ 0,\text{ }\text{if}\text{\hspace{0.17em}}{T}_{j}>{t}_{j}\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{censored}\end{array}.$

Since the data occur naturally as ordered pairs, we know exactly which points were censored and which points were not censored, assuming there were no tied observations, then

${Z}_{\left(1\right)}<{Z}_{\left(2\right)}<\cdots <{Z}_{\left( n \right)}$

Since the observed data are all different, we must always ensure that when the data are ranked from smallest to largest, the censoring indicators are also re-arranged to follow the new ranking. Let ${\delta }_{j}$ denote the value of the indicator associated with ${Z}_{\left(j\right)}$. To construct the survival estimator, Namboodiri and Suchundran  assets that we divide $\left(0,{Z}_{n}\right)$ into disjoint intervals ${I}_{j}=1,2,\cdots ,n$ such that ${Z}_{\left(0\right)}=0.$

Definition: The risk set at time t denoted by $R\left(t\right)$, is the set of indices of subjects which are still alive and under observation at time ${t}^{-}$, that is, at time prior to t, then

${N}_{j}$ (The number at risk) is the number of elements in the set $R\left({Z}_{j}\right).$

${D}_{j}$ (The number of deaths) is the observed failure at ${Z}_{j}$ and is equal to 0 or 1.

${p}_{j}$ (The conditional probability) = P (Surviving through ${I}_{j}$ | Alive at the start of ${I}_{j}$ ).

Estimate of ${p}_{j}=\frac{\text{number}\text{\hspace{0.17em}}\text{dying}\text{\hspace{0.17em}}\text{in}\text{\hspace{0.17em}}{I}_{j}}{\text{number}\text{\hspace{0.17em}}\text{with}\text{\hspace{0.17em}}\text{the}\text{\hspace{0.17em}}\text{potential}\text{\hspace{0.17em}}\text{to}\text{\hspace{0.17em}}\text{die}\text{\hspace{0.17em}}\text{in}\text{\hspace{0.17em}}{I}_{j}}=\left\{\begin{array}{l}1-\frac{1}{N},\text{ }\text{if}\text{\hspace{0.17em}}{\delta }_{\left(j\right)}=1\\ 0\text{ }\text{ }\text{ }\text{if}\text{\hspace{0.17em}}{\delta }_{\left(j\right)}=0\end{array}$

The product of all such estimates ${\stackrel{^}{p}}_{j}$ gives an estimate of the survival functions S, we note that as j increases, the risk sets diminish one at a time: ${N}_{j}=n-\left(j-1\right)=n-j+1$, putting these ideas together, for a fixed value of u

$\begin{array}{c}\stackrel{^}{S}\left(t\right)={\prod }_{j:{Z}_{\left(j\right)}\le t}{\stackrel{^}{p}}_{j}\\ ={\prod }_{j:{Z}_{\left(j\right)}\le t,{\delta }_{\left(j\right)}=1}\left(1-\frac{1}{N}\right)\\ ={\prod }_{j:{Z}_{\left(j\right)}\le t}{\left(1-\frac{1}{N}\right)}^{{\delta }_{\left(j\right)}}\\ ={\prod }_{j:{Z}_{\left(j\right)}\le t}{\left(1-\frac{1}{n-j+1}\right)}^{{\delta }_{\left(j\right)}}\\ ={\prod }_{j:{Z}_{\left(j\right)}\le t}{\left(\frac{n-j}{n-j+1}\right)}^{{\delta }_{\left(j\right)}}\end{array}$ (1)

This estimator (1) is the most common method of estimating the survival function:

$S\left(t\right)=P\left(T>t\right)=\stackrel{^}{S}\left({t}_{j}\right)=\stackrel{^}{S}\left({t}_{j-1}\right)×\stackrel{^}{Pr}\left(T\ge {t}_{j}|T\ge {t}_{j}\right)$ (2)

Equation (2) gives the probability of surviving past the previous failure time ${t}_{j-1}$ multiplied by the conditional probability of surviving past time ${t}_{j}.$

Suppose that we have n observations and that there are m unique event times arranged in ascending order ${t}_{\left(1\right)},{t}_{\left(2\right)},\cdots ,{t}_{\left(m\right)}.$

Between $t=0$ and $t={t}_{\left(1\right)}$ that is, the time of the first event, the estimate of the survival function $\stackrel{^}{S}\left(t\right)=1.$

${n}_{j}$ = # of subjects at risk of the event just before time ${t}_{j}.$

${d}_{j}$ = # of events observed (death) at time ${t}_{j}.$

${p}_{j}$ = P (surviving through ${t}_{\left(j\right)}$ | alive at the beginning of ${t}_{\left(j\right)}$ ) = $P\left(T>{t}_{j}|T\ge t\right).$

${q}_{j}=1-{p}_{j}$ (That is, the complement of ${p}_{j}$ ).

Recalling the multiplication rule for joint events A1 and A2

$P\left({A}_{1}\cap {A}_{2}\right)=P\left({A}_{2}|{A}_{1}\right)P\left( A 1 \right)$

We can write the survival probability as $S\left(t\right)=P\left(T>t\right)={\prod }_{{t}_{j}\le t}{p}_{j}$. The estimate of ${p}_{j}$ and ${q}_{j}$ are ${\stackrel{^}{q}}_{j}=\frac{{d}_{j}}{{n}_{j}}$ and $\stackrel{^}{p}=1-{q}_{j}=1-\frac{{d}_{j}}{{n}_{j}}=\frac{{n}_{j}-{d}_{j}}{{n}_{j}}$, thus the K-M estimator is given by

$\stackrel{^}{S}\left(t\right)=\underset{{t}_{j}\le t}{\prod }{\stackrel{^}{p}}_{j}=\underset{{t}_{j}\le t}{\prod }\left(\frac{{n}_{j}-{d}_{j}}{{n}_{j}}\right)$

$\stackrel{^}{S}\left(t\right)={\prod }_{{t}_{\left(j\right)}\le t}\frac{{n}_{j}-{d}_{j}}{{n}_{j}}$ (3)

Using the K-M definition the estimator of survivorship in Equation (3) can be written alternatively as

$\begin{array}{l}\stackrel{^}{S}\left({t}_{j}\right)=\stackrel{^}{S}\left({t}_{\left(j-1\right)}\right)×\stackrel{^}{P}\left(T>{t}_{\left(j\right)}|T\ge {t}_{\left(j\right)}\right)\\ \text{where}\text{\hspace{0.17em}}\text{ }\stackrel{^}{S}\left({t}_{\left(j-1\right)}\right)={\prod }_{i=1}^{j-1}\stackrel{^}{P}\left(T>{t}_{\left(j\right)}|T\ge {t}_{\left(j\right)}\right)\end{array}$ (4)

When there is no censoring, ${n}_{j}$ is just the number of survivors prior to time ${t}_{j}$. With censoring, ${n}_{j}$ is the number of survivors less the number of losses (censored cases). It is only those surviving cases that are still being observed and are “at risk” of death. The principle governing the use of the K-M estimate is illustrated as follows:

1.4. Kaplan-Meier Computations

We use part of the Malaria data collected to provide the computational mechanism of the Kaplan-Meier Estimator. The data is produced below.

Note: + indicates a censored value.

In the computation of K-M estimates, we include incomplete information about patients. We first present the data layout needed to do the computation and proceed with the use of Equation (4) (Table 2).

Table 2. The basic data layout needed to understand survival data.

1.5. Computing the K-M Survival Function Using Data from Table 2

In computing the survival function of users of ITN the following K-M formula was used

$\stackrel{^}{S}\left({t}_{j}\right)=\stackrel{^}{S}\left({t}_{\left(j-1\right)}\right)×\stackrel{^}{P}\left(T>{t}_{\left(j\right)}|T\ge {t}_{\left(j\right)}\right)\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{where}\text{\hspace{0.17em}}\text{ }\stackrel{^}{S}\left({t}_{\left(j-1\right)}\right)=\underset{i=1}{\overset{j-1}{\prod }}\stackrel{^}{P}\left(T>{t}_{\left(j\right)}|T\ge {t}_{\left( j \right)}\right)$

Observation: It is noted from Table 3 that the censored figures were not included in the time column, that notwithstanding they were captured under the censored status. For instance, 28+ was captured between time 23 and 34 and recorded as censored in line with time t(4) and beyond. The censored figure 45+, was also captured between the time periods 34 and 48 days and recorded in line with t(6).

1.6. The Life TableMethod

Selvin  has defined a life table as a highly organized description of age-specific mortality rates. Its application ranges well beyond mortality analysis, it has also attracted studies in the social sciences. There is a close connection between life tables and survival analysis in that both theories are expressed in terms of follow-up of a cohort; both rely on concepts like survival function and hazard functions. The difference is that the method of survival analysis is applied to data collected by following a relatively small cohort over a short period of time whereas life table methods are generally used to analyze cross-sectional data from large populations. A significant feature of the life table method according to Govindurajulu and Qadri  is that it does not require a standard population. Alison  has observed that life table is among the most common methods used to analyze survival data. Demographic and insurance studies also fall under its jurisdiction. The primary function of a life table is to summarize survival data grouped into intervals to provide estimates of the survivor function, the density

Table 3. Results of the K-M computations.

function, and the hazard rate. Kalbfleisch and Prentice  and Allison  have indicated that the life table is designed for situations where only the interval in which failure or censoring occurred is known but the actual failure or censoring time is unknown. Teachman  has alluded that Life Tables are also useful for preliminary evaluation of data and evaluating the fit of regression models. It also allows assessment of exogenous variables in more complex analyses. Additionally, it can be used to assess the mortality level of a population and its age structure, project the population into the future, assess the survival rate and the number of cases at risk within a cohort.

There are basically three types of life tables; ordinary life table, multiple decrement life tables and cause deleted life tables. In ordinary life table membership in a well-defined study group can be terminated by a single attrition factor while in a multiple-decrement life table, there are multiple reasons for attrition. The life table methods can be used to study a variety of cohorts. A birth cohort where the attrition factors are causes of death. A marriage cohort where the attrition factors are causes of marital disruption and death due to a disease cohort where the attrition factor is death due to diseases. In this section we limit our discussions to ordinary life tables.

1.7. Ordinary Life Tables

According to Shryock and Siegel , the ordinary life table is a statistical technique that gives the summary of mortality experiences, survivorship and the life expectancy of a population of varying age. It gives a measure of period age-specific death rates, it also describes the hypothetical survival experiences of a synthetic or fictitious cohort subject to current death rates over an imagined lifetime. Another school of thought, views the table’s numbers as a description of the stationary population model build up by a succession of birth cohorts of the same size all of which experience the same age-specific death rates. The Tablealso gives information about longevity and expectation of life. It does provide a technique of analyzing data for all the causes of the event of interest. This discussion is limited to the study of current or period life table, which is based on the mortality experiences of a hypothetical cohort of newborn babies, usually 100,000 newborns who are subject to the age-specific mortality rates on which the table is based. It traces the cohort of newborns throughout their lifetime under the assumption that they are subject to the age-specific mortality rates of a country. There are two types of current life tables: Unabridged for single years of life and abridged for five (5) year cohorts of life, again, current discussion is based on the abridged life table.

From Table 4 and Table 5, we give brief explanations to some items in the nine columns. To calculate the age specific mortality rate for the first year of life, we proceeded as follows:

${P}_{x}=90035$, ${D}_{x}=738$

${}_{5}q{}_{0}=\frac{738}{90035+0.5×738}=0.00816$

Table 4. Column notation for a life table.

Table 5. Example of an abridged life table for a population under study.

The other columns can be calculated using the formulae given in the text above. ${l}_{0}$ which is the radix of the Life Tableis arbitrary taken as 100,000. For example, in column (6) ${}_{5}d{}_{0}$ is equal to 0.00816 × 100000 = 816. In column (5), ${l}_{1}=100000-816=99184$ again ${}_{5}L{}_{0}=100000-0.9×816=99266$. This is because in early childhood, mortality declines rapidly with age. 10L5 = 99187 ? 0.5 × 771 = 98799, it is assumed that at most ages’ deaths are evenly distributed. ${T}_{x}=\sum {}_{n}L{}_{x}$. For instance, ${T}_{80}=77402+78142+78616=234160.$

As stated earlier on, the value of ex gives a summary of the life table survival in terms of mean life left to be lived by each age group x. From Table 5, ${e}_{x}=\frac{{T}_{x}}{{l}_{x}}$, it could be inferred that the mean years of life left for a new born child e0 is 61.9 years. The crude mortality rate is just the reciprocal of the expectation of life ex. For a new born child the expectation of life is equal to 1/61.9 which is equal to 0.016, this translates to 160 deaths per 100,000 person years. The crude mortality rate measures the risk of death.

${\sum }_{0}^{n}{}_{n}d{}_{x}={l}_{0}$, that is to say, the sum of the number of deaths in all age intervals gives us the radix. Number of persons dying before age x is given by ${l}_{0}-{l}_{x}$, in the Life Table(Table5), the number of new born children dying before age twenty (20) is given by ${l}_{0}-{l}_{20}=100000-97127=2873$.

The probability of new born children dying before age x, where x is twenty (20) years is given by $\frac{{l}_{0}-{l}_{x}}{{l}_{0}}=\frac{100000-97127}{100000}=\frac{2873}{100000}=0.0287.$

1.8. Calculation of Survival Rates

We use the abridged life table (Table 5) to calculate survival rates. For population projections, 5-year survival rates are computed and for estimates of net migration, 10-year survival rates are calculated. Calculation of survival rates relies on two columns from the table, i.e. ${T}_{x}$ and ${L}_{x}$. Measure evaluation .

$\text{5-yearsurvivalrate}={S}_{\left(x+5\right)}=\frac{{}_{5}L{}_{\left(x+5\right)}}{{}_{5}L{}_{x}}$ (5)

To calculate the probability that a child in age bracket (15 - 20) will survive to the next age bracket (20 - 25), is obtained as follows:

${S}_{\left(20\text{\hspace{0.17em}}\text{-}\text{\hspace{0.17em}}25\right)}=\frac{{L}_{\left(20\text{\hspace{0.17em}}\text{-}\text{\hspace{0.17em}}25\right)}}{{L}_{\left(15\text{\hspace{0.17em}}\text{-}\text{\hspace{0.17em}}20\right)}}=\frac{96934}{97439}=0.9948$

・ Survivorship of the youngest age cohort (0 - 5) and (5 - 10):

${S}_{\left(0\text{\hspace{0.17em}}\text{-}\text{\hspace{0.17em}}5,5\text{\hspace{0.17em}}\text{-}\text{\hspace{0.17em}}10\right)}=\frac{{L}_{\left(0\text{\hspace{0.17em}}\text{-}\text{\hspace{0.17em}}5\right)+}{L}_{\left(5\text{\hspace{0.17em}}\text{-}\text{\hspace{0.17em}}10\right)+}}{{l}_{0}×\left(\text{ageinterval}\right)}=\frac{99266+98799}{100000×5}=0.3961$

・ Survivorship of the oldest age cohort ${S}_{{95}^{+}}=\frac{{T}_{90\text{\hspace{0.17em}}\text{-}\text{\hspace{0.17em}}95}}{{T}_{85\text{\hspace{0.17em}}\text{-}\text{\hspace{0.17em}}90}}=\frac{77402}{155543}=0.4976$

1.9. Computation of Ordinary Life Table

LaMorte  and Kleinbaum  have presented a Life Tableused widely in biostatistics called a cohort Life Table. This table summarizes the experiences of participants over a pre-defined period in a cohort study until the event of interest is observed or the study is terminated. We organize the table into five (5) equally spaced intervals (that is, 0 - 4, 5 - 9, 10 - 14, 15 - 19 and so on) (Table6).

Observation: From time 17 to 21 years the survival probabilities were all the same, the simple reason is that all the subjects within that period were censored. The same applies to subjects within the time 24 to 30 years (Table 7).

2. Methodology

Secondary data was collected to assess the survival experiences of residents in

Table 6. Life table construction using the actuarial method.

Footnote: t = Time interval in years; ${C}_{t}$ = Number censored; ${N}_{t}$ = Number at risk during interval; ${q}_{t}$ = Proportion dying during Interval; ${N}_{\rho }$ = Average number at risk during interval; ${p}_{t}$ = Proportion surviving among those at risk; ${D}_{t}$ = Number of deaths during interval; $S\left(t\right)$ = Survival function.

Table 7. Life table construction using the Kaplan-Meier model.

the Sekondi-Takoradi district who use various means besides insecticide treated nets (ITNs) to prevent exposure to the malaria parasite. In doing so, the researcher assessed hospital in-patients who reported cases of malaria (origin state) to time until death or censored (recovered) as a result of causal factors (exposure levels to the malaria parasites). The event of interest was time until death. Predictor variables were age, gender, and preventive measures (level of exposure). The extracted data for each person was coded as follows: Gender: Male = 1; Female = 2.

Type of preventive measure used: Insecticide treated net = 0; mosquito nets plus others = 1.

Censoring status: if patient died, it is coded as 1; If patient was discharged, died from a different sickness or was alive at the end of study, the code is equal to 0.

Difference between date of discharge/death and date of admission = survival time in days. Age was seen as a continuous variable. The entire data was analyzed with the help of the Kaplan-Meier and The Life Tablefunctions in the SPSS application software.

The following command was followed: Select “Analyze”, “Survival”, “Kaplan-Meier”. “SURVT” was selected from the variable list and entered into the “Time box”. “Status” variable was selected and entered into the ‘Status box’. When a question mark in parentheses appeared after the status variable, “the define event button” was clicked and a value of 1 was inserted in the box since the variable status was coded 1 for events and 0 for censorships. The process was continued then okayed. To view the output, the “Paste” option was selected rather than OK. The following syntax was obtained:

KM survt/STATUS = status/PRINT TABLE MEAN (1)

To obtain Life Tableestimates these steps were followed: Analyze > Survival > Life Tables. The time-to-event and status variables were defined similarly as described above for K-M estimates. The time intervals were defined and used for the life table analysis. 0 through 100 by 5 was imputed, this produced 20-time intervals of equal length. The factor variable “preventive measure” was also keyed into the factor box and the range (being the number of categories) was defined, in this example the categories were two, thus (min = 1, max = 2). The prompt actions were followed till the end of process.

3. Results and Discussion

3.1. Data on Malaria Cases for Users and Non-Users of ITN

We see from (Table 8) that out of the 1793 patients sampled 405 representing 22.6% were using insecticide treated nets while the majority (77.4%) was using other types of nets like window netting and ordinary treated nets. It was also established that of those who were using ITN 16% died within the six months study period while 84% survived, again out of the non-users of ITN 23% died

Table 8. Summary of data on malaria cases for users and non-users of ITN.

while 77% survived within the same study period. The ratio of death of male to female was 1.2׃1. This ratio indicates that death due to malaria for the period of observation was not gender related.

3.2. Survival Functions for Gender and Used Method of Protection

The study demonstrated the use of real-life data in the computation of both the K-M estimators and the life table method. From the graph of Figure 1, it could be inferred that the survival experiences of males and females were approximately the same, this implies that gender did not contribute significantly to death due to malaria. The plot of Figure 2 gave a graphical picture of the survival curves of the two groups of users of ITN. It was noticed from the graph that the survival experiences at the first few days of the study appeared to be the same but thereafter the differences showed up clearly, it was also revealed that the curve for the users of ITN consistently laid above that of the non-users, this characteristic shows that users of ITN had a better survival prognosis than non-users. The difference further meant that ITN was effective at all points during the period. From the graph one could estimate the median survival times for the two classes of users. This was done by locating 0.5 on the y-axis and proceeding horizontally till it met the curves, once the horizontal line meets the curve, a vertical line is drawn from the point of intersection of the curve and the horizontal line to meet the x-axis. From the graph the median survival time of the non-users of ITN was approximately 10 days while that of the users of ITN was close to forty (40) days. The median value further confirms the claim that users of ITN had better survival prognosis than non-users.

Figure 1. Survival curves of male and females and risk of malaria death. The curves for the male and the female cross each other at various points. Results show that a lot more of the females were censored than their male counterparts.

Figure 2. Survival curves of users and non-users of ITN. The curve for users of ITN lies above that of the non-users. For those who did not use ITN there were many steps within the curve with each step representing death.

The comparative analysis made on the Life Tableresults (Table9) suggests that patients who subscribed to ITNs as means of preventing exposure to the mosquito parasites had better and longer survival experiences than patients who subscribed to other means of protection. From the life table results presented as Table9, we compared the survival experiences of residents who subscribed to ITNs as means of reducing exposure to the mosquito parasites with those who subscribed to mosquito nets and other nets as means of protection. The results of the overall comparisons provided by the life table (Wilcoxin/Gehan statistics, p = 0.002) was the same as the one provided by the log rank test in the Kaplan-Meier method, they all showed that the method of protection adopted by the residents to prevent exposure to the mosquito parasite was highly significant. For the patients who subscribed to ITNs, the probability that they would survive for at least a day after admission at the hospital was about 0.9, for the non-users of ITNs, probability that they would survive for at least a day after admission at the hospital was 0.8, again, the probability that patients who use ITNs will survive for at least 30 days after admission was 0.6, while for patients who used other means of protection their probability was 0.2, finally, the probability that patients who used ITNs would survive for at least 60 days after admission was 0.3, but for those who subscribed to other means of protection, the probability of surviving for at least 60 days was 0.1.

3.3. Hypothesis Testing

A test of hypothesis was conducted to ascertain whether the survival curves for the two categories of users were the same or not.

Hypothesis: HO: The survival curves of the exposed and the less exposed

Table 9. Life table results showing the survival functions for the method of protection adopted by the dwellers against mosquito bites.

groups are equivalent.

Test statistic: Log-Rank test (Mantel-Cox)

Decision criteria; Reject the HO, if the p-value is less than α = 0.05.

The computer output (Table 10) provides useful information needed to test

Table 10. Summary of test results for testing the equality of Survival curves for exposed and less exposed groups. The log-rank test was used to test whether Kaplan-Meier curves for the two exposure groups in the entire population were statistically equivalent.

whether there was any difference in the population survival curves for the two classes of users of ITN, after adjusting for gender (since gender did not contribute significantly to the risk of malaria death). The null hypothesis for this test was that there was no difference in the survival curves of the users of ITN and non-users. The p-value of the log-rank test (0.002 < 0.05) was highly significant, implying that there was a statistically significant difference between the population survival curves. In my candid opinion, the purpose for which the researcher set out to carry out this study has been achieved

4. Conclusion

Survival analysis is an important field in the theory and practice of biostatistics. As alluded by Kosi?ska and Szwed  survival analysis is suitable for studies of the distribution in time of developmental events. It can be used to indicate the factors which significantly influence the course of development by modifying the duration of developmental stages. The techniques developed in survival analysis have penetrated many disciplines; various methods are available in the literature for analyzing survival data. Due to time and space, the study touched on the Kaplan-Meier Estimator and Life Tables. Lyu  used the Kaplan Meier approach to study the behavior of lung cancer patients. Through this discourse, it has been shown how survival probabilities could be obtained and graphed, how one could find the number at risk, conditional probabilities, number of deaths and probability of a person surviving to the next period given that he has survived the current period. It has also been demonstrated how the life table procedure could be generated and used to find the age-specific mortality rates, survival rates, longevity or expectation of life, mean years left, and crude mortality rates. Additionally, it has been shown how researchers could find the number of newborns dying before age twenty and for that matter, the probability of newborns dying before age twenty. Above all, the study has shown how one could include incomplete information into the computation of both the K-M estimators and the life table methods.

Cite this paper: Turkson, A.J. (2021) A Closer Look at the Kaplan-Meier and Life Table Models in Survival Analysis. Open Access Library Journal, 8, 1-19. doi: 10.4236/oalib.1108104.
References

   Hosmer, D.W., Lemeshow, S. and May, S. (2008) Descriptive Methods for Survival Data. In: Applied Survival Analysis, John Wiley, Hoboken. https://doi.org/10.1002/9780470258019

   Hosmer, D.W.Jr. and Lemeshow, S. (1999) Applied Survival Analysis, Regression Modeling of Time to Event Data. John Wiley and Sons, Inc., New York.

   Kaplan, E.L. and Meier, P. (1959) Nonparametric Estimation from Incomplete Observations. Journal of the American Statistical Association, 53, 457-481. https://doi.org/10.1080/01621459.1958.10501452

   Rich, J.I., Neely, J.G., Courtney, P., Voelker, C.J., Nussenbaum, B. and Wang, E.W. (2010) A Practical Guide to Understanding Kaplan Meir Curves. Otolaryngology—Head and Neck Surgery, 143, 331-336. https://doi.org/10.1016/j.otohns.2010.05.007

   Turkson, A.J. (2010) Understanding the Basic Concepts of Survival Analysis. Journal of Biostatistics, 4, 63-80.

   Nisbet, R., Gray, M. and Yale, K. (2018) Handbook of Statistical Analysis and Data Mining Application. Elsevier Inc., Academic Press, Cambridge.

   Namboodiri, K. and Suchindran, C.M. (1987) Life Tables and Their Applications. Academic Press, Orlando.

   Selvin, S. (2008) Survival Analysis for Epidemiology and Medical Research: A Practical Guide. Cambridge University Press, New York.

   Govindarajulu, U.S. and Qadri, M. (2019) Survival and Mediation Analysis with Correlated Frailty. Current Research in Biostatistics, 9, 21-30. https://doi.org/10.3844/amjbsp.2019.21.30

   Allison, P.D. (1995) Survival Analysis Using the SAS System: A Practical Guide. SAS Institute, Inc., Cary.

   Kalbfleisch, J.D. and Prentice, R.L. (1980) The Statistical Analysis of Failure Time Data. John Wiley and Sons, New York.

   Allison, P.D. (1984) Event History Analysis. Sage Publications, Beverly Hills. https://doi.org/10.4135/9781412984195

   Teachman, J.D. (1983) Analyzing Social Processes: Life Table and Proportional Hazards Model. Social Science Research, 12, 263-301. https://doi.org/10.1016/0049-089X(83)90015-7

   Shryock, H. and Siegel, S. (1973) The Methods and Materials of Demography. US Government Printing Office, Washington DC.

   Measure Evaluation (n.d.) Overview of Life Tables and Survival Rates. https://www.measureevaluation.org/resources/training/online-courses-and-resources/non-certificate-courses-and-mini-tutorials/population-analysis-for-planners/lesson-7.html#:%7E:text=Survival%20rates%20are%20derived%20from%20life%20tables%20or%20census%20data,be%20alive%20in%20the%20future.&text=Life%20tables%20are%20used%20to,several%20types%20of%20life%20tables

   LaMorte, W.W. (2016) Survival Analysis. https://sphweb.bumc.bu.edu/otlt/MPH-Modules/BS/BS704_Survival/index.html

   Kleinbaum, D.G. and Klein, M. (2005) Survival Analysis a Self-Learning Test. Springer Science Business Media, New York. https://doi.org/10.1007/0-387-29150-4

   Kosińska, M. and Szwed, A. (2014) Application of Survival Analysis in Studies of Human Ontogeny. Applied Mathematics, 5, 1697-1704. https://doi.org/10.4236/am.2014.511162

   Lyu, R. (2020) Survival Analysis of Lung Cancer Patients from TCGA Cohort. Advances in Lung Cancer, 9, 1-15. https://doi.org/10.4236/alc.2020.91001.

Top