| Title: | Datasets from "Modelling Survival Data in Medical Research" by Collett |
|---|---|
| Description: | Datasets for the book entitled "Modelling Survival Data in Medical Research" by Collett (2023) <doi:10.1201/9781003282525>. The datasets provide extensive examples of time-to-event data. |
| Authors: | Mark Clements [aut, cre] (ORCID: <https://orcid.org/0000-0003-4518-5670>), Enoch Yi-Tung Chen [ctb] (ORCID: <https://orcid.org/0000-0003-2448-708X>) |
| Maintainer: | Mark Clements <[email protected]> |
| License: | MIT + file LICENSE |
| Version: | 0.1.3 |
| Built: | 2026-06-02 09:40:54 UTC |
| Source: | https://github.com/mclements/collett |
Clinical trial of 44 patients with chronic active hepatitis randomised to either the drug prednisolone or an untreated control group.
active_hepatitisactive_hepatitis
A data frame with 44 rows and 3 variables:
treatmentinteger treatment (1=prednisolone, 2=control)
timeinteger survival time from admission to study (months)
statusinteger event indicator (1=event, 0=right censored)
See Collett (2023)
For female breast cancer patients from Middlesex Hospital. The dataset includes the result of staining using Helix pomatia agglutinin (HPA).
bcancerbcancer
A data frame with 45 rows and 3 variables:
staininteger for negative staining (=1) or positive staining (=2)
timeinteger time in months for survival
statusinteger for status at end of follow-up (0=censored, 1=death)
For details about the study design, see Leathem and Brooks (1987).
The dataset is described in Example 1.2 and Table 1.2 (Collett, 2023, pages 6-7).
Leathem AJ, Brooks S. Predictive value of lectin binding on breast-cancer recurrence and survival. The Lancet. 1987 May 9;329(8541):1054-6. doi:10.1016/S0140-6736(87)90482-X
library(survival) plot(survfit(Surv(time,status)~stain, data=bcancer), col=1:2, xlab="Survival time (months)", ylab="Survival") legend("topright", legend=c("Negative staining","Positive staining"), col=1:2, lty=1, bty="n")library(survival) plot(survfit(Surv(time,status)~stain, data=bcancer), col=1:2, xlab="Survival time (months)", ylab="Survival") legend("topright", legend=c("Negative staining","Positive staining"), col=1:2, lty=1, bty="n")
Placebo controlled trial of bladder cancer patients randomised to thiopeta or to placebo
bladderbladder
A data frame with 86 rows and 6 variables:
patientinteger patient number (1-86)
timeinteger survival time in months
statusinteger status of patient (0=censored, 1=recurrence)
treatinteger treatment group (1=placebo, 2=thiotepa)
initinteger initial number of tumours
sizeinteger diameter of larger initial tumour in cm
See Collett (2023)
A study of 37 patients with leukaemia in complete remission who received a non-depleted allogenic bone marrow transplant.
bone_marrowbone_marrow
A data frame with 37 rows and 9 variables:
patientinteger patient number (1-37)
timeinteger survival time in days
statusinteger status of patient (0=alive, 1=dead)
rageinteger age of patient in years
dageinteger age of donor in years
typeinteger type of leukaemia (1=AML, 2=ALL, 3=CML)
preginteger Donor pregnancy (0=no, 1=yes)
indexdouble index of cell-lymphocyte reactions
gvhdinteger graft-versus-host disease (0=no, 1=yes)
See Collett (2023)
Patient outcome following bone marrow transplantation
bone_marrow_txbone_marrow_tx
A data frame with 2204 rows and 9 variables:
idinteger patient id
leukaemiacharacter type of leukaemia (CML,ALL,AML)
agecharacter age group of patient in years (<=20, 21-40, >40))
matchinteger indicator for whether there was a donor gender match (0=no, 1=yes)
tcellinteger indicator for whether there was T-cell depletion (1=yes, n=no)
ptimeinteger time to platelet recovery (days)
pcensinteger event indicator for platelet recovery (1=event, 0=censored)
rdtimeinteger time to relapse of death (days)
rdcensinteger event indicator for relapse or death (1=event, 0=censored)
See Collett (2023)
Recurrence free survival in breast cancer patients
breast_rfsbreast_rfs
A data frame with 686 rows and 11 variables:
idinteger patient id
treatinteger hormonal treatment (0=no tamoxifen, 1=tamoxifen)
ageinteger patient age (years)
meninteger menopausal status (1=premenopausal, 2=postmenopausal)
sizeinteger tumour size (mm)
gradeinteger tumour grade (1,2,3)
nodesinteger number of positive pymph nodes
proginteger progesterone receptor status (femtomoles)
oestinteger oestrogen receptor status (femtomoles)
timeinteger recurrence-free survival time (days)
statusinteger event indicator (0=censored, 1=relapse or death)
See Collett (2023)
The datasets are based on the official .zip file. A table for the dataset names and file names sorted by file name is here:
| Dataset name | File name |
| -------------------- | ----------------- |
| illustration | "A numerical illustration.dat" |
| leukaemia | "Bone marrow transplantation in the treatment of leukaemia.dat" |
| bone_marrow | "Bone marrow transplantation.dat" |
| ovarian | "Chemotherapy in ovarian cancer patients.dat" |
| active_hepatitis | "Chronic active hepatitis.dat" |
| granulomatous | "Chronic granulomatous disease.dat" |
| tamoxifen | "Clinical trial of tamoxifen in breast cancer patients.dat" |
| prostatic | "Comparison of two treatments for prostatic cancer.dat" |
| kidneytx | "Comparisons between kidney transplant centres.dat" |
| liverbase | "Data from a cirrhosis study (baseline).dat" |
| liver_counting | "Data from a cirrhosis study (in counting process format).dat" |
| lbrdata0 | "Data from a cirrhosis study (lbr data).dat" |
| HELP | "Health evaluation and linkage to primary care.dat" |
| dialysis | "Infection in patients on dialysis.dat" |
| bone_marrow_tx | "Patient outcome following bone marrow transplantation.dat" |
| bcancer | "Prognosis for women with breast cancer.dat" |
| pulmonary | "Pulmonary metastasis.dat" |
| breast_rfs | "Recurrence free survival in breast cancer patients.dat" |
| ulcer | "Recurrence of an ulcer.dat" |
| bladder | "Recurrence of bladder cancer.dat" |
| mammary | "Recurrence of mammary tumours in female rats.dat" |
| valve | "Survival following aortic valve replacement.dat" |
| tplant | "Survival following kidney transplantation.dat" |
| ducks | "Survival of black ducks.dat" |
| mice | "Survival of laboratory mice.dat" |
| liver | "Survival of liver transplant recipients.dat" |
| myeloma | "Survival of multiple myeloma patients.dat" |
| lung | "Survival of patients registered for a lung transplant.dat" |
| gcancer | "Survival of patients with gastric cancer.dat" |
| melanoma | "Survival times of patients with melanoma .dat" |
| livertx | "Time to death while waiting for a liver transplant.dat" |
| IUD | "Time to discontinuation of the use of an IUD.dat" |
| kidney | "Treatment of hypernephroma.dat" |
And now sorted by the dataset names:
| Dataset name | File name |
| -------------------- | ----------------- |
| active_hepatitis | "Chronic active hepatitis.dat" |
| bcancer | "Prognosis for women with breast cancer.dat" |
| bladder | "Recurrence of bladder cancer.dat" |
| bone_marrow | "Bone marrow transplantation.dat" |
| bone_marrow_tx | "Patient outcome following bone marrow transplantation.dat" |
| breast_rfs | "Recurrence free survival in breast cancer patients.dat" |
| dialysis | "Infection in patients on dialysis.dat" |
| ducks | "Survival of black ducks.dat" |
| gcancer | "Survival of patients with gastric cancer.dat" |
| granulomatous | "Chronic granulomatous disease.dat" |
| HELP | "Health evaluation and linkage to primary care.dat" |
| illustration | "A numerical illustration.dat" |
| IUD | "Time to discontinuation of the use of an IUD.dat" |
| kidney | "Treatment of hypernephroma.dat" |
| kidneytx | "Comparisons between kidney transplant centres.dat" |
| lbrdata0 | "Data from a cirrhosis study (lbr data).dat" |
| leukaemia | "Bone marrow transplantation in the treatment of leukaemia.dat" |
| liver | "Survival of liver transplant recipients.dat" |
| liver_counting | "Data from a cirrhosis study (in counting process format).dat" |
| liverbase | "Data from a cirrhosis study (baseline).dat" |
| livertx | "Time to death while waiting for a liver transplant.dat" |
| lung | "Survival of patients registered for a lung transplant.dat" |
| mammary | "Recurrence of mammary tumours in female rats.dat" |
| melanoma | "Survival times of patients with melanoma .dat" |
| mice | "Survival of laboratory mice.dat" |
| myeloma | "Survival of multiple myeloma patients.dat" |
| ovarian | "Chemotherapy in ovarian cancer patients.dat" |
| prostatic | "Comparison of two treatments for prostatic cancer.dat" |
| pulmonary | "Pulmonary metastasis.dat" |
| tamoxifen | "Clinical trial of tamoxifen in breast cancer patients.dat" |
| tplant | "Survival following kidney transplantation.dat" |
| ulcer | "Recurrence of an ulcer.dat" |
| valve | "Survival following aortic valve replacement.dat" |
As an alternative to using the R datasets, the collett_data
function allows for reading from the original .dat files that
are stored in the package.
collett_data(name)collett_data(name)
name |
Character string with the original filename |
A data-frame
Maintainer: Mark Clements [email protected] (ORCID)
Other contributors:
Enoch Yi-Tung Chen [email protected] (ORCID) [contributor]
Useful links:
head(collett_data("A numerical illustration.dat")) ## which is equivalent to: head(illustration)head(collett_data("A numerical illustration.dat")) ## which is equivalent to: head(illustration)
Time from dialysis to infection for patients with diseases of the kidney.
dialysisdialysis
A data frame with 13 rows and 5 variables:
patientinteger patient id
timeinteger time to infection (days)
statusinteger event indicator (0=censored, 1=infection)
ageinteger age in years
sexinteger sex of the patient (1=male, 2=female)
See Collett (2023)
Black ducks, Anas rubripes, were followed the US Fish and Wildlife Service.
ducksducks
A data frame with 50 rows and 6 variables:
duckinteger duck indicator
timeinteger survival time in days
statusinteger status of bird (0=alive or missing, 1=dead)
ageinteger age group (0=hatch-year bird, 1=bird aged >= 1 year)
weightinteger weight of bird in g
lengthinteger length of wing in mm
See Collett (2023)
Survival of patients with gastric cancer
gcancergcancer
A data frame with 90 rows and 4 variables:
patientinteger patient id
timeinteger survival time in days
statusinteger event indicator (0=censored, 1=dead)
treatinteger treatment arm (0=chemotherapy alone, 1=chemotherapy and radiotherapy)
See Collett (2023)
Trial comparing interferon with a placebo.
granulomatousgranulomatous
A data frame with 128 rows and 12 variables:
patientinteger patient number (1-128)
timeinteger time to first infection (days)
statusinteger status of patient (0=censored, 1=infection)
centreinteger treatment centre; see Collett (2023, page 504)
treatinteger treatment group (0=placebo, 1=interferon)
ageinteger age in years
sexinteger sex (1=male, 2=female)
heightdouble height in cm
weightdouble weight in kg
patterninteger pattern of inheritance (1=X-linked, 2=autosomal recessive)
cortinteger use of corticosteroids at trial entry (1=used, 2=not used)
antiinteger Use of antibiotics at trial entry (1=used, 2=not used)
See Collett (2023)
A clinical trial for patients in a residential detoxification programme. Patients were randomised to either get a referral to a HELP clinic or not.
HELPHELP
A data frame with 447 rows and 7 variables:
subjectinteger subject id
daysinteger time to linkage to primary care in days
statusinteger event indicator (0=no linkage, 1=linkage)
ageinteger age of patient in years
genderinteger gender of the patient (0=female, 1=male)
housinginteger Homelessness status (0=homeless, 1=housed)
linkageinteger assistance to linking to healthcare (0=no, 1=yes)
Collett (2023) defines this dataset as "help", however that leads to issues with using R's help system. We have changed the dataset name to "HELP". Moreover, the book uses the variables "Time" an d"Help", whereas the dataset includes variables "days" and "linkage", respectively.
Artificial data on patient survival classified according to factors a and b
illustrationillustration
A data frame with 37 rows and 4 variables:
ainteger factor a
binteger factor b
timeinteger event time
statusinteger event status (1=event, 0=right censored)
See Collett (2023).
A very simple dataset showing potential right censoring for time to discontinuation of the use of an IUD.
IUDIUD
A data frame with 18 rows and 2 variables:
timeinteger Time in weeks to discontinuation of the use of an IUD
statusinteger Indicator for whether the IUD was discontinued: 0=No, 1=Yes
These data are reported in Table 1.1 (Collett, 2023, page 6).
This study was undertaken at the University of Oklahoma Health Sciences Center to investigate survival among 36 patients with a kidney tumour (hypernephroma). Standard tangent included chemotherapy and immunotherapy, with some patients also having a nephrectomy, or surgical removal of the kidney. For further details, see Lee and Wang (2013).
kidneykidney
A data frame with 36 rows and 4 variables:
nephrectomyinteger indicator for nephrectomy (0=No; 1=Yes)
ageinteger age group (1=<60; 2=60-70; 3=>70
timeinteger for the follow-up time in months
statusinteger for status at the end of follow-up (1=died; 0=censored)
Lee ET, Wang J. Statistical Methods for Survival Data Analysis. New York, NY: John Wiley & Sons; 2013, fourth edition. https://www.wiley.com/en-sg/Statistical+Methods+for+Survival+Data+Analysis%252C+4th+Edition-p-9781118095027
Transplant survival rates by recipients of organs from deceased donors. No event was defined as being alive with a functioning graft at the last known follow-up.
kidneytxkidneytx
A data frame with 1439 rows and 9 variables:
patientinteger patient id
centreinteger transplant centre (1-8)
tsurvinteger transplant survival time (days)
tcensinteger event indicator (0=censored, 1=transplant failure)
dageinteger donor age (years)
dtypeinteger donor type (0=deceased following brain death, 1=circulatory death)
rageinteger recipient age (years)
diabinteger diabetic status (0=absent, 1=present)
citdouble cold ischaemic time (hours)
See Collett (2023). Thirty-five patients had tsurv==0 (that is, the transplanted kidney did not function).
DATASET_DESCRIPTION
lbrdata0lbrdata0
A data frame with 42 rows and 3 variables:
patientinteger patient id
timeinteger date of measurement (days)
lbrdouble log bilirubin level
See Collett (2023)
Bone marrow transplantation in the treatment of leukaemia
leukaemialeukaemia
A data frame with 23 rows and 8 variables:
patientinteger patient id
timeinteger survival time in days
statusinteger event indicator (0=alive, 1=dead)
groupinteger disease group (1=ALL, 2=low-risk AML, 3=high-risk AML)
pageinteger age of patient in years
dageinteger age of donor in years
precoveryinteger platelet recovery indicator (0=no, 1=yes)
ptimecharacter time in days to return of platelets to normal level (if precovery=1)
See Collett (2023). Note that ptime will need conversion:).
Survival of liver transplant recipients
liverliver
A data frame with 1761 rows and 7 variables:
patientinteger patient id
ageinteger patient age in years
genderinteger patient gender (1=male, 2=female)
diseaseinteger primary disease (1=PBC, 2=PSC, 3=ALD)
timeinteger time to event (days)
statusinteger cof>0
cofinteger cause of graft failure (0=functioning graft, 1=rejection, 2=thrombosis, 3=recurrent disease, 4=other)
See Collett (2023)
Artificial data
liver_countingliver_counting
A data frame with 54 rows and 7 variables:
patientinteger patient id
startinteger start time (days)
stopinteger stop time (days)
statusinteger event indicator (0=censored, 1=uncensored)
treatinteger treatment group (0=placebo, 1=Liverol)
ageinteger age of the patient at start of study (years)
lbrtdouble logarithm of bilirubin level
See Collett (2023). Note that the variable for log of bilirubin differs to that for "liverbase".
Articial data
liverbaseliverbase
A data frame with 12 rows and 6 variables:
patientinteger patient id
timeinteger survival time in days
statusinteger event indicator (0=censored, 1=uncensored)
ageinteger age of the patient (years)
treatinteger treatment group (0=placebo, 1=Liverol)
lbrdouble logarithm of bilirubin level
See Collett (2023)
Investigate the time on the liver transplantation list.
livertxlivertx
A data frame with 281 rows and 7 variables:
patientinteger patient id
timeinteger time on the list
statusinteger event indicator (0=censored, including having a transplant, 1=died on the list)
ageinteger patient age in years
genderinteger patient gender (1=male, 0=female)
bmidouble body mass index (kg/m^2)
ukeldinteger UK endstage liver disease score
See Collett (2023). A higher UKELD is associated with worse disease severity.
Survival of patients registered for a lung transplant
lunglung
A data frame with 196 rows and 7 variables:
patientinteger patient id
timeinteger time from registration to the earlist of removal from list, last known follow-up date, 30 April 2012, or death (days)
statusinteger event indicator (0=censored, 1=dead)
ageinteger age in years
genderinteger gender (1=male, 2=female)
bmidouble body mass index
diseaseinteger disease (1=COPD, 2=fibrosis, 3=suppurative, 4=other)
See Collett (2023)
This is an animal experiment to compare the use of retinyl acetate (related to vitamin A) across the study (treatment) to treatment with retinyl acetate to 60 days and then no further treatment (control). The female rats all had mammary tumours.
mammarymammary
A data frame with 254 rows and 4 variables:
ratinteger id for each rat
treatmentinteger treatment arm indicator (1=treatment, 0=control)
timedouble follow-up time (days)
statusinteger recurrence indicator (0=no, 1=yes)
See Collett (2023)
Comparing two immunotherapy treatments for patients with melanoma
melanomamelanoma
A data frame with 30 rows and 4 variables:
ageinteger age group (1=21-44, 2=41-60, 3=61+)
treatmentinteger treatment arm (1=BCG, 2=C. parvum)
timeinteger survival time (months)
statusinteger event indicator (0=censored, 1=dead)
See Collett (2023)
Laboratory study of survival for two groups of mice exposed to radiation.
micemice
A data frame with 181 rows and 3 variables:
environmentinteger type of environment (1=standard, 2=germ-free)
causeofdeathinteger cause of death (1=thymic lymphoma, 2=reticulum cell sarcoma, 3=other causes)
timeinteger survival time (days)
See Collett (2023). Note that are no censored event times.
Patients diagnosed with multiple myeloma who were diagnosed and treated with alkylating agents at West Virginia University Medical Center for ages 50-80 years.
myelomamyeloma
A data frame with 48 rows and 10 variables:
patientinteger for a patient identifier
timeinteger survival time in months
statusinteger for status at follow-up (0=Alive, 1=Dead)
ageinteger age at diagnosis in years
sexinteger for sex of the patient (1=male, 2=female)
buninteger level of blood urea nitrogen at diagnosis (unit assumed to be mg/dL based on the normal range for adults reported by https://en.wikipedia.org/wiki/Blood_urea_nitrogen)
cainteger serum calcium at diagnosis in mg/dL
hbdouble for serum hemoglobin level at diagnosis in g/dL (equivalently, grams per 100 mL)
pcellsinteger percent of plasma cells in the bone marrow at diagnosis
proteininteger indicator for whether or not the Bence-Jones protein was present in the urine at diagnosis (0=absent, 1=present)
Krall et al (1975) did not provide the units for all of these measurements. In their analyses, they used some data transformations: log(bun). Collett (2023) converted data from Krall et al (1975): BUN is reported by Krall and colleagues as X1=log(BUN), however the log base and unit is unclear; Krall and colleagues reported for 65 individuals, including those younger than 50 and older than 80.
Krall JM, Uthoff VA, Harley JB. A step-up procedure for selecting variables associated with survival. Biometrics. 1975 Mar 1:49-57. doi:10.2307/2529709
## To be completed.## To be completed.
Trial for treatment of ovarian cancer patients comparing cyclophosphamide alone with cyclophosphamide combined with adriamycin.
ovarianovarian
A data frame with 26 rows and 7 variables:
patientinteger identifer
timeinteger survival time from randomisation in days
statusinteger event indicator (0=right censored, 1=event)
treatinteger treatment (1=single, 2=combined)
ageinteger age of patients in years
rdiseaseinteger extent of residual disease (1=incomplete, 2=complete)
perfinteger performance status (1=good, 2=poor)
See Collett (2023)
Randomised controlled trial from the Veteran's Administration Cooperative Urological Research Group. Includes patients who had stage III cancers and were randomised to placebo or daily oral treatment with 1.0 mg of diethylstilbesterol (DES).
prostaticprostatic
A data frame with 38 rows and 8 variables:
patientinteger patient identifier
treatmentinteger treatment indicator (1=placebo; 2=daily treatment with 1.0 mg of diethylstilbesterol (DES))
timeinteg er survival time from trial entry to end of follow-up in months
statusinteger for follow-up status (0=alive or died from other causes, 1=died from prostate cancer
ageinteger age at trial entry in years
shbdouble serum hemoglobin at trial entry in g/dL
sizeinteger size of the primary tumour in cm^3
indexinteger Gleason index based on histopathology
TBC.
Andrews DF, Herzberg AM. Data: a collection of problems from many fields for the student and research worker. Springer Series in Statistics; Springer New York, NY; 1985. doi:10.1007/978-1-4612-5098-2
A very simple dataset with no censoring
pulmonarypulmonary
A data frame with 11 rows and 1 variables:
timeinteger survival time from pulmonary metastasis to death in months
See Collett (2023)
Simulated data with left truncated follow-up and potentially right censored outcomes.
simdatasimdata
A data frame with 30 observations and 8 variables:
idinteger index each individual
trtnumeric for whether treated (1=treated; 0=not treated)
ageinteger for age in years
entry_timenumeric for year of entry
observed_durationnumeric for years that an individual was observed
statusinteger for status at the end of follow-up (1=event, 0=censored)
event_calendar_timenumeric hypothetical event time in calendar time
stop_calendar_timenumeric end of follow-up in calendar time
## Simulate 30 individuals survival based on Weibull distribution set.seed(13579) n <- 30 ## Randomly assign treatment groups (15 each) trt <- sample(c(1, 0), n, replace = TRUE) ## Randomly assign integer age 50-80 to each individual age <- sample(50:80, n, replace = TRUE) ## Simulate true event times based on Weibull distribution true_shape <- 3 true_scale <- 8 true_times <- rweibull(n, shape = true_shape, scale = true_scale) ## Simulate right censoring times based on exponential distribution censoring_rate <- 0.1 censoring_times <- rexp(n, rate = censoring_rate) ## Random entry times entry_time <- runif(n, min = 2000, max = 2010) ## Convert durations (time-on-study) to calendar times event_calendar_time <- entry_time + true_times censor_calendar_time <- entry_time + censoring_times ## Study end time (Administrative Censoring) study_end_time <- 2012 study_censor_calendar_time <- rep(study_end_time, n) ## Determine the calendar time when observation ends stop_calendar_time <- pmin(event_calendar_time, censor_calendar_time, study_censor_calendar_time) ## Calculate the observed duration observed_duration <- stop_calendar_time - entry_time ## Create tied data observed_duration <- round(observed_duration * 2) / 2 ## Determine the final status (1 if event, 0 if censored) status <- as.integer(event_calendar_time <= pmin(censor_calendar_time, study_censor_calendar_time)) ## Create a data frame simdata <- data.frame( id = 1:n, trt, ## Treatment group age, ## Age at diagnosis entry_time, ## When they entered (Calendar time) observed_duration, ## Time-on-study status, ## Event status (1=event, 0=censored) event_calendar_time, ## Hypothetical event time (Calendar time) stop_calendar_time ## When observation ended (Calendar time) ) ## Save the data frame ## save(simdata, file = "~/src/R/collett/data/simdata.rda")## Simulate 30 individuals survival based on Weibull distribution set.seed(13579) n <- 30 ## Randomly assign treatment groups (15 each) trt <- sample(c(1, 0), n, replace = TRUE) ## Randomly assign integer age 50-80 to each individual age <- sample(50:80, n, replace = TRUE) ## Simulate true event times based on Weibull distribution true_shape <- 3 true_scale <- 8 true_times <- rweibull(n, shape = true_shape, scale = true_scale) ## Simulate right censoring times based on exponential distribution censoring_rate <- 0.1 censoring_times <- rexp(n, rate = censoring_rate) ## Random entry times entry_time <- runif(n, min = 2000, max = 2010) ## Convert durations (time-on-study) to calendar times event_calendar_time <- entry_time + true_times censor_calendar_time <- entry_time + censoring_times ## Study end time (Administrative Censoring) study_end_time <- 2012 study_censor_calendar_time <- rep(study_end_time, n) ## Determine the calendar time when observation ends stop_calendar_time <- pmin(event_calendar_time, censor_calendar_time, study_censor_calendar_time) ## Calculate the observed duration observed_duration <- stop_calendar_time - entry_time ## Create tied data observed_duration <- round(observed_duration * 2) / 2 ## Determine the final status (1 if event, 0 if censored) status <- as.integer(event_calendar_time <= pmin(censor_calendar_time, study_censor_calendar_time)) ## Create a data frame simdata <- data.frame( id = 1:n, trt, ## Treatment group age, ## Age at diagnosis entry_time, ## When they entered (Calendar time) observed_duration, ## Time-on-study status, ## Event status (1=event, 0=censored) event_calendar_time, ## Hypothetical event time (Calendar time) stop_calendar_time ## When observation ended (Calendar time) ) ## Save the data frame ## save(simdata, file = "~/src/R/collett/data/simdata.rda")
Given a data-frame with an "s" step function, expand the data-frame to include the steps.
Given a survfit object, return a data-frame
Given a summary.survfit object, return a data-frame
Calculates the linear predictor. Typically, the tt function should include an intercept term (see the examples below). Note that spline terms assume that the x argument is multiplicative; moreover, the additional arguments are not passed. For other types of tt terms, the x is passed directly to the tt function together with other arguments.
step_s( data, x, y, ymin, ymax, group, add_origin = TRUE, x_origin = 0, y_origin = 1 ) ## S3 method for class 'survfit' as.data.frame(x, row.names, optional, type = c("expanded", "plain"), ...) ## S3 method for class 'summary.survfit' as.data.frame(x, row.names, optional, type = c("expanded", "plain"), ...) predict_coxph_tt(object, newdata, type = "lp", se.fit = FALSE, ...) predict_coxph_tv(object, data, id) plot_coxph_functional( formula, data, x = NULL, pch = 19, ylab = "Martingale residual for null model", smoother = c("loess", "lm"), smoother.formula = resi ~ xi, smoother.args = list(), points.args = list(), ... )step_s( data, x, y, ymin, ymax, group, add_origin = TRUE, x_origin = 0, y_origin = 1 ) ## S3 method for class 'survfit' as.data.frame(x, row.names, optional, type = c("expanded", "plain"), ...) ## S3 method for class 'summary.survfit' as.data.frame(x, row.names, optional, type = c("expanded", "plain"), ...) predict_coxph_tt(object, newdata, type = "lp", se.fit = FALSE, ...) predict_coxph_tv(object, data, id) plot_coxph_functional( formula, data, x = NULL, pch = 19, ylab = "Martingale residual for null model", smoother = c("loess", "lm"), smoother.formula = resi ~ xi, smoother.args = list(), points.args = list(), ... )
data |
a dataset for evaluation of the coxph model |
x |
a numeric vector for the smoother (defaults to the 401 values between the range) |
y |
name of the y-variable (required) |
ymin |
name of the ymin variable (optional) |
ymax |
name of the ymax variable (optional) |
group |
name of a grouping variable (optional) |
add_origin |
logical for whether to add an origin to the start of each group |
x_origin |
double for the value of x at the origin |
y_origin |
double for the value of y, ymin and ymax at the origin |
row.names |
not used (in generic signature) |
optional |
not used (in generic signature) |
type |
a character for the type of prediction (currently only the linear predictor for the tt argument) |
... |
other arguments to pass to the plot function |
object |
a coxph object |
newdata |
a data-frame used for the predictions |
se.fit |
a logical for whether to return the standard errors |
id |
a character for the subject id |
formula |
a formula with a Surv on the lhs and a single variable on the rhs |
pch |
an integer for the pch argument in the plot for the residuals |
ylab |
a character for the ylab argument in the plot |
smoother |
a character for the name of the smoother |
smoother.formula |
a formula for the smoother in terms of resi and xi |
smoother.args |
a list of arguments to pass to the smoother function |
points.args |
a list of arguments to pass to the points function |
expanded data-frame with the same names
a vector of fitted values (when se.fit=FALSE) or a data-frame with fitted and se.fit columns (when se.fit=TRUE)
an update of data with survival probabilities
invisible plot return
step_s(data.frame(g=c(1,1), a=1:2, b=4:5), a, b) step_s(data.frame(g=c(2,2), a=3:4, b=6:7), a, b) step_s(data.frame(g=c(1,1,2,2), a=1:4, b=4:7), a, b, group=g) library(survival) library(tinyplot) sfit1 = survfit(Surv(time,status)~rx, data=survival::colon, subset=etype==1) with(as.data.frame(sfit1), tinyplot::plt(surv~time|strata,ymin=lower,ymax=upper,type="ribbon")) library(survival) library(tinyplot) sfit1 = survfit(Surv(time,status)~rx, data=survival::colon, subset=etype==1) with(as.data.frame(sfit1, type="expanded"), tinyplot::plt(surv~time|strata,ymin=lower,ymax=upper,type="ribbon")) library(splines) library(tinyplot) fit1 = coxph(Surv(time,status)~tt(treat),data=breast_rfs, tt=function(x,t,...) x*cbind(1,t)) fit2 = coxph(Surv(time,status)~tt(treat),data=breast_rfs, tt=function(x,t,...) x*ns(t,df=4,intercept=TRUE)) times = seq(0,2500,len=301L) nd = data.frame(treat=1,time=times) df1 = transform(predict_coxph_tt(fit1,nd,se.fit=TRUE), lower=exp(fitted-1.96*se.fit), upper=exp(fitted+1.96*se.fit), fitted=exp(fitted), model="linear",times=times) df2 = transform(predict_coxph_tt(fit2,nd,se.fit=TRUE), lower=exp(fitted-1.96*se.fit), upper=exp(fitted+1.96*se.fit), fitted=exp(fitted), model="ns",times=times) with(rbind(df1,df2), plt(fitted ~ times | model, ymin=lower, ymax=upper, type="ribbon", xlab="Time since diagnosis (days)", ylab="Hazard ratio comparing treated with untreated")) with(subset(breast_rfs,status==1), rug(time)) library(survival) liver = transform(collett::liverbase, lbr=NULL) liver = tmerge(liver, liver, id=patient, status=event(time,status)) liver = tmerge(liver, rbind(with(collett::liverbase, data.frame(patient,tstart=0,lbr)), with(collett::lbrdata0, data.frame(patient,tstart=time,lbr))), id=patient, lbr = tdc(tstart,lbr)) fit3 = coxph(Surv(tstart,tstop,status)~lbr+treat,liver) predict_coxph_tv(fit3,data=subset(liver,patient %in% c(1,7)), id="patient") library(survival) par(mfrow=c(2,2)) plot_coxph_functional(Surv(time,status)~hb, data=collett::myeloma, xlab="Value of Hb") plot_coxph_functional(Surv(time,status)~bun, data=collett::myeloma, xlab="Value of Bun") plot_coxph_functional(Surv(time,status)~log(bun), data=collett::myeloma, xlab="Value of log Bun")step_s(data.frame(g=c(1,1), a=1:2, b=4:5), a, b) step_s(data.frame(g=c(2,2), a=3:4, b=6:7), a, b) step_s(data.frame(g=c(1,1,2,2), a=1:4, b=4:7), a, b, group=g) library(survival) library(tinyplot) sfit1 = survfit(Surv(time,status)~rx, data=survival::colon, subset=etype==1) with(as.data.frame(sfit1), tinyplot::plt(surv~time|strata,ymin=lower,ymax=upper,type="ribbon")) library(survival) library(tinyplot) sfit1 = survfit(Surv(time,status)~rx, data=survival::colon, subset=etype==1) with(as.data.frame(sfit1, type="expanded"), tinyplot::plt(surv~time|strata,ymin=lower,ymax=upper,type="ribbon")) library(splines) library(tinyplot) fit1 = coxph(Surv(time,status)~tt(treat),data=breast_rfs, tt=function(x,t,...) x*cbind(1,t)) fit2 = coxph(Surv(time,status)~tt(treat),data=breast_rfs, tt=function(x,t,...) x*ns(t,df=4,intercept=TRUE)) times = seq(0,2500,len=301L) nd = data.frame(treat=1,time=times) df1 = transform(predict_coxph_tt(fit1,nd,se.fit=TRUE), lower=exp(fitted-1.96*se.fit), upper=exp(fitted+1.96*se.fit), fitted=exp(fitted), model="linear",times=times) df2 = transform(predict_coxph_tt(fit2,nd,se.fit=TRUE), lower=exp(fitted-1.96*se.fit), upper=exp(fitted+1.96*se.fit), fitted=exp(fitted), model="ns",times=times) with(rbind(df1,df2), plt(fitted ~ times | model, ymin=lower, ymax=upper, type="ribbon", xlab="Time since diagnosis (days)", ylab="Hazard ratio comparing treated with untreated")) with(subset(breast_rfs,status==1), rug(time)) library(survival) liver = transform(collett::liverbase, lbr=NULL) liver = tmerge(liver, liver, id=patient, status=event(time,status)) liver = tmerge(liver, rbind(with(collett::liverbase, data.frame(patient,tstart=0,lbr)), with(collett::lbrdata0, data.frame(patient,tstart=time,lbr))), id=patient, lbr = tdc(tstart,lbr)) fit3 = coxph(Surv(tstart,tstop,status)~lbr+treat,liver) predict_coxph_tv(fit3,data=subset(liver,patient %in% c(1,7)), id="patient") library(survival) par(mfrow=c(2,2)) plot_coxph_functional(Surv(time,status)~hb, data=collett::myeloma, xlab="Value of Hb") plot_coxph_functional(Surv(time,status)~bun, data=collett::myeloma, xlab="Value of Bun") plot_coxph_functional(Surv(time,status)~log(bun), data=collett::myeloma, xlab="Value of log Bun")
Clinical trial for breast cancer patients comparing combined tamoxifen and radiotherapy with tamoxifen alone.
tamoxifentamoxifen
A data frame with 641 rows and 18 variables:
idinteger patient identifier
treatinteger treatment group (0=tamoxifen+radiotherapy, 1=tamoxifen)
ageinteger patient age at study entry (years)
sizedouble tumour size (cm)
histinteger tumour histology (1=ductal, 2=lobular, 3=medullary, 4=mixed, 5=other)
hrinteger hormone receptor level (0=negative, 1=positive)
hbinteger Haemoglobin level (g/l)
andisinteger axillary relapse (0=no, 1=yes)
lsurvinteger time to local relapse or last follow-up (days)
lsinteger local relapse (0=no, 1=yes))
asurvinteger time to axillary relapse or last follow-up (days)
asinteger axillary relapse (0=no, 1=yes)
dsurvinteger Time to distant relapse or last follow-up (days)
dsinteger distant relapse (0=no, 1=yes)
msurvinteger time to second malignancy or last follow-up (days)
msinteger second malignancy (0=no, 1=yes)
tsurvinteger time from randomisation to death or last follow-up (days)
tsinteger status at last follw-up (0=alive, 1=dead)
See Collett (2023)
Survival following kidney transplantation
tplanttplant
A data frame with 434 rows and 7 variables:
patientinteger patient id
donorinteger donoe id
timeinteger survival time in days
statusinteger event indicator (0=censored, 1=graft failure or death with a functioning graft)
ageinteger patient age (years)
diabetesinteger diabetes status (0=absent, 1=present)
citdouble cold ischaemic time, the time in hours between retrieval of the kidney from the donor and the transplantation
See Collett (2023)
A double-blind trial comparing two treatments for ulcers. Data from Belgium.
ulcerulcer
A data frame with 43 rows and 6 variables:
patientinteger patient id
ageinteger age at the end of the trial in years
durationinteger duration of verified disease (1: <5 years, 2: >=5 years
treatmentinteger treatment arm (1=A,2=B)
timeinteger time since last visit (months)
resultinteger result of the last visit (1=no ulcer detected, 2=ulcer detected)
See Collett (2023)
Patients following an aortic valve replacement are measured for left ventricular mass index (LVMI).
valvevalve
A data frame with 988 rows and 11 variables:
idinteger patient id
futimedouble total follow-up time from date of surgery (years)
statusinteger event indicator (0=censored, 1=death)
timedouble time of LVMI measurement after surgery (years)
lvmidouble standardised LVMI
ageinteger age of patient in years
sexinteger sex of patient (0=male, 1=female)
redointeger previous cardiac surgery (0=no, 1=yes)
emerginteger operative urgency (0=elective, 1=urgent or emergency)
dminteger preoperative diabetes mellitus (0=no, 1=yes)
typeinteger type of valve (1=human tissue, 2=porcine tissue)
See Collett (2023)