% Generated by roxygen2: do not edit by hand
% Please edit documentation in R/orsf_predict.R
\name{predict.orsf_fit}
\alias{predict.orsf_fit}
\title{Compute predictions using ORSF}
\usage{
\method{predict}{orsf_fit}(
  object,
  new_data,
  pred_horizon = NULL,
  pred_type = "risk",
  na_action = "fail",
  boundary_checks = TRUE,
  n_thread = 1,
  verbose_progress = FALSE,
  pred_aggregate = TRUE,
  ...
)
}
\arguments{
\item{object}{(\emph{orsf_fit}) a trained oblique random survival forest
(see \link{orsf}).}

\item{new_data}{a \link{data.frame}, \link[tibble:tibble-package]{tibble}, or \link[data.table:data.table]{data.table} to compute predictions in.}

\item{pred_horizon}{(\emph{double}) a value or vector indicating the time(s)
that predictions will be calibrated to. E.g., if you were predicting
risk of incident heart failure within the next 10 years, then
\code{pred_horizon = 10}. \code{pred_horizon} can be \code{NULL} if \code{pred_type} is
\code{'mort'}, since mortality predictions are aggregated over all
event times}

\item{pred_type}{(\emph{character}) the type of predictions to compute. Valid
options are
\itemize{
\item 'risk' : probability of having an event at or before \code{pred_horizon}.
\item 'surv' : 1 - risk.
\item 'chf': cumulative hazard function
\item 'mort': mortality prediction
}}

\item{na_action}{(\emph{character}) what should happen when \code{new_data} contains missing values (i.e., \code{NA} values). Valid options are:
\itemize{
\item 'fail' : an error is thrown if \code{new_data} contains \code{NA} values
\item 'pass' : the output will have \code{NA} in all rows where \code{new_data} has 1 or more \code{NA} value for the predictors used by \code{object}
\item 'omit' : rows in \code{new_data} with incomplete data will be dropped
\item 'impute_meanmode' : missing values for continuous and categorical variables in \code{new_data} will be imputed using the mean and mode, respectively. To clarify,
the mean and mode used to impute missing values are from the
training data of \code{object}, not from \code{new_data}.
}}

\item{boundary_checks}{(\emph{logical}) if \code{TRUE}, \code{pred_horizon} will be
checked to make sure the requested values are less than the maximum
observed time in \code{object}'s training data. If \code{FALSE}, these checks
are skipped.}

\item{n_thread}{(\emph{integer}) number of threads to use while computing predictions. Default is one thread. To use the maximum number of threads that your system provides for concurrent execution, set \code{n_thread = 0}.}

\item{verbose_progress}{(\emph{logical}) if \code{TRUE}, progress messages are
printed in the console. If \code{FALSE} (the default), nothing is printed.}

\item{pred_aggregate}{(\emph{logical}) If \code{TRUE} (the default), predictions
will be aggregated over all trees by taking the mean. If \code{FALSE}, the
returned output will contain one row per observation and one column
for each tree. If the length of \code{pred_horizon} is two or more and
\code{pred_aggregate} is \code{FALSE}, then the result will be a list of such
matrices, with the i'th item in the list corresponding to the i'th
value of \code{pred_horizon}.}

\item{...}{Further arguments passed to or from other methods (not currently used).}
}
\value{
a \code{matrix} of predictions. Column \code{j} of the matrix corresponds
to value \code{j} in \code{pred_horizon}. Row \code{i} of the matrix corresponds to
row \code{i} in \code{new_data}.
}
\description{
Predicted risk, survival, hazard, or mortality from an ORSF model.
}
\details{
\code{new_data} must have the same columns with equivalent types as the data
used to train \code{object}. Also, factors in \code{new_data} must not have levels
that were not in the data used to train \code{object}.

\code{pred_horizon} values should not exceed the maximum follow-up time in
\code{object}'s training data, but if you truly want to do this, set
\code{boundary_checks = FALSE} and you can use a \code{pred_horizon} as large
as you want. Note that predictions beyond the maximum follow-up time
in the \code{object}'s training data are equal to predictions at the
maximum follow-up time, because \code{aorsf} does not estimate survival
beyond its maximum observed time.

If unspecified, \code{pred_horizon} may be automatically specified as the value
used for \code{oobag_pred_horizon} when \code{object} was created (see \link{orsf}).
}
\section{Examples}{
Begin by fitting an ORSF ensemble:

\if{html}{\out{<div class="sourceCode r">}}\preformatted{library(aorsf)

set.seed(329730)

index_train <- sample(nrow(pbc_orsf), 150) 

pbc_orsf_train <- pbc_orsf[index_train, ]
pbc_orsf_test <- pbc_orsf[-index_train, ]

fit <- orsf(data = pbc_orsf_train, 
            formula = Surv(time, status) ~ . - id,
            oobag_pred_horizon = 365.25 * 5)
}\if{html}{\out{</div>}}

Predict risk, survival, or cumulative hazard at one or several times:

\if{html}{\out{<div class="sourceCode r">}}\preformatted{# predicted risk, the default
predict(fit, 
        new_data = pbc_orsf_test[1:5, ], 
        pred_type = 'risk', 
        pred_horizon = c(500, 1000, 1500))
}\if{html}{\out{</div>}}

\if{html}{\out{<div class="sourceCode">}}\preformatted{##             [,1]       [,2]       [,3]
## [1,] 0.459077419 0.73067673 0.89246351
## [2,] 0.032194868 0.08028381 0.15592011
## [3,] 0.115945485 0.24099853 0.38094684
## [4,] 0.008378033 0.02964250 0.06977315
## [5,] 0.009798295 0.01793586 0.04454374
}\if{html}{\out{</div>}}

\if{html}{\out{<div class="sourceCode r">}}\preformatted{# predicted survival, i.e., 1 - risk
predict(fit, 
        new_data = pbc_orsf_test[1:5, ], 
        pred_type = 'surv',
        pred_horizon = c(500, 1000, 1500))
}\if{html}{\out{</div>}}

\if{html}{\out{<div class="sourceCode">}}\preformatted{##           [,1]      [,2]      [,3]
## [1,] 0.5409226 0.2693233 0.1075365
## [2,] 0.9678051 0.9197162 0.8440799
## [3,] 0.8840545 0.7590015 0.6190532
## [4,] 0.9916220 0.9703575 0.9302269
## [5,] 0.9902017 0.9820641 0.9554563
}\if{html}{\out{</div>}}

\if{html}{\out{<div class="sourceCode r">}}\preformatted{# predicted cumulative hazard function
# (expected number of events for person i at time j)
predict(fit, 
        new_data = pbc_orsf_test[1:5, ], 
        pred_type = 'chf',
        pred_horizon = c(500, 1000, 1500))
}\if{html}{\out{</div>}}

\if{html}{\out{<div class="sourceCode">}}\preformatted{##            [,1]       [,2]       [,3]
## [1,] 0.63532189 1.27109029 1.74481341
## [2,] 0.03415809 0.09124550 0.20017014
## [3,] 0.14715014 0.34375274 0.62976148
## [4,] 0.00857621 0.03195771 0.08744159
## [5,] 0.01043219 0.01888677 0.05177019
}\if{html}{\out{</div>}}

Predict mortality, defined as the number of events in the forest’s
population if all observations had characteristics like the current
observation. This type of prediction does not require you to specify a
prediction horizon

\if{html}{\out{<div class="sourceCode r">}}\preformatted{predict(fit, 
        new_data = pbc_orsf_test[1:5, ], 
        pred_type = 'mort')
}\if{html}{\out{</div>}}

\if{html}{\out{<div class="sourceCode">}}\preformatted{##           [,1]
## [1,] 78.646185
## [2,] 20.872849
## [3,] 37.341745
## [4,] 13.616617
## [5,]  8.798328
}\if{html}{\out{</div>}}
}

