- random_forest_error(forest, X_train, X_test, inbag=None, calibrate=True, memory_constrained=False, memory_limit=None)¶
Calculate error bars from scikit-learn RandomForest estimators.
RandomForest is a regressor or classifier object this variance can be used to plot error bars for RandomForest objects
Regressor or Classifier object.
An array with shape (n_train_sample, n_features). The design matrix for training data.
An array with shape (n_test_sample, n_features). The design matrix for testing data
- inbagndarray, optional
The inbag matrix that fit the data. If set to None (default) it will be inferred from the forest. However, this only works for trees for which bootstrapping was set to True. That is, if sampling was done with replacement. Otherwise, users need to provide their own inbag matrix.
- calibrate: boolean, optional
Whether to apply calibration to mitigate Monte Carlo noise. Some variance estimates may be negative due to Monte Carlo effects if the number of trees in the forest is too small. To use calibration, Default: True
- memory_constrained: boolean, optional
Whether or not there is a restriction on memory. If False, it is assumed that a ndarry of shape (n_train_sample,n_test_sample) fits in main memory. Setting to True can actually provide a speed up if memory_limit is tuned to the optimal range.
- memory_limit: int, optional.
An upper bound for how much memory the itermediate matrices will take up in Megabytes. This must be provided if memory_constrained=True.
- An array with the unbiased sampling variance (V_IJ_unbiased)
- for a RandomForest object.
The calculation of error is based on the infinitesimal jackknife variance, as described in [Wager2014] and is a Python implementation of the R code provided at: https://github.com/swager/randomForestCI
S. Wager, T. Hastie, B. Efron. “Confidence Intervals for Random Forests: The Jackknife and the Infinitesimal Jackknife”, Journal of Machine Learning Research vol. 15, pp. 1625-1651, 2014.