edu.rit.numeric
Class RobustFit

java.lang.Object
  extended by edu.rit.numeric.RobustFit

public class RobustFit
extends Object

Class RobustFit uses a robust estimation procedure to fit a series of (x,y) data points to a model. The data series is an instance of class XYSeries. The model is represented by a ParameterizedFunction that computes the y value, given an x value. The model also has parameters.

Given a data series, a model function, and an initial guess for the parameter values, class RobustFit's fit() method finds parameter values that minimize the following metric:

    Σi ρ (yif (xi, parameters))

where f is the model function and ρ is one of these metric functions:

In other words, the fit() method fits the model to the data by adjusting the parameters to minimize the metric.

The metric function is the negative logarithm of the probability distribution of the errors in the y values. The above metric functions correspond to normal, two-sided exponential, and Cauchy error distributions.

The metric functions differ in how they treat outliers, i.e., data points that deviate from the model. The normal metric function gives increasing weights to points with increasing deviations. However, because of the increasing weights, outlier points may skew the fit (hence, this is not really a "robust" metric function). The exponential metric function gives equal weights to all points, regardless of deviation. This reduces the influence of outliers on the fit, yielding a more robust fit. With the Cauchy metric function, the weights first increase, then decrease as the deviations increase. This reduces the influence of outliers even further.

The fit() method uses class MDMinimizationDownhillSimplex to find the parameter values that minimize the metric. The inputs to and outputs from the fit() method are stored in fields of an instance of class RobustFit.

The fitWithDistribution() method uses the bootstrapping technique to determine the distribution of the model parameters, which depends on the error distribution of the data points. Bootstrapping performs multiple iterations of the model fitting procedure. On each iteration, a trial data set the same size as the original data set is created by sampling the original data points with replacement, and model parameters for the trial data set are computed. The fitWithDistribution() method outputs a series of the parameter values found at each iteration; the confidence region for the parameters; and the goodness-of-fit p-value.


Field Summary
static Function CAUCHY
          The Cauchy metric function.
 double[] confidenceRegionLowerBound
          The lower bound of the confidence region for the model parameters.
 double[] confidenceRegionUpperBound
          The upper bound of the confidence region for the model parameters.
 XYSeries data
          The data series.
static Function EXPONENTIAL
          The exponential metric function.
 int M
          The number of parameters in the model, M.
 Function metric
          The metric function.
 double[] metricSeries
          The metric values for the model parameter distribution.
 double metricValue
          The metric value.
 ParameterizedFunction model
          The model function.
static Function NORMAL
          The normal metric function.
 double[] param
          The model parameters.
 double[][] paramSeries
          The model parameter distribution.
 double pValue
          The goodness-of-fit p-value.
 
Constructor Summary
RobustFit(ParameterizedFunction model)
          Construct a new robust fitting object for the given model.
 
Method Summary
 void fit(XYSeries data)
          Fit the given data series to the model.
 void fitWithDistribution(XYSeries data, int T, Random prng, double conf)
          Fit the given data series to the model and compute the distribution of the model parameters.
protected  void initializeSimplex(MDMinimizationDownhillSimplex minimizer)
          Initialize the simplex in the given downhill simplex minimizer object.
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Field Detail

model

public final ParameterizedFunction model
The model function. When model.f() is called, the x argument is xi, the x value of a data point; the p argument contains the model parameters; and the return value is f (xi, parameters).


M

public final int M
The number of parameters in the model, M.


metric

public Function metric
The metric function. By default, this is CAUCHY. It can instead be set to NORMAL, EXPONENTIAL, or some other metric function.


param

public final double[] param
The model parameters. On input to the fit() and fitWithDistribution() methods, param contains the initial guess for the model parameters. On output from the fit() and fitWithDistribution() methods, param contains the fitted parameter values.


data

public XYSeries data
The data series. It contains the (x,y) data points to be fitted to the model. It is specified as an argument of the fit() and fitWithDistribution() methods.


metricValue

public double metricValue
The metric value. An output of the fit() and fitWithDistribution() methods. It is set to the value of the metric for the model with the fitted parameters stored in param.


paramSeries

public double[][] paramSeries
The model parameter distribution. An output of the fitWithDistribution() method. paramSeries is a T-element array, where T is the number of trials. Each element of paramSeries is an M-element array giving the fitted parameter values for the corresponding trial.


metricSeries

public double[] metricSeries
The metric values for the model parameter distribution. An output of the fitWithDistribution() method. metricSeries is a T-element array, where T is the number of trials. Each element of metricSeries gives the value of the metric for the model with the parameters stored in the corresponding element of paramSeries.


confidenceRegionLowerBound

public double[] confidenceRegionLowerBound
The lower bound of the confidence region for the model parameters. An output of the fitWithDistribution() method. The confidence level is specified as an argument of the fitWithDistribution() method; for example, 0.90 specifies a 90% confidence level. The confidence region is an M-dimensional rectangular hyperprism centered on the fitted parameters stored in param, such that the given fraction of the model parameter distribution stored in paramSeries falls within the hyperprism. confidenceRegionLowerBound gives the lower bound of each dimension of the confidence region hyperprism.


confidenceRegionUpperBound

public double[] confidenceRegionUpperBound
The upper bound of the confidence region for the model parameters. An output of the fitWithDistribution() method. confidenceRegionUpperBound gives the upper bound of each dimension of the confidence region hyperprism.


pValue

public double pValue
The goodness-of-fit p-value. An output of the fitWithDistribution() method. This gives the probability that a metric value greater than or equal to metricValue would occur by chance, even if the model with parameters params is correct.


NORMAL

public static final Function NORMAL
The normal metric function.


EXPONENTIAL

public static final Function EXPONENTIAL
The exponential metric function.


CAUCHY

public static final Function CAUCHY
The Cauchy metric function.

Constructor Detail

RobustFit

public RobustFit(ParameterizedFunction model)
Construct a new robust fitting object for the given model. The model field is set to the corresponding argument. The M field is set by calling the model function's parameterLength() method. The param field is allocated with M elements; initially, the elements are 0.

Parameters:
model - Model function.
Throws:
NullPointerException - (unchecked exception) Thrown if model is null.
Method Detail

fit

public void fit(XYSeries data)
Fit the given data series to the model. The data series is stored in the data field. The model function was specified to the constructor, and is also stored in the model field. On input to the fit() method, param contains the initial guess for the model parameters. On output from the fit() method, param contains the fitted parameter values and metricValue contains the value of the metric for the fitted parameters.

The fit() method uses the downhill simplex technique to find the model parameters that minimize the metric. This involves initializing the simplex in an MDMinimizationDownhillSimplex object. The initializeSimplex() method is called to initialize the simplex.

Parameters:
data - Data series.
Throws:
TooManyIterationsException - (unchecked exception) Thrown if too many iterations occurred without finding parameters that minimize the metric function.

fitWithDistribution

public void fitWithDistribution(XYSeries data,
                                int T,
                                Random prng,
                                double conf)
Fit the given data series to the model and compute the distribution of the model parameters. The data series is stored in the data field. The bootstrapping technique with T trials using the given pseudorandom number generator is used to compute the distribution. The given confidence level is used to compute the confidence region; for example, 0.90 specifies a 90% confidence level. The model function was specified to the constructor, and is also stored in the model field. On input to the fitWithDistribution() method, param contains the initial guess for the model parameters. On output from the fit() method, param contains the fitted parameter values, metricValue contains the value of the metric for the fitted parameters, paramSeries contains the series of fitted parameter values from all the trials, metricSeries contains the metric values from all the trials, confidenceRegionLowerBound and confidenceRegionUpperBound contain the lower and upper bounds of the confidence region hyperprism, and pValue contains the goodness-of-fit.

The fitWithDistribution() method uses the downhill simplex technique to find the model parameters that minimize the metric. This involves initializing the simplex in an MDMinimizationDownhillSimplex object. The initializeSimplex() method is called to initialize the simplex.

Parameters:
data - Data series.
T - Number of trials.
prng - Pseudorandom number generator.
conf - Confidence level, in the range 0.0 .. 1.0.
Throws:
IllegalArgumentException - (unchecked exception) Thrown if conf is out of bounds.
TooManyIterationsException - (unchecked exception) Thrown if too many iterations occurred without finding parameters that minimize the metric function.

initializeSimplex

protected void initializeSimplex(MDMinimizationDownhillSimplex minimizer)
Initialize the simplex in the given downhill simplex minimizer object. The simplex points must be set based on the initial guess for the parameter values stored in the param field. For further information about initializing the simplex, see class MDMinimizationDownhillSimplex.

The default implementation of this method sets the first simplex point to param, sets the second simplex point to param except element 0 is set to perturb(param[0]), sets the third simplex point to param except element 1 is set to perturb(param[1]), and so on. perturb(x) = 1.01x if x ≠ 0; perturb(0) = 0.01. This method can be overridden to initialize the simplex differently.



Copyright © 2005-2012 by Alan Kaminsky. All rights reserved. Send comments to ark­@­cs.rit.edu.