Case Study: Spline Regression

Back to main page

Case study background and problem formulations

Instructions for optimization with PSG Run-File, PSG MATLAB Toolbox, PSG MATLAB Subroutines and PSG R.

PROBLEM1: problem_Logexp_Sum_Of_Splines
Maximize logexp_sum(spline_sum) (function Logarithms Exponents Sum applied to Spline Sum)
Calculate:
logexp_sum(spline_sum) (function Logarithms Exponents Sum applied to Spline Sum)
logistic(spline_sum) (function Logistic applied to Spline Sum)
——————————————————————–————————————————
logexp_sum = Logarithms Exponents Sum
spline_sum = Spline Sum calculates spline values depending upon regression variables for every scenario
logistic = Logistic calculate values of logistic function of spline regression for every scenario
———————————————————————————
Sum of 15 Third Degree Polynomial Splines Consisting of 5 Piecies
———————————————————————————

# of Variables # of Scenarios Objective Value Solving Time, PC 3.14GHz (sec)
Dataset 286 4000 -0.67938 10.66
Environments
Run-File Problem Statement Data Solution
Matlab Toolbox Data
Matlab Subroutines Matlab Code Data
R R Code Data

————————————————————————————
PROBLEM2: problem_Logexp_Sum_Of_Splines_Cross_Validation
4-fold crossvalidation
Maximize logexp_sum(spline_sum) (function Logarithms Exponents Sum applied to Spline Sum)
Calculate:
logexp_sum(spline_sum) (function Logarithms Exponents Sum applied to Spline Sum on the out-of-sample data)
logistic(spline_sum) (function Logistic applied to Spline Sum on the in-sample data)
logistic(spline_sum) (function Logistic applied to Spline Sum on the out-of-sample data)
——————————————————————–————————————————
crossvalidation(N,Matrix) = matrix operation splits input Matrix into N pairs of complementary sub-matrices
logexp_sum = Logarithms Exponents Sum
spline_sum = Spline Sum calculates spline values depending upon regression variables for every scenario
logistic = Logistic calculate values of logistic function of spline regression for every scenario
——————————————————————–————————————————
——————————————————————–—————
Sum of 15 Third Degree Polynomial Splines Consisting of 5 Piecies
——————————————————————–—————

# of Variables # of Scenarios Objective Value Solving Time, PC 3.14GHz (sec)
Dataset 301 3000 -0.67912 12.07
Environments
Run-File Problem Statement Data Solution
Matlab Toolbox Data
Matlab Subroutines Matlab Code Data
R R Code Data
————————————————————————————
PROBLEM3: problem_logexp_sum_of_splines_boolean
Maximize logexp_sum(spline_sum) (function Logarithms Exponents Sum applied to Spline Sum)
subject to
polynom_abs – variable ≤ 0 (constraint on the coefficients of every spline)
linear ≤ const (constraint on number of factors)
Calculate:
logexp_sum(spline_sum) (function Logarithms Exponents Sum applied to Spline Sum)
logistic(spline_sum) (function Logistic applied to Spline Sum)
——————————————————————–————————————————
logexp_sum = Logarithms Exponents Sum
spline_sum = Spline Sum calculates spline values depending upon regression variables for every scenario
polynom_abs = Polynomial Absolute
linear = Linear Function
logistic = Logistic calculate values of logistic function of spline regression for every scenario
——————————————————————–———————————————————————————————————————————
Sum of 15 Third Degree Polynomial Splines Consisting of 5 Piecies
———————————————————————————

# of Variables # of Scenarios Objective Value Solving Time, PC 3.14GHz (sec)
Dataset 301 4000 -0.68694 84.96
Environments
Run-File Problem Statement Data Solution
Matlab Toolbox Data
Matlab Subroutines Matlab Code Data
R R Code Data

NOTE: Problem statements can be simplified using MultiConstraint.
————————————————————————————

————————————————————————————
PROBLEM4: problem_logexp_sum_of_splines_knots
Maximize logexp_sum(spline_sum) (function Logarithms Exponents Sum applied to Spline Sum)
subject to
linear ≤ const (constraint on the value of sum of splines at knot points)
Calculate:
logexp_sum(spline_sum) (function Logarithms Exponents Sum applied to Spline Sum)
logistic(spline_sum) (function Logistic applied to Spline Sum)
——————————————————————–————————————————
logexp_sum = Logarithms Exponents Sum
spline_sum = Spline Sum calculates spline values depending upon regression variables for every scenario
linear = Linear Function
logistic = Logistic calculate values of logistic function of spline regression for every scenario
——————————————————————–———————————————————————————————————————————
Sum of 15 Third Degree Polynomial Splines Consisting of 5 Piecies
———————————————————————————

# of Variables # of Scenarios Objective Value Solving Time, PC 3.14GHz (sec)
Dataset 316 4000 -0.68122 18.42
Environments
Run-File Problem Statement Data Solution
Matlab Toolbox Data
Matlab Subroutines Matlab Code Data
R R Code Data

————————————————————————————

CASE STUDY SUMMARY
PSG function Maximum Likelihood for Logistic Regression, logexp_sum, is minimized to find variables of splines providing the best approximation of data (see Problem 1). Estimated spline may “overfit” the in-sample data and this may result in poor out-of-sample performance. Сross-validation technique is used to check overfitting (see Problem 2). To prepare data for cross-validation we use PSG Crossvalidation(K,Matrix) matrix operation which splits input Matrix of Scenarios in N pairs of complementary sub-matrices. Overfitting can be reduced by dropping some factors. Selection of factors that should be left in the sum of splines is done by solving optimization problem (see Problem 3). This problem uses additional Boolean variables showing inclusion of factors in the sum of splines: 1 = factor is included in the sum of splines, 0 = not included (see Problem 3). Another way to reduce overfitting is to control values of splines at knot points by setting upper bound (see Problem 4).