Classification in Loan Application Process

Back to main page

Case study background and problem formulations

Instructions for optimization with PSG Run-File, PSG MATLAB Toolbox, PSG MATLAB Subroutines and PSG R.

PROBLEM 0: problem_Logexp_Sum (original untransformed features)
Maximize logexp_sum     (log-likelihood function applied to original untransformed features)
Calculate:
pr_pen(difference of losses)     (Probability of Exceedance applied to difference of losses based on original untransformed features )
——————————————————————–————————————————
logexp_sum = Logarithms Exponents Sum = log-likelihood function
pr_pen = Probability of Exceedance
——————————————————————–————————————————

# of Variables # of Scenarios Objective Value Solving Time, PC 3.14GHz (sec)
Dataset1 2012 4 380465 -0.238801512644 2.82
Environments
Run-File Problem Statement Data Solution
Matlab Toolbox Data
Matlab Subroutines Matlab Code Data
R R Code Data
Download other datasets in Run-File Environment.
Instructions for importing problems from Run-File to PSG MATLAB.

Problem Datasets # of Variables # of Scenarios Objective Value Solving Time, PC 3.50 GHz (sec)
Dataset2 2013 Problem statement Data Solution 4 380465 -0.244676286732 4.56
Dataset3 2014 Problem statement Data Solution 4 380465 -0.216362946123 11.89

PROBLEM 1: problem_Logexp_Sum (for spline transformation of features)
Maximize logexp_sum(spline_sum)      (log-likelihood function applied to spline function)
Calculate:
logistic(spline_sum)             (calculation of Logistic to get transformed feature)
——————————————————————–————————————————
logexp_sum = Logarithms Exponents Sum = log-likelihood function
logistic = Logistic calculate values of logistic function of spline approximation for every scenario
spline_sum = Spline Sum calculates spline value depending upon regression variables for every scenario
——————————————————————–————————————————

# of Variables # of Scenarios Objective Value Solving Time, PC 3.14GHz (sec)
Dataset1 DTI, 2012 20 380465 -0.441511982280 1.45
Environments
Run-File Problem Statement Data Solution
Matlab Toolbox Data
Matlab Subroutines Matlab Code Data
R R Code Data
Download other datasets in Run-File Environment.
Instructions for importing problems from Run-File to PSG MATLAB.

Problem Datasets # of Variables # of Scenarios Objective Value Solving Time, PC 3.50GHz (sec)
Dataset2 DTI, 2013 Problem statement Data Solution 20 380465 -0.363102932426 48.09
Dataset3 DTI, 2014 Problem statement Data Solution 20 380465 -0.321050341841 81.66
Dataset4 EmpLen, 2012 Problem statement Data Solution 20 380465 -0.215776179308 20.04
Dataset5 EmpLen, 2013 Problem statement Data Solution 20 380465 -0.225138264176 52.72
Dataset6 EmpLen, 2014 Problem statement Data Solution 20 380465 -0.130285134598 120.86
Dataset7 FICO, 2012 Problem statement Data Solution 20 380465 -0.283383677945 49.90
Dataset8 FICO, 2013 Problem statement Data Solution 20 380465 -0.299431106338 121.41
Dataset8 FICO, 2014 Problem statement Data Solution 20 380465 -0.251899113731 293.00
PROBLEM 2: problem_Logexp_Sum (transformed features)
Maximize logexp_sum     (log-likelihood function applied to transformed features)
Calculate:
pr_pen(difference of losses)     (Probability of Exceedance applied to difference of losses based on transformed features )
——————————————————————–————————————————
logexp_sum = Logarithms Exponents Sum = log-likelihood function
pr_pen = Probability of Exceedance
——————————————————————–————————————————

# of Variables # of Scenarios Objective Value Solving Time, PC 3.14GHz (sec)
Dataset1 2012 4 380465 -0.17741580617 1.61
Environments
Run-File Problem Statement Data Solution
Matlab Toolbox Data
Matlab Subroutines Matlab Code Data
R R Code Data
Download other datasets in Run-File Environment.
Instructions for importing problems from Run-File to PSG MATLAB.

Problem Datasets # of Variables # of Scenarios Objective Value Solving Time, PC 3.50GHz (sec)
Dataset2 2013 Problem statement Data Solution 4 380465 -0.17389892051 3.59
Dataset3 2014 Problem statement Data Solution 4 380465 -0.111404489632 8.97
PROBLEM 3: minimizing buffered probability of exceedance (bPOE) with transformed features(equivalent to maximization of bAUC)
Minimize bPOE    (Buffered Probability of Exceedance applied to transformed features)
subject to
linear = const (linear constraint)
Calculate:
pr_pen(difference of losses)     (Probability of Exceedance applied to difference of losses based on transformed features )
——————————————————————–————————————————
bPOE = Buffered Probability of Exceedance
pr_pen = Probability of Exceedance
——————————————————————–————————————————

# of Variables # of Scenarios Objective Value Solving Time, PC 3.14GHz (sec)
Dataset1 2012 4 380465 0.106004778111 7.47
Environments
Run-File Problem Statement Data Solution
Matlab Toolbox Data
Matlab Subroutines Matlab Code Data
R R Code Data
Download other datasets in Run-File Environment.
Instructions for importing problems from Run-File to PSG MATLAB.

Problem Datasets # of Variables # of Scenarios Objective Value Solving Time, PC 3.50GHz (sec)
Dataset2 2013 Problem statement Data Solution 4 380465 0.09195791885 14.38
Dataset3 2014 Problem statement Data Solution 4 380465 0.044281544942 105.65
PROBLEM 4: minimizing Probability of Exceedance using transformed features (equivalent to maximization of AUC)
Minimize pr_pen(difference of losses)     (Probability of Exceedance applied to difference of losses based on transformed features)
subject to
linear = const (linear constraint)
——————————————————————–————————————————
pr_pen = Probability of Exceedance
——————————————————————–————————————————

# of Variables # of Scenarios Objective Value Solving Time, PC 3.14GHz (sec)
Dataset1 2012 4 380465 0.045272219018 113.33
Environments
Run-File Problem Statement Data Solution
Matlab Toolbox Data
Matlab Subroutines Matlab Code Data
R R Code Data
Download other datasets in Run-File Environment.
Instructions for importing problems from Run-File to PSG MATLAB.

Problem Datasets # of Variables # of Scenarios Objective Value Solving Time, PC 3.50GHz (sec)
Dataset2 2013 Problem statement Data Solution 4 380465 0.039795749026 450.90
Dataset3 2014 Problem statement Data Solution 4 380465 0.017920403518 1691.12

CASE STUDY SUMMARY

Problem 0 is the standard logistic regression. Features transformation is done using cubic splines (Problem 1). Splines transform one dimension observation data. Input data for building a spline are data of independent variables (features), dependent variables, and parameters defining number of knots and smoothing degree of the spline. Splines were built by minimizing log-likelihood logistic regression function (logexp_sum). Problem 2 is the logistic regression with transformed features. Problem 3 maximizes Buffered AUC (bAUC) by minimizing buffered probability of exceedance (bPOE). Problem 4 maximizes AUC by minimizing Probability of Exceedance (PSG function pr_pen).