Robust approach to machine learning models comparison - Dmitry Larko, Sr. Data Scientist, H2O.ai

Sr Data Scientist @ H2O.ai
@DmitryLarko / dmitry@h2o.ai / h2o.ai
Dmitry Larko

Robust approach to
ML models
comparison

Problem and data
• Competition: Mercedes-Benz Greener Manufacturing
• The target – car testing time (sec)
• Evaluation metric – 𝑹 𝟐
• Train 4029 rows
• Test 4029 rows: 81% private, 19% public
• Features (378 columns):
• Binary – tests' characteristics (369 columns)
• Categorical – car's characteristics (8 columns)
• ID – numerical order

Leaderboard shake-up stats
• Biggest improvement: 3808 places (3923 ⇒ 115)
• Second biggest improvement: 2838 places (2843 ⇒ 5)
• Biggest fall: 3564 places (I won't point anyone out)
Source: https://www.kaggle.com/c/mercedes-benz-greener-manufacturing/discussion/36103

Leaderboard shake-up
Source: https://www.kaggle.com/c/mercedes-benz-greener-manufacturing/discussion/36103

Why?
• Small public test:
• Test 4029 rows:
81% private, 19% public
• Outliers
• Dropping them or making
stratification with respect to the
them did not help
• Metric very sensitive to outliers:
• 𝑅2
• 5-fold cross-validation std is 0.068
• Most of the competitors overfit to public leaderboard

Better approach to cross-validation
• 10 splits into 5 folds – 50 scores for each fold
• To compare two models we use the t-test for related samples
(scipy.stats.ttest_rel)
• We asses the significance of the difference between two models
𝑇 𝑋1
𝑛
, 𝑋2
𝑛
=
𝑋1 − 𝑋2
𝑆
𝑛
𝑋1, 𝑋2 – out-of-fold values of 𝑅2
for the respective folds,
𝑆 – deviance of the pairwise differences of 𝑋1 and 𝑋2, 𝑆 = 𝑉𝑎𝑟(𝑋1 − 𝑋2)
𝑛 – number of folds (50 in our case)
Wiki: http://en.wikipedia.org/wiki/T-test#Dependent_t-test

Better approach to cross-validation
• The main strategy is to test less
hypothesis
• When changing a hyper-parameter
we should observe the tendency of
the t-statistic's changing. If t-
statistics changes smoothly and
has an optimum, where 𝜌 < 0.05,
then the difference is statistically
significant
• If t-statistic changes rapidly with
small changes of hyper-parameter -
there is no difference

Why testing a lot of hypothesis is a bad
thing?
• The more hypothesis tests you do, the
higher probability to get significant score
by pure chance is (bad luck)
• There are a lot of approaches how to
avoid this but the easiest one is to do less
tests and a lot of careful thinking what to
test 
*https://xkcd.com/882/

- Is that enough?
- Well, no 

Outliers
• ML algorithms get distracted by outliers and spend most part of
the learning process trying to predict them
• Let’s say we identify outliers, 2 ways to fight it:
• Remove outliers from dataset and train and validate your models
without them. An overview of this approach can be found here:
https://www.kaggle.com/c/sberbank-russian-housing-
market/discussion/35684. Guys took 1st place using it.
• Keep outliers in train part but remove them from validation, so your
model will be trained with outliers but decision regarding performance
will be made without considering outliers impact. Check
cross_validation_score_statement function in this repo:
https://github.com/Danila89/cross_validation_custom

Dealing with outliers. Approach 1.
Dataset
Remove outliers
Train
Validation

Dealing with outliers. Approach 2.
Dataset Train
Validation Validation

Which approach to choose?
That depends, but as a rule of thumb:
• If outliers are introduced by mistake (all records in millions, a
few in thousands) – Approach 1 is preferable
• If outliers are defined by nature of the data – use 2nd approach

Wrapping up
• I did not participate in this competition, but I found this approaches
very promising to share with you.
• All kudos goes to Danila Savenkov who place in this competition 11th
and shared his insights here (he covers much more stuff than this
presentation):
• https://www.kaggle.com/c/mercedes-benz-greener-
manufacturing/discussion/36242
• https://www.youtube.com/watch?v=0qHXNeuNOAE (English subtitles
available)
• slides: https://github.com/yandexdataschool/ml-training-website/raw/gh-
pages/presentations/Savenkov_KaggleMercedes_2017_eng.pdf

Robust approach to machine learning models comparison - Dmitry Larko, Sr. Data Scientist, H2O.ai

More Related Content

Robust approach to machine learning models comparison - Dmitry Larko, Sr. Data Scientist, H2O.ai