Notebook

Tackling overfitting via cross-validation over quarters

by Thomas Wiecki, Quantopian inc.

Overfitting is probably the biggest potential pitfall in algorithmic trading. You work on a factor or algorithm that looks pretty good, you have some ideas to improve it so that it looks even better. Excitedly, you turn the algorithm on only to be dismayed when the out-of-sample performance doesn't look nearly as good as your backtest. We at Quantopian certainly observe this pattern a lot when evaluating algorithms for inclusion in the fund. If you want your algorithm to do well in the contest, you need to be very careful in this regard.

Note that the context here is not the same as in machine learning, although it is definitely related. Here, we're talking about manual factor development. As such, how we do hold-out here is also a bit different than what you might want to do with a machine learning algorithm.

We could just split our time period in two and develop on the first section, and when we're done, make sure the factor still works on e.g. the last 6 months we haven't looked at. Unfortunately, your factor might only work in certain market regimes (bull/bear market, low/high vol regimes etc) or time periods. So it is quite common that if it doesn't look great to explain that result away by saying "well the market's were really weird recently so I won't take that negative result too seriously."

Another approach is to instead only evaluate your factor on even quarters and keep the odd quarters as our final test. If it fails on the test set then, you are almost out of excuses.

In [19]:
from quantopian.research import run_pipeline
from quantopian.pipeline import Pipeline
from quantopian.pipeline.factors import Returns
from quantopian.pipeline.filters import QTradableStocksUS
from quantopian.pipeline.data.factset import Fundamentals as fsf

import pandas as pd
import numpy as np

import alphalens as al
import pyfolio as pf
import matplotlib.pyplot as plt

Next we will define a very simple momentum factor that we want to hold for 10 days, or two weeks.

In [20]:
ZSCORE_FILTER = 3 # Maximum number of standard deviations to include before counting as outliers
ZERO_FILTER = 0.001 # Minimum weight we allow before dropping security


def make_pipeline():
    
    # Setting up the variables
    alpha_factor = -fsf.debt.latest / \
                    fsf.assets.latest
    
    # Standardized logic for each input factor after this point
    alpha_w = alpha_factor.winsorize(min_percentile=0.02,
                                     max_percentile=0.98,
                                     mask=QTradableStocksUS() & alpha_factor.isfinite())
    
    alpha_z = alpha_w.zscore()
    alpha_weight = alpha_z / 100.0
    
    outlier_filter = alpha_z.abs() < ZSCORE_FILTER
    zero_filter = alpha_weight.abs() > ZERO_FILTER

    universe = QTradableStocksUS() & \
               outlier_filter & \
               zero_filter

    pipe = Pipeline(
        columns={
            'alpha_weight': alpha_weight
        },
        screen=universe
    )
    return pipe
In [21]:
pipe = make_pipeline()
In [22]:
start = pd.Timestamp("2010-01-05")
end = pd.Timestamp("2017-01-01")
results = run_pipeline(pipe, start_date=start, end_date=end)

Subsample the factor to 2 weeks as that is how we would actually trade it.

In [23]:
dts = results.index.get_level_values(0).drop_duplicates()
dts_subsampled = dts.to_series().resample('2W-MON').first().tolist()
dts_subsampled[:5]
Out[23]:
[Timestamp('2010-01-05 00:00:00'),
 Timestamp('2010-01-12 00:00:00'),
 Timestamp('2010-01-26 00:00:00'),
 Timestamp('2010-02-09 00:00:00'),
 Timestamp('2010-02-23 00:00:00')]
In [24]:
results = results.loc[dts_subsampled]

Our split function returns the factor split over quarters.

In [25]:
def split_quarters(factor, train_test='train'):
    odd_even = 0 if train_test == 'train' else 1        
    return factor.iloc[factor.index.get_level_values(0).quarter % 2 == odd_even]

Split our factor results into the train component.

In [26]:
results_train = split_quarters(results, train_test='train')

Analyze factor with alphalens

In [27]:
assets = results_train.index.levels[1]
pricing = get_pricing(assets, start, end + pd.Timedelta(days=30), fields="close_price")
In [28]:
factor_train = al.utils.get_clean_factor_and_forward_returns(results_train['alpha_weight'], pricing,
                                                             periods=[10])
Dropped 0.0% entries from factor data: 0.0% in forward returns computation and 0.0% in binning phase (set max_loss=0 to see potentially suppressed Exceptions).
max_loss is 35.0%, not exceeded: OK!
In [29]:
al.tears.create_summary_tear_sheet(factor_train)
Quantiles Statistics
min max mean std count count %
factor_quantile
1 -0.027616 -0.008685 -0.016031 0.005472 33456 20.019986
2 -0.009471 -0.001687 -0.005490 0.001914 33404 19.988870
3 -0.002940 0.005316 0.001650 0.002238 33407 19.990665
4 0.004175 0.010979 0.007784 0.001746 33404 19.988870
5 0.010446 0.013812 0.012110 0.000729 33442 20.011609
Returns Analysis
10D
Ann. alpha -0.006
beta 0.010
Mean Period Wise Return Top Quantile (bps) -3.748
Mean Period Wise Return Bottom Quantile (bps) 2.731
Mean Period Wise Spread (bps) -6.478
Information Analysis
10D
IC Mean -0.003
IC Std. 0.102
Risk-Adjusted IC -0.025
t-stat(IC) -0.240
p-value(IC) 0.811
IC Skew 0.301
IC Kurtosis 0.011
Turnover Analysis
10D
Quantile 1 Mean Turnover 0.0
Quantile 2 Mean Turnover 0.0
Quantile 3 Mean Turnover 0.0
Quantile 4 Mean Turnover 0.0
Quantile 5 Mean Turnover 0.0
10D
Mean Factor Rank Autocorrelation 1.0
<matplotlib.figure.Figure at 0x7f0967d93a10>
In [30]:
al.tears.create_full_tear_sheet(factor_train)
Quantiles Statistics
min max mean std count count %
factor_quantile
1 -0.027616 -0.008685 -0.016031 0.005472 33456 20.019986
2 -0.009471 -0.001687 -0.005490 0.001914 33404 19.988870
3 -0.002940 0.005316 0.001650 0.002238 33407 19.990665
4 0.004175 0.010979 0.007784 0.001746 33404 19.988870
5 0.010446 0.013812 0.012110 0.000729 33442 20.011609
Returns Analysis
10D
Ann. alpha -0.006
beta 0.010
Mean Period Wise Return Top Quantile (bps) -3.748
Mean Period Wise Return Bottom Quantile (bps) 2.731
Mean Period Wise Spread (bps) -6.478
<matplotlib.figure.Figure at 0x7f09670ce4d0>
Information Analysis
10D
IC Mean -0.003
IC Std. 0.102
Risk-Adjusted IC -0.025
t-stat(IC) -0.240
p-value(IC) 0.811
IC Skew 0.301
IC Kurtosis 0.011
Turnover Analysis
10D
Quantile 1 Mean Turnover 0.0
Quantile 2 Mean Turnover 0.0
Quantile 3 Mean Turnover 0.0
Quantile 4 Mean Turnover 0.0
Quantile 5 Mean Turnover 0.0
10D
Mean Factor Rank Autocorrelation 1.0

As you can see, we only have data for even quarters. Alphalens fortunately handles this reasonably well and just shows straight lines in periods where no data is present.

A couple of indicators suggest that we might be on to something. The mean returns seem to be nicely spread between the top and bottom quantile and the mean IC values is reasonably high, although they are not statistically significant which should make us suspicious.

Usually at this stage you might get a bit excited and start improving the factor. Maybe a better way to compute momentum, maybe normalize across sectors, winsorize etc. This is totally fine, you want a factor that works well. What is not fine is not having hold-out data to see if in that process you overfit your factor.

Fortunately we were wise enough to split our data, so once we are really satisified with our factor (but not sooner!) you can move on to your hold-out set. It is crucial to realize that you can only do this once so practice self-discipline, lest you shoot yourself in the foot.

Evaluating the factor on hold-out data

In [31]:
results_test = split_quarters(results, train_test='test')
In [32]:
factor_test = al.utils.get_clean_factor_and_forward_returns(results_test['alpha_weight'], pricing,
                                                           periods=[10])

al.tears.create_summary_tear_sheet(factor_train)
Dropped 0.0% entries from factor data: 0.0% in forward returns computation and 0.0% in binning phase (set max_loss=0 to see potentially suppressed Exceptions).
max_loss is 35.0%, not exceeded: OK!
Quantiles Statistics
min max mean std count count %
factor_quantile
1 -0.027616 -0.008685 -0.016031 0.005472 33456 20.019986
2 -0.009471 -0.001687 -0.005490 0.001914 33404 19.988870
3 -0.002940 0.005316 0.001650 0.002238 33407 19.990665
4 0.004175 0.010979 0.007784 0.001746 33404 19.988870
5 0.010446 0.013812 0.012110 0.000729 33442 20.011609
Returns Analysis
10D
Ann. alpha -0.006
beta 0.010
Mean Period Wise Return Top Quantile (bps) -3.748
Mean Period Wise Return Bottom Quantile (bps) 2.731
Mean Period Wise Spread (bps) -6.478
Information Analysis
10D
IC Mean -0.003
IC Std. 0.102
Risk-Adjusted IC -0.025
t-stat(IC) -0.240
p-value(IC) 0.811
IC Skew 0.301
IC Kurtosis 0.011
Turnover Analysis
10D
Quantile 1 Mean Turnover 0.0
Quantile 2 Mean Turnover 0.0
Quantile 3 Mean Turnover 0.0
Quantile 4 Mean Turnover 0.0
Quantile 5 Mean Turnover 0.0
10D
Mean Factor Rank Autocorrelation 1.0
<matplotlib.figure.Figure at 0x7f096b9a4910>
In [33]:
al.tears.create_full_tear_sheet(factor_test)
Quantiles Statistics
min max mean std count count %
factor_quantile
1 -0.027605 -0.008714 -0.016014 0.005445 32765 20.021632
2 -0.009482 -0.001686 -0.005520 0.001919 32715 19.991078
3 -0.002941 0.005241 0.001640 0.002248 32707 19.986190
4 0.004185 0.010957 0.007789 0.001743 32715 19.991078
5 0.010468 0.013822 0.012128 0.000722 32746 20.010022
Returns Analysis
10D
Ann. alpha -0.003
beta -0.032
Mean Period Wise Return Top Quantile (bps) -3.524
Mean Period Wise Return Bottom Quantile (bps) 3.094
Mean Period Wise Spread (bps) -6.618
<matplotlib.figure.Figure at 0x7f09813f9ad0>