Notebook

101 Alphas #2 with Parameter Optimization

From the paper 101 Formulaic Alphas

$ (-1 * correlation(rank(delta(log(volume), 2)), rank(((close - open) / open)), 6)) $

This factor returns a negative value if the change in volume is highly correlated with intraday return. In other words, if volume increases (decreases) by a lot on days where the intraday return is high (low), this factor is negative.

I am postulating that the idea behind this factor is that large moves with heavy volume are liquidity demanding trades (ideally by uninformed traders). Traders providing liquidity in these instances would demand a premium/discount to take the other side to compensate for the risk that they may be trading with an informed trader or the risk of being stuck with an inventory too large. Note, this is quite the opposite of how technical analysis generally looks at the volume/price relationships (although I am oversimplifying a bit with this statement).

My in-sample data for this runs from 2003 to 2012. However, it should be noted that this paper was published in 2015. Therefore, any out-of-sample testing should be done on data after 2015, once the researcher gets to that stage. 2012 to 2015 could possibly be used as sort of a cross-validation set to tune hyper parameters if any kind of machine learning is used to tweak the factor.

Parameter Optimization

In this notebook, I will perform a bit of parameter optimization, in part to see what the best parameters or for performance. However, I am more interested in seeing how sensitive the performance of the factor is to changes in the input parameters. If performance is super sensitive to small changes in the inputs, then I would give a higher likelihood that the researchers overfit this factor.

To keep things simple for the moment, I will only adjust the correlation lookback window in the optimization. In the future, I may work on tweaking other parameters if I can find an efficient workflow for doing so.

In [1]:
# Typical imports for use with Pipeline
from quantopian.pipeline import Pipeline, CustomFactor
from quantopian.research import run_pipeline
from quantopian.pipeline.data.builtin import USEquityPricing
from quantopian.pipeline.data import Fundamentals  
from quantopian.pipeline.classifiers.fundamentals import Sector 
from quantopian.pipeline.filters import QTradableStocksUS, Q500US

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt

import alphalens as al
In [2]:
class  VolumeChange(CustomFactor):
    """Factor returning the change in log volume as compared
    to (window_length - 1) days ago. Essentially, this is the
    percent change in volume."""
    inputs = [USEquityPricing.volume]
    window_length = 3
    window_safe=True
  
    def compute(self, today, asset_ids, out, volume):
        out[:] = np.log(volume[-1]) - np.log(volume[-3])
        
class IntradayReturn(CustomFactor):
    """Factor returning the return from today's open to 
    today's close"""
    inputs = [USEquityPricing.open, USEquityPricing.close]
    window_length = 1
    window_safe=True  
    def compute(self, today, asset_ids, out, open_, close):
        out[:] = close / open_ - 1

def make_alpha_2(mask, window_length=6):
    """Construct factor returning the negative of the rank correlation over the 
    past 'window_length' days between the intraday return and the VolumeChange.
    
    Parameters
    -----------
    mask: Filter
        Filter representing what assets get included in factor computation.
        
    Returns
    -------
    Factor
    
    Notes: This is a measure of whether returns are correlated with volume. It is
    negative when volume is stronger on up moves and light on down moves. It is 
    positive when volume is stronger on down moves and lighter on up moves.
        """
    class Alpha2(CustomFactor):
#         inputs = [VolumeChange().rank(), IntradayReturn().rank()]
#         window_length = 6

        def compute(self, today, asset_ids, out, volume_change, intraday_return):
            volume_change_df = pd.DataFrame(volume_change)
            intraday_return_df = pd.DataFrame(intraday_return)
            out[:]=-volume_change_df.corrwith(intraday_return_df)
        
    return Alpha2(mask=mask, 
                  inputs = [VolumeChange(mask=mask).rank(), 
                            IntradayReturn(mask=mask).rank()],
                  window_length=window_length
                 )
In [3]:
def make_pipeline(corr_param_range):
    base_universe = QTradableStocksUS()
#     base_universe = Fundamentals.symbol.latest.element_of(['GS', 'AAPL', 'XOM'])
    closed_end_funds = Fundamentals.share_class_description.latest.startswith('CE')
    universe = base_universe & ~closed_end_funds
    
    factor_dict = {}
    for i in corr_param_range:
        factor_dict['alpha_2_{}'.format(i)] = make_alpha_2(universe, i)

    factor_dict['sector_code'] = Sector(mask=universe)
    
    return Pipeline(columns=factor_dict, screen=universe)

start_date = '2003-01-01' 
end_date = '2012-12-31'
# end_date = '2003-01-10'
corr_param_range = [4,6,8,10,12,14,16,18,20]

result = run_pipeline(make_pipeline(corr_param_range), start_date, end_date, chunksize=504)  
col_order = []

# Reorder Columns
for i in corr_param_range:
    col_order.append('alpha_2_{}'.format(i))
col_order.append('sector_code')
result = result[col_order]
In [4]:
result.head()
Out[4]:
alpha_2_4 alpha_2_6 alpha_2_8 alpha_2_10 alpha_2_12 alpha_2_14 alpha_2_16 alpha_2_18 alpha_2_20 sector_code
2003-01-02 00:00:00+00:00 Equity(2 [ARNC]) 0.202107 0.240665 -0.176664 -0.301140 -0.239590 -0.177081 -0.091721 -0.092662 -0.141505 101
Equity(24 [AAPL]) -0.679728 -0.021000 0.214151 -0.009809 -0.014433 -0.156374 -0.180991 -0.132036 -0.109547 311
Equity(41 [ARCB]) -0.989867 -0.535938 -0.544902 -0.224799 -0.152206 -0.169425 -0.057055 -0.071686 0.062592 310
Equity(60 [ABS]) 0.062825 0.045906 0.285614 0.223836 0.221915 0.022063 -0.078990 0.048286 -0.059117 205
Equity(62 [ABT]) -0.483985 -0.017441 0.203240 0.254046 0.189048 0.085416 0.094814 -0.025710 -0.060899 206

Code to get factor_data

In [5]:
def get_al_prices(result, periods=(1,5,21)):
    assets = result.index.levels[1].unique()
    start_date = result.index.get_level_values(0)[0] 
    end_date = result.index.get_level_values(0)[-1]  + max(periods) * pd.tseries.offsets.BDay()
    pricing = get_pricing(assets, start_date, end_date, fields="open_price")
    return pricing 

def get_factor_data(result, 
                    factor_col, 
                    prices,
                    forward_returns,
                    quantiles=5,
                    bins=None, 
                    groupby=None, 
                    binning_by_group=False,
                    groupby_labels=None,
                    max_loss=0.35):

#     pricing = get_al_prices(result, periods)
    
#     factor_data = al.utils.get_clean_factor_and_forward_returns(factor=result[factor_col],
#                                                                 prices=pricing,
#                                                                 groupby=groupby,
#                                                                 binning_by_group=binning_by_group,
#                                                                 groupby_labels=groupby_labels,
#                                                                 quantiles=quantiles,
#                                                                 bins=bins,
#                                                                 periods=periods,
#                                                                 max_loss=max_loss)
    
    factor_data = al.utils.get_clean_factor(result[factor_col], 
                                            forward_returns,
                                            groupby=groupby,
                                            binning_by_group=binning_by_group,
                                            groupby_labels=groupby_labels,
                                            quantiles=quantiles,
                                            bins=bins,
                                            max_loss=max_loss)
    
    return factor_data

Optimize by Correlation Window

In [6]:
periods=(1,3,5,7,10,12,15,20)
prices = get_al_prices(result, periods)
forward_returns = al.utils.compute_forward_returns(result[result.columns[0]], prices, periods)
In [7]:
forward_returns.head()
Out[7]:
1D 3D 5D 7D 10D 12D 15D 20D
date asset
2003-01-02 00:00:00+00:00 Equity(2 [ARNC]) 0.022567 0.059515 -0.046867 -0.012123 -0.035583 -0.057731 -0.057731 -0.141545
Equity(24 [AAPL]) 0.030631 0.029928 0.018126 0.037516 -0.010538 -0.010538 -0.008431 -0.011803
Equity(31 [ABAX]) 0.041850 0.083150 0.069659 0.029460 0.026982 0.010738 -0.040198 0.000000
Equity(39 [DDC]) 0.010651 0.039744 0.014238 -0.021414 -0.004653 -0.011772 0.078480 0.065699
Equity(41 [ARCB]) 0.072531 0.086505 0.087304 0.076790 0.045382 -0.026395 -0.074927 -0.071422
In [8]:
# factor_data={}
ic_dict={}
for factor_col in result.columns:
    if factor_col != 'sector_code':
        print "-"*30 + "\nGetting Factor Data for '{}'".format(factor_col)
        factor_data = get_factor_data(result, 
                                      factor_col, 
                                      prices,
                                      forward_returns)
        print "-"*30 + "\nCalculating ICs for '{}'".format(factor_col)
        ic_dict[factor_col] = al.performance.mean_information_coefficient(factor_data)
------------------------------
Getting Factor Data for 'alpha_2_4'
Dropped 0.4% entries from factor data: 0.4% in forward returns computation and 0.0% in binning phase (set max_loss=0 to see potentially suppressed Exceptions).
max_loss is 35.0%, not exceeded: OK!
------------------------------
Calculating ICs for 'alpha_2_4'
------------------------------
Getting Factor Data for 'alpha_2_6'
Dropped 0.4% entries from factor data: 0.4% in forward returns computation and 0.0% in binning phase (set max_loss=0 to see potentially suppressed Exceptions).
max_loss is 35.0%, not exceeded: OK!
------------------------------
Calculating ICs for 'alpha_2_6'
------------------------------
Getting Factor Data for 'alpha_2_8'
Dropped 0.4% entries from factor data: 0.4% in forward returns computation and 0.0% in binning phase (set max_loss=0 to see potentially suppressed Exceptions).
max_loss is 35.0%, not exceeded: OK!
------------------------------
Calculating ICs for 'alpha_2_8'
------------------------------
Getting Factor Data for 'alpha_2_10'
Dropped 0.4% entries from factor data: 0.4% in forward returns computation and 0.0% in binning phase (set max_loss=0 to see potentially suppressed Exceptions).
max_loss is 35.0%, not exceeded: OK!
------------------------------
Calculating ICs for 'alpha_2_10'
------------------------------
Getting Factor Data for 'alpha_2_12'
Dropped 0.4% entries from factor data: 0.4% in forward returns computation and 0.0% in binning phase (set max_loss=0 to see potentially suppressed Exceptions).
max_loss is 35.0%, not exceeded: OK!
------------------------------
Calculating ICs for 'alpha_2_12'
------------------------------
Getting Factor Data for 'alpha_2_14'
Dropped 0.4% entries from factor data: 0.4% in forward returns computation and 0.0% in binning phase (set max_loss=0 to see potentially suppressed Exceptions).
max_loss is 35.0%, not exceeded: OK!
------------------------------
Calculating ICs for 'alpha_2_14'
------------------------------
Getting Factor Data for 'alpha_2_16'
Dropped 0.4% entries from factor data: 0.4% in forward returns computation and 0.0% in binning phase (set max_loss=0 to see potentially suppressed Exceptions).
max_loss is 35.0%, not exceeded: OK!
------------------------------
Calculating ICs for 'alpha_2_16'
------------------------------
Getting Factor Data for 'alpha_2_18'
Dropped 0.4% entries from factor data: 0.4% in forward returns computation and 0.0% in binning phase (set max_loss=0 to see potentially suppressed Exceptions).
max_loss is 35.0%, not exceeded: OK!
------------------------------
Calculating ICs for 'alpha_2_18'
------------------------------
Getting Factor Data for 'alpha_2_20'
Dropped 0.4% entries from factor data: 0.4% in forward returns computation and 0.0% in binning phase (set max_loss=0 to see potentially suppressed Exceptions).
max_loss is 35.0%, not exceeded: OK!
------------------------------
Calculating ICs for 'alpha_2_20'
In [9]:
ic_df = pd.DataFrame.from_dict(ic_dict)[col_order[:-1]]
ic_df
Out[9]:
alpha_2_4 alpha_2_6 alpha_2_8 alpha_2_10 alpha_2_12 alpha_2_14 alpha_2_16 alpha_2_18 alpha_2_20
1D 0.003254 0.003735 0.004258 0.003816 0.004303 0.004392 0.004393 0.004439 0.004360
3D 0.004920 0.006562 0.006746 0.006705 0.007213 0.007156 0.007333 0.007258 0.007115
5D 0.005548 0.007183 0.007520 0.007788 0.008084 0.008186 0.008216 0.008079 0.007898
7D 0.005438 0.006946 0.007501 0.007732 0.008063 0.008078 0.008069 0.007813 0.007775
10D 0.004768 0.006439 0.006953 0.007176 0.007472 0.007451 0.007345 0.007350 0.007581
12D 0.004447 0.005994 0.006439 0.006692 0.006944 0.006888 0.006908 0.007069 0.007269
15D 0.003852 0.005273 0.005759 0.005897 0.006156 0.006289 0.006502 0.006769 0.007131
20D 0.003233 0.004494 0.004924 0.005311 0.005719 0.006072 0.006539 0.006968 0.007388
In [10]:
ic_df.loc['5D'].idxmax()
Out[10]:
'alpha_2_16'
In [11]:
ic_df.plot();
In [12]:
import seaborn as sns

sns.heatmap(ic_df, annot=True, cmap='RdBu', vmin=-.01, vmax=.01)
Out[12]:
<matplotlib.axes._subplots.AxesSubplot at 0x7f09b5c31ad0>

Tearsheet on Original Params

Correlation_window = 6 days

In [13]:
# prices, factor_data = get_factor_data(result, 'alpha_2')
factor_data = get_factor_data(result, 
                              ['alpha_2_6'], 
                              prices,
                              forward_returns)
Dropped 0.4% entries from factor data: 0.4% in forward returns computation and 0.0% in binning phase (set max_loss=0 to see potentially suppressed Exceptions).
max_loss is 35.0%, not exceeded: OK!
In [14]:
al.tears.create_full_tear_sheet(factor_data, long_short=True, group_neutral=False )
Quantiles Statistics
min max mean std count count %
factor_quantile
1 -1.000000 -0.307079 -0.672667 0.137022 917070 20.022300
2 -0.587819 0.005523 -0.324010 0.096262 915526 19.988590
3 -0.312260 0.255733 -0.045512 0.092879 915557 19.989267
4 -0.047081 0.539635 0.239701 0.100056 915526 19.988590
5 0.242650 1.000000 0.619084 0.153605 916564 20.011253
Returns Analysis
1D 3D 5D 7D 10D 12D 15D 20D
Ann. alpha 0.010 0.014 0.013 0.013 0.011 0.010 0.008 0.006
beta 0.004 0.006 0.004 0.002 0.001 0.002 0.002 0.002
Mean Period Wise Return Top Quantile (bps) 0.548 0.804 0.732 0.690 0.570 0.474 0.348 0.257
Mean Period Wise Return Bottom Quantile (bps) -0.416 -0.579 -0.543 -0.513 -0.510 -0.453 -0.386 -0.318
Mean Period Wise Spread (bps) 0.964 1.383 1.275 1.203 1.080 0.927 0.735 0.575
<matplotlib.figure.Figure at 0x7f09b59f3810>