This guide will help you prepare and submit your Python model script (.py
file), output file (output.csv
), and model documentation (.pdf
file).
Your Python script should include two key components:
main(chars: pd.DataFrame, features: pd.DataFrame, daily_ret: pd.DataFrame) -> pd.DataFrame
This function loads packages, prepares data, trains the model, and calculates portfolio
weights. It returns a DataFrame with the required columns for the output file
Function Signature:
def main(chars: pd.DataFrame, features: pd.DataFrame, daily_ret: pd.DataFrame) -> pd.DataFrame:
"""
Main function to load packages, prepare data, train model, and calculate portfolio weights.
Args:
chars (pd.DataFrame): DataFrame containing characteristics data.
features (pd.Series): Series containing feature names.
daily_ret (str): DataFrame containing daily returns data.
Returns:
pd.DataFrame: DataFrame with columns 'id', eom, and 'w'.
"""
Example Implementation:
def ecdf(data: pd.Series) -> pd.Series:
""" Example helper function for ecdf. """
if data.empty:
return data
sorted_data = data.sort_values()
ranks = sorted_data.rank(method='min', pct=True)
cdf_values = ranks
return pd.Series(cdf_values, index=data.index)
def prepare_data(chars: pd.DataFrame, features: pd.Series, eom: str) -> pd.DataFrame:
"""Example helper function to apply an ECDF transformation grouped by 'eom'"""
for feature in features:
is_zero = chars[feature] == 0 # Preserve zeros
chars[feature] = chars.groupby(eom)[feature].transform(lambda x: ecdf(x))
chars.loc[is_zero, feature] = 0 # Restore zeros
chars[feature].fillna(0.5, inplace=True) # Impute missing values
return chars
def fit_xgb(train: pd.DataFrame, features: pd.Series) -> xgb.Booster:
""" Example helper function to train an XGBoost model on the training data. """
dtrain = xgb.DMatrix(data=train[features], label=train['ret_exc_lead1m'])
params = {
'booster': 'gbtree',
'eta': 0.1,
'max_depth': 3,
'subsample': 0.5,
'colsample_bytree': 0.5,
'objective': 'reg:squarederror',
'verbosity': 0
}
model = xgb.train(params, dtrain, num_boost_round=100)
return model
def main(chars: pd.DataFrame, features: pd.DataFrame, daily_ret: pd.DataFrame) -> pd.DataFrame:
""" Main function to to load packages, prepare data, train model, and calculate portfolio weights. """
import pandas as pd
import xgboost as xgb
from typing import Tuple
eom='eom'
features = features['features']
chars = prepare_data(chars, features, eom)
train = chars[chars['ctff_test'] == False]
test = chars[chars['ctff_test'] == True]
model = fit_xgb(train, features)
dtest = xgb.DMatrix(test[features])
test['pred'] = model.predict(dtest)
test['rank'] = test.groupby(eom)['pred'].rank(ascending=False, method='average')
test['rank'] = test.groupby(eom)['rank'].transform(lambda x: x - x.mean())
test['w'] = test.groupby(eom)['rank'].transform(lambda x: x / x.abs().sum() * 2)
return test[['id', eom, 'w']]
Make sure your script is executable as a standalone program by including the following code at the end of your script:
if __name__ == "__main__":
features, chars, daily_ret = load_data()
pf = main(chars, features, daily_ret)
export_data(pf)
Your output file should be named output.csv
and contain exactly three columns: id
, eom
, and w
.
Example of output.csv
:
id,eom,w
1,2024-07,0.123
2,2024-07,-0.456
3,2024-07,0.789
...
Your output This file should contain details about your model, methodology, and assumptions.
Before you submit, ensure you have the following:
.py
file) with main
function and executable code blockoutput.csv
) with Contains exactly three columns: id
, eom
, and w
..pdf
file) with details about your model, methodology, and assumptions.