This guide will help you prepare and submit your Python model script (.py file), output file (output.csv), and model documentation (.pdf file).
Your Python script should include two key components:
main(chars: pd.DataFrame, features: pd.DataFrame, daily_ret: pd.DataFrame) -> pd.DataFrameThis function loads packages, prepares data, trains the model, and calculates portfolio
weights. It returns a DataFrame with the required columns for the output file
Function Signature:
def main(chars: pd.DataFrame, features: pd.DataFrame, daily_ret: pd.DataFrame) -> pd.DataFrame:
"""
Main function to load packages, prepare data, train model, and calculate portfolio weights.
Args:
chars (pd.DataFrame): DataFrame containing characteristics data.
features (pd.Series): Series containing feature names.
daily_ret (str): DataFrame containing daily returns data.
Returns:
pd.DataFrame: DataFrame with columns 'id', eom, and 'w'.
"""
Example Implementation:
def ecdf(data: pd.Series) -> pd.Series:
""" Example helper function for ecdf. """
if data.empty:
return data
sorted_data = data.sort_values()
ranks = sorted_data.rank(method='min', pct=True)
cdf_values = ranks
return pd.Series(cdf_values, index=data.index)
def prepare_data(chars: pd.DataFrame, features: pd.Series, eom: str) -> pd.DataFrame:
"""Example helper function to apply an ECDF transformation grouped by 'eom'"""
for feature in features:
is_zero = chars[feature] == 0 # Preserve zeros
chars[feature] = chars.groupby(eom)[feature].transform(lambda x: ecdf(x))
chars.loc[is_zero, feature] = 0 # Restore zeros
chars[feature].fillna(0.5, inplace=True) # Impute missing values
return chars
def fit_xgb(train: pd.DataFrame, features: pd.Series) -> xgb.Booster:
""" Example helper function to train an XGBoost model on the training data. """
dtrain = xgb.DMatrix(data=train[features], label=train['ret_exc_lead1m'])
params = {
'booster': 'gbtree',
'eta': 0.1,
'max_depth': 3,
'subsample': 0.5,
'colsample_bytree': 0.5,
'objective': 'reg:squarederror',
'verbosity': 0
}
model = xgb.train(params, dtrain, num_boost_round=100)
return model
def main(chars: pd.DataFrame, features: pd.DataFrame, daily_ret: pd.DataFrame) -> pd.DataFrame:
""" Main function to to load packages, prepare data, train model, and calculate portfolio weights. """
import pandas as pd
import xgboost as xgb
from typing import Tuple
eom='eom'
features = features['features']
chars = prepare_data(chars, features, eom)
train = chars[chars['ctff_test'] == False]
test = chars[chars['ctff_test'] == True]
model = fit_xgb(train, features)
dtest = xgb.DMatrix(test[features])
test['pred'] = model.predict(dtest)
test['rank'] = test.groupby(eom)['pred'].rank(ascending=False, method='average')
test['rank'] = test.groupby(eom)['rank'].transform(lambda x: x - x.mean())
test['w'] = test.groupby(eom)['rank'].transform(lambda x: x / x.abs().sum() * 2)
return test[['id', eom, 'w']]
Make sure your script is executable as a standalone program by including the following code at the end of your script:
if __name__ == "__main__":
features, chars, daily_ret = load_data()
pf = main(chars, features, daily_ret)
export_data(pf)
Your output file should be named output.csv and contain exactly three columns: id, eom, and w.
Example of output.csv:
id,eom,w
1,2024-07,0.123
2,2024-07,-0.456
3,2024-07,0.789
...
Your output This file should contain details about your model, methodology, and assumptions.
Before you submit, ensure you have the following:
.py file) with main function and executable code blockoutput.csv) with Contains exactly three columns: id, eom, and w..pdf file) with details about your model, methodology, and assumptions.