Pyhton_Guide

Guide for Submitting Your Python Script and Output File

This guide will help you prepare and submit your Python model script (.py file), output file (output.csv), and model documentation (.pdf file).

Requirements for Your Python Script

Your Python script should include two key components:

main(chars: pd.DataFrame, features: pd.DataFrame, daily_ret: pd.DataFrame) -> pd.DataFrame
standardized script execution statement

1. Main Function

This function loads packages, prepares data, trains the model, and calculates portfolio
weights. It returns a DataFrame with the required columns for the output file

Function Signature:

def main(chars: pd.DataFrame, features: pd.DataFrame, daily_ret: pd.DataFrame) -> pd.DataFrame:
    """
    Main function to load packages, prepare data, train model, and calculate portfolio weights.
    Args:
        chars (pd.DataFrame): DataFrame containing characteristics data.
        features (pd.Series): Series containing feature names.
        daily_ret (str): DataFrame containing daily returns data.
    Returns:
        pd.DataFrame: DataFrame with columns 'id', eom, and 'w'.
    """

Example Implementation:

def ecdf(data: pd.Series) -> pd.Series:
    """ Example helper function for ecdf. """
    if data.empty:
        return data
    sorted_data = data.sort_values()
    ranks = sorted_data.rank(method='min', pct=True)
    cdf_values = ranks
    return pd.Series(cdf_values, index=data.index)

def prepare_data(chars: pd.DataFrame, features: pd.Series, eom: str) -> pd.DataFrame:
    """Example helper function to apply an ECDF transformation grouped by 'eom'"""
    for feature in features:
        is_zero = chars[feature] == 0  # Preserve zeros
        chars[feature] = chars.groupby(eom)[feature].transform(lambda x: ecdf(x))
        chars.loc[is_zero, feature] = 0  # Restore zeros
        chars[feature].fillna(0.5, inplace=True)  # Impute missing values
    return chars

def fit_xgb(train: pd.DataFrame, features: pd.Series) -> xgb.Booster:
    """ Example helper function to train an XGBoost model on the training data. """
    dtrain = xgb.DMatrix(data=train[features], label=train['ret_exc_lead1m'])
    params = {
        'booster': 'gbtree',
        'eta': 0.1,
        'max_depth': 3,
        'subsample': 0.5,
        'colsample_bytree': 0.5,
        'objective': 'reg:squarederror',
        'verbosity': 0
    }
    model = xgb.train(params, dtrain, num_boost_round=100)
    return model

def main(chars: pd.DataFrame, features: pd.DataFrame, daily_ret: pd.DataFrame) -> pd.DataFrame:
    """ Main function to to load packages, prepare data, train model, and calculate portfolio weights. """
    import pandas as pd
    import xgboost as xgb
    from typing import Tuple

    eom='eom'
    features = features['features']
    chars = prepare_data(chars, features, eom)
    
    train = chars[chars['ctff_test'] == False]
    test = chars[chars['ctff_test'] == True]
    
    model = fit_xgb(train, features)
    
    dtest = xgb.DMatrix(test[features])
    test['pred'] = model.predict(dtest)
    
    test['rank'] = test.groupby(eom)['pred'].rank(ascending=False, method='average')
    test['rank'] = test.groupby(eom)['rank'].transform(lambda x: x - x.mean())
    test['w'] = test.groupby(eom)['rank'].transform(lambda x: x / x.abs().sum() * 2)
    
    return test[['id', eom, 'w']]

2. Standardized script execution statement`

Make sure your script is executable as a standalone program by including the following code at the end of your script:

if __name__ == "__main__":
    features, chars, daily_ret = load_data()
    pf = main(chars, features, daily_ret)
    export_data(pf)

Output File

Your output file should be named output.csv and contain exactly three columns: id, eom, and w.

Example of output.csv:

id,eom,w
1,2024-07,0.123
2,2024-07,-0.456
3,2024-07,0.789
...

PDF Model Documentation

Your output This file should contain details about your model, methodology, and assumptions.

Submission Checklist

Before you submit, ensure you have the following:

Python Script (.py file) with main function and executable code block
Output File (output.csv) with Contains exactly three columns: id, eom, and w.
Model Documentatuion (.pdf file) with details about your model, methodology, and assumptions.