This guide will help you prepare and submit your R model script (e.g. a .R
file), output file (output.csv
), and model documentation (a .pdf
file).
Your R script should include two key components:
main(chars: data.frame, features: vector, daily_ret: data.frame) -> data.frame
This function loads packages, prepares data, trains the model, and calculates portfolio weights. It returns a data.frame with the required columns for the output file.
Function Signature:
main <- function(chars, features, daily_ret) {
"""
Main function to load packages, prepare data, train model, and calculate portfolio weights.
Args:
chars: data.frame: DataFrame containing characteristics data.
features: vector: Vector containing feature names.
daily_ret: data.frame: DataFrame containing daily returns data.
Returns:
data.frame: DataFrame with columns 'id', eom, and 'w'.
"""
}
Example Implementation:
""" Example helper function for ecdf. """
ecdf_transform <- function(data) {
if (length(data) == 0) return(numeric(0)) # no data
if (all(is.na(data))) return(rep(NA_real_, length(data))) # all NA
sorted <- sort(data, na.last = NA)
pct_ranks <- rank(sorted, ties.method = "min") / length(sorted)
# Map ranks back to original data
result <- rep(NA_real_, length(data))
non_na_indices <- which(!is.na(data))
result[non_na_indices] <- pct_ranks
return(result)
}
"""Example helper function applying an ECDF transformation grouped by 'eom'"""
prepare_data <- function(chars, features) {
eom <- "eom"
for(feature in features) {
is_zero <- chars[[feature]] == 0 # Preserve zeros
# Apply ECDF transformation grouped by 'eom'
chars[[feature]] <- ave(chars[[feature]], chars[[eom]], FUN = function(x) ecdf_transform(x))
chars[[feature]][is_zero] <- 0 # Restore zeros
# Impute missing values with 0.5
chars[[feature]][is.na(chars[[feature]])] <- 0.5
}
return(chars)
}
""" Example helper function to train an XGBoost model on the training data. """
fit_xgb <- function(train, features) {
library(xgboost)
# Prepare the training matrix for xgboost
dtrain <- xgb.DMatrix(data = as.matrix(train[, ..features]), label = train$ret_exc_lead1m)
params <- list(
booster = "gbtree",
eta = 0.1,
max_depth = 3,
subsample = 0.5,
colsample_bytree = 0.5,
objective = "reg:squarederror",
verbosity = 0
)
model <- xgb.train(params = params, data = dtrain, nrounds = 100)
return(model)
}
""" Main function to to load packages, prepare data, train model, and calculate portfolio weights. """
main <- function(chars, features, daily_ret) {
library(xgboost)
eom <- "eom"
features <- as.character(features$features)
# Prepare the data
chars <- prepare_data(chars, features)
# Split data into training and testing sets
train <- subset(chars, ctff_test == FALSE)
test <- subset(chars, ctff_test == TRUE)
# Fit model on training data
model <- fit_xgb(train, features)
# Predict on test data
dtest <- xgb.DMatrix(data = as.matrix(test[, ..features]))
test$pred <- predict(model, dtest)
# Compute ranking and adjust weights
test$rank <- ave(test$pred, test[[eom]], FUN = function(x) rank(-x, ties.method = "average"))
test$rank <- ave(test$rank, test[[eom]], FUN = function(x) x - mean(x))
test$w <- ave(test$rank, test[[eom]], FUN = function(x) x / sum(abs(x)) * 2)
# Return the required columns
result <- data.frame(
id = test$id,
eom = test[[eom]],
w = test$w
)
return(result)
}
Make sure your script is executable as a standalone program by including the following code at the end of your script:
if (interactive()) {
data_list <- load_data()
features <- data_list$features
chars <- data_list$chars
daily_ret <- data_list$daily_ret
pf <- main(chars, features, daily_ret)
export_data(pf)
}
Your output file should be named output.csv
and contain exactly three columns: id
, eom
, and w
.
Example of output.csv
:
id,eom,w
1,2024-07,0.123
2,2024-07,-0.456
3,2024-07,0.789
...
Your output model documentation file should be a PDF that contains details about your model, methodology, and assumptions.
Before you submit, ensure you have the following:
main
function and executable code block.output.csv
) containing exactly three columns: id
, eom
, and w
.