R_guide

Guide for Submitting Your R Script and Output File

This guide will help you prepare and submit your R model script (e.g. a .R file), output file (output.csv), and model documentation (a .pdf file).

Requirements for Your R Script

Your R script should include two key components:

main(chars: data.frame, features: vector, daily_ret: data.frame) -> data.frame
Standardized script execution statement

1. Main Function

This function loads packages, prepares data, trains the model, and calculates portfolio weights. It returns a data.frame with the required columns for the output file.

Function Signature:

main <- function(chars, features, daily_ret) {
    """
    Main function to load packages, prepare data, train model, and calculate portfolio weights.
    Args:
        chars: data.frame: DataFrame containing characteristics data.
        features: vector: Vector containing feature names.
        daily_ret: data.frame: DataFrame containing daily returns data.
    Returns:
        data.frame: DataFrame with columns 'id', eom, and 'w'.
    """
}

Example Implementation:

 """ Example helper function for ecdf. """
ecdf_transform <- function(data) {
  if (length(data) == 0) return(numeric(0))       # no data
  if (all(is.na(data))) return(rep(NA_real_, length(data)))  # all NA
  sorted <- sort(data, na.last = NA)
  pct_ranks <- rank(sorted, ties.method = "min") / length(sorted)
  # Map ranks back to original data
  result <- rep(NA_real_, length(data))
  non_na_indices <- which(!is.na(data))
  result[non_na_indices] <- pct_ranks
  return(result)
}


 """Example helper function applying an ECDF transformation grouped by 'eom'"""
prepare_data <- function(chars, features) {
   eom <- "eom"
  for(feature in features) {
    is_zero <- chars[[feature]] == 0  # Preserve zeros
    # Apply ECDF transformation grouped by 'eom'
    chars[[feature]] <- ave(chars[[feature]], chars[[eom]], FUN = function(x) ecdf_transform(x))
    chars[[feature]][is_zero] <- 0  # Restore zeros
    # Impute missing values with 0.5
    chars[[feature]][is.na(chars[[feature]])] <- 0.5
  }
  return(chars)
}

""" Example helper function to train an XGBoost model on the training data. """
fit_xgb <- function(train, features) {
  library(xgboost)
  # Prepare the training matrix for xgboost
  dtrain <- xgb.DMatrix(data = as.matrix(train[, ..features]), label = train$ret_exc_lead1m)
  params <- list(
    booster = "gbtree",
    eta = 0.1,
    max_depth = 3,
    subsample = 0.5,
    colsample_bytree = 0.5,
    objective = "reg:squarederror",
    verbosity = 0
  )
  model <- xgb.train(params = params, data = dtrain, nrounds = 100)
  return(model)
}

""" Main function to to load packages, prepare data, train model, and calculate portfolio weights. """
main <- function(chars, features, daily_ret) {
  library(xgboost)
  eom <- "eom"
  features <- as.character(features$features)
  # Prepare the data
  chars <- prepare_data(chars, features)
  
  # Split data into training and testing sets
  train <- subset(chars, ctff_test == FALSE)
  test <- subset(chars, ctff_test == TRUE)
  
  # Fit model on training data
  model <- fit_xgb(train, features)
  
  # Predict on test data
  dtest <- xgb.DMatrix(data = as.matrix(test[, ..features]))
  test$pred <- predict(model, dtest)
  
  # Compute ranking and adjust weights
  test$rank <- ave(test$pred, test[[eom]], FUN = function(x) rank(-x, ties.method = "average"))
  test$rank <- ave(test$rank, test[[eom]], FUN = function(x) x - mean(x))
  test$w <- ave(test$rank, test[[eom]], FUN = function(x) x / sum(abs(x)) * 2)
  
  # Return the required columns

  result <- data.frame(
  id = test$id,
  eom = test[[eom]],
  w = test$w
)
  return(result)
}

2. Standardized Script Execution Statement

Make sure your script is executable as a standalone program by including the following code at the end of your script:

if (interactive()) {
  data_list <- load_data()
  features <- data_list$features
  chars <- data_list$chars
  daily_ret <- data_list$daily_ret
  
  pf <- main(chars, features, daily_ret)
  export_data(pf)
}

Output File

Your output file should be named output.csv and contain exactly three columns: id, eom, and w.

Example of output.csv:

id,eom,w
1,2024-07,0.123
2,2024-07,-0.456
3,2024-07,0.789
...

PDF Model Documentation

Your output model documentation file should be a PDF that contains details about your model, methodology, and assumptions.

Submission Checklist

Before you submit, ensure you have the following:

R Script (.R file) with the main function and executable code block.
Output File (output.csv) containing exactly three columns: id, eom, and w.
Model Documentation (.pdf file) detailing your model, methodology, and assumptions.