Modeling intensity vs time.

One of the challenges to assess match demands is that the intensity and density of actions is likely time-independent, i.e., the longer the period, the lower the average intensity. Delaney et al. proposed to model match-related locomotor intensity vs. time relationship during matches using a power relationship. We used this method in a recent paper to examine at which extent different SSG formats could be used to either under- or overload the running- and/or mechanical- demands of competitive matches. In this post, we will see how to perform these analyses in R.

library("tidyverse")       
library("readxl")
library(RcppRoll)

First, I need to import csv files to create my database. To work around these models, we usually use multiple csv. An easy and efficient way to import a batch of files is to map the read_csv function over the differents files.

For this, the first step is to list all the files into our directory and then path the list of files as an argument to the function read_csv with the map function from the purrr package.

my_files <- list.files(path = "C:/Users/mlacome/Documents/MathLacome-2018-blog/1_DataSets/0_GameData", pattern="*.csv", full.names = TRUE)
my_file <- map_dfr(my_files, read_csv, skip = 2, .id = "FileName")
## Parsed with column specification:
## cols(
##   `PLAYER NAME` = col_character(),
##   `TEAM FIXTURE` = col_character(),
##   `MATCH FIXTURE` = col_character(),
##   HALF = col_character(),
##   TIME = col_double(),
##   `Player X Position` = col_integer(),
##   `Player Y Position` = col_integer()
## )
## Parsed with column specification:
## cols(
##   `PLAYER NAME` = col_character(),
##   `TEAM FIXTURE` = col_character(),
##   `MATCH FIXTURE` = col_character(),
##   HALF = col_character(),
##   TIME = col_double(),
##   `Player X Position` = col_integer(),
##   `Player Y Position` = col_integer()
## )
## Parsed with column specification:
## cols(
##   `PLAYER NAME` = col_character(),
##   `TEAM FIXTURE` = col_character(),
##   `MATCH FIXTURE` = col_character(),
##   HALF = col_character(),
##   TIME = col_double(),
##   `Player X Position` = col_integer(),
##   `Player Y Position` = col_integer()
## )
## Parsed with column specification:
## cols(
##   `PLAYER NAME` = col_character(),
##   `TEAM FIXTURE` = col_character(),
##   `MATCH FIXTURE` = col_character(),
##   HALF = col_character(),
##   TIME = col_double(),
##   `Player X Position` = col_integer(),
##   `Player Y Position` = col_integer()
## )
head(my_file)
## # A tibble: 6 x 8
##   FileName `PLAYER NAME` `TEAM FIXTURE` `MATCH FIXTURE` HALF        TIME
##   <chr>    <chr>         <chr>          <chr>           <chr>      <dbl>
## 1 1        Player1       Team1          Team2 v Team1   First Half 0.   
## 2 1        Player1       Team1          Team2 v Team1   First Half 0.100
## 3 1        Player1       Team1          Team2 v Team1   First Half 0.200
## 4 1        Player1       Team1          Team2 v Team1   First Half 0.300
## 5 1        Player1       Team1          Team2 v Team1   First Half 0.400
## 6 1        Player1       Team1          Team2 v Team1   First Half 0.500
## # ... with 2 more variables: `Player X Position` <int>, `Player Y
## #   Position` <int>

We imported tracking data with only Players x & y positions. Using mutate, we can calculate distances, speeds & thus, distances covered in different speed zones.

Game_analysis <- my_file %>%
  group_by(`PLAYER NAME`, `MATCH FIXTURE`, HALF) %>%
  mutate(Distance_m = (sqrt((`Player X Position` - lag(`Player X Position`))^2 + (`Player Y Position` - lag(`Player Y Position`))^2)) / 100) %>%
  mutate(Speed_km.h = (Distance_m / 0.1)*3.6) %>%
  mutate(Distance_14.4 = ifelse(Speed_km.h > 14.4, Distance_m, NA))

Delaney used different rolling average (from 1 to 15-min) to do his model. We can set the duration using a parameter. As my files have a frequency of 10Hz, I only need to multiply the number of minutes by 600.

roll_mean_dur <- c(1:15)*600L

We want to perform the rolling average on distance covered & distance above 14.4 km.h-1 per minute in this exemple. The first step is to create a function (rolling_foo) that will store the rolling sum of distance covered. Then, we will use the map_dfr to apply our function for each rolling duration and store the results in a dataframe. We need approximatively 1min to analyse 4 files of 90+ minutes. I’m sure there is possibility to get something faster and would be happy to get your feedbacks.

start_seg <- Sys.time()

rolling_foo <- function(x){
    res <- Game_analysis %>%
    group_by(`PLAYER NAME`, `MATCH FIXTURE`, HALF) %>%
    mutate(Roll_dur = x) %>%
    mutate(Distance = roll_sum(Distance_m, x, by=1L, fill=NA, na.rm=T)) %>%
    mutate(DistanceHS = roll_sum(Distance_14.4, x, by=1L, fill=NA, na.rm=T))
}

Game_analysis_2 <- map_dfr(roll_mean_dur, rolling_foo) %>%
  group_by(`PLAYER NAME`, `MATCH FIXTURE`, Roll_dur) %>%
  summarise(Peak_Dist_min = max(Distance, na.rm=T)/(mean(Roll_dur, na.rm=T)/600),
            Peak_DistHS_min = max(DistanceHS, na.rm=T)/(mean(Roll_dur, na.rm=T)/600))

end_seg <- Sys.time()
(end_seg - start_seg)
## Time difference of 58.9198 secs

As described in Delaney paper, a power law curve describes non-linear but clearly dependent relationships between two variables (x and y) & can be given by the equation: y = c * x^n where n and c are constants. To conclude this post, we need to perform the linear relationships on the log of Distances & log of durations.

(Peak_dist_model <- lm(log(Peak_Dist_min) ~ log(Roll_dur/600), data = Game_analysis_2))
## 
## Call:
## lm(formula = log(Peak_Dist_min) ~ log(Roll_dur/600), data = Game_analysis_2)
## 
## Coefficients:
##       (Intercept)  log(Roll_dur/600)  
##            5.3120            -0.1626
(Peak_Dist_Intercept <- exp(Peak_dist_model$coefficients[1]))
## (Intercept) 
##    202.7516
(Peak_Dist_Slope <- Peak_dist_model$coefficients[2])
## log(Roll_dur/600) 
##        -0.1626441

Related

Previous