Working with Fitbit Data in R

Dec 27, 2017 · 1591 words · 8 minutes read

Several years late to the game, I got a Fitbit for Christmas this year. With my student lifestyle finally having been replaced with an office desk one, it seemed like a good time to start keeping track of how much I exercise.

As a bonus, a Fitbit gives me one more modality of personal data to analyze. I spent a chunk of Christmas break experimenting with how I could analyze my own data in R. This post will go over the basic steps for working with heart rate and step count data.

API Access

The basic Fitbit app provides some summary data about your resting heart rate, steps taken, hours slept, etc. This data can also be downloaded as CSV files from settings part of the Fitbit online dashboard.

To obtain more granular data, you need to register your app in the Fitbit developer centre. While the form seems long, it’s a formality – all apps registered as “personal” are automatically approved. I made no attempts at coming up with an elaborate description of my app, and simply stated that I wanted access to my personal data.

Once your application has been registered, click on the link to the OAuth 2.0 Tutorial page and follow the steps. In the end you are given a personalized “token” (OAuth 2.0 access token) that can be used to make requests to the Fitbit API.

Using the API in R

To get started with the Fitbit Web API, the OAuth 2.0 Tutorial page contains an example using the UNIX command curl. The example gets your user profile. To experiment with obtaining other types of data, you can find other API endpoint URLs in the Fitbit Web API.

Rather than making command line requests, I wanted to read my Fitbit data directly into R. The package curl is a web client interface for R, and can be used to make requests to the Fitbit API.

To get started, we need to define a new handle and set the headers. This authorizes us to make requests to the Fitbit API. After setting up the handle, we can use the curl function to open a connection to the API endpoint.

library(curl);

# Read Fitbit token from file
token <- scan('~/fitbit_token.txt', what = character() );

h <- new_handle();

handle_setheaders(
    h,
    'Authorization' = paste('Bearer', token)
    );

Plotting Minute-Level Heart Rate Data

The Fitbit Web API documentation explains how to get your heartrate as a time series. There are four different formats for retrieving the data. I was interested in getting minute-level data from Christmas Day, and opted for the API endpoint that would give me all observations from a single day.

date <- '2017-12-25';

# open the curl connection
hr.connection <- curl(
    paste0('https://api.fitbit.com/1/user/-/activities/heart/date/', date, '/1d/1min.json'), 
    handle = h
    );

To read data from the curl connection to a variable, I used the readLines function. This gives a JSON string, which I parsed further to a list with the fromJSON function from the jsonlite package.

library(jsonlite);

# read to string object
# turn off warnings for no end-of-line character on final line
hr.string <- readLines( hr.connection, warn = FALSE );

hr.content <- fromJSON( hr.string );
str(hr.content);
## List of 2
##  $ activities-heart         :'data.frame':   1 obs. of  2 variables:
##   ..$ dateTime: chr "2017-12-25"
##   ..$ value   :'data.frame': 1 obs. of  3 variables:
##   .. ..$ customHeartRateZones:List of 1
##   .. .. ..$ : list()
##   .. ..$ heartRateZones      :List of 1
##   .. .. ..$ :'data.frame':   4 obs. of  5 variables:
##   .. .. .. ..$ caloriesOut: num [1:4] 1584 582 0 0
##   .. .. .. ..$ max        : int [1:4] 97 135 164 220
##   .. .. .. ..$ min        : int [1:4] 30 97 135 164
##   .. .. .. ..$ minutes    : int [1:4] 872 106 0 0
##   .. .. .. ..$ name       : chr [1:4] "Out of Range" "Fat Burn" "Cardio" "Peak"
##   .. ..$ restingHeartRate    : int 80
##  $ activities-heart-intraday:List of 3
##   ..$ dataset        :'data.frame':  978 obs. of  2 variables:
##   .. ..$ time : chr [1:978] "06:19:00" "06:20:00" "06:21:00" "06:22:00" ...
##   .. ..$ value: int [1:978] 96 97 88 90 86 82 80 79 77 76 ...
##   ..$ datasetInterval: int 1
##   ..$ datasetType    : chr "minute"

The minute-level heart rate data is available under activities-heart-intraday and dataset. It contains the time in hour:minute:second format and the corresponding heartrate in beats per minute. To convert the time to a format R could understand, I used the chron package.

library(chron);

heartrate <- hr.content[['activities-heart-intraday']]$dataset;
heartrate$time <- chron(
    dates = rep(date, nrow(heartrate)), 
    times = heartrate$time,
    format = c('dates' = 'y-m-d', 'times' = 'h:m:s')
    );

While the heart rate data is interesting by itself, I also wanted to analyze how my pulse compared to my activity level. To this end, I obtained data on steps per minute from the API.

steps.connection <- curl(
    url = paste0('https://api.fitbit.com/1/user/-/activities/steps/date/', date, '/1d.json'),
    handle = h
    );

steps.content <- fromJSON( readLines(steps.connection, warn = FALSE) );

steps <- steps.content[['activities-steps-intraday']]$dataset;
steps$time <- chron(
    dates = rep(date, nrow(steps)), 
    times = steps$time,
    format = c('dates' = 'y-m-d', 'times' = 'h:m:s')
    );

Missing data is handled differently for step counts and heartrate. Timepoints without an estimate of beats per minute are left out of the heartrate intraday time series. By contrast, all timepoints are included in the steps time series, and time points when you where not wearing your Fitbit are entered with a step count of zero. To even out these differences, I restricted both time series to timepoints present in the heartrate data.

Finally, I plotted heartrate and step count in separate panels of the same plot.

steps <- steps[ steps$time %in% heartrate$time, ];

par(
    mfrow = c(2, 1),
    mar = c(3, 4, 0, 0)
    );

plot(
    heartrate, 
    pch = 19,
    ylab = 'Heartrate',
    xlab = ''
    );

plot(
    steps, 
    pch = 19,
    ylab = 'Steps per minute',
    xlab = ''
    );

Predicting Heart Rate

As expected, the highest heartrate peaks correspond to times when I was moving around. I explored this trend further by merging the data and comparing steps per minute to my heartrate.

merged.data <- merge(steps, heartrate, by = 'time');
names(merged.data) <- c('time', 'steps', 'heartrate');

plot(
    heartrate ~ steps,
    merged.data,
    pch = 19, 
    xlab = 'Steps per minute',
    ylab = 'Heartrate'
    );

linear.model <- lm(heartrate ~ steps, merged.data);
abline(linear.model, lty = 2, lwd = 2, col = 'firebrick');

There’s a pretty clear linear trend, suggesting that a linear model can be used to predict my heartrate based on how fast I’m walking.

plot(
    heartrate, 
    pch = 19,
    ylab = 'Heartrate',
    xlab = '',
    main = 'Simple Linear Model',
    cex = 0.7
    );

points(
    linear.model$fitted.values ~ heartrate$time,
    col = 'firebrick',
    pch = 19,
    cex = 0.7,
    xlab = 'Steps per minute',
    ylab = 'Heartrate'
    );

# add legend
legend(
    x = heartrate$time[1],
    y = 128,
    bty = 'n',
    legend = c('observed', 'expected'),
    pch = 19,
    col = c('black', 'firebrick'),
    cex = 0.7
    );

The simple linear regression model seems to roughly explain most of the increases in heartrate. We can further improve the accuracy by considering the number of steps taken in the few minutes before the heart rate measurement.

# generate lagged time series
merged.data$steps.lag1 <- c(NA, merged.data$steps[-nrow(merged.data)]);
merged.data$steps.lag2 <- c(NA, merged.data$steps.lag1[-nrow(merged.data)]);
merged.data$steps.lag3 <- c(NA, merged.data$steps.lag2[-nrow(merged.data)]);
merged.data$steps.lag4 <- c(NA, merged.data$steps.lag3[-nrow(merged.data)]);

full.lagged.model <- lm(
    heartrate ~ steps + steps.lag1 + steps.lag2 + steps.lag3 + steps.lag4,
    merged.data
    );

print( summary(full.lagged.model)$coefficients );
##                Estimate Std. Error    t value     Pr(>|t|)
## (Intercept) 82.96694430 0.27501733 301.678964 0.000000e+00
## steps        0.08652370 0.02236382   3.868914 1.166379e-04
## steps.lag1   0.10983762 0.02678509   4.100700 4.464906e-05
## steps.lag2   0.06211817 0.02665392   2.330545 1.998203e-02
## steps.lag3   0.03632664 0.02673857   1.358586 1.745945e-01
## steps.lag4   0.03147964 0.02232500   1.410063 1.588424e-01

Including lagged terms in the model suggests that my heart rate takes about two minutes to return to normal after walking. I further considered whether there could be interaction effects between the step counts, and found a significant interaction between steps.lag1 and steps.lag2.

lagged.model <- lm(
    heartrate ~ steps + steps.lag1 + steps.lag2 + steps.lag1:steps.lag2,
    merged.data
    );

print( summary(lagged.model)$coefficients );
##                           Estimate   Std. Error    t value     Pr(>|t|)
## (Intercept)           82.193639668 0.2754924930 298.351649 0.000000e+00
## steps                  0.111584933 0.0210876358   5.291486 1.499874e-07
## steps.lag1             0.239150192 0.0284393464   8.409131 1.460689e-16
## steps.lag2             0.262778894 0.0266710111   9.852603 6.817695e-22
## steps.lag1:steps.lag2 -0.003526606 0.0003580413  -9.849720 6.997837e-22

This updated model seems to be a good fit to most of the increases in heartrate. However, the lowest predicted heartrate is 82.2. The lowest observed heartrate is around 60 beats per minute, and there is considerable variability between 60 and 82 that is not captured by the model.

plot(
    heartrate ~ time,
    merged.data, 
    pch = 19,
    ylab = 'Heartrate',
    xlab = '',
    main = 'Two Minute Lagged Model with Interaction',
    cex = 0.7
    );

points(
    lagged.model$fitted.values ~ merged.data$time[-c(1, 2)],
    col = 'firebrick',
    pch = 19,
    cex = 0.7,
    xlab = 'Steps per minute',
    ylab = 'Heartrate'
    );

# add legend
legend(
    x = merged.data$time[3],
    y = 128,
    bty = 'n',
    legend = c('observed', 'expected'),
    pch = 19,
    col = c('black', 'firebrick'),
    cex = 0.7
    );

Conclusion

The Fitbit API makes it easy to access your own data from R. At the moment I only have a few days worth of data, but as time goes on I’m hoping to do more fun analyses. With an increasing number of companies collecting data on me, I’d like to at least be aware of how much they can decipher about my daily whereabouts and activities.

Update: The same day I posted this, Nick Strayer published a great analysis of all of his Fitbit data from 2017 – check it out!