Paper:
Chen, Q. and Poorthuis, A. (2021) Identifying home locations in human mobility data: an open-source R package for comparison and reproducibility. International Journal of Geographical Information Science, Pages 1425-1448, https://doi.org/10.1080/13658816.2021.1887489
# Install development version from GitHub
install_github("spatialnetworkslab/homelocator")
These are some basic examples that show you how to use common functions in the package.
You need to make sure the input dataset includes three essential attributes:
- a unique identifier for the person or user
- a unique identifier for the spatial location for the data point
- a timestamp that reflects the time the data point was created
You can use validate_dataset()
to validate your input dataset before starting identifying meaningful locations. In this function, you need to specify the names of three essential attribute that used in your dataset.
# Load homelocator library
library(homelocator)
#> Welcome to homelocator package!
# Load other needed libraries
library(tidyverse)
library(here)
# load test sample dataset
data("test_sample", package = "homelocator")
df_validated <- validate_dataset(test_sample, user = "u_id", timestamp = "created_at", location = "grid_id")
#> 🎉 Congratulations!! Your dataset has passed validation.
#> 👤 There are 100 unique users in your dataset.
#> 🌏 Now start your journey identifying their meaningful location(s)!
#> 👏 Good luck!
#>
head(df_validated)
#> # A tibble: 6 x 3
#> u_id grid_id created_at
#> <int> <int> <dttm>
#> 1 92298565 1581 2016-04-17 22:43:06
#> 2 33908340 1461 2014-10-03 16:29:48
#> 3 92298565 1136 2014-02-07 07:26:15
#> 4 11616678 1375 2014-07-18 10:08:21
#> 5 11616678 1375 2013-11-24 23:16:24
#> 6 47727539 736 2016-06-21 15:59:49
To speed up computing progress, you can nest the validated dataset by user so that the subsequent location inference can be applied to each user at the same time.
df_nested <- nest_verbose(df_validated, c("created_at", "grid_id"))
#> 🛠 Start nesting...
#> ✅ Finish nesting!
#> ⌛ Nesting time: 0.192 secs
#>
head(df_nested)
#> # A tibble: 6 x 2
#> u_id data
#> <int> <list>
#> 1 92298565 <tibble [1,291 × 2]>
#> 2 33908340 <tibble [1,170 × 2]>
#> 3 11616678 <tibble [938 × 2]>
#> 4 47727539 <tibble [307 × 2]>
#> 5 54875363 <tibble [903 × 2]>
#> 6 40153763 <tibble [1,688 × 2]>
head(df_nested$data[[1]])
#> # A tibble: 6 x 2
#> created_at grid_id
#> <dttm> <int>
#> 1 2016-04-17 22:43:06 1581
#> 2 2014-02-07 07:26:15 1136
#> 3 2012-08-18 19:26:31 1038
#> 4 2014-11-25 19:39:00 1699
#> 5 2014-12-23 01:54:55 1499
#> 6 2015-06-01 19:48:18 740
Add additional needed varialbes derived from the timestamp column. These are often used/needed as intermediate variables in home location algorithms, such as year, month, day, day of the week and hour of the day, etc.
df_enriched <- enrich_timestamp(df_nested, timestamp = "created_at")
#> 🛠 Enriching variables from timestamp...
#>
#> ✅ Finish enriching! New added variables: year, month, day, wday, hour, ymd.
#> ⌛ Enriching time: 0.708 secs
#>
head(df_enriched$data[[1]])
#> # A tibble: 6 x 8
#> created_at grid_id year month day wday hour ymd
#> <dttm> <int> <dbl> <dbl> <int> <dbl> <int> <date>
#> 1 2016-04-17 22:43:06 1581 2016 4 17 1 22 2016-04-17
#> 2 2014-02-07 07:26:15 1136 2014 2 7 6 7 2014-02-07
#> 3 2012-08-18 19:26:31 1038 2012 8 18 7 19 2012-08-18
#> 4 2014-11-25 19:39:00 1699 2014 11 25 3 19 2014-11-25
#> 5 2014-12-23 01:54:55 1499 2014 12 23 3 1 2014-12-23
#> 6 2015-06-01 19:48:18 740 2015 6 1 2 19 2015-06-01
Current available recipes, where HMLC
is the default recipe used in
identify_location
:
HMLC
:- Weighs data points across multiple time frames to ‘score’ potentially meaningful locations for each user
FREQ
- Selects the most frequently ‘visited’ location assuming a user is active mainly around their home location.
OSNA
: Efstathiades et al.2015- Finds the most ‘popular’ location during ‘rest’, ‘active’ and ‘leisure time. Here we focus on ’rest’ and ‘leisure’ time to find the most possible home location for each user.
APDM
: Ahas et al. 2010- Calculates the average and standard deviation of start time data points by a single user, in a single location.
# default recipe: homelocator -- HMLC
identify_location(test_sample, user = "u_id", timestamp = "created_at", location = "grid_id", show_n_loc = 1, recipe = "HMLC")
# recipe: Frequency -- FREQ
identify_location(test_sample, user = "u_id", timestamp = "created_at", location = "grid_id",
show_n_loc = 1, recipe = "FREQ")
# recipe: Online Social Network Activity -- OSNA
identify_location(test_sample, user = "u_id", timestamp = "created_at", location = "grid_id",
show_n_loc = 1, recipe = "OSNA")
# recipe: Online Social Network Activity -- APDM
## APDM recipe strictly returns the most likely home location
## It is important to create your location neighbors table before you use the recipe!!
## example: st_queen <- function(a, b = a) st_relate(a, b, pattern = "F***T****")
## neighbors <- st_queen(df_sf) ===> convert result to dataframe
data("df_neighbors", package = "homelocator")
identify_location(test_sample, user = "u_id", timestamp = "created_at", location = "grid_id",
show_n_loc = 1, recipe = "APDM")