-
Notifications
You must be signed in to change notification settings - Fork 4
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Create furness_balancing.R #36
Open
mem48
wants to merge
2
commits into
main
Choose a base branch
from
furness
base: main
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Open
Changes from 1 commit
Commits
Show all changes
2 commits
Select commit
Hold shift + click to select a range
File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,304 @@ | ||
# Furness balancing functions | ||
|
||
#' Furness Balance a matrix | ||
#' | ||
#' Furness balancing fills a matrix when only the row and column sums are known. | ||
#' | ||
#' @param mat matrix of numeric values e.g. `matrix(rep(1, 6), nrow = 3)` | ||
#' @param rsum vector of desired row sums without NAs | ||
#' @param csum vector of desired column sums without NAs | ||
#' @param n number of iterations of balancing | ||
#' @param check logical (default TRUE) check if output is valid | ||
#' @param int_only logical (default FALSE) only return whole numbers | ||
#' @param quiet logical if FALSE extra messages | ||
#' @return A vector of rounded numeric | ||
#' @export | ||
#' @example | ||
#' mat = matrix(rep(1, 6), nrow = 3) | ||
#' rownames(mat) = c("car","bike","other") | ||
#' colnames(mat) = c("company","private") | ||
#' rsum = c(32,15,10) | ||
#' csum = c(4, 53) | ||
#' furness_partial(mat, rsum, csum) | ||
furness_balance <- function(mat, rsum, csum, n = 100, check = TRUE, int_only = FALSE, quiet = TRUE){ | ||
|
||
rname <- rownames(mat) | ||
cname <- colnames(mat) | ||
|
||
# Get scale about right | ||
mat <- mat / (sum(mat, na.rm = TRUE) / sum(rsum)) | ||
#mat_orig = mat | ||
|
||
for(i in seq_len(n)){ | ||
mat <- bal_func(mat, rsum = rsum, csum = csum, int_only = int_only) | ||
if(i == 1 & quiet){ | ||
message("First pass") | ||
print(summary(rowSums(mat, na.rm = TRUE) - rsum)) | ||
} | ||
if(i == n & quiet){ | ||
message("Last pass") | ||
print(summary(rowSums(mat, na.rm = TRUE) - rsum)) | ||
} | ||
} | ||
|
||
|
||
rownames(mat) <- rname | ||
colnames(mat) <- cname | ||
|
||
# Check | ||
if(check){ | ||
if(!all(rowSums(mat_fin) == rsum)){ | ||
print("\n") | ||
print(mat) | ||
print(rsum) | ||
print(csum) | ||
stop("Rows don't match ",i) | ||
} | ||
if(!all(colSums(mat_fin) == csum)){ | ||
print("\n") | ||
print(mat) | ||
print(rsum) | ||
print(csum) | ||
stop("Cols don't match ",i) | ||
} | ||
} | ||
|
||
return(mat) | ||
} | ||
|
||
|
||
|
||
|
||
#' Furness Balance a partial matrix | ||
#' | ||
#' Furness balance a matrix when some values are missing, a version of a | ||
#' version of Furness balancing when the matrix values are partially known | ||
#' | ||
#' @param mat matrix of numeric values can contain NAs | ||
#' @param rsum vector of row sums without NAs | ||
#' @param csum vector of column sums without NAs | ||
#' @param n number of iterations of balancing | ||
#' @param check logical (default TRUE) check if output is valid | ||
#' @param int_only logical (default TRUE) only return whole numbers | ||
#' @return A vector of rounded numeric | ||
#' @export | ||
#' @example | ||
#' mat = matrix(c(NA,0,NA,32,NA,NA), nrow = 3) | ||
#' rownames(mat) = c("car","bike","other") | ||
#' colnames(mat) = c("company","private") | ||
#' rsum = c(32,15,10) | ||
#' csum = c(4, 53) | ||
#' furness_partial(mat, rsum, csum) | ||
furness_partial <- function(mat, rsum, csum, n = 100, check = TRUE, int_only = TRUE){ | ||
|
||
rname <- rownames(mat) | ||
cname <- colnames(mat) | ||
|
||
mat_change <- is.na(mat) | ||
mat_change <- ifelse(mat_change, 1, NA) | ||
|
||
rsum_change = rsum - rowSums(mat, na.rm = TRUE) | ||
csum_change = csum - colSums(mat, na.rm = TRUE) | ||
|
||
# Get scale about right | ||
mat_change <- mat_change / (sum(mat_change, na.rm = TRUE) / sum(rsum_change)) | ||
mat_change_orig <- mat_change | ||
i <- 1 | ||
while(i < n){ | ||
i <- i + 1 | ||
mat_old <- mat_change | ||
mat_change <- bal_func(mat2 = mat_change, | ||
rsum2 = rsum_change, | ||
csum2 = csum_change, | ||
int_only = int_only) | ||
# Are we stuck in a loop? | ||
if(identical(mat_old, mat_change)){ | ||
mat_change <- bal_func(mat2 = mat_change_orig, | ||
rsum2 = rsum_change, | ||
csum2 = csum_change, | ||
int_only = int_only) | ||
#message("Stuck in loop") | ||
} | ||
#print(mat_change) | ||
|
||
# Check | ||
if(all(rowSums(mat_change, na.rm = TRUE) == rsum_change)){ | ||
if(all(colSums(mat_change, na.rm = TRUE) == csum_change)){ | ||
break | ||
} | ||
} | ||
|
||
} | ||
mat_change <- round(mat_change) | ||
|
||
mat_fin = ifelse(is.na(mat), mat_change, mat) | ||
|
||
|
||
# Check | ||
if(check){ | ||
if(!all(rowSums(mat_fin) == rsum)){ | ||
print("\n") | ||
print(mat) | ||
print(rsum) | ||
print(csum) | ||
stop("Rows don't match ",i) | ||
} | ||
if(!all(colSums(mat_fin) == csum)){ | ||
print("\n") | ||
print(mat) | ||
print(rsum) | ||
print(csum) | ||
stop("Cols don't match ",i) | ||
} | ||
} | ||
|
||
|
||
return(mat_fin) | ||
} | ||
|
||
|
||
|
||
|
||
|
||
#' Fill an incomplete matrix when some totals are unknown | ||
#' | ||
#' Fill a matrix from row and column totals with some missing values. Sometimes | ||
#' you can have a matrix with some missing values (e.g. small values suppressed | ||
#' for privacy), this function will infill missing NA values in a matrix based | ||
#' on row and column sums that can also contain NA values and a total value that | ||
#' can't be NA. | ||
#' | ||
#' | ||
#' @param mat matrix of numeric values with some NAs | ||
#' @param rsum vector of row sums (can contain NAs) | ||
#' @param csum vector of column sums (can contain NAs) | ||
#' @param tt desired sum of matrix | ||
#' @return A filled in matrix | ||
#' @export | ||
#' @example | ||
#' mat = matrix(c(0,0,NA,32,15,6), nrow = 3) | ||
#' rownames(mat) = c("car","bike","other") | ||
#' colnames(mat) = c("company","private") | ||
#' rsum = c(32,15,10) | ||
#' csum = c(NA, 53) | ||
#' tt = 57 | ||
#' matrix_fill_incomplete(mat, rsum,csum, 57) | ||
|
||
matrix_fill_incomplete <- function(mat, rsum, csum, tt){ | ||
|
||
rname <- rownames(mat) | ||
cname <- colnames(mat) | ||
|
||
# Generate combinations of number that sum to total | ||
n_gaps = sum(is.na(mat)) | ||
combinations = generate_combinations(tt - sum(mat, na.rm = TRUE), n_gaps) | ||
|
||
combinations_mat <- list() | ||
na_indices <- which(is.na(mat)) | ||
for(i in seq(1, length(combinations))){ | ||
mat_sub <- mat | ||
mat_sub[na_indices] <- combinations[[i]] | ||
|
||
rsum_mat <- rowSums(mat_sub) | ||
csum_mat <- colSums(mat_sub) | ||
|
||
if(!all(rsum_mat == rsum, na.rm = TRUE)){ | ||
mat_sub <- NULL | ||
} | ||
if(!all(csum_mat == csum, na.rm = TRUE)){ | ||
mat_sub <- NULL | ||
} | ||
combinations_mat[[i]] <- mat_sub | ||
} | ||
|
||
combinations_mat <- combinations_mat[lengths(combinations_mat) > 0] | ||
mat_fin <- combinations_mat[[sample(seq_along(combinations_mat), 1)]] | ||
|
||
return(mat_fin) | ||
} | ||
|
||
#' Generate combinations of number that sum to totals | ||
#' | ||
#' Internal function that returns rounded numbers sometimes rounded up sometimes | ||
#' rounded down | ||
#' | ||
#' @param t numbers to split | ||
#' @param n numeric of number of splits required | ||
#' @param prefix numeric() | ||
#' @return list of combinations | ||
#' @examples | ||
#' generate_combinations(5,3) | ||
#' @noRd | ||
generate_combinations <- function(t, n, prefix = numeric()) { | ||
result <- list() | ||
if (n == 1) { | ||
if (t > 0) { | ||
result <- list(c(prefix, t)) | ||
} | ||
} else { | ||
if (n < 1 || t <= 0) return(list()) | ||
for (i in t:1) { | ||
temp <- generate_combinations(t - i, n - 1, c(prefix, i)) | ||
result <- c(result, temp) | ||
} | ||
} | ||
return(result) | ||
} | ||
|
||
|
||
|
||
|
||
|
||
#' Round numbers randomly up or down | ||
#' | ||
#' Internal function that returns rounded numbers sometimes rounded up sometimes | ||
#' rounded down | ||
#' | ||
#' @param x Vector of numeric | ||
#' @return A vector of rounded numeric | ||
#' @noRd | ||
|
||
round_half_random <- function(x) { | ||
tweaks <- runif(length(x), min = -0.5, max = 0.5) | ||
# Never tweak to 0 | ||
tweaks <- ifelse(x < 1, 0, tweaks) | ||
round(x + tweaks) | ||
} | ||
|
||
#' Internal balancing function | ||
#' | ||
#' | ||
#' | ||
#' @param mat2 matrix of numeric | ||
#' @param rsum2 vector of row sums | ||
#' @param csum2 vector of column sums | ||
#' @param int_only logical (default FALSE) only return whole numbers | ||
#' | ||
#' @return a matrix | ||
#' @noRd | ||
bal_func <- function(mat2, rsum2, csum2, int_only = FALSE){ | ||
# Find ratio of rows | ||
mat_rsum <- rowSums(mat2, na.rm = TRUE) | ||
mat_rratio <- rsum2 / mat_rsum | ||
mat_rratio[is.nan(mat_rratio)] <- 0 | ||
|
||
mat2 <- mat2 * mat_rratio | ||
|
||
if(int_only){ | ||
mat2 <- round_half_random(mat2) | ||
} | ||
|
||
# Find ratio of columns | ||
mat_csum <- colSums(mat2, na.rm = TRUE) | ||
mat_cratio <- csum2 / mat_csum | ||
mat_cratio[is.nan(mat_cratio)] <- 0 | ||
|
||
mat2 <- sweep(mat2, MARGIN=2, mat_cratio, `*`) | ||
mat2[is.nan(mat2)] <- 0 | ||
|
||
if(int_only){ | ||
mat2 <- round_half_random(mat2) | ||
} | ||
|
||
return(mat2) | ||
} |
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great to see an example in the package's style. Clearly works but I think we should have an example that uses real example data. Could example datasets in this package work, or do we need to create more packaged data, e.g. starting with NTEM? Having done some reading it seems to me that Furness balancing is more about updating SIM estimates over time, which would be v. handy and links nicely with the spanishoddata package.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Don't know about over time, all of these functions are for filling in matrices where you know row/column totals but not internal values. So classic example is knowing the number of pupils at schools and the number of children in each LSOA, but you don't have OD data. You can prime the matrix with values from a distance decay curve and then run the furness balance and you should get a reasonable OD matrix.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Doesn't the function assume OD data in matrix form, with rows representing origins and columns representing destinations?
Fine if so, although will not work on large sparse OD datasets, e.g. all schools to all LSOAs which would be 10 * 30 = 300,000,000 cells.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes all these functions are matrix based
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, need to chat this over. Next time both in office could be good.