| Title: | Functions for Discordant Kinship Modeling |
|---|---|
| Description: | Functions for discordant kinship modeling (and other sibling-based quasi-experimental designs). Contains data restructuring functions and functions for generating biometrically informed data for kin pairs. See [Garrison and Rodgers, 2016 <doi:10.1016/j.intell.2016.08.008>], [Sims, Trattner, and Garrison, 2024 <doi:10.3389/fpsyg.2024.1430978>] for empirical examples, and [Garrison and colleagues for theoretical work <doi:10.1101/2025.08.25.25334395>]. |
| Authors: | S. Mason Garrison [aut, cre, cph] (ORCID: <https://orcid.org/0000-0002-4804-6003>), Jonathan Trattner [aut] (ORCID: <https://orcid.org/0000-0002-1097-7603>), Yoo Ri Hwang [aut], Cermet Ream [ctb] |
| Maintainer: | S. Mason Garrison <[email protected]> |
| License: | GPL-3 |
| Version: | 1.3 |
| Built: | 2026-05-22 06:04:53 UTC |
| Source: | https://github.com/r-computing-lab/discord |
This function checks for common errors in the provided data, including the correct specification of identifiers (ID, sex, race) and their existence in the data.
check_discord_errors(data, id, sex, race, pair_identifiers)check_discord_errors(data, id, sex, race, pair_identifiers)
data |
The data to perform a discord regression on. |
id |
A unique kinship pair identifier. |
sex |
A character string for the sex column name. |
race |
A character string for the race column name. |
pair_identifiers |
A character vector of length two that contains the variable identifier for each kinship pair. |
An error message if one of the conditions are met.
This function determines the order of sibling pairs based on an outcome variable. The function checks which of the two kinship pairs has more of a specified outcome variable. It adds a new column named 'order' to the dataset, indicating which sibling (identified as "s1" or "s2") has more of the outcome. If the two siblings have the same amount of the outcome, it randomly assigns one as having more.
check_sibling_order(..., fast = FALSE)check_sibling_order(..., fast = FALSE)
... |
Additional arguments to be passed to the function. |
fast |
Logical. If TRUE, uses a faster method for data processing. |
A one-row data frame with a new column order indicating which familial member (1, 2, or neither) has more of the outcome.
A data frame that accompanies the regression vignette. It contains data on SES and flu vaccination.
data_flu_sesdata_flu_ses
A data frame.
Kinship pairs and their relatedness, SES, and flu vaccination information.
NLSY/R Lab
A data frame output from the NlsyLinks package that contains data for kinship pairs' height and weight.
data_sampledata_sample
A data frame.
Kinship pairs and their relatedness, height, and weight information.
NLSY/R Lab
Perform a Between-Family Linear Regression within the Discordant Kinship Framework
discord_between_model( data, outcome, predictors, demographics = NULL, id = NULL, sex = "sex", race = "race", pair_identifiers = c("_s1", "_s2"), data_processed = FALSE, coding_method = "none", fast = TRUE )discord_between_model( data, outcome, predictors, demographics = NULL, id = NULL, sex = "sex", race = "race", pair_identifiers = c("_s1", "_s2"), data_processed = FALSE, coding_method = "none", fast = TRUE )
data |
The data set with kinship pairs |
outcome |
A character string containing the outcome variable of interest. |
predictors |
A character vector containing the column names for predicting the outcome. Can be NULL if no predictors are desired. |
demographics |
Indicator variable for if the data has the sex and race demographics. If both are present (default, and recommended), value should be "both". Other options include "sex", "race", or "none". |
id |
Default's to NULL. If supplied, must specify the column name corresponding to unique kinship pair identifiers. |
sex |
A character string for the sex column name. |
race |
A character string for the race column name. |
pair_identifiers |
A character vector of length two that contains the variable identifier for each kinship pair. Default is c("_s1","_s2"). |
data_processed |
Logical operator if data are already preprocessed by discord_data , default is FALSE |
coding_method |
A character string that indicates what kind of additional coding schemes should be used. Default is none. Other options include "binary" and "multi". |
fast |
Logical. If TRUE, uses a faster method for data processing. |
Resulting 'lm' object from performing the between-family regression.
discord_between_model( data = data_sample, outcome = "height", predictors = "weight", pair_identifiers = c("_s1", "_s2"), sex = NULL, race = NULL )discord_between_model( data = data_sample, outcome = "height", predictors = "weight", pair_identifiers = c("_s1", "_s2"), sex = NULL, race = NULL )
Custom Conditions for the discord package
discord_cond(type, msg, class = paste0("discord-", type), call = NULL, ...)discord_cond(type, msg, class = paste0("discord-", type), call = NULL, ...)
type |
One of the following conditions: c("error", "warning", "message") |
msg |
Message |
class |
Default is to prefix the 'type' argument with "discord", but can be more specific to the problem at hand. |
call |
What triggered the condition? |
... |
Additional arguments that can be coerced to character or single condition object. |
A condition for discord.
## Not run: derr <- function(x) discord_cond("error", x) dwarn <- function(x) discord_cond("warning", x) dmess <- function(x) discord_cond("message", x) return_class <- function(func) { tryCatch(func, error = function(cond) class(cond), warning = function(cond) class(cond), message = function(cond) class(cond) ) } return_class(derr("error-class")) return_class(dwarn("warning-class")) return_class(dmess("message-class")) ## End(Not run)## Not run: derr <- function(x) discord_cond("error", x) dwarn <- function(x) discord_cond("warning", x) dmess <- function(x) discord_cond("message", x) return_class <- function(func) { tryCatch(func, error = function(cond) class(cond), warning = function(cond) class(cond), message = function(cond) class(cond) ) } return_class(derr("error-class")) return_class(dwarn("warning-class")) return_class(dmess("message-class")) ## End(Not run)
Restructure Data to Determine Kinship Differences
discord_data( data, outcome, predictors, id = NULL, sex = "sex", race = "race", pair_identifiers = c("_s1", "_s2"), demographics = "both", coding_method = "none", fast = TRUE, ... )discord_data( data, outcome, predictors, id = NULL, sex = "sex", race = "race", pair_identifiers = c("_s1", "_s2"), demographics = "both", coding_method = "none", fast = TRUE, ... )
data |
The data set with kinship pairs |
outcome |
A character string containing the outcome variable of interest. |
predictors |
A character vector containing the column names for predicting the outcome. Can be NULL if no predictors are desired. |
id |
Default's to NULL. If supplied, must specify the column name corresponding to unique kinship pair identifiers. |
sex |
A character string for the sex column name. |
race |
A character string for the race column name. |
pair_identifiers |
A character vector of length two that contains the variable identifier for each kinship pair. Default is c("_s1","_s2"). |
demographics |
Indicator variable for if the data has the sex and race demographics. If both are present (default, and recommended), value should be "both". Other options include "sex", "race", or "none". |
coding_method |
A character string that indicates what kind of additional coding schemes should be used. Default is none. Other options include "binary" and "multi". |
fast |
Logical. If TRUE, uses a faster method for data processing. |
... |
Additional arguments to be passed to the function. |
A data frame that contains analyzable, paired data for performing kinship regressions.
discord_data( data = data_sample, outcome = "height", predictors = "weight", pair_identifiers = c("_s1", "_s2"), sex = NULL, race = NULL, demographics = "none" )discord_data( data = data_sample, outcome = "height", predictors = "weight", pair_identifiers = c("_s1", "_s2"), sex = NULL, race = NULL, demographics = "none" )
Perform a Linear Regression within the Discordant Kinship Framework
discord_regression( data, outcome, predictors, demographics = NULL, id = NULL, sex = "sex", race = "race", pair_identifiers = c("_s1", "_s2"), data_processed = FALSE, coding_method = "none", fast = TRUE ) discord_within_model( data, outcome, predictors, demographics = NULL, id = NULL, sex = "sex", race = "race", pair_identifiers = c("_s1", "_s2"), data_processed = FALSE, coding_method = "none", fast = TRUE )discord_regression( data, outcome, predictors, demographics = NULL, id = NULL, sex = "sex", race = "race", pair_identifiers = c("_s1", "_s2"), data_processed = FALSE, coding_method = "none", fast = TRUE ) discord_within_model( data, outcome, predictors, demographics = NULL, id = NULL, sex = "sex", race = "race", pair_identifiers = c("_s1", "_s2"), data_processed = FALSE, coding_method = "none", fast = TRUE )
data |
The data set with kinship pairs |
outcome |
A character string containing the outcome variable of interest. |
predictors |
A character vector containing the column names for predicting the outcome. Can be NULL if no predictors are desired. |
demographics |
Indicator variable for if the data has the sex and race demographics. If both are present (default, and recommended), value should be "both". Other options include "sex", "race", or "none". |
id |
Default's to NULL. If supplied, must specify the column name corresponding to unique kinship pair identifiers. |
sex |
A character string for the sex column name. |
race |
A character string for the race column name. |
pair_identifiers |
A character vector of length two that contains the variable identifier for each kinship pair. Default is c("_s1","_s2"). |
data_processed |
Logical operator if data are already preprocessed by discord_data , default is FALSE |
coding_method |
A character string that indicates what kind of additional coding schemes should be used. Default is none. Other options include "binary" and "multi". |
fast |
Logical. If TRUE, uses a faster method for data processing. |
Resulting 'lm' object from performing the discordant regression.
discord_regression( data = data_sample, outcome = "height", predictors = "weight", pair_identifiers = c("_s1", "_s2"), sex = NULL, race = NULL )discord_regression( data = data_sample, outcome = "height", predictors = "weight", pair_identifiers = c("_s1", "_s2"), sex = NULL, race = NULL )
Generates paired multivariate data for kinship pairs based on specified ACE (Additive genetic, Common environment, unique Environment) parameters with covariance structure.
kinsim( r_all = c(1, 0.5), c_all = 1, npg_all = 500, npergroup_all = rep(npg_all, length(r_all)), mu_all = 0, variables = 2, mu_list = rep(mu_all, variables), r_vector = NULL, c_vector = NULL, ace_all = c(1, 1, 1), ace_list = matrix(rep(ace_all, variables), byrow = TRUE, nrow = variables), cov_a = 0, cov_c = 0, cov_e = 0, id = NULL, ... )kinsim( r_all = c(1, 0.5), c_all = 1, npg_all = 500, npergroup_all = rep(npg_all, length(r_all)), mu_all = 0, variables = 2, mu_list = rep(mu_all, variables), r_vector = NULL, c_vector = NULL, ace_all = c(1, 1, 1), ace_list = matrix(rep(ace_all, variables), byrow = TRUE, nrow = variables), cov_a = 0, cov_c = 0, cov_e = 0, id = NULL, ... )
r_all |
Numeric vector. Levels of genetic relatedness for each group; default is c(1, 0.5) representing MZ and DZ twins respectively. |
c_all |
Numeric. Default shared variance for common environment; default is 1. |
npg_all |
Integer. Default sample size per group; default is 500. |
npergroup_all |
Numeric vector. Sample sizes by group;
default repeats |
mu_all |
Numeric. Default mean value for all generated variables; default is 0. |
variables |
Integer. Number of variables to generate; default is 2. Currently limited to a maximum of two variables. |
mu_list |
Numeric vector. Means for each variable;
default repeats |
r_vector |
Numeric vector. Alternative specification providing genetic relatedness coefficients for the entire sample; default is NULL. |
c_vector |
Numeric vector. Alternative specification providing shared-environmental relatedness |
ace_all |
Numeric vector. Default variance components in order c(a, c, e) for all variables; default is c(1, 1, 1). |
ace_list |
Matrix. ACE variance components by variable, where each row
represents a variable and columns are a, c, e components;
default repeats |
cov_a |
Numeric. Shared variance for additive genetics between variables; default is 0. |
cov_c |
Numeric. Shared variance for shared-environment between variables; default is 0. |
cov_e |
Numeric. Shared variance for non-shared-environment between variables; default is 0. |
id |
Numeric vector. Optional unique identifiers for each kinship pair; |
... |
Additional arguments passed to other methods. |
This function extends the univariate ACE model to multivariate data, allowing simulation of correlated phenotypes across kinship pairs with different levels of genetic relatedness. It supports simulation of up to two phenotypic variables with specified genetic and environmental covariance structures.
A data frame with the following columns:
genetic component for variable i for kin1
genetic component for variable i for kin2
shared-environmental component for variable i for kin1
shared-environmental component for variable i for kin2
non-shared-environmental component for variable i for kin1
non-shared-environmental component for variable i for kin2
generated variable i for kin1
generated variable i for kin2
level of relatedness for the kin pair
Unique identifier for each kinship pair
# Generate basic multivariate twin data with default parameters twin_data <- kinsim() # Generate data with genetic correlation between variables correlated_data <- kinsim(cov_a = 0.5) # Generate data for different relatedness groups with custom parameters family_data <- kinsim( r_all = c(1, 0.5, 0.25), # MZ twins, DZ twins, and half-siblings npergroup_all = c(100, 100, 150), # Sample sizes per group ace_list = matrix( c( 1.5, 0.5, 1.0, # Variable 1 ACE components 0.8, 1.2, 1.0 ), # Variable 2 ACE components nrow = 2, byrow = TRUE ), cov_a = 0.3, # Genetic covariance cov_c = 0.2 # Shared environment covariance )# Generate basic multivariate twin data with default parameters twin_data <- kinsim() # Generate data with genetic correlation between variables correlated_data <- kinsim(cov_a = 0.5) # Generate data for different relatedness groups with custom parameters family_data <- kinsim( r_all = c(1, 0.5, 0.25), # MZ twins, DZ twins, and half-siblings npergroup_all = c(100, 100, 150), # Sample sizes per group ace_list = matrix( c( 1.5, 0.5, 1.0, # Variable 1 ACE components 0.8, 1.2, 1.0 ), # Variable 2 ACE components nrow = 2, byrow = TRUE ), cov_a = 0.3, # Genetic covariance cov_c = 0.2 # Shared environment covariance )
This function calculates differences and means of a given variable for each kinship pair. The order of subtraction and the variables' names in the output dataframe depend on the order column set by check_sibling_order(). If the demographics parameter is set to "race", "sex", or "both", it also prepares demographic information accordingly, swapping the order of demographics as per the order column.
make_mean_diffs(..., fast = FALSE)make_mean_diffs(..., fast = FALSE)
... |
Additional arguments to be passed to the function. |
fast |
Logical. If TRUE, uses a faster method for data processing. |