add_bipartite_summaries
this function takes an interaction_model and an interaction_model, where row_universe and column_universe have two additional columns $coreness and $component_label. importantly, strange behavior if im is a symmetric graph. this treats each row index and each column index as nodes in a graph. So, if something appears in both rows and columns (e.g. as in symmetric graph), then this will be ignored.
Best Feature Function (bff)
Given a pc object from pca, or a rotated version, we wish to interpret the individual dimensions. Often, each unit/row (or context/column) of the original interaction_model will have a some sort of text description. For example, if each row is an R package, we have the package title and description. Convert these text descriptions into an interaction_model where the units (i.e. the variable before the *) matches either the units or context for the pc.
core
Given and interaction_model, this returns a new interaction_model that is the "k-core" of the "largest connected component" of the original interaction_model. This function is recommended when diagnose(im) shows that the majority of rows/columns have 1, 2, or 3 connections. In this case, the data is potentially too sparse for pca. If you simply throwing away the rows/columns that are weakly connected, then you will reduce the connections of those that remain. The k-core is what you get if you keep on iterating. In particular, it will find the largest subset of rows and columns from the interaction_model such that every row and column has at least core_threshold number of connections or "data points" in interaction_tibble. This is exactly the k-core if the row and columns correspond to unique elements (non-overlapping). If the elements in the rows match some elements in the columns, then those elements are represented twice... once for the row and once for the column. It is possible that only one of those is retained.
#' core
#' Given an interaction_model, this will return another interaction_model that corresponds to the "k-core" of the input. This function is recommended when diagnose(im) shows that the majority of rows/columns have 1, 2, or 3 connections. In this case, the data is potentially too sparse for pca. If you simply throwing away the rows/columns that are weakly connected, then you will reduce the connections of those that remain. The k-core is what you get if you keep on iterating. In particular, it will find the largest subset of rows and columns from the interaction_model such that every row and column has at least core_threshold number of connections or "data points" in interaction_tibble. This is exactly the k-core if the row and columns correspond to unique elements (non-overlapping). If the elements in the rows match some elements in the columns, then those elements are represented twice... once for the row and once for the column. It is possible that only one of those is retained.
#'
#' @param im_input
#' @param core_threshold
#'
#' @return
#' @export
#'
#' @import dplyr
#'
#' @examples
core = function(im_input, core_threshold)
subset_im(im_input)
im_input
if(!"coreness" %in% colnames(im_input$row_universe))
print("adding graph summaries (coreness and connected components).")
im = add_bipartite_summaries(im)