README.Rmd

---
output: github_document
---

<!-- README.md is generated from README.Rmd. Please edit that file -->

```{r, include = FALSE}
knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>",
  fig.path = "man/figures/README-",
  out.width = "100%"
)
```

# gdim

<!-- badges: start -->
[![R-CMD-check](https://github.com/RoheLab/gdim/actions/workflows/R-CMD-check.yaml/badge.svg)](https://github.com/RoheLab/gdim/actions/workflows/R-CMD-check.yaml)
[![Codecov test coverage](https://codecov.io/gh/RoheLab/gdim/branch/main/graph/badge.svg)](https://app.codecov.io/gh/RoheLab/gdim?branch=main)
[![CRAN status](https://www.r-pkg.org/badges/version/gdim)](https://CRAN.R-project.org/package=gdim)
<!-- badges: end -->

`gdim` estimates graph dimension using cross-validated eigenvalues, via the graph-splitting technique developed in <https://arxiv.org/abs/2108.03336>. Theoretically, the method works by computing a special type of cross-validated eigenvalue which follows a simple central limit theorem. This allows users to perform hypothesis tests on the rank of the graph.

## Installation

You can install `gdim` from CRAN with:

``` r
install.packages("gdim")

# to get the development version from GitHub:
install.packages("pak")
pak::pak("RoheLab/gdim")
```

## Example

`eigcv()` is the main function in `gdim`. The single required parameter for the function is the maximum possible dimension, `k_max`. 

In the following example, we generate a random graph from the stochastic block model (SBM) with 1000 nodes and 5 blocks (as such, we would expect the estimated graph dimension to be 5). 

```{r}
library(fastRG)

B <- matrix(0.1, 5, 5)
diag(B) <- 0.3

model <- sbm(
  n = 1000,
  k = 5,
  B = B,
  expected_degree = 40,
  poisson_edges = FALSE,
  allow_self_loops = FALSE
)

A <- sample_sparse(model)
```
Here, `A` is the adjacency matrix.

Now, we call the `eigcv()` function with `k_max=10` to estimate graph dimension. 
```{r eigcv}
library(gdim)

eigcv_result <- eigcv(A, k_max = 10)
eigcv_result
```
In this example, `eigcv()` suggests `k=5`. 

To visualize the result, use `plot()` which returns a `ggplot` object. The function displays the test statistic (z score) for each hypothesized graph dimension.

```{r}
plot(eigcv_result)
```

## Reference

Chen, Fan, Sebastien Roch, Karl Rohe, and Shuqi Yu. “Estimating Graph Dimension with Cross-Validated Eigenvalues.” ArXiv:2108.03336 [Cs, Math, Stat], August 6, 2021. https://arxiv.org/abs/2108.03336.