Additional functions

Generating background noise

The gen_bkgnoise() function allows users to generate multivariate Gaussian noise to serve as background data in high-dimensional spaces.

# Example: Generate 4D background noise
bkg_data <- gen_bkgnoise(n = 500, p = 4, 
                         m = c(0, 0, 0, 0), s = c(2, 2, 2, 2))
head(bkg_data)
#> # A tibble: 6 × 4
#>       x1     x2     x3     x4
#>    <dbl>  <dbl>  <dbl>  <dbl>
#> 1 -0.889  0.419  0.632 -0.219
#> 2  3.64  -1.50  -1.76   1.70 
#> 3  0.408 -1.43  -0.511  0.978
#> 4  0.643 -0.191 -2.93   0.326
#> 5  0.271  1.43   2.51   2.83 
#> 6 -3.66   3.31   2.01   1.65

The generated data has independent dimensions with specified means (m) and standard deviations (s).

Randomizing rows

randomize_rows() ensures the rows of the input data is randomized.

randomized_data <- randomize_rows(bkg_data)
head(randomized_data)
#> # A tibble: 6 × 4
#>        x1      x2     x3    x4
#>     <dbl>   <dbl>  <dbl> <dbl>
#> 1 -0.0180  0.0876  0.873  2.80
#> 2 -2.95   -2.82    1.29  -1.15
#> 3 -2.09   -0.593  -1.25   2.25
#> 4  3.35   -2.89   -3.40  -1.12
#> 5 -0.660  -0.817  -0.722 -3.07
#> 6 -0.280   0.109  -0.341  1.88

Relocating clusters

relocate_clusters() allows users to translate clusters in any dimension(s). This is achieved by centering each cluster (subtracting its mean) and then adding a translation vector from a provided matrix (vert_mat).

df <- tibble::tibble(
  x1 = rnorm(12),
  x2 = rnorm(12),
  x3 = rnorm(12),
  x4 = rnorm(12),
  cluster = rep(1:3, each = 4)
)

vert_mat <- matrix(c(
  5, 0, 0, 0,
  0, 5, 0, 0,
  0, 0, 5, 0
), nrow = 3, byrow = TRUE)

relocated_df <- relocate_clusters(df, vert_mat)
head(relocated_df)
#> # A tibble: 6 × 5
#>        x1     x2     x3      x4 cluster
#>     <dbl>  <dbl>  <dbl>   <dbl>   <int>
#> 1  0.291   5.72  -1.80   2.04         2
#> 2  0.0748  2.97   1.58  -0.0442       2
#> 3  5.49    0.693 -1.43  -0.350        1
#> 4  4.68   -1.02   0.504  0.661        1
#> 5 -0.609  -0.286  5.18  -0.641        3
#> 6 -1.09    0.273  5.73  -0.156        3

Generating Rotation Matrices

The gen_rotation() function creates a rotation matrix in high-dimensional space for given planes and angles.


rotations_4d <- list(
  list(plane = c(1, 2), angle = 60),
  list(plane = c(3, 4), angle = 90)
)

rot_mat <- gen_rotation(p = 4, planes_angles = rotations_4d)
rot_mat
#>           [,1]       [,2]         [,3]          [,4]
#> [1,] 0.5000000 -0.8660254 0.000000e+00  0.000000e+00
#> [2,] 0.8660254  0.5000000 0.000000e+00  0.000000e+00
#> [3,] 0.0000000  0.0000000 6.123234e-17 -1.000000e+00
#> [4,] 0.0000000  0.0000000 1.000000e+00  6.123234e-17

Normalize data

When combining clusters or transforming data geometrically, magnitudes can differ drastically. The normalize_data() function rescales the entire dataset to fit within ([-1, 1]) based on its maximum absolute value.

norm_data <- normalize_data(bkg_data)
head(norm_data)
#>            x1          x2          x3          x4
#> 1 -0.11003546  0.05185252  0.07819091 -0.02712044
#> 2  0.45089188 -0.18533577 -0.21766661  0.21049062
#> 3  0.05048517 -0.17688512 -0.06323987  0.12099897
#> 4  0.07962656 -0.02357571 -0.36309332  0.04033840
#> 5  0.03355515  0.17711161  0.31003192  0.35059034
#> 6 -0.45328952  0.40926876  0.24924429  0.20465305

Generating cluster locations

To place clusters in different positions, gen_clustloc() generates points forming a simplex-like arrangement ensuring each cluster center is equidistant from others as much as possible.


centers <- gen_clustloc(p = 4, k = 5)
head(centers)
#>            [,1]       [,2]       [,3]        [,4]       [,5]
#> [1,]  0.7446399 -1.9779807  0.4843451  0.38800876  0.3609869
#> [2,]  1.6147255  0.8596287 -0.5623052 -0.70052032 -1.2115286
#> [3,] -1.1242003  1.2844618  1.0435773 -0.01292967 -1.1909091
#> [4,] -1.2368310  0.7327734 -0.1596049 -0.16166881  0.8253313

Numeric generators

Two helper functions, gen_nproduct() and gen_nsum(), generate numeric vectors of positive integers that approximately satisfy a user-specified target product or sum, respectively.

The function gen_nsum(n, k) divides a total sum n into k positive integers. It first assigns an equal base value to each element and then randomly distributes any remainder, ensuring the elements sum exactly to n.

gen_nsum(n = 100, k = 3)
#> [1] 34 33 33

The function gen_nproduct(n, p) aims to produce p positive integers whose product is approximately n. It starts with all elements equal to the rounded \(p^{th}\) root of n and iteratively adjusts elements up or down in a randomized manner until the product is within a small tolerance of n. This accommodates the fact that exact integer solutions for a given product are often impossible.

gen_nproduct(n = 500, p = 4)
#> [1] 4 5 5 5