These are helper functions included in the package.
The gen_bkgnoise() function allows users to generate
multivariate Gaussian noise to serve as background data in
high-dimensional spaces.
# Example: Generate 4D background noise
bkg_data <- gen_bkgnoise(n = 500, p = 4,
m = c(0, 0, 0, 0), s = c(2, 2, 2, 2))
head(bkg_data)
#> # A tibble: 6 × 4
#> x1 x2 x3 x4
#> <dbl> <dbl> <dbl> <dbl>
#> 1 -0.889 0.419 0.632 -0.219
#> 2 3.64 -1.50 -1.76 1.70
#> 3 0.408 -1.43 -0.511 0.978
#> 4 0.643 -0.191 -2.93 0.326
#> 5 0.271 1.43 2.51 2.83
#> 6 -3.66 3.31 2.01 1.65The generated data has independent dimensions with specified means
(m) and standard deviations (s).
randomize_rows() ensures the rows of the input data is
randomized.
randomized_data <- randomize_rows(bkg_data)
head(randomized_data)
#> # A tibble: 6 × 4
#> x1 x2 x3 x4
#> <dbl> <dbl> <dbl> <dbl>
#> 1 -0.0180 0.0876 0.873 2.80
#> 2 -2.95 -2.82 1.29 -1.15
#> 3 -2.09 -0.593 -1.25 2.25
#> 4 3.35 -2.89 -3.40 -1.12
#> 5 -0.660 -0.817 -0.722 -3.07
#> 6 -0.280 0.109 -0.341 1.88relocate_clusters() allows users to translate clusters
in any dimension(s). This is achieved by centering each cluster
(subtracting its mean) and then adding a translation vector from a
provided matrix (vert_mat).
df <- tibble::tibble(
x1 = rnorm(12),
x2 = rnorm(12),
x3 = rnorm(12),
x4 = rnorm(12),
cluster = rep(1:3, each = 4)
)
vert_mat <- matrix(c(
5, 0, 0, 0,
0, 5, 0, 0,
0, 0, 5, 0
), nrow = 3, byrow = TRUE)
relocated_df <- relocate_clusters(df, vert_mat)
head(relocated_df)
#> # A tibble: 6 × 5
#> x1 x2 x3 x4 cluster
#> <dbl> <dbl> <dbl> <dbl> <int>
#> 1 0.291 5.72 -1.80 2.04 2
#> 2 0.0748 2.97 1.58 -0.0442 2
#> 3 5.49 0.693 -1.43 -0.350 1
#> 4 4.68 -1.02 0.504 0.661 1
#> 5 -0.609 -0.286 5.18 -0.641 3
#> 6 -1.09 0.273 5.73 -0.156 3The gen_rotation() function creates a rotation matrix in
high-dimensional space for given planes and angles.
rotations_4d <- list(
list(plane = c(1, 2), angle = 60),
list(plane = c(3, 4), angle = 90)
)
rot_mat <- gen_rotation(p = 4, planes_angles = rotations_4d)
rot_mat
#> [,1] [,2] [,3] [,4]
#> [1,] 0.5000000 -0.8660254 0.000000e+00 0.000000e+00
#> [2,] 0.8660254 0.5000000 0.000000e+00 0.000000e+00
#> [3,] 0.0000000 0.0000000 6.123234e-17 -1.000000e+00
#> [4,] 0.0000000 0.0000000 1.000000e+00 6.123234e-17When combining clusters or transforming data geometrically,
magnitudes can differ drastically. The normalize_data()
function rescales the entire dataset to fit within ([-1, 1]) based on
its maximum absolute value.
norm_data <- normalize_data(bkg_data)
head(norm_data)
#> x1 x2 x3 x4
#> 1 -0.11003546 0.05185252 0.07819091 -0.02712044
#> 2 0.45089188 -0.18533577 -0.21766661 0.21049062
#> 3 0.05048517 -0.17688512 -0.06323987 0.12099897
#> 4 0.07962656 -0.02357571 -0.36309332 0.04033840
#> 5 0.03355515 0.17711161 0.31003192 0.35059034
#> 6 -0.45328952 0.40926876 0.24924429 0.20465305To place clusters in different positions, gen_clustloc()
generates points forming a simplex-like arrangement
ensuring each cluster center is equidistant from others as much as
possible.
centers <- gen_clustloc(p = 4, k = 5)
head(centers)
#> [,1] [,2] [,3] [,4] [,5]
#> [1,] 0.7446399 -1.9779807 0.4843451 0.38800876 0.3609869
#> [2,] 1.6147255 0.8596287 -0.5623052 -0.70052032 -1.2115286
#> [3,] -1.1242003 1.2844618 1.0435773 -0.01292967 -1.1909091
#> [4,] -1.2368310 0.7327734 -0.1596049 -0.16166881 0.8253313Two helper functions, gen_nproduct() and
gen_nsum(), generate numeric vectors of positive integers
that approximately satisfy a user-specified target product or sum,
respectively.
The function gen_nsum(n, k) divides a total sum
n into k positive integers. It first assigns
an equal base value to each element and then randomly distributes any
remainder, ensuring the elements sum exactly to n.
The function gen_nproduct(n, p) aims to produce
p positive integers whose product is approximately
n. It starts with all elements equal to the rounded \(p^{th}\) root of n and
iteratively adjusts elements up or down in a randomized manner until the
product is within a small tolerance of n. This accommodates
the fact that exact integer solutions for a given product are often
impossible.