TECHNIQUES TO PRODUCE AND EVALUATE REALISTIC MULTIVARIATE SYNTHETIC DATA

Techniques to produce and evaluate realistic multivariate synthetic data

Techniques to produce and evaluate realistic multivariate synthetic data

Blog Article

Abstract Data modeling requires a sufficient sample size for reproducibility.A small sample size can inhibit model evaluation.A synthetic data generation technique addressing this small sample size problem is evaluated: from the space of arbitrarily distributed samples, a subgroup (class) has a latent multivariate normal characteristic; synthetic data can lolasalinas.com be generated from this class with univariate kernel density estimation (KDE); and synthetic samples are statistically like their respective samples.

Three samples (n = 667) were investigated with 10 input variables (X).KDE was used to augment the sample size in X.Maps produced univariate normal variables in Y.

Principal component analysis in Y produced uncorrelated variables in T, where the probability density functions were approximated as normal and characterized; synthetic data was generated with normally distributed univariate random variables in T.Reversing each step produced synthetic data in Y and X.All samples were approximately multivariate normal in Y, permitting the generation of synthetic data.

Probability density function and covariance comparisons showed similarity between samples and synthetic samples.A class of samples has a latent normal characteristic.For such here samples, this approach offers a solution to the small sample size problem.

Further studies are required to understand this latent class.

Report this page