Daniel Sousa
Adv. Artif. Intell. Mach. Learn., 1 (3):203-220
Daniel Sousa : Department of Geography, San Diego State University, San Diego CA 92182 USA.
DOI: 10.54364/AAIML.2021.1113
Article History: Received on: 09-Dec-21, Accepted on: 18-Dec-21, Published on: 25-Dec-21
Corresponding Author: Daniel Sousa
Email: DAN.SOUSA@SDSU.EDU
Citation: Daniel Sousa and Christopher Small (2021). Joint Characterization of Multiscale Information in High Dimensional Data. Adv. Artif. Intell. Mach. Learn., 1 (3 ):203-220
High dimensional feature spaces can contain information on multiple scales. At global scales, spanning an entire feature space, covariance structure among dimensions can determine topology and intrinsic dimensionality. In addition, local scale information can be captured by the structure of low-dimensional manifolds embedded within the high-dimensional feature space. Such manifolds may not easily be resolved by the global covariance structure. Analysis tools that preferentially operate at one scale can be ineffective at capturing all the information present in cross-scale complexity. We propose a multiscale joint characterization approach designed to exploit synergies between global and local approaches to dimensionality reduction. We illustrate this approach using Principal Components Analysis (PCA) to characterize global variance structure and t-distributed Stochastic Neighbor Embedding (t-SNE) to characterize local manifold structure, also comparing against a second approach for characterization of local manifold structure, Laplacian Eigenmaps (LE). Using both low dimensional synthetic images and high dimensional imaging spectroscopy data, we show that joint characterization is capable of detecting and isolating signals which are not evident from either algorithm alone. Broadly, t-SNE is effective at rendering a randomly oriented low-dimensional map of local manifolds (clustering), and PCA renders this map interpretable by providing global, physically meaningful structure. LE provides additional useful context by reinforcing and refining the feature space topology found by PCA, simplifying structural interpretation, clarifying endmember identification and highlighting new potential endmembers which are not evident from other methods alone. This approach is illustrated using hyperspectral imagery of agriculture resolving crop-specific, field scale, differencesin vegetation reflectance. The fundamental premise of joint characterization could easily be extended to other high dimensional datasets, including image time series and nonimage data. The approach may prove particularly useful for other geospatial data since both robust manifold structure (due to spatial autocorrelation) and physically interpretable global variance structure (due to physical generative processes) are frequently present.