Skip to content

zezhishao/DailyArXiv

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Daily Papers

The project automatically fetches the latest papers from arXiv based on keywords.

The subheadings in the README file represent the search keywords.

Only the most recent articles for each keyword are retained, up to a maximum of 100 papers.

You can click the 'Watch' button to receive daily email notifications.

Last update: 2024-11-28

Time Series

Title Date Abstract Comment
CatNet: Effective FDR Control in LSTM with Gaussian Mirrors and SHAP Feature Importance 2024-11-26
Show

We introduce CatNet, an algorithm that effectively controls False Discovery Rate (FDR) and selects significant features in LSTM with the Gaussian Mirror (GM) method. To evaluate the feature importance of LSTM in time series, we introduce a vector of the derivative of the SHapley Additive exPlanations (SHAP) to measure feature importance. We also propose a new kernel-based dependence measure to avoid multicollinearity in the GM algorithm, to make a robust feature selection with controlled FDR. We use simulated data to evaluate CatNet's performance in both linear models and LSTM models with different link functions. The algorithm effectively controls the FDR while maintaining a high statistical power in all cases. We also evaluate the algorithm's performance in different low-dimensional and high-dimensional cases, demonstrating its robustness in various input dimensions. To evaluate CatNet's performance in real world applications, we construct a multi-factor investment portfolio to forecast the prices of S&P 500 index components. The results demonstrate that our model achieves superior predictive accuracy compared to traditional LSTM models without feature selection and FDR control. Additionally, CatNet effectively captures common market-driving features, which helps informed decision-making in financial markets by enhancing the interpretability of predictions. Our study integrates of the Gaussian Mirror algorithm with LSTM models for the first time, and introduces SHAP values as a new feature importance metric for FDR control methods, marking a significant advancement in feature selection and error control for neural networks.

Evolving Markov Chains: Unsupervised Mode Discovery and Recognition from Data Streams 2024-11-26
Show

Markov chains are simple yet powerful mathematical structures to model temporally dependent processes. They generally assume stationary data, i.e., fixed transition probabilities between observations/states. However, live, real-world processes, like in the context of activity tracking, biological time series, or industrial monitoring, often switch behavior over time. Such behavior switches can be modeled as transitions between higher-level \emph{modes} (e.g., running, walking, etc.). Yet all modes are usually not previously known, often exhibit vastly differing transition probabilities, and can switch unpredictably. Thus, to track behavior changes of live, real-world processes, this study proposes an online and efficient method to construct Evolving Markov chains (EMCs). EMCs adaptively track transition probabilities, automatically discover modes, and detect mode switches in an online manner. In contrast to previous work, EMCs are of arbitrary order, the proposed update scheme does not rely on tracking windows, only updates the relevant region of the probability tensor, and enjoys geometric convergence of the expected estimates. Our evaluation of synthetic data and real-world applications on human activity recognition, electric motor condition monitoring, and eye-state recognition from electroencephalography (EEG) measurements illustrates the versatility of the approach and points to the potential of EMCs to efficiently track, model, and understand live, real-world processes.

20 pages, 8 figures
Time-Series Forecasting in Smart Manufacturing Systems: An Experimental Evaluation of the State-of-the-art Algorithms 2024-11-26
Show

TSF is growing in various domains including manufacturing. Although numerous TSF algorithms have been developed recently, the validation and evaluation of algorithms hold substantial value for researchers and practitioners and are missing. This study aims to fill this gap by evaluating the SoTA TSF algorithms on thirteen manufacturing datasets, focusing on their applicability in manufacturing. Each algorithm was selected based on its TSF category to ensure a representative set of algorithms. The evaluation includes different scenarios to evaluate the models using two problem categories and two forecasting horizons. To evaluate the performance, the WAPE was calculated, and additional post hoc analyses were conducted to assess the significance of observed differences. Only algorithms with codes from open-source libraries were utilized, and no hyperparameter tuning was done. This allowed us to evaluate the algorithms as "out-of-the-box" solutions that can be easily implemented, ensuring their usability within the manufacturing by practitioners with limited technical knowledge. This aligns to facilitate the adoption of these techniques in smart manufacturing systems. Based on the results, transformer and MLP-based architectures demonstrated the best performance with MLP-based architecture winning the most scenarios. For univariate TSF, PatchTST emerged as the most robust, particularly for long-term horizons, while for multivariate problems, MLP-based architectures like N-HITS and TiDE showed superior results. The study revealed that simpler algorithms like XGBoost could outperform complex algorithms in certain tasks. These findings challenge the assumption that more sophisticated models produce better results. Additionally, the research highlighted the importance of computational resource considerations, showing variations in runtime and memory usage across different algorithms.

From RNNs to Foundation Models: An Empirical Study on Commercial Building Energy Consumption 2024-11-26
Show

Accurate short-term energy consumption forecasting for commercial buildings is crucial for smart grid operations. While smart meters and deep learning models enable forecasting using past data from multiple buildings, data heterogeneity from diverse buildings can reduce model performance. The impact of increasing dataset heterogeneity in time series forecasting, while keeping size and model constant, is understudied. We tackle this issue using the ComStock dataset, which provides synthetic energy consumption data for U.S. commercial buildings. Two curated subsets, identical in size and region but differing in building type diversity, are used to assess the performance of various time series forecasting models, including fine-tuned open-source foundation models (FMs). The results show that dataset heterogeneity and model architecture have a greater impact on post-training forecasting performance than the parameter count. Moreover, despite the higher computational cost, fine-tuned FMs demonstrate competitive performance compared to base models trained from scratch.

NeurI...

NeurIPS 2024 Workshop on Time Series in the Age of Large Models

Unveiling the Secrets: How Masking Strategies Shape Time Series Imputation 2024-11-26
Show

Time series imputation is a critical challenge in data mining, particularly in domains like healthcare and environmental monitoring, where missing data can compromise analytical outcomes. This study investigates the influence of diverse masking strategies, normalization timing, and missingness patterns on the performance of eleven state-of-the-art imputation models across three diverse datasets. Specifically, we evaluate the effects of pre-masking versus in-mini-batch masking, augmentation versus overlaying of artificial missingness, and pre-normalization versus post-normalization. Our findings reveal that masking strategies profoundly affect imputation accuracy, with dynamic masking providing robust augmentation benefits and overlay masking better simulating real-world missingness patterns. Sophisticated models, such as CSDI, exhibited sensitivity to preprocessing configurations, while simpler models like BRITS delivered consistent and efficient performance. We highlight the importance of aligning preprocessing pipelines and masking strategies with dataset characteristics to improve robustness under diverse conditions, including high missing rates. This study provides actionable insights for designing imputation pipelines and underscores the need for transparent and comprehensive experimental designs.

MFF-FTNet: Multi-scale Feature Fusion across Frequency and Temporal Domains for Time Series Forecasting 2024-11-26
Show

Time series forecasting is crucial in many fields, yet current deep learning models struggle with noise, data sparsity, and capturing complex multi-scale patterns. This paper presents MFF-FTNet, a novel framework addressing these challenges by combining contrastive learning with multi-scale feature extraction across both frequency and time domains. MFF-FTNet introduces an adaptive noise augmentation strategy that adjusts scaling and shifting factors based on the statistical properties of the original time series data, enhancing model resilience to noise. The architecture is built around two complementary modules: a Frequency-Aware Contrastive Module (FACM) that refines spectral representations through frequency selection and contrastive learning, and a Complementary Time Domain Contrastive Module (CTCM) that captures both short- and long-term dependencies using multi-scale convolutions and feature fusion. A unified feature representation strategy enables robust contrastive learning across domains, creating an enriched framework for accurate forecasting. Extensive experiments on five real-world datasets demonstrate that MFF-FTNet significantly outperforms state-of-the-art models, achieving a 7.7% MSE improvement on multivariate tasks. These findings underscore MFF-FTNet's effectiveness in modeling complex temporal patterns and managing noise and sparsity, providing a comprehensive solution for both long- and short-term forecasting.

Confidence surfaces for the mean of locally stationary functional time series 2024-11-26
Show

The problem of constructing a simultaneous confidence surface for the 2-dimensional mean function of a non-stationary functional time series is challenging as these bands can not be built on classical limit theory for the maximum absolute deviation between an estimate and the time-dependent regression function. In this paper, we propose a new bootstrap methodology to construct such a region. Our approach is based on a Gaussian approximation for the maximum norm of sparse high-dimensional vectors approximating the maximum absolute deviation which is suitable for nonparametric inference of high-dimensional time series. The elimination of the zero entries produces (besides the time dependence) additional dependencies such that the "classical" multiplier bootstrap is not applicable. To solve this issue we develop a novel multiplier bootstrap, where blocks of the coordinates of the vectors are multiplied with random variables, which mimic the specific structure between the vectors appearing in the Gaussian approximation. We prove the validity of our approach by asymptotic theory, demonstrate good finite sample properties by means of a simulation study and illustrate its applicability by analyzing a data example.

Disentangled Interpretable Representation for Efficient Long-term Time Series Forecasting 2024-11-26
Show

Industry 5.0 introduces new challenges for Long-term Time Series Forecasting (LTSF), characterized by high-dimensional, high-resolution data and high-stakes application scenarios. Against this backdrop, developing efficient and interpretable models for LTSF becomes a key challenge. Existing deep learning and linear models often suffer from excessive parameter complexity and lack intuitive interpretability. To address these issues, we propose DiPE-Linear, a Disentangled interpretable Parameter-Efficient Linear network. DiPE-Linear incorporates three temporal components: Static Frequential Attention (SFA), Static Temporal Attention (STA), and Independent Frequential Mapping (IFM). These components alternate between learning in the frequency and time domains to achieve disentangled interpretability. The decomposed model structure reduces parameter complexity from quadratic in fully connected networks (FCs) to linear and computational complexity from quadratic to log-linear. Additionally, a Low-Rank Weight Sharing policy enhances the model's ability to handle multivariate series. Despite operating within a subspace of FCs with limited expressive capacity, DiPE-Linear demonstrates comparable or superior performance to both FCs and nonlinear models across multiple open-source and real-world LTSF datasets, validating the effectiveness of its sophisticatedly designed structure. The combination of efficiency, accuracy, and interpretability makes DiPE-Linear a strong candidate for advancing LTSF in both research and real-world applications. The source code is available at https://github.com/wintertee/DiPE-Linear.

This ...

This work is submitted to IEEE International Conference on Data Engineering (ICDE) 2025

GraphSubDetector: Time Series Subsequence Anomaly Detection via Density-Aware Adaptive Graph Neural Network 2024-11-26
Show

Time series subsequence anomaly detection is an important task in a large variety of real-world applications ranging from health monitoring to AIOps, and is challenging due to the following reasons: 1) how to effectively learn complex dynamics and dependencies in time series; 2) diverse and complicated anomalous subsequences as well as the inherent variance and noise of normal patterns; 3) how to determine the proper subsequence length for effective detection, which is a required parameter for many existing algorithms. In this paper, we present a novel approach to subsequence anomaly detection, namely GraphSubDetector. First, it adaptively learns the appropriate subsequence length with a length selection mechanism that highlights the characteristics of both normal and anomalous patterns. Second, we propose a density-aware adaptive graph neural network (DAGNN), which can generate further robust representations against variance of normal data for anomaly detection by message passing between subsequences. The experimental results demonstrate the effectiveness of the proposed algorithm, which achieves superior performance on multiple time series anomaly benchmark datasets compared to state-of-the-art algorithms.

Conformalised Conditional Normalising Flows for Joint Prediction Regions in time series 2024-11-26
Show

Conformal Prediction offers a powerful framework for quantifying uncertainty in machine learning models, enabling the construction of prediction sets with finite-sample validity guarantees. While easily adaptable to non-probabilistic models, applying conformal prediction to probabilistic generative models, such as Normalising Flows is not straightforward. This work proposes a novel method to conformalise conditional normalising flows, specifically addressing the problem of obtaining prediction regions for multi-step time series forecasting. Our approach leverages the flexibility of normalising flows to generate potentially disjoint prediction regions, leading to improved predictive efficiency in the presence of potential multimodal predictive distributions.

Works...

Workshop on Bayesian Decision-making and Uncertainty, 38th Conference on Neural Information Processing Systems (NeurIPS 2024)

Achieving Privacy Utility Balance for Multivariate Time Series Data 2024-11-26
Show

Utility-preserving data privatization is of utmost importance for data-producing agencies. The popular noise-addition privacy mechanism distorts autocorrelation patterns in time series data, thereby marring utility; in response, McElroy et al. (2023) introduced all-pass filtering (FLIP) as a utility-preserving time series data privatization method. Adapting this concept to multivariate data is more complex, and in this paper we propose a multivariate all-pass (MAP) filtering method, employing an optimization algorithm to achieve the best balance between data utility and privacy protection. To test the effectiveness of our approach, we apply MAP filtering to both simulated and real data, sourced from the U.S. Census Bureau's Quarterly Workforce Indicator (QWI) dataset.

CMAViT: Integrating Climate, Managment, and Remote Sensing Data for Crop Yield Estimation with Multimodel Vision Transformers 2024-11-25
Show

Crop yield prediction is essential for agricultural planning but remains challenging due to the complex interactions between weather, climate, and management practices. To address these challenges, we introduce a deep learning-based multi-model called Climate-Management Aware Vision Transformer (CMAViT), designed for pixel-level vineyard yield predictions. CMAViT integrates both spatial and temporal data by leveraging remote sensing imagery and short-term meteorological data, capturing the effects of growing season variations. Additionally, it incorporates management practices, which are represented in text form, using a cross-attention encoder to model their interaction with time-series data. This innovative multi-modal transformer tested on a large dataset from 2016-2019 covering 2,200 hectares and eight grape cultivars including more than 5 million vines, outperforms traditional models like UNet-ConvLSTM, excelling in spatial variability capture and yield prediction, particularly for extreme values in vineyards. CMAViT achieved an R2 of 0.84 and a MAPE of 8.22% on an unseen test dataset. Masking specific modalities lowered performance: excluding management practices, climate data, and both reduced R2 to 0.73, 0.70, and 0.72, respectively, and raised MAPE to 11.92%, 12.66%, and 12.39%, highlighting each modality's importance for accurate yield prediction. Code is available at https://github.com/plant-ai-biophysics-lab/CMAViT.

Clustering Time Series Data with Gaussian Mixture Embeddings in a Graph Autoencoder Framework 2024-11-25
Show

Time series data analysis is prevalent across various domains, including finance, healthcare, and environmental monitoring. Traditional time series clustering methods often struggle to capture the complex temporal dependencies inherent in such data. In this paper, we propose the Variational Mixture Graph Autoencoder (VMGAE), a graph-based approach for time series clustering that leverages the structural advantages of graphs to capture enriched data relationships and produces Gaussian mixture embeddings for improved separability. Comparisons with baseline methods are included with experimental results, demonstrating that our method significantly outperforms state-of-the-art time-series clustering techniques. We further validate our method on real-world financial data, highlighting its practical applications in finance. By uncovering community structures in stock markets, our method provides deeper insights into stock relationships, benefiting market prediction, portfolio optimization, and risk management.

First...

First two listed authors have equal contribution. Author ordering is determined by coin flip

UniTS: A Unified Multi-Task Time Series Model 2024-11-25
Show

Although pre-trained transformers and reprogrammed text-based LLMs have shown strong performance on time series tasks, the best-performing architectures vary widely across tasks, with most models narrowly focused on specific areas, such as time series forecasting. Unifying predictive and generative time series tasks within a single model remains challenging. We introduce UniTS, a unified multi-task time series model that utilizes task tokenization to integrate predictive and generative tasks into a single framework. UniTS employs a modified transformer block to capture universal time series representations, enabling transferability from a heterogeneous, multi-domain pre-training dataset-characterized by diverse dynamic patterns, sampling rates, and temporal scales-to a wide range of downstream datasets with varied task specifications and data domains. Tested on 38 datasets across human activity sensors, healthcare, engineering, and finance, UniTS achieves superior performance compared to 12 forecasting models, 20 classification models, 18 anomaly detection models, and 16 imputation models, including adapted text-based LLMs. UniTS also demonstrates strong few-shot and prompt capabilities when applied to new domains and tasks. In single-task settings, UniTS outperforms competitive task-specialized time series models. Code and datasets are available at https://github.com/mims-harvard/UniTS.

NeurIPS 2024
Motion Code: Robust Time Series Classification and Forecasting via Sparse Variational Multi-Stochastic Processes Learning 2024-11-25
Show

Despite extensive research, time series classification and forecasting on noisy data remain highly challenging. The main difficulties lie in finding suitable mathematical concepts to describe time series and effectively separate noise from the true signals. Unlike traditional methods treating time series as static vectors or fixed sequences, we propose a novel framework that views each time series, regardless of length, as a realization of a continuous-time stochastic process. This mathematical approach captures dependencies across timestamps and detects hidden, time-varying signals within the noise. However, real-world data often involves multiple distinct dynamics, making it insufficient to model the entire process with a single stochastic model. To address this, we assign each dynamic a unique signature vector and introduce the concept of "most informative timestamps" to infer a sparse approximation of the individual dynamics from these vectors. The resulting model, called Motion Code, includes parameters that fully capture diverse underlying dynamics in an integrated manner, enabling simultaneous classification and forecasting of time series. Extensive experiments on noisy datasets, including real-world Parkinson's disease sensor tracking, demonstrate Motion Code's strong performance against established benchmarks for time series classification and forecasting.

20 pa...

20 pages, 5 figures, 4 tables

Any2Any: Incomplete Multimodal Retrieval with Conformal Prediction 2024-11-25
Show

Autonomous agents perceive and interpret their surroundings by integrating multimodal inputs, such as vision, audio, and LiDAR. These perceptual modalities support retrieval tasks, such as place recognition in robotics. However, current multimodal retrieval systems encounter difficulties when parts of the data are missing due to sensor failures or inaccessibility, such as silent videos or LiDAR scans lacking RGB information. We propose Any2Any-a novel retrieval framework that addresses scenarios where both query and reference instances have incomplete modalities. Unlike previous methods limited to the imputation of two modalities, Any2Any handles any number of modalities without training generative models. It calculates pairwise similarities with cross-modal encoders and employs a two-stage calibration process with conformal prediction to align the similarities. Any2Any enables effective retrieval across multimodal datasets, e.g., text-LiDAR and text-time series. It achieves a Recall@5 of 35% on the KITTI dataset, which is on par with baseline models with complete modalities.

Enhancing In-Hospital Mortality Prediction Using Multi-Representational Learning with LLM-Generated Expert Summaries 2024-11-25
Show

In-hospital mortality (IHM) prediction for ICU patients is critical for timely interventions and efficient resource allocation. While structured physiological data provides quantitative insights, clinical notes offer unstructured, context-rich narratives. This study integrates these modalities with Large Language Model (LLM)-generated expert summaries to improve IHM prediction accuracy. Using the MIMIC-III database, we analyzed time-series physiological data and clinical notes from the first 48 hours of ICU admission. Clinical notes were concatenated chronologically for each patient and transformed into expert summaries using Med42-v2 70B. A multi-representational learning framework was developed to integrate these data sources, leveraging LLMs to enhance textual data while mitigating direct reliance on LLM predictions, which can introduce challenges in uncertainty quantification and interpretability. The proposed model achieved an AUPRC of 0.6156 (+36.41%) and an AUROC of 0.8955 (+7.64%) compared to a time-series-only baseline. Expert summaries outperformed clinical notes or time-series data alone, demonstrating the value of LLM-generated knowledge. Performance gains were consistent across demographic groups, with notable improvements in underrepresented populations, underscoring the framework's equitable application potential. By integrating LLM-generated summaries with structured and unstructured data, the framework captures complementary patient information, significantly improving predictive performance. This approach showcases the potential of LLMs to augment critical care prediction models, emphasizing the need for domain-specific validation and advanced integration strategies for broader clinical adoption.

Responsible forecasting: identifying and typifying forecasting harms 2024-11-25
Show

Data-driven organizations around the world routinely use forecasting methods to improve their planning and decision-making capabilities. Although much research exists on the harms resulting from traditional machine learning applications, little has specifically focused on the ethical impact of time series forecasting. Yet forecasting raises unique ethical issues due to the way it is used in different organizational contexts, supports different goals, and involves different data processing, model development, and evaluation pipelines. These differences make it difficult to apply machine learning harm taxonomies to common forecasting contexts. We leverage multiple interviews with expert industry practitioners and academic researchers to remedy this knowledge gap by cataloguing and analysing under-explored domains, applications, and scenarios where forecasting may cause harm, with the goal of developing a novel taxonomy of forecasting-specific harms. Inspired by Microsoft Azure taxonomy for responsible innovation, we combined a human-led inductive coding scheme with an AI-driven analysis centered on the extraction of key taxonomies of harm in forecasting. The taxonomy is designed to guide researchers and practitioners and encourage ethical reflection on the impact of their decisions during the forecasting process. A secondary objective is to create a research agenda focused on possible forecasting-related measures to mitigate harm. Our work extends the growing literature on machine learning harms by identifying unique forms of harm that may occur in forecasting.

A Dataset for Evaluating Online Anomaly Detection Approaches for Discrete Multivariate Time Series 2024-11-25
Show

Benchmarking anomaly detection approaches for multivariate time series is challenging due to the lack of high-quality datasets. Current publicly available datasets are too small, not diverse and feature trivial anomalies, which hinders measurable progress in this research area. We propose a solution: a diverse, extensive, and non-trivial dataset generated via state-of-the-art simulation tools that reflects realistic behaviour of an automotive powertrain, including its multivariate, dynamic and variable-state properties. To cater for both unsupervised and semi-supervised anomaly detection settings, as well as time series generation and forecasting, we make different versions of the dataset available, where training and test subsets are offered in contaminated and clean versions, depending on the task. We also provide baseline results from a small selection of approaches based on deterministic and variational autoencoders, as well as a non-parametric approach. As expected, the baseline experimentation shows that the approaches trained on the semi-supervised version of the dataset outperform their unsupervised counterparts, highlighting a need for approaches more robust to contaminated training data.

Submi...

Submitted to the IEEE Transactions on Reliability journal

Machine learning for cerebral blood vessels' malformations 2024-11-25
Show

Cerebral aneurysms and arteriovenous malformations are life-threatening hemodynamic pathologies of the brain. While surgical intervention is often essential to prevent fatal outcomes, it carries significant risks both during the procedure and in the postoperative period, making the management of these conditions highly challenging. Parameters of cerebral blood flow, routinely monitored during medical interventions, could potentially be utilized in machine learning-assisted protocols for risk assessment and therapeutic prognosis. To this end, we developed a linear oscillatory model of blood velocity and pressure for clinical data acquired from neurosurgical operations. Using the method of Sparse Identification of Nonlinear Dynamics (SINDy), the parameters of our model can be reconstructed online within milliseconds from a short time series of the hemodynamic variables. The identified parameter values enable automated classification of the blood-flow pathologies by means of logistic regression, achieving an accuracy of 73 %. Our results demonstrate the potential of this model for both diagnostic and prognostic applications, providing a robust and interpretable framework for assessing cerebral blood vessel conditions.

14 pa...

14 pages, 6 main figures, 5 supplementary figures, 2 supplementary tables

Towards Foundation Models for Critical Care Time Series 2024-11-25
Show

Notable progress has been made in generalist medical large language models across various healthcare areas. However, large-scale modeling of in-hospital time series data - such as vital signs, lab results, and treatments in critical care - remains underexplored. Existing datasets are relatively small, but combining them can enhance patient diversity and improve model robustness. To effectively utilize these combined datasets for large-scale modeling, it is essential to address the distribution shifts caused by varying treatment policies, necessitating the harmonization of treatment variables across the different datasets. This work aims to establish a foundation for training large-scale multi-variate time series models on critical care data and to provide a benchmark for machine learning models in transfer learning across hospitals to study and address distribution shift challenges. We introduce a harmonized dataset for sequence modeling and transfer learning research, representing the first large-scale collection to include core treatment variables. Future plans involve expanding this dataset to support further advancements in transfer learning and the development of scalable, generalizable models for critical healthcare applications.

Accep...

Accepted for Oral Presentation at AIM-FM Workshop at NeurIPS 2024

Emulating complex dynamical simulators with random Fourier features 2024-11-25
Show

A Gaussian process (GP)-based methodology is proposed to emulate complex dynamical computer models (or simulators). The method relies on emulating the numerical flow map of the system over an initial (short) time step, where the flow map is a function that describes the evolution of the system from an initial condition to a subsequent value at the next time step. This yields a probabilistic distribution over the entire flow map function, with each draw offering an approximation to the flow map. The model output times series is then predicted (under the Markov assumption) by drawing a sample from the emulated flow map (i.e., its posterior distribution) and using it to iterate from the initial condition ahead in time. Repeating this procedure with multiple such draws creates a distribution over the time series. The mean and variance of this distribution at a specific time point serve as the model output prediction and the associated uncertainty, respectively. However, drawing a GP posterior sample that represents the underlying function across its entire domain is computationally infeasible, given the infinite-dimensional nature of this object. To overcome this limitation, one can generate such a sample in an approximate manner using random Fourier features (RFF). RFF is an efficient technique for approximating the kernel and generating GP samples, offering both computational efficiency and theoretical guarantees. The proposed method is applied to emulate several dynamic nonlinear simulators including the well-known Lorenz and van der Pol models. The results suggest that our approach has a promising predictive performance and the associated uncertainty can capture the dynamics of the system appropriately.

Learning Predictive Checklists with Probabilistic Logic Programming 2024-11-25
Show

Checklists have been widely recognized as effective tools for completing complex tasks in a systematic manner. Although originally intended for use in procedural tasks, their interpretability and ease of use have led to their adoption for predictive tasks as well, including in clinical settings. However, designing checklists can be challenging, often requiring expert knowledge and manual rule design based on available data. Recent work has attempted to address this issue by using machine learning to automatically generate predictive checklists from data, although these approaches have been limited to Boolean data. We propose a novel method for learning predictive checklists from diverse data modalities, such as images and time series. Our approach relies on probabilistic logic programming, a learning paradigm that enables matching the discrete nature of checklist with continuous-valued data. We propose a regularization technique to tradeoff between the information captured in discrete concepts of continuous data and permit a tunable level of interpretability for the learned checklist concepts. We demonstrate that our method outperforms various explainable machine learning techniques on prediction tasks involving image sequences, time series, and clinical notes.

36 pages
Unified Principal Components Analysis of Functional Time Series 2024-11-25
Show

Functional time series (FTS) are increasingly available from diverse real-world applications such as finance, traffic, and environmental science. To analyze such data, it is common to perform dimension reduction on FTS, converting serially dependent random functions to vector time series for downstream tasks. Traditional methods like functional principal component analysis (FPCA) and dynamic FPCA (DFPCA) can be employed for the dimension reduction of FTS. However, these methods may either not be theoretically optimal or be too redundant to represent serially dependent functional data. In this article, we introduce a novel dimension reduction method for FTS based on dynamic FPCA. Through a new concept called optimal functional filters, we unify the theories of FPCA and dynamic FPCA, providing a parsimonious and optimal representation for FTS adapting to its serial dependence structure. This framework is referred to as principal analysis via dependency-adaptivity (PADA). Under a hierarchical Bayesian model, we establish an implementation procedure of PADA for dimension reduction and prediction of irregularly observed FTS. We establish the statistical consistency of PADA in achieving parsimonious and optimal dimension reduction and demonstrate its effectiveness through extensive simulation studies. Finally, we apply our method to daily PM2.5 concentration data, validating the effectiveness of PADA for analyzing FTS data.

Modeling large dimensional matrix time series with partially known and latent factors 2024-11-25
Show

This article considers to model large-dimensional matrix time series by introducing a regression term to the matrix factor model. This is an extension of classic matrix factor model to incorporate the information of known factors or useful covariates. We establish the convergence rates of coefficient matrix, loading matrices and the signal part. The theoretical results coincide with the rates in Wang et al. (2019). We conduct numerical studies to verify the performance of our estimation procedure in finite samples. Finally, we demonstrate the superiority of our proposed model using the daily returns of stocks data.

20 pages, 4 figures
OrionBench: Benchmarking Time Series Generative Models in the Service of the End-User 2024-11-24
Show

Time series anomaly detection is a vital task in many domains, including patient monitoring in healthcare, forecasting in finance, and predictive maintenance in energy industries. This has led to a proliferation of anomaly detection methods, including deep learning-based methods. Benchmarks are essential for comparing the performances of these models as they emerge, in a fair, rigorous, and reproducible approach. Although several benchmarks for comparing models have been proposed, these usually rely on a one-time execution over a limited set of datasets, with comparisons restricted to a few models. We propose OrionBench: an end-user centric, continuously maintained benchmarking framework for unsupervised time series anomaly detection models. Our framework provides universal abstractions to represent models, hyperparameter standardization, extensibility to add new pipelines and datasets, pipeline verification, and frequent releases with published updates of the benchmark. We demonstrate how to use OrionBench, and the performance of pipelines across 17 releases published over the course of four years. We also walk through two real scenarios we experienced with OrionBench that highlight the importance of continuous benchmarking for unsupervised time series anomaly detection.

This ...

This work is accepted by IEEE BigData 2024

Incorporating Metabolic Information into LLMs for Anomaly Detection in Clinical Time-Series 2024-11-24
Show

Anomaly detection in clinical time-series holds significant potential in identifying suspicious patterns in different biological parameters. In this paper, we propose a targeted method that incorporates the clinical domain knowledge into LLMs to improve their ability to detect anomalies. We introduce the Metabolism Pathway-driven Prompting (MPP) method, which integrates the information about metabolic pathways to better capture the structural and temporal changes in biological samples. We applied our method for doping detection in sports, focusing on steroid metabolism, and evaluated using real-world data from athletes. The results show that our method improves anomaly detection performance by leveraging metabolic context, providing a more nuanced and accurate prediction of suspicious samples in athletes' profiles.

Dynamical Mode Recognition of Coupled Flame Oscillators by Supervised and Unsupervised Learning Approaches 2024-11-24
Show

Combustion instability in gas turbines and rocket engines, as one of the most challenging problems in combustion research, arises from the complex interactions among flames, which are also influenced by chemical reactions, heat and mass transfer, and acoustics. Identifying and understanding combustion instability is essential to ensure the safe and reliable operation of many combustion systems, where exploring and classifying the dynamical behaviors of complex flame systems is a core take. To facilitate fundamental studies, the present work concerns dynamical mode recognition of coupled flame oscillators made of flickering buoyant diffusion flames, which have gained increasing attention in recent years but are not sufficiently understood. The time series data of flame oscillators are generated by fully validated reacting flow simulations. Due to limitations of expertise-based models, a data-driven approach is adopted. In this study, a nonlinear dimensional reduction model of variational autoencoder (VAE) is used to project the simulation data onto a 2-dimensional latent space. Based on the phase trajectories in latent space, both supervised and unsupervised classifiers are proposed for datasets with well known labeling and without, respectively. For labeled datasets, we establish the Wasserstein-distance-based classifier (WDC) for mode recognition; for unlabeled datasets, we develop a novel unsupervised classifier (GMM-DTWC) combining dynamic time warping (DTW) and Gaussian mixture model (GMM). Through comparing with conventional approaches for dimensionality reduction and classification, the proposed supervised and unsupervised VAE-based approaches exhibit a prominent performance for distinguishing dynamical modes, implying their potential extension to dynamical mode recognition of complex combustion problems.

resea...

research paper (29 pages, 20 figures)

Beyond Data Scarcity: A Frequency-Driven Framework for Zero-Shot Forecasting 2024-11-24
Show

Time series forecasting is critical in numerous real-world applications, requiring accurate predictions of future values based on observed patterns. While traditional forecasting techniques work well in in-domain scenarios with ample data, they struggle when data is scarce or not available at all, motivating the emergence of zero-shot and few-shot learning settings. Recent advancements often leverage large-scale foundation models for such tasks, but these methods require extensive data and compute resources, and their performance may be hindered by ineffective learning from the available training set. This raises a fundamental question: What factors influence effective learning from data in time series forecasting? Toward addressing this, we propose using Fourier analysis to investigate how models learn from synthetic and real-world time series data. Our findings reveal that forecasters commonly suffer from poor learning from data with multiple frequencies and poor generalization to unseen frequencies, which impedes their predictive performance. To alleviate these issues, we present a novel synthetic data generation framework, designed to enhance real data or replace it completely by creating task-specific frequency information, requiring only the sampling rate of the target data. Our approach, Freq-Synth, improves the robustness of both foundation as well as nonfoundation forecast models in zero-shot and few-shot settings, facilitating more reliable time series forecasting under limited data scenarios.

TableTime: Reformulating Time Series Classification as Zero-Shot Table Understanding via Large Language Models 2024-11-24
Show

Large language models (LLMs) have demonstrated their effectiveness in multivariate time series classification (MTSC). Effective adaptation of LLMs for MTSC necessitates informative data representations. Existing LLM-based methods directly encode embeddings for time series within the latent space of LLMs from scratch to align with semantic space of LLMs. Despite their effectiveness, we reveal that these methods conceal three inherent bottlenecks: (1) they struggle to encode temporal and channel-specific information in a lossless manner, both of which are critical components of multivariate time series; (2) it is much difficult to align the learned representation space with the semantic space of the LLMs; (3) they require task-specific retraining, which is both computationally expensive and labor-intensive. To bridge these gaps, we propose TableTime, which reformulates MTSC as a table understanding task. Specifically, TableTime introduces the following strategies: (1) convert multivariate time series into a tabular form, thus minimizing information loss to the greatest extent; (2) represent tabular time series in text format to achieve natural alignment with the semantic space of LLMs; (3) design a reasoning framework that integrates contextual text information, neighborhood assistance, multi-path inference and problem decomposition to enhance the reasoning ability of LLMs and realize zero-shot classification. Extensive experiments performed on 10 publicly representative datasets from UEA archive verify the superiorities of the TableTime.

Tackling Data Heterogeneity in Federated Time Series Forecasting 2024-11-24
Show

Time series forecasting plays a critical role in various real-world applications, including energy consumption prediction, disease transmission monitoring, and weather forecasting. Although substantial progress has been made in time series forecasting, most existing methods rely on a centralized training paradigm, where large amounts of data are collected from distributed devices (e.g., sensors, wearables) to a central cloud server. However, this paradigm has overloaded communication networks and raised privacy concerns. Federated learning, a popular privacy-preserving technique, enables collaborative model training across distributed data sources. However, directly applying federated learning to time series forecasting often yields suboptimal results, as time series data generated by different devices are inherently heterogeneous. In this paper, we propose a novel framework, Fed-TREND, to address data heterogeneity by generating informative synthetic data as auxiliary knowledge carriers. Specifically, Fed-TREND generates two types of synthetic data. The first type of synthetic data captures the representative distribution information from clients' uploaded model updates and enhances clients' local training consensus. The second kind of synthetic data extracts long-term influence insights from global model update trajectories and is used to refine the global model after aggregation. Fed-TREND is compatible with most time series forecasting models and can be seamlessly integrated into existing federated learning frameworks to improve prediction performance. Extensive experiments on eight datasets, using several federated learning baselines and four popular time series forecasting models, demonstrate the effectiveness and generalizability of Fed-TREND.

Quantile deep learning models for multi-step ahead time series prediction 2024-11-24
Show

Uncertainty quantification is crucial in time series prediction, and quantile regression offers a valuable mechanism for uncertainty quantification which is useful for extreme value forecasting. Although deep learning models have been prominent in multi-step ahead prediction, the development and evaluation of quantile deep learning models have been limited. We present a novel quantile regression deep learning framework for multi-step time series prediction. In this way, we elevate the capabilities of deep learning models by incorporating quantile regression, thus providing a more nuanced understanding of predictive values. We provide an implementation of prominent deep learning models for multi-step ahead time series prediction and evaluate their performance under high volatility and extreme conditions. We include multivariate and univariate modelling, strategies and provide a comparison with conventional deep learning models from the literature. Our models are tested on two cryptocurrencies: Bitcoin and Ethereum, using daily close-price data and selected benchmark time series datasets. The results show that integrating a quantile loss function with deep learning provides additional predictions for selected quantiles without a loss in the prediction accuracy when compared to the literature. Our quantile model has the ability to handle volatility more effectively and provides additional information for decision-making and uncertainty quantification through the use of quantiles when compared to conventional deep learning models.

Reliable Generation of Privacy-preserving Synthetic Electronic Health Record Time Series via Diffusion Models 2024-11-23
Show

Electronic Health Records (EHRs) are rich sources of patient-level data, offering valuable resources for medical data analysis. However, privacy concerns often restrict access to EHRs, hindering downstream analysis. Current EHR de-identification methods are flawed and can lead to potential privacy leakage. Additionally, existing publicly available EHR databases are limited, preventing the advancement of medical research using EHR. This study aims to overcome these challenges by generating realistic and privacy-preserving synthetic electronic health records (EHRs) time series efficiently. We introduce a new method for generating diverse and realistic synthetic EHR time series data using Denoising Diffusion Probabilistic Models (DDPM). We conducted experiments on six databases: Medical Information Mart for Intensive Care III and IV (MIMIC-III/IV), the eICU Collaborative Research Database (eICU), and non-EHR datasets on Stocks and Energy. We compared our proposed method with eight existing methods. Our results demonstrate that our approach significantly outperforms all existing methods in terms of data fidelity while requiring less training effort. Additionally, data generated by our method yields a lower discriminative accuracy compared to other baseline methods, indicating the proposed method can generate data with less privacy risk. The proposed diffusion-model-based method can reliably and efficiently generate synthetic EHR time series, which facilitates the downstream medical data analysis. Our numerical results show the superiority of the proposed method over all other existing methods.

Asynchronous Jump Testing and Estimation in High Dimensions Under Complex Temporal Dynamics 2024-11-23
Show

Most high dimensional changepoint detection methods assume the error process is stationary and changepoints occur synchronously across dimensions. The violation of these assumptions, which in applied settings is increasingly likely as the dimensionality of the time series being analyzed grows, can dramatically curtail the sensitivity or the accuracy of these methods. We propose AJDN (Asynchronous Jump Detection under Nonstationary noise). AJDN is a high dimensional multiscale jump detection method that tests and estimates jumps in an otherwise smoothly varying mean function for high dimensional time series with nonstationary noise where the jumps across dimensions may not occur at the same time. AJDN is correct in the sense that it detects the correct number of jumps with a prescribed probability asymptotically and its accuracy in estimating the locations of the jumps is asymptotically nearly optimal under the asynchronous jump assumption. Through a simulation study we demonstrate AJDN's robustness across a wide variety of stationary and nonstationary high dimensional time series, and we show its strong performance relative to some existing high dimensional changepoint detection methods. We apply AJDN to a seismic time series to demonstrate its ability to accurately detect jumps in real-world high dimensional time series with complex temporal dynamics.

Modeling Latent Neural Dynamics with Gaussian Process Switching Linear Dynamical Systems 2024-11-23
Show

Understanding how the collective activity of neural populations relates to computation and ultimately behavior is a key goal in neuroscience. To this end, statistical methods which describe high-dimensional neural time series in terms of low-dimensional latent dynamics have played a fundamental role in characterizing neural systems. Yet, what constitutes a successful method involves two opposing criteria: (1) methods should be expressive enough to capture complex nonlinear dynamics, and (2) they should maintain a notion of interpretability often only warranted by simpler linear models. In this paper, we develop an approach that balances these two objectives: the Gaussian Process Switching Linear Dynamical System (gpSLDS). Our method builds on previous work modeling the latent state evolution via a stochastic differential equation whose nonlinear dynamics are described by a Gaussian process (GP-SDEs). We propose a novel kernel function which enforces smoothly interpolated locally linear dynamics, and therefore expresses flexible -- yet interpretable -- dynamics akin to those of recurrent switching linear dynamical systems (rSLDS). Our approach resolves key limitations of the rSLDS such as artifactual oscillations in dynamics near discrete state boundaries, while also providing posterior uncertainty estimates of the dynamics. To fit our models, we leverage a modified learning objective which improves the estimation accuracy of kernel hyperparameters compared to previous GP-SDE fitting approaches. We apply our method to synthetic data and data recorded in two neuroscience experiments and demonstrate favorable performance in comparison to the rSLDS.

38th ...

38th Conference on Neural Information Processing Systems (NeurIPS 2024)

Nd-BiMamba2: A Unified Bidirectional Architecture for Multi-Dimensional Data Processing 2024-11-22
Show

Deep learning models often require specially designed architectures to process data of different dimensions, such as 1D time series, 2D images, and 3D volumetric data. Existing bidirectional models mainly focus on sequential data, making it difficult to scale effectively to higher dimensions. To address this issue, we propose a novel multi-dimensional bidirectional neural network architecture, named Nd-BiMamba2, which efficiently handles 1D, 2D, and 3D data. Nd-BiMamba2 is based on the Mamba2 module and introduces innovative bidirectional processing mechanisms and adaptive padding strategies to capture bidirectional information in multi-dimensional data while maintaining computational efficiency. Unlike existing methods that require designing specific architectures for different dimensional data, Nd-BiMamba2 adopts a unified architecture with a modular design, simplifying development and maintenance costs. To verify the portability and flexibility of Nd-BiMamba2, we successfully exported it to ONNX and TorchScript and tested it on different hardware platforms (e.g., CPU, GPU, and mobile devices). Experimental results show that Nd-BiMamba2 runs efficiently on multiple platforms, demonstrating its potential in practical applications. The code is open-source: https://github.com/Human9000/nd-Mamba2-torch

Point process analysis of geographical diffusion of news in Argentina 2024-11-22
Show

The diffusion of information plays a crucial role in a society, affecting its economy and the well-being of the population. Characterizing the diffusion process is challenging because it is highly non-stationary and varies with the media type. To understand the spreading of newspaper news in Argentina, we collected data from more than 27000 articles published in six main provinces during four months. We classified the articles into 20 thematic axes and obtained a set of time series that capture daily newspaper attention on different topics in different provinces. To analyze the data we use a point process approach. For each topic, $n$, and for all pairs of provinces, $i$ and $j$, we use two measures to quantify the synchronicity of the events, $Q_s(i,j)$, which quantifies the number of events that occur almost simultaneously in $i$ and $j$, and $Q_a(i,j)$, which quantifies the direction of news spreading. Our analysis unveils how fast the information diffusion process is, showing pairs of provinces with very similar and almost simultaneous temporal variations of media attention. On the other hand, we also calculate other measures computed from the raw time series, such as Granger Causality and Transfer Entropy, which do not perform well in this context because they often return opposite directions of information transfer. We interpret this as due to different factors such as the characteristics of the data, which is highly non-stationary and the features of the information diffusion process, which is very fast and probably acts at a sub-resolution time scale.

Exploring Kolmogorov-Arnold Networks for Interpretable Time Series Classification 2024-11-22
Show

Time series classification is a relevant step supporting decision-making processes in various domains, and deep neural models have shown promising performance. Despite significant advancements in deep learning, the theoretical understanding of how and why complex architectures function remains limited, prompting the need for more interpretable models. Recently, the Kolmogorov-Arnold Networks (KANs) have been proposed as a more interpretable alternative. While KAN-related research is significantly rising, to date, the study of KAN architectures for time series classification has been limited. In this paper, we aim to conduct a comprehensive and robust exploration of the KAN architecture for time series classification on the UCR benchmark. More specifically, we look at a) how reference architectures for forecasting transfer to classification, at the b) hyperparameter and implementation influence on the classification performance in view of finding the one that performs best on the selected benchmark, the c) complexity trade-offs and d) interpretability advantages. Our results show that (1) Efficient KAN outperforms MLP in performance and computational efficiency, showcasing its suitability for tasks classification tasks. (2) Efficient KAN is more stable than KAN across grid sizes, depths, and layer configurations, particularly with lower learning rates. (3) KAN maintains competitive accuracy compared to state-of-the-art models like HIVE-COTE2, with smaller architectures and faster training times, supporting its balance of performance and transparency. (4) The interpretability of the KAN model aligns with findings from SHAP analysis, reinforcing its capacity for transparent decision-making.

Random Fourier Signature Features 2024-11-22
Show

Tensor algebras give rise to one of the most powerful measures of similarity for sequences of arbitrary length called the signature kernel accompanied with attractive theoretical guarantees from stochastic analysis. Previous algorithms to compute the signature kernel scale quadratically in terms of the length and the number of the sequences. To mitigate this severe computational bottleneck, we develop a random Fourier feature-based acceleration of the signature kernel acting on the inherently non-Euclidean domain of sequences. We show uniform approximation guarantees for the proposed unbiased estimator of the signature kernel, while keeping its computation linear in the sequence length and number. In addition, combined with recent advances on tensor projections, we derive two even more scalable time series features with favourable concentration properties and computational complexity both in time and memory. Our empirical results show that the reduction in computational cost comes at a negligible price in terms of accuracy on moderate-sized datasets, and it enables one to scale to large datasets up to a million time series.

A Unified Energy Management Framework for Multi-Timescale Forecasting in Smart Grids 2024-11-22
Show

Accurate forecasting of the electrical load, such as the magnitude and the timing of peak power, is crucial to successful power system management and implementation of smart grid strategies like demand response and peak shaving. In multi-time-scale optimization scheduling, rolling optimization is a common solution. However, rolling optimization needs to consider the coupling of different optimization objectives across time scales. It is challenging to accurately capture the mid- and long-term dependencies in time series data. This paper proposes Multi-pofo, a multi-scale power load forecasting framework, that captures such dependency via a novel architecture equipped with a temporal positional encoding layer. To validate the effectiveness of the proposed model, we conduct experiments on real-world electricity load data. The experimental results show that our approach outperforms compared to several strong baseline methods.

Submi...

Submitted to PES GM 2025

Stable Neural Stochastic Differential Equations in Analyzing Irregular Time Series Data 2024-11-22
Show

Irregular sampling intervals and missing values in real-world time series data present challenges for conventional methods that assume consistent intervals and complete data. Neural Ordinary Differential Equations (Neural ODEs) offer an alternative approach, utilizing neural networks combined with ODE solvers to learn continuous latent representations through parameterized vector fields. Neural Stochastic Differential Equations (Neural SDEs) extend Neural ODEs by incorporating a diffusion term, although this addition is not trivial, particularly when addressing irregular intervals and missing values. Consequently, careful design of drift and diffusion functions is crucial for maintaining stability and enhancing performance, while incautious choices can result in adverse properties such as the absence of strong solutions, stochastic destabilization, or unstable Euler discretizations, significantly affecting Neural SDEs' performance. In this study, we propose three stable classes of Neural SDEs: Langevin-type SDE, Linear Noise SDE, and Geometric SDE. Then, we rigorously demonstrate their robustness in maintaining excellent performance under distribution shift, while effectively preventing overfitting. To assess the effectiveness of our approach, we conduct extensive experiments on four benchmark datasets for interpolation, forecasting, and classification tasks, and analyze the robustness of our methods with 30 public datasets under different missing rates. Our results demonstrate the efficacy of the proposed method in handling real-world irregular time series data.

Publi...

Published at the Twelfth International Conference on Learning Representations (ICLR 2024), Spotlight presentation (Notable Top 5%). https://openreview.net/forum?id=4VIgNuQ1pY

ArrivalNet: Predicting City-wide Bus/Tram Arrival Time with Two-dimensional Temporal Variation Modeling 2024-11-22
Show

Accurate arrival time prediction (ATP) of buses and trams plays a crucial role in public transport operations. Current methods focused on modeling one-dimensional temporal information but overlooked the latent periodic information within time series. Moreover, most studies developed algorithms for ATP based on a single or a few routes of public transport, which reduces the transferability of the prediction models and their applicability in public transport management systems. To this end, this paper proposes \textit{ArrivalNet}, a two-dimensional temporal variation-based multi-step ATP for buses and trams. It decomposes the one-dimensional temporal sequence into intra-periodic and inter-periodic variations, which can be recast into two-dimensional tensors (2D blocks). Each row of a tensor contains the time points within a period, and each column involves the time points at the same intra-periodic index across various periods. The transformed 2D blocks in different frequencies have an image-like feature representation that enables effective learning with computer vision backbones (e.g., convolutional neural network). Drawing on the concept of residual neural network, the 2D block module is designed as a basic module for flexible aggregation. Meanwhile, contextual factors like workdays, peak hours, and intersections, are also utilized in the augmented feature representation to improve the performance of prediction. 125 days of public transport data from Dresden were collected for model training and validation. Experimental results show that the root mean square error, mean absolute error, and mean absolute percentage error of the proposed predictor decrease by at least 6.1%, 14.7%, and 34.2% compared with state-of-the-art baseline methods.

Con4m: Context-aware Consistency Learning Framework for Segmented Time Series Classification 2024-11-22
Show

Time Series Classification (TSC) encompasses two settings: classifying entire sequences or classifying segmented subsequences. The raw time series for segmented TSC usually contain Multiple classes with Varying Duration of each class (MVD). Therefore, the characteristics of MVD pose unique challenges for segmented TSC, yet have been largely overlooked by existing works. Specifically, there exists a natural temporal dependency between consecutive instances (segments) to be classified within MVD. However, mainstream TSC models rely on the assumption of independent and identically distributed (i.i.d.), focusing on independently modeling each segment. Additionally, annotators with varying expertise may provide inconsistent boundary labels, leading to unstable performance of noise-free TSC models. To address these challenges, we first formally demonstrate that valuable contextual information enhances the discriminative power of classification instances. Leveraging the contextual priors of MVD at both the data and label levels, we propose a novel consistency learning framework Con4m, which effectively utilizes contextual information more conducive to discriminating consecutive segments in segmented TSC tasks, while harmonizing inconsistent boundary labels for training. Extensive experiments across multiple datasets validate the effectiveness of Con4m in handling segmented TSC tasks on MVD.

Recursive Gaussian Process State Space Model 2024-11-22
Show

Learning dynamical models from data is not only fundamental but also holds great promise for advancing principle discovery, time-series prediction, and controller design. Among various approaches, Gaussian Process State-Space Models (GPSSMs) have recently gained significant attention due to their combination of flexibility and interpretability. However, for online learning, the field lacks an efficient method suitable for scenarios where prior information regarding data distribution and model function is limited. To address this issue, this paper proposes a recursive GPSSM method with adaptive capabilities for both operating domains and Gaussian process (GP) hyperparameters. Specifically, we first utilize first-order linearization to derive a Bayesian update equation for the joint distribution between the system state and the GP model, enabling closed-form and domain-independent learning. Second, an online selection algorithm for inducing points is developed based on informative criteria to achieve lightweight learning. Third, to support online hyperparameter optimization, we recover historical measurement information from the current filtering distribution. Comprehensive evaluations on both synthetic and real-world datasets demonstrate the superior accuracy, computational efficiency, and adaptability of our method compared to state-of-the-art online GPSSM techniques.

Introducing Spectral Attention for Long-Range Dependency in Time Series Forecasting 2024-11-22
Show

Sequence modeling faces challenges in capturing long-range dependencies across diverse tasks. Recent linear and transformer-based forecasters have shown superior performance in time series forecasting. However, they are constrained by their inherent inability to effectively address long-range dependencies in time series data, primarily due to using fixed-size inputs for prediction. Furthermore, they typically sacrifice essential temporal correlation among consecutive training samples by shuffling them into mini-batches. To overcome these limitations, we introduce a fast and effective Spectral Attention mechanism, which preserves temporal correlations among samples and facilitates the handling of long-range information while maintaining the base model structure. Spectral Attention preserves long-period trends through a low-pass filter and facilitates gradient to flow between samples. Spectral Attention can be seamlessly integrated into most sequence models, allowing models with fixed-sized look-back windows to capture long-range dependencies over thousands of steps. Through extensive experiments on 11 real-world time series datasets using 7 recent forecasting models, we consistently demonstrate the efficacy of our Spectral Attention mechanism, achieving state-of-the-art results.

Co-fi...

Co-first Author: Bong Gyun Kang, Dongjun Lee. NeurIPS 2024 (Conference on Neural Information Processing Systems)

Modelling Loss of Complexity in Intermittent Time Series and its Application 2024-11-21
Show

In this paper, we developed a nonparametric relative entropy (RlEn) for modelling loss of complexity in intermittent time series. This technique consists of two steps. First, we carry out a nonlinear autoregressive model where the lag order is determined by a Bayesian Information Criterion (BIC), and complexity of each intermittent time series is obtained by our novel relative entropy. Second, change-points in complexity were detected by using the cumulative sum (CUSUM) based method. Using simulations and compared to the popular method appropriate entropy (ApEN), the performance of RlEn was assessed for its (1) ability to localise complexity change-points in intermittent time series; (2) ability to faithfully estimate underlying nonlinear models. The performance of the proposal was then examined in a real analysis of fatigue-induced changes in the complexity of human motor outputs. The results demonstrated that the proposed method outperformed the ApEn in accurately detecting complexity changes in intermittent time series segments.

44 pages, 4 figures
A Bayesian mixture model for Poisson network autoregression 2024-11-21
Show

In this paper, we propose a new Bayesian Poisson network autoregression mixture model (PNARM). Our model combines ideas from the models of Dahl 2008, Ren et al. 2024 and Armillotta and Fokianos 2024, as it is motivated by the following aims. We consider the problem of modelling multivariate count time series since they arise in many real-world data sets, but has been studied less than its Gaussian-distributed counterpart (Fokianos 2024). Additionally, we assume that the time series occur on the nodes of a known underlying network where the edges dictate the form of the structural vector autoregression model, as a means of imposing sparsity. A further aim is to accommodate heterogeneous node dynamics, and to develop a probabilistic model for clustering nodes that exhibit similar behaviour. We develop an MCMC algorithm for sampling from the model's posterior distribution. The model is applied to a data set of COVID-19 cases in the counties of the Republic of Ireland.

Spectral domain likelihoods for Bayesian inference in time-varying parameter models 2024-11-21
Show

Inference for locally stationary processes is often based on some local Whittle-type approximation of the likelihood function defined in the frequency domain. The main reasons for using such a likelihood approximation is that i) it has substantially lower computational cost and better scalability to long time series compared to the time domain likelihood, particularly when used for Bayesian inference via Markov Chain Monte Carlo (MCMC), ii) convenience when the model itself is specified in the frequency domain, and iii) it provides access to bootstrap and subsampling MCMC which exploits the asymptotic independence of Fourier transformed data. Most of the existing literature compares the asymptotic performance of the maximum likelihood estimator (MLE) from such frequency domain likelihood approximation with the exact time domain MLE. Our article uses three simulation studies to assess the finite-sample accuracy of several frequency domain likelihood functions when used to approximate the posterior distribution in time-varying parameter models. The methods are illustrated on an application to egg price data.

Transfer Learning on Transformers for Building Energy Consumption Forecasting -- A Comparative Study 2024-11-21
Show

This study investigates the application of Transfer Learning (TL) on Transformer architectures to enhance building energy consumption forecasting. Transformers are a relatively new deep learning architecture, which has served as the foundation for groundbreaking technologies such as ChatGPT. While TL has been studied in the past, prior studies considered either one data-centric TL strategy or used older deep learning models such as Recurrent Neural Networks or Convolutional Neural Networks. Here, we carry out an extensive empirical study on six different data-centric TL strategies and analyse their performance under varying feature spaces. In addition to the vanilla Transformer architecture, we also experiment with Informer and PatchTST, specifically designed for time series forecasting. We use 16 datasets from the Building Data Genome Project 2 to create building energy consumption forecasting models. Experimental results reveal that while TL is generally beneficial, especially when the target domain has no data, careful selection of the exact TL strategy should be made to gain the maximum benefit. This decision largely depends on the feature space properties such as the recorded weather features. We also note that PatchTST outperforms the other two Transformer variants (vanilla Transformer and Informer). Our findings advance the building energy consumption forecasting using advanced approaches like TL and Transformer architectures.

WaveRoRA: Wavelet Rotary Route Attention for Multivariate Time Series Forecasting 2024-11-21
Show

In recent years, Transformer-based models (Transformers) have achieved significant success in multivariate time series forecasting (MTSF). However, previous works focus on extracting features either from the time domain or the frequency domain, which inadequately captures the trends and periodic characteristics. To address this issue, we propose a wavelet learning framework to model complex temporal dependencies of the time series data. The wavelet domain integrates both time and frequency information, allowing for the analysis of local characteristics of signals at different scales. Additionally, the Softmax self-attention mechanism used by Transformers has quadratic complexity, which leads to excessive computational costs when capturing long-term dependencies. Therefore, we propose a novel attention mechanism: Rotary Route Attention (RoRA). Unlike Softmax attention, RoRA utilizes rotary position embeddings to inject relative positional information to sequence tokens and introduces a small number of routing tokens $r$ to aggregate information from the $KV$ matrices and redistribute it to the $Q$ matrix, offering linear complexity. We further propose WaveRoRA, which leverages RoRA to capture inter-series dependencies in the wavelet domain. We conduct extensive experiments on eight real-world datasets. The results indicate that WaveRoRA outperforms existing state-of-the-art models while maintaining lower computational costs. Our code is available at https://github.com/Leopold2333/WaveRoRA.

Model...

Model architecture changed

On the Use of Relative Validity Indices for Comparing Clustering Approaches 2024-11-21
Show

Relative Validity Indices (RVIs) such as the Silhouette Width Criterion and Davies Bouldin indices are the most widely used tools for evaluating and optimising clustering outcomes. Traditionally, their ability to rank collections of candidate dataset partitions has been used to guide the selection of the number of clusters, and to compare partitions from different clustering algorithms. However, there is a growing trend in the literature to use RVIs when selecting a Similarity Paradigm (SP) for clustering - the combination of normalisation procedure, representation method, and distance measure which affects the computation of object dissimilarities used in clustering. Despite the growing prevalence of this practice, there has been no empirical or theoretical investigation into the suitability of RVIs for this purpose. Moreover, since RVIs are computed using object dissimilarities, it remains unclear how they would need to be implemented for fair comparisons of different SPs. This study presents the first comprehensive investigation into the reliability of RVIs for SP selection. We conducted extensive experiments with seven popular RVIs on over 2.7 million clustering partitions of synthetic and real-world datasets, encompassing feature-vector and time-series data. We identified fundamental conceptual limitations undermining the use of RVIs for SP selection, and our empirical findings confirmed this predicted unsuitability. Among our recommendations, we suggest instead that practitioners select SPs by using external validation on high quality labelled datasets or carefully designed outcome-oriented objective criteria, both of which should be informed by careful consideration of dataset characteristics, and domain requirements. Our findings have important implications for clustering methodology and evaluation, suggesting the need for more rigorous approaches to SP selection.

Multi-Modal Forecaster: Jointly Predicting Time Series and Textual Data 2024-11-21
Show

Current forecasting approaches are largely unimodal and ignore the rich textual data that often accompany the time series due to lack of well-curated multimodal benchmark dataset. In this work, we develop TimeText Corpus (TTC), a carefully curated, time-aligned text and time dataset for multimodal forecasting. Our dataset is composed of sequences of numbers and text aligned to timestamps, and includes data from two different domains: climate science and healthcare. Our data is a significant contribution to the rare selection of available multimodal datasets. We also propose the Hybrid Multi-Modal Forecaster (Hybrid-MMF), a multimodal LLM that jointly forecasts both text and time series data using shared embeddings. However, contrary to our expectations, our Hybrid-MMF model does not outperform existing baselines in our experiments. This negative result highlights the challenges inherent in multimodal forecasting. Our code and data are available at https://github.com/Rose-STL-Lab/Multimodal_ Forecasting.

21 pa...

21 pages, 4 tables, 2 figures

Fast Machine-Precision Spectral Likelihoods for Stationary Time Series 2024-11-20
Show

We provide in this work an algorithm for approximating a very broad class of symmetric Toeplitz matrices to machine precision in $\mathcal{O}(n \log n)$ time with applications to fitting time series models. In particular, for a symmetric Toeplitz matrix $\mathbf{\Sigma}$ with values $\mathbf{\Sigma}{j,k} = h{

j-k
VisTR: Visualizations as Representations for Time-series Table Reasoning 2024-11-20
Show

Table reasoning involves transforming natural language questions into corresponding answers based on the provided data table. Recent research exploits large language models (LLMs) to facilitate table reasoning, which however struggle with pattern recognition and lack support for visual-based pattern exploration. To address these limitations, we propose VisTR, a framework that leverages visualizations as representations to facilitate data pattern recognition and support cross-modal exploration. We describe VisTR as a process consisting of four major modules: 1) visualization alignment that utilizes multimodal LLMs to align visualizations across various modalities, including chart, text, and sketch; 2) visualization referencing that decomposes a table into multifaceted visualization references that comprehensively represent the table; 3) visualization pruning that incorporates data and retrieval pruning to excise visualization references with poor information and enhance retrieval efficiency; and 4) visualization interaction that offers an interactive visual interface with multimodal interactions for user-friendly table reasoning. Quantitative evaluation with existing multimodal LLMs demonstrates the effectiveness of the alignment model in cross-modal visualization pairings. We further illustrate the applicability of the proposed framework in various time-series table reasoning and exploration tasks.

11 pages, 10 figures
Conformal Prediction for Hierarchical Data 2024-11-20
Show

Reconciliation has become an essential tool in multivariate point forecasting for hierarchical time series. However, there is still a lack of understanding of the theoretical properties of probabilistic Forecast Reconciliation techniques. Meanwhile, Conformal Prediction is a general framework with growing appeal that provides prediction sets with probabilistic guarantees in finite sample. In this paper, we propose a first step towards combining Conformal Prediction and Forecast Reconciliation by analyzing how including a reconciliation step in the Split Conformal Prediction (SCP) procedure enhances the resulting prediction sets. In particular, we show that the validity granted by SCP remains while improving the efficiency of the prediction sets. We also advocate a variation of the theoretical procedure for practical use. Finally, we illustrate these results with simulations.

14 pages, 2 figures
SynEHRgy: Synthesizing Mixed-Type Structured Electronic Health Records using Decoder-Only Transformers 2024-11-20
Show

Generating synthetic Electronic Health Records (EHRs) offers significant potential for data augmentation, privacy-preserving data sharing, and improving machine learning model training. We propose a novel tokenization strategy tailored for structured EHR data, which encompasses diverse data types such as covariates, ICD codes, and irregularly sampled time series. Using a GPT-like decoder-only transformer model, we demonstrate the generation of high-quality synthetic EHRs. Our approach is evaluated using the MIMIC-III dataset, and we benchmark the fidelity, utility, and privacy of the generated data against state-of-the-art models.

Generation of synthetic gait data: application to multiple sclerosis patients' gait patterns 2024-11-20
Show

Multiple sclerosis (MS) is the leading cause of severe non-traumatic disability in young adults and its incidence is increasing worldwide. The variability of gait impairment in MS necessitates the development of a non-invasive, sensitive, and cost-effective tool for quantitative gait evaluation. The eGait movement sensor, designed to characterize human gait through unit quaternion time series (QTS) representing hip rotations, is a promising approach. However, the small sample sizes typical of clinical studies pose challenges for the stability of gait data analysis tools. To address these challenges, this article presents two key scientific contributions. First, a comprehensive framework is proposed for transforming QTS data into a form that preserves the essential geometric properties of gait while enabling the use of any tabular synthetic data generation method. Second, a synthetic data generation method is introduced, based on nearest neighbors weighting, which produces high-fidelity synthetic QTS data suitable for small datasets and private data environments. The effectiveness of the proposed method, is demonstrated through its application to MS gait data, showing very good fidelity and respect of the initial geometry of the data. Thanks to this work, we are able to produce synthetic data sets and work on the stability of clustering methods.

Transformers with Sparse Attention for Granger Causality 2024-11-20
Show

Temporal causal analysis means understanding the underlying causes behind observed variables over time. Deep learning based methods such as transformers are increasingly used to capture temporal dynamics and causal relationships beyond mere correlations. Recent works suggest self-attention weights of transformers as a useful indicator of causal links. We leverage this to propose a novel modification to the self-attention module to establish causal links between the variables of multivariate time-series data with varying lag dependencies. Our Sparse Attention Transformer captures causal relationships using a two-fold approach - performing temporal attention first followed by attention between the variables across the time steps masking them individually to compute Granger Causality indices. The key novelty in our approach is the ability of the model to assert importance and pick the most significant past time instances for its prediction task against manually feeding a fixed time lag value. We demonstrate the effectiveness of our approach via extensive experimentation on several synthetic benchmark datasets. Furthermore, we compare the performance of our model with the traditional Vector Autoregression based Granger Causality method that assumes fixed lag length.

Quantized symbolic time series approximation 2024-11-20
Show

Time series are ubiquitous in numerous science and engineering domains, e.g., signal processing, bioinformatics, and astronomy. Previous work has verified the efficacy of symbolic time series representation in a variety of engineering applications due to its storage efficiency and numerosity reduction. The most recent symbolic aggregate approximation technique, ABBA, has been shown to preserve essential shape information of time series and improve downstream applications, e.g., neural network inference regarding prediction and anomaly detection in time series. Motivated by the emergence of high-performance hardware which enables efficient computation for low bit-width representations, we present a new quantization-based ABBA symbolic approximation technique, QABBA, which exhibits improved storage efficiency while retaining the original speed and accuracy of symbolic reconstruction. We prove an upper bound for the error arising from quantization and discuss how the number of bits should be chosen to balance this with other errors. An application of QABBA with large language models (LLMs) for time series regression is also presented, and its utility is investigated. By representing the symbolic chain of patterns on time series, QABBA not only avoids the training of embedding from scratch, but also achieves a new state-of-the-art on Monash regression dataset. The symbolic approximation to the time series offers a more efficient way to fine-tune LLMs on the time series regression task which contains various application domains. We further present a set of extensive experiments performed across various well-established datasets to demonstrate the advantages of the QABBA method for symbolic approximation.

TSINR: Capturing Temporal Continuity via Implicit Neural Representations for Time Series Anomaly Detection 2024-11-20
Show

Time series anomaly detection aims to identify unusual patterns in data or deviations from systems' expected behavior. The reconstruction-based methods are the mainstream in this task, which learn point-wise representation via unsupervised learning. However, the unlabeled anomaly points in training data may cause these reconstruction-based methods to learn and reconstruct anomalous data, resulting in the challenge of capturing normal patterns. In this paper, we propose a time series anomaly detection method based on implicit neural representation (INR) reconstruction, named TSINR, to address this challenge. Due to the property of spectral bias, TSINR enables prioritizing low-frequency signals and exhibiting poorer performance on high-frequency abnormal data. Specifically, we adopt INR to parameterize time series data as a continuous function and employ a transformer-based architecture to predict the INR of given data. As a result, the proposed TSINR method achieves the advantage of capturing the temporal continuity and thus is more sensitive to discontinuous anomaly data. In addition, we further design a novel form of INR continuous function to learn inter- and intra-channel information, and leverage a pre-trained large language model to amplify the intense fluctuations in anomalies. Extensive experiments demonstrate that TSINR achieves superior overall performance on both univariate and multivariate time series anomaly detection benchmarks compared to other state-of-the-art reconstruction-based methods. Our codes are available.

Accep...

Accepted by SIGKDD 2025

Rethinking the Power of Timestamps for Robust Time Series Forecasting: A Global-Local Fusion Perspective 2024-11-20
Show

Time series forecasting has played a pivotal role across various industries, including finance, transportation, energy, healthcare, and climate. Due to the abundant seasonal information they contain, timestamps possess the potential to offer robust global guidance for forecasting techniques. However, existing works primarily focus on local observations, with timestamps being treated merely as an optional supplement that remains underutilized. When data gathered from the real world is polluted, the absence of global information will damage the robust prediction capability of these algorithms. To address these problems, we propose a novel framework named GLAFF. Within this framework, the timestamps are modeled individually to capture the global dependencies. Working as a plugin, GLAFF adaptively adjusts the combined weights for global and local information, enabling seamless collaboration with any time series forecasting backbone. Extensive experiments conducted on nine real-world datasets demonstrate that GLAFF significantly enhances the average performance of widely used mainstream forecasting models by 12.5%, surpassing the previous state-of-the-art method by 5.5%.

Accep...

Accepted by NeurIPS 2024

A Gap in Time: The Challenge of Processing Heterogeneous IoT Data in Digitalized Buildings 2024-11-20
Show

The increasing demand for sustainable energy solutions has driven the integration of digitalized buildings into the power grid, leveraging Internet-of-Things (IoT) technologies to enhance energy efficiency and operational performance. Despite their potential, effectively utilizing IoT point data within deep-learning frameworks presents significant challenges, primarily due to its inherent heterogeneity. This study investigates the diverse dimensions of IoT data heterogeneity in both intra-building and inter-building contexts, examining their implications for predictive modeling. A benchmarking analysis of state-of-the-art time series models highlights their performance on this complex dataset. The results emphasize the critical need for multi-modal data integration, domain-informed modeling, and automated data engineering pipelines. Additionally, the study advocates for collaborative efforts to establish high-quality public datasets, which are essential for advancing intelligent and sustainable energy management systems in digitalized buildings.

4 fig...

4 figures, 1 tables, 9 pages

Generalized Prompt Tuning: Adapting Frozen Univariate Time Series Foundation Models for Multivariate Healthcare Time Series 2024-11-19
Show

Time series foundation models are pre-trained on large datasets and are able to achieve state-of-the-art performance in diverse tasks. However, to date, there has been limited work demonstrating how well these models perform in medical applications, where labeled data can be scarce. Further, we observe that currently, the majority of time series foundation models either are univariate in nature, or assume channel independence, meaning that they handle multivariate time series but do not model how the different variables relate. In this paper, we propose a prompt-tuning-inspired fine-tuning technique, Generalized Prompt Tuning (Gen-P-Tuning), that enables us to adapt an existing univariate time series foundation model (treated as frozen) to handle multivariate time series prediction. Our approach provides a way to combine information across channels (variables) of multivariate time series. We demonstrate the effectiveness of our fine-tuning approach against various baselines on two MIMIC classification tasks, and on influenza-like illness forecasting.

Machi...

Machine Learning for Health (ML4H 2024)

Machine Learning Approaches on Crop Pattern Recognition a Comparative Analysis 2024-11-19
Show

Monitoring agricultural activities is important to ensure food security. Remote sensing plays a significant role for large-scale continuous monitoring of cultivation activities. Time series remote sensing data were used for the generation of the cropping pattern. Classification algorithms are used to classify crop patterns and mapped agriculture land used. Some conventional classification methods including support vector machine (SVM) and decision trees were applied for crop pattern recognition. However, in this paper, we are proposing Deep Neural Network (DNN) based classification to improve the performance of crop pattern recognition and make a comparative analysis with two (2) other machine learning approaches including Naive Bayes and Random Forest.

Publi...

Published in ICNTET2018: International Conference on New Trends in Engineering & Technology Tirupathi Highway, Tiruvallur Dist Chennai, India, September 7-8, 2018

Smart Predict-then-Optimize Method with Dependent Data: Risk Bounds and Calibration of Autoregression 2024-11-19
Show

The predict-then-optimize (PTO) framework is indispensable for addressing practical stochastic decision-making tasks. It consists of two crucial steps: initially predicting unknown parameters of an optimization model and subsequently solving the problem based on these predictions. Elmachtoub and Grigas [1] introduced the Smart Predict-then-Optimize (SPO) loss for the framework, which gauges the decision error arising from predicted parameters, and a convex surrogate, the SPO+ loss, which incorporates the underlying structure of the optimization model. The consistency of these different loss functions is guaranteed under the assumption of i.i.d. training data. Nevertheless, various types of data are often dependent, such as power load fluctuations over time. This dependent nature can lead to diminished model performance in testing or real-world applications. Motivated to make intelligent predictions for time series data, we present an autoregressive SPO method directly targeting the optimization problem at the decision stage in this paper, where the conditions of consistency are no longer met. Therefore, we first analyze the generalization bounds of the SPO loss within our autoregressive model. Subsequently, the uniform calibration results in Liu and Grigas [2] are extended in the proposed model. Finally, we conduct experiments to empirically demonstrate the effectiveness of the SPO+ surrogate compared to the absolute loss and the least squares loss, especially when the cost vectors are determined by stationary dynamical systems and demonstrate the relationship between normalized regret and mixing coefficients.

10 pages
A data driven approach to classify descriptors based on their efficiency in translating noisy trajectories into physically-relevant information 2024-11-19
Show

Reconstructing the physical complexity of many-body dynamical systems can be challenging. Starting from the trajectories of their constitutive units (raw data), typical approaches require selecting appropriate descriptors to convert them into time-series, which are then analyzed to extract interpretable information. However, identifying the most effective descriptor is often non-trivial. Here, we report a data-driven approach to compare the efficiency of various descriptors in extracting information from noisy trajectories and translating it into physically relevant insights. As a prototypical system with non-trivial internal complexity, we analyze molecular dynamics trajectories of an atomistic system where ice and water coexist in equilibrium near the solid/liquid transition temperature. We compare general and specific descriptors often used in aqueous systems: number of neighbors, molecular velocities, Smooth Overlap of Atomic Positions (SOAP), Local Environments and Neighbors Shuffling (LENS), Orientational Tetrahedral Order, and distance from the fifth neighbor ($d_5$). Using Onion Clustering -- an efficient unsupervised method for single-point time-series analysis -- we assess the maximum extractable information for each descriptor and rank them via a high-dimensional metric. Our results show that advanced descriptors like SOAP and LENS outperform classical ones due to higher signal-to-noise ratios. Nonetheless, even simple descriptors can rival or exceed advanced ones after local signal denoising. For example, $d_5$, initially among the weakest, becomes the most effective at resolving the system's non-local dynamical complexity after denoising. This work highlights the critical role of noise in information extraction from molecular trajectories and offers a data-driven approach to identify optimal descriptors for systems with characteristic internal complexity.

19 pa...

19 pages, 5 figures + 3 in supporting information (at the bottom of the manuscript)

Machine Learning Algorithms to Assess Site Closure Time Frames for Soil and Groundwater Contamination 2024-11-19
Show

Monitored Natural Attenuation (MNA) is gaining prominence as an effective method for managing soil and groundwater contamination due to its cost-efficiency and minimal environmental disruption. Despite its benefits, MNA necessitates extensive groundwater monitoring to ensure that contaminant levels decrease to meet safety standards. This study expands the capabilities of PyLEnM, a Python package designed for long-term environmental monitoring, by incorporating new algorithms to enhance its predictive and analytical functionalities. We introduce methods to estimate the timeframe required for contaminants like Sr-90 and I-129 to reach regulatory safety standards using linear regression and to forecast future contaminant levels with the Bidirectional Long Short-Term Memory (Bi-LSTM) networks. Additionally, Random Forest regression is employed to identify factors influencing the time to reach safety standards. Our methods are illustrated using data from the Savannah River Site (SRS) F-Area, where preliminary findings reveal a notable downward trend in contaminant levels, with variability linked to initial concentrations and groundwater flow dynamics. The Bi-LSTM model effectively predicts contaminant concentrations for the next four years, demonstrating the potential of advanced time series analysis to improve MNA strategies and reduce reliance on manual groundwater sampling. The code, along with its usage instructions, validation, and requirements, is available at: https://github.com/csplevuanh/pylenm_extension.

The p...

The paper will be withdrawn to fix some work issues with the sections on Bi-LSTM models

Ichnos: A Carbon Footprint Estimator for Scientific Workflows 2024-11-19
Show

We propose Ichnos, a novel and flexible tool to estimate the carbon footprint of Nextflow workflows based on detailed workflow traces, CI time series, and power models. First, Ichnos takes as input the automatically-generated workflow trace produced by Nextflow. Use of these traces is an original contribution, ensuring that users do not need to manually monitor power consumption and enabling analysis of previously executed workflows. Next, Ichnos allows users to provide their own resource power model for utilised compute resources to accurately reflect processor settings, such as the processor frequency, instead of solely relying on a linear function. Finally, Ichnos converts estimated energy consumption to overall carbon emissions using fine-grained time-series CI data for each workflow task and only resorts to coarse-grained yearly averages where high-resolution location-based CI data are not available. Additionally, Ichnos reports estimated energy consumption and carbon emissions per task, providing greater granularity than existing methodologies and allowing users to identify which of their tasks have the largest footprint to address. We provide the implementation of Ichnos as open-source. We demonstrate our tool on traces of two real-world Nextflow workflows, compare the estimated energy consumption against RAPL and the GA methodology, and show the tool's functionality by varying the granularity of provided CI data and varying the processor frequency settings of assigned compute resources.

Exten...

Extended Abstract for LOCO 2024. GitHub Repository: https://github.com/westkath/ichnos

Nonstationary functional time series forecasting 2024-11-19
Show

We propose a nonstationary functional time series forecasting method with an application to age-specific mortality rates observed over the years. The method begins by taking the first-order differencing and estimates its long-run covariance function. Through eigen-decomposition, we obtain a set of estimated functional principal components and their associated scores for the differenced series. These components allow us to reconstruct the original functional data and compute the residuals. To model the temporal patterns in the residuals, we again perform dynamic functional principal component analysis and extract its estimated principal components and the associated scores for the residuals. As a byproduct, we introduce a geometrically decaying weighted approach to assign higher weights to the most recent data than those from the distant past. Using the Swedish age-specific mortality rates from 1751 to 2022, we demonstrate that the weighted dynamic functional factor model can produce more accurate point and interval forecasts, particularly for male series exhibiting higher volatility.

34 pages, 10 figures
Different PCA approaches for vector functional time series with applications to resistive switching processes 2024-11-19
Show

This paper is motivated by modeling the cycle-to-cycle variability associated with the resistive switching operation behind memristors. As the data are by nature curves, functional principal component analysis is a suitable candidate to explain the main modes of variability. Taking into account this data-driven motivation, in this paper we propose two new forecasting approaches based on studying the sequential cross-dependence between and within a multivariate functional time series in terms of vector autoregressive modeling of the most explicative functional principal component scores. The main difference between the two methods lies in whether a univariate or multivariate PCA is performed so that we have a different set of principal component scores for each functional time series or the same one for all of them. Finally, the sample performance of the proposed methodologies is illustrated by an application on a bivariate functional time series of reset-set curves.

O-MAGIC: Online Change-Point Detection for Dynamic Systems 2024-11-19
Show

The capture of changes in dynamic systems, especially ordinary differential equations (ODEs), is an important and challenging task, with multiple applications in biomedical research and other scientific areas. This article proposes a fast and mathematically rigorous online method, called ODE-informed MAnifold-constrained Gaussian process Inference for Change point detection(O-MAGIC), to detect changes of parameters in the ODE system using noisy and sparse observation data. O-MAGIC imposes a Gaussian process prior to the time series of system components with a latent manifold constraint, induced by restricting the derivative process to satisfy ODE conditions. To detect the parameter changes from the observation, we propose a procedure based on a two-sample generalized likelihood ratio (GLR) test that can detect multiple change points in the dynamic system automatically. O-MAGIC bypasses conventional numerical integration and achieves substantial savings in computation time. By incorporating the ODE structures through manifold constraints, O-MAGIC enjoys a significant advantage in detection delay, while following principled statistical construction under the Bayesian paradigm, which further enables it to handle systems with missing data or unobserved components. O-MAGIC can also be applied to general nonlinear systems. Simulation studies on three challenging examples: SEIRD model, Lotka-Volterra model and Lorenz model are provided to illustrate the robustness and efficiency of O-MAGIC, compared with numerical integration and other popular time-series-based change point detection benchmark methods.

A Review on Generative AI Models for Synthetic Medical Text, Time Series, and Longitudinal Data 2024-11-19
Show

This paper presents the results of a novel scoping review on the practical models for generating three different types of synthetic health records (SHRs): medical text, time series, and longitudinal data. The innovative aspects of the review, which incorporate study objectives, data modality, and research methodology of the reviewed studies, uncover the importance and the scope of the topic for the digital medicine context. In total, 52 publications met the eligibility criteria for generating medical time series (22), longitudinal data (17), and medical text (13). Privacy preservation was found to be the main research objective of the studied papers, along with class imbalance, data scarcity, and data imputation as the other objectives. The adversarial network-based, probabilistic, and large language models exhibited superiority for generating synthetic longitudinal data, time series, and medical texts, respectively. Finding a reliable performance measure to quantify SHR re-identification risk is the major research gap of the topic.

27 pages, 3 figures
E-STGCN: Extreme Spatiotemporal Graph Convolutional Networks for Air Quality Forecasting 2024-11-19
Show

Modeling and forecasting air quality plays a crucial role in informed air pollution management and protecting public health. The air quality data of a region, collected through various pollution monitoring stations, display nonlinearity, nonstationarity, and highly dynamic nature and detain intense stochastic spatiotemporal correlation. Geometric deep learning models such as Spatiotemporal Graph Convolutional Networks (STGCN) can capture spatial dependence while forecasting temporal time series data for different sensor locations. Another key characteristic often ignored by these models is the presence of extreme observations in the air pollutant levels for severely polluted cities worldwide. Extreme value theory is a commonly used statistical method to predict the expected number of violations of the National Ambient Air Quality Standards for air pollutant concentration levels. This study develops an extreme value theory-based STGCN model (E-STGCN) for air pollution data to incorporate extreme behavior across pollutant concentrations. Along with spatial and temporal components, E-STGCN uses generalized Pareto distribution to investigate the extreme behavior of different air pollutants and incorporate it inside graph convolutional networks. The proposal is then applied to analyze air pollution data (PM2.5, PM10, and NO2) of 37 monitoring stations across Delhi, India. The forecasting performance for different test horizons is evaluated compared to benchmark forecasters (both temporal and spatiotemporal). It was found that E-STGCN has consistent performance across all the seasons in Delhi, India, and the robustness of our results has also been evaluated empirically. Moreover, combined with conformal prediction, E-STGCN can also produce probabilistic prediction intervals.

Extending the Burrows-Wheeler Transform for Cartesian Tree Matching and Constructing It 2024-11-19
Show

Cartesian tree matching is a form of generalized pattern matching where a substring of the text matches with the pattern if they share the same Cartesian tree. This form of matching finds application for time series of stock prices and can be of interest for melody matching between musical scores. For the indexing problem, the state-of-the-art data structure is a Burrows-Wheeler transform based solution due to [Kim and Cho, CPM'21], which uses nearly succinct space and can count the number of substrings that Cartesian tree match with a pattern in time linear in the pattern length. The authors address the construction of their data structure with a straight-forward solution that, however, requires pointer-based data structures, which asymptotically need more space than compact solutions [Kim and Cho, CPM'21, Section A.4]. We address this bottleneck by a construction that requires compact space and has a time complexity linear in the product of the text length with some logarithmic terms. Additionally, we can extend this index for indexing multiple circular texts in the spirit of the extended Burrows-Wheeler transform without sacrificing the time and space complexities. We present this index in a dynamic variant, where we pay a logarithmic slowdown and need compact space for the extra functionality that we can incrementally add texts. Our extended setting is of interest for finding repetitive motifs common in the aforementioned applications, independent of offsets and scaling.

Contrast Similarity-Aware Dual-Pathway Mamba for Multivariate Time Series Node Classification 2024-11-19
Show

Multivariate time series (MTS) data is generated through multiple sensors across various domains such as engineering application, health monitoring, and the internet of things, characterized by its temporal changes and high dimensional characteristics. Over the past few years, many studies have explored the long-range dependencies and similarities in MTS. However, long-range dependencies are difficult to model due to their temporal changes and high dimensionality makes it difficult to obtain similarities effectively and efficiently. Thus, to address these issues, we propose contrast similarity-aware dual-pathway Mamba for MTS node classification (CS-DPMamba). Firstly, to obtain the dynamic similarity of each sample, we initially use temporal contrast learning module to acquire MTS representations. And then we construct a similarity matrix between MTS representations using Fast Dynamic Time Warping (FastDTW). Secondly, we apply the DPMamba to consider the bidirectional nature of MTS, allowing us to better capture long-range and short-range dependencies within the data. Finally, we utilize the Kolmogorov-Arnold Network enhanced Graph Isomorphism Network to complete the information interaction in the matrix and MTS node classification task. By comprehensively considering the long-range dependencies and dynamic similarity features, we achieved precise MTS node classification. We conducted experiments on multiple University of East Anglia (UEA) MTS datasets, which encompass diverse application scenarios. Our results demonstrate the superiority of our method through both supervised and semi-supervised experiments on the MTS classification task.

Submi...

Submitted to Knowledge-Based Systems on Nov 17, 2024

Hybrid Gaussian Process Regression with Temporal Feature Extraction for Partially Interpretable Remaining Useful Life Interval Prediction in Aeroengine Prognostics 2024-11-19
Show

The estimation of Remaining Useful Life (RUL) plays a pivotal role in intelligent manufacturing systems and Industry 4.0 technologies. While recent advancements have improved RUL prediction, many models still face interpretability and compelling uncertainty modeling challenges. This paper introduces a modified Gaussian Process Regression (GPR) model for RUL interval prediction, tailored for the complexities of manufacturing process development. The modified GPR predicts confidence intervals by learning from historical data and addresses uncertainty modeling in a more structured way. The approach effectively captures intricate time-series patterns and dynamic behaviors inherent in modern manufacturing systems by coupling GPR with deep adaptive learning-enhanced AI process models. Moreover, the model evaluates feature significance to ensure more transparent decision-making, which is crucial for optimizing manufacturing processes. This comprehensive approach supports more accurate RUL predictions and provides transparent, interpretable insights into uncertainty, contributing to robust process development and management.

Trajectory

Title Date Abstract Comment
Dynamic Trajectory Adaptation for Efficient UAV Inspections of Wind Energy Units 2024-11-26
Show

The research presents an automated method for determining the trajectory of an unmanned aerial vehicle (UAV) for wind turbine inspection. The proposed method enables efficient data collection from multiple wind installations using UAV optical sensors, considering the spatial positioning of blades and other components of the wind energy installation. It includes component segmentation of the wind energy unit (WEU), determination of the blade pitch angle, and generation of optimal flight trajectories, considering safe distances and optimal viewing angles. The results of computational experiments have demonstrated the advantage of the proposed method in monitoring WEU, achieving a 78% reduction in inspection time, a 17% decrease in total trajectory length, and a 6% increase in average blade surface coverage compared to traditional methods. Furthermore, the process minimizes the average deviation from the optimal trajectory by 68%, indicating its high accuracy and ability to compensate for external influences.

Unman...

Unmanned aerial vehicles, wind turbine inspection, automated trajectory determination, dynamic trajectory adaptation, image segmentation, computer vision, optical sensors, wind energy unit

RealTraj: Towards Real-World Pedestrian Trajectory Forecasting 2024-11-26
Show

This paper jointly addresses three key limitations in conventional pedestrian trajectory forecasting: pedestrian perception errors, real-world data collection costs, and person ID annotation costs. We propose a novel framework, RealTraj, that enhances the real-world applicability of trajectory forecasting. Our approach includes two training phases--self-supervised pretraining on synthetic data and weakly-supervised fine-tuning with limited real-world data--to minimize data collection efforts. To improve robustness to real-world errors, we focus on both model design and training objectives. Specifically, we present Det2TrajFormer, a trajectory forecasting model that remains invariant in tracking noise by using past detections as inputs. Additionally, we pretrain the model using multiple pretext tasks, which enhance robustness and improve forecasting performance based solely on detection data. Unlike previous trajectory forecasting methods, our approach fine-tunes the model using only ground-truth detections, significantly reducing the need for costly person ID annotations. In the experiments, we comprehensively verify the effectiveness of the proposed method against the limitations, and the method outperforms state-of-the-art trajectory forecasting methods on multiple datasets.

Enhancing Lane Segment Perception and Topology Reasoning with Crowdsourcing Trajectory Priors 2024-11-26
Show

In autonomous driving, recent advances in lane segment perception provide autonomous vehicles with a comprehensive understanding of driving scenarios. Moreover, incorporating prior information input into such perception model represents an effective approach to ensure the robustness and accuracy. However, utilizing diverse sources of prior information still faces three key challenges: the acquisition of high-quality prior information, alignment between prior and online perception, efficient integration. To address these issues, we investigate prior augmentation from a novel perspective of trajectory priors. In this paper, we initially extract crowdsourcing trajectory data from Argoverse2 motion forecasting dataset and encode trajectory data into rasterized heatmap and vectorized instance tokens, then we incorporate such prior information into the online mapping model through different ways. Besides, with the purpose of mitigating the misalignment between prior and online perception, we design a confidence-based fusion module that takes alignment into account during the fusion process. We conduct extensive experiments on OpenLane-V2 dataset. The results indicate that our method's performance significantly outperforms the current state-of-the-art methods.

Characterized Diffusion Networks for Enhanced Autonomous Driving Trajectory Prediction 2024-11-25
Show

In this paper, we present a novel trajectory prediction model for autonomous driving, combining a Characterized Diffusion Module and a Spatial-Temporal Interaction Network to address the challenges posed by dynamic and heterogeneous traffic environments. Our model enhances the accuracy and reliability of trajectory predictions by incorporating uncertainty estimation and complex agent interactions. Through extensive experimentation on public datasets such as NGSIM, HighD, and MoCAD, our model significantly outperforms existing state-of-the-art methods. We demonstrate its ability to capture the underlying spatial-temporal dynamics of traffic scenarios and improve prediction precision, especially in complex environments. The proposed model showcases strong potential for application in real-world autonomous driving systems.

7 pages, 0 figures
InTraGen: Trajectory-controlled Video Generation for Object Interactions 2024-11-25
Show

Advances in video generation have significantly improved the realism and quality of created scenes. This has fueled interest in developing intuitive tools that let users leverage video generation as world simulators. Text-to-video (T2V) generation is one such approach, enabling video creation from text descriptions only. Yet, due to the inherent ambiguity in texts and the limited temporal information offered by text prompts, researchers have explored additional control signals like trajectory-guided systems, for more accurate T2V generation. Nonetheless, methods to evaluate whether T2V models can generate realistic interactions between multiple objects are lacking. We introduce InTraGen, a pipeline for improved trajectory-based generation of object interaction scenarios. We propose 4 new datasets and a novel trajectory quality metric to evaluate the performance of the proposed InTraGen. To achieve object interaction, we introduce a multi-modal interaction encoding pipeline with an object ID injection mechanism that enriches object-environment interactions. Our results demonstrate improvements in both visual fidelity and quantitative performance. Code and datasets are available at https://github.com/insait-institute/InTraGen

Bring the Heat: Rapid Trajectory Optimization with Pseudospectral Techniques and the Affine Geometric Heat Flow Equation 2024-11-24
Show

Generating optimal trajectories for high-dimensional robotic systems in a time-efficient manner while adhering to constraints is a challenging task. This paper introduces PHLAME, which applies pseudospectral collocation and spatial vector algebra to efficiently solve the Affine Geometric Heat Flow (AGHF) Partial Differential Equation (PDE) for trajectory optimization. Unlike traditional PDE approaches like the Hamilton-Jacobi-Bellman (HJB) PDE, which solve for a function over the entire state space, computing a solution to the AGHF PDE scales more efficiently because its solution is defined over a two-dimensional domain, thereby avoiding the intractability of state-space scaling. To solve the AGHF one usually applies the Method of Lines (MOL), which discretizes one variable of the AGHF PDE, and converts the PDE into a system of ordinary differential equations (ODEs) that are solved using standard time-integration methods. Though powerful, this method requires a fine discretization to generate accurate solutions and requires evaluating the AGHF PDE which is computationally expensive for high-dimensional systems. PHLAME overcomes this deficiency by using a pseudospectral method, which reduces the number of function evaluations required to yield a high accuracy solution thereby allowing it to scale efficiently to high-dimensional robotic systems. To further increase computational speed, this paper presents analytical expressions for the AGHF and its Jacobian, both of which can be computed efficiently using rigid body dynamics algorithms. PHLAME is tested across various dynamical systems, with and without obstacles and compared to a number of state-of-the-art techniques. PHLAME generates trajectories for a 44-dimensional state-space system in $\sim5$ seconds, much faster than current state-of-the-art techniques. A project page is available at https://roahmlab.github.io/PHLAME/

26 pa...

26 pages, 8 figures, A project page can be found at https://roahmlab.github.io/PHLAME/

FollowGen: A Scaled Noise Conditional Diffusion Model for Car-Following Trajectory Prediction 2024-11-23
Show

Vehicle trajectory prediction is crucial for advancing autonomous driving and advanced driver assistance systems (ADAS). Although deep learning-based approaches - especially those utilizing transformer-based and generative models - have markedly improved prediction accuracy by capturing complex, non-linear patterns in vehicle dynamics and traffic interactions, they frequently overlook detailed car-following behaviors and the inter-vehicle interactions critical for real-world driving applications, particularly in fully autonomous or mixed traffic scenarios. To address the issue, this study introduces a scaled noise conditional diffusion model for car-following trajectory prediction, which integrates detailed inter-vehicular interactions and car-following dynamics into a generative framework, improving both the accuracy and plausibility of predicted trajectories. The model utilizes a novel pipeline to capture historical vehicle dynamics by scaling noise with encoded historical features within the diffusion process. Particularly, it employs a cross-attention-based transformer architecture to model intricate inter-vehicle dependencies, effectively guiding the denoising process and enhancing prediction accuracy. Experimental results on diverse real-world driving scenarios demonstrate the state-of-the-art performance and robustness of the proposed method.

arXiv...

arXiv admin note: text overlap with arXiv:2406.11941

Learning-based Trajectory Tracking for Bird-inspired Flapping-Wing Robots 2024-11-22
Show

Bird-sized flapping-wing robots offer significant potential for agile flight in complex environments, but achieving agile and robust trajectory tracking remains a challenge due to the complex aerodynamics and highly nonlinear dynamics inherent in flapping-wing flight. In this work, a learning-based control approach is introduced to unlock the versatility and adaptiveness of flapping-wing flight. We propose a model-free reinforcement learning (RL)-based framework for a high degree-of-freedom (DoF) bird-inspired flapping-wing robot that allows for multimodal flight and agile trajectory tracking. Stability analysis was performed on the closed-loop system comprising of the flapping-wing system and the RL policy. Additionally, simulation results demonstrate that the RL-based controller can successfully learn complex wing trajectory patterns, achieve stable flight, switch between flight modes spontaneously, and track different trajectories under various aerodynamic conditions.

RED: Effective Trajectory Representation Learning with Comprehensive Information 2024-11-22
Show

Trajectory representation learning (TRL) maps trajectories to vectors that can then be used for various downstream tasks, including trajectory similarity computation, trajectory classification, and travel-time estimation. However, existing TRL methods often produce vectors that, when used in downstream tasks, yield insufficiently accurate results. A key reason is that they fail to utilize the comprehensive information encompassed by trajectories. We propose a self-supervised TRL framework, called RED, which effectively exploits multiple types of trajectory information. Overall, RED adopts the Transformer as the backbone model and masks the constituting paths in trajectories to train a masked autoencoder (MAE). In particular, RED considers the moving patterns of trajectories by employing a Road-aware masking strategy} that retains key paths of trajectories during masking, thereby preserving crucial information of the trajectories. RED also adopts a spatial-temporal-user joint Embedding scheme to encode comprehensive information when preparing the trajectories as model inputs. To conduct training, RED adopts Dual-objective task learning}: the Transformer encoder predicts the next segment in a trajectory, and the Transformer decoder reconstructs the entire trajectory. RED also considers the spatial-temporal correlations of trajectories by modifying the attention mechanism of the Transformer. We compare RED with 9 state-of-the-art TRL methods for 4 downstream tasks on 3 real-world datasets, finding that RED can usually improve the accuracy of the best-performing baseline by over 5%.

This ...

This paper is accepted by VLDB2025

Trajectory Planning and Control for Robotic Magnetic Manipulation 2024-11-22
Show

Robotic magnetic manipulation offers a minimally invasive approach to gastrointestinal examinations through capsule endoscopy. However, controlling such systems using external permanent magnets (EPM) is challenging due to nonlinear magnetic interactions, especially when there are complex navigation requirements such as avoidance of sensitive tissues. In this work, we present a novel trajectory planning and control method incorporating dynamics and navigation requirements, using a single EPM fixed to a robotic arm to manipulate an internal permanent magnet (IPM). Our approach employs a constrained iterative linear quadratic regulator that considers the dynamics of the IPM to generate optimal trajectories for both the EPM and IPM. Extensive simulations and real-world experiments, motivated by capsule endoscopy operations, demonstrate the robustness of the method, showcasing resilience to external disturbances and precise control under varying conditions. The experimental results show that the IPM reaches the goal position with a maximum mean error of 0.18 cm and a standard deviation of 0.21 cm. This work introduces a unified framework for constrained trajectory optimization in magnetic manipulation, directly incorporating both the IPM's dynamics and the EPM's manipulability.

8 pages, 6 figures
Bi-level Trajectory Optimization on Uneven Terrains with Differentiable Wheel-Terrain Interaction Model 2024-11-22
Show

Navigation of wheeled vehicles on uneven terrain necessitates going beyond the 2D approaches for trajectory planning. Specifically, it is essential to incorporate the full 6dof variation of vehicle pose and its associated stability cost in the planning process. To this end, most recent works aim to learn a neural network model to predict the vehicle evolution. However, such approaches are data-intensive and fraught with generalization issues. In this paper, we present a purely model-based approach that just requires the digital elevation information of the terrain. Specifically, we express the wheel-terrain interaction and 6dof pose prediction as a non-linear least squares (NLS) problem. As a result, trajectory planning can be viewed as a bi-level optimization. The inner optimization layer predicts the pose on the terrain along a given trajectory, while the outer layer deforms the trajectory itself to reduce the stability and kinematic costs of the pose. We improve the state-of-the-art in the following respects. First, we show that our NLS based pose prediction closely matches the output from a high-fidelity physics engine. This result coupled with the fact that we can query gradients of the NLS solver, makes our pose predictor, a differentiable wheel-terrain interaction model. We further leverage this differentiability to efficiently solve the proposed bi-level trajectory optimization problem. Finally, we perform extensive experiments, and comparison with a baseline to showcase the effectiveness of our approach in obtaining smooth, stable trajectories.

8 pag...

8 pages, 7 figures, submitted to IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS 2024)

Grid and Road Expressions Are Complementary for Trajectory Representation Learning 2024-11-22
Show

Trajectory representation learning (TRL) maps trajectories to vectors that can be used for many downstream tasks. Existing TRL methods use either grid trajectories, capturing movement in free space, or road trajectories, capturing movement in a road network, as input. We observe that the two types of trajectories are complementary, providing either region and location information or providing road structure and movement regularity. Therefore, we propose a novel multimodal TRL method, dubbed GREEN, to jointly utilize Grid and Road trajectory Expressions for Effective representatioN learning. In particular, we transform raw GPS trajectories into both grid and road trajectories and tailor two encoders to capture their respective information. To align the two encoders such that they complement each other, we adopt a contrastive loss to encourage them to produce similar embeddings for the same raw trajectory and design a mask language model (MLM) loss to use grid trajectories to help reconstruct masked road trajectories. To learn the final trajectory representation, a dual-modal interactor is used to fuse the outputs of the two encoders via cross-attention. We compare GREEN with 7 state-of-the-art TRL methods for 3 downstream tasks, finding that GREEN consistently outperforms all baselines and improves the accuracy of the best-performing baseline by an average of 15.99%.

This ...

This paper is accepted by KDD2025(August Cycle)

Landing Trajectory Prediction for UAS Based on Generative Adversarial Network 2024-11-21
Show

Models for trajectory prediction are an essential component of many advanced air mobility studies. These models help aircraft detect conflict and plan avoidance maneuvers, which is especially important in Unmanned Aircraft systems (UAS) landing management due to the congested airspace near vertiports. In this paper, we propose a landing trajectory prediction model for UAS based on Generative Adversarial Network (GAN). The GAN is a prestigious neural network that has been developed for many years. In previous research, GAN has achieved many state-of-the-art results in many generation tasks. The GAN consists of one neural network generator and a neural network discriminator. Because of the learning capacity of the neural networks, the generator is capable to understand the features of the sample trajectory. The generator takes the previous trajectory as input and outputs some random status of a flight. According to the results of the experiences, the proposed model can output more accurate predictions than the baseline method(GMR) in various datasets. To evaluate the proposed model, we also create a real UAV landing dataset that includes more than 2600 trajectories of drone control manually by real pilots.

9 pag...

9 pages, AIAA SCITECH 2023

Tra-MoE: Learning Trajectory Prediction Model from Multiple Domains for Adaptive Policy Conditioning 2024-11-21
Show

Learning from multiple domains is a primary factor that influences the generalization of a single unified robot system. In this paper, we aim to learn the trajectory prediction model by using broad out-of-domain data to improve its performance and generalization ability. Trajectory model is designed to predict any-point trajectories in the current frame given an instruction and can provide detailed control guidance for robotic policy learning. To handle the diverse out-of-domain data distribution, we propose a sparsely-gated MoE (\textbf{Top-1} gating strategy) architecture for trajectory model, coined as \textbf{Tra-MoE}. The sparse activation design enables good balance between parameter cooperation and specialization, effectively benefiting from large-scale out-of-domain data while maintaining constant FLOPs per token. In addition, we further introduce an adaptive policy conditioning technique by learning 2D mask representations for predicted trajectories, which is explicitly aligned with image observations to guide action prediction more flexibly. We perform extensive experiments on both simulation and real-world scenarios to verify the effectiveness of Tra-MoE and adaptive policy conditioning technique. We also conduct a comprehensive empirical study to train Tra-MoE, demonstrating that our Tra-MoE consistently exhibits superior performance compared to the dense baseline model, even when the latter is scaled to match Tra-MoE's parameter count.

15 pages, 5 figures
FlightPatchNet: Multi-Scale Patch Network with Differential Coding for Flight Trajectory Prediction 2024-11-21
Show

Accurate multi-step flight trajectory prediction plays an important role in Air Traffic Control, which can ensure the safety of air transportation. Two main issues limit the flight trajectory prediction performance of existing works. The first issue is the negative impact on prediction accuracy caused by the significant differences in data range. The second issue is that real-world flight trajectories involve underlying temporal dependencies, and existing methods fail to reveal the hidden complex temporal variations and only extract features from one single time scale. To address the above issues, we propose FlightPatchNet, a multi-scale patch network with differential coding for flight trajectory prediction. Specifically, FlightPatchNet first utilizes the differential coding to encode the original values of longitude and latitude into first-order differences and generates embeddings for all variables at each time step. Then, a global temporal attention is introduced to explore the dependencies between different time steps. To fully explore the diverse temporal patterns in flight trajectories, a multi-scale patch network is delicately designed to serve as the backbone. The multi-scale patch network exploits stacked patch mixer blocks to capture inter- and intra-patch dependencies under different time scales, and further integrates multi-scale temporal features across different scales and variables. Finally, FlightPatchNet ensembles multiple predictors to make direct multi-step prediction. Extensive experiments on ADS-B datasets demonstrate that our model outperforms the competitive baselines.

Dynamic Trajectory and Power Control in Ultra-Dense UAV Networks: A Mean-Field Reinforcement Learning Approach 2024-11-21
Show

In ultra-dense unmanned aerial vehicle (UAV) networks, it is challenging to coordinate the resource allocation and interference management among large-scale UAVs, for providing flexible and efficient service coverage to the ground users (GUs). In this paper, we propose a learning-based resource allocation scheme in an ultra-dense UAV communication network, where the GUs' service demands are time-varying with unknown distributions. We formulate the non-cooperative game among multiple co-channel UAVs as a stochastic game, where each UAV jointly optimizes its trajectory, user association, and downlink power control to maximize the expectation of its locally cumulative energy efficiency under the interference and energy constraints. To cope with the scalability issue in a large-scale network, we further formulate the problem as a mean-field game (MFG), which simplifies the interactions among the UAVs into a two-player game between a representative UAV and a mean-field. We prove the existence and uniqueness of the equilibrium for the MFG, and propose a model-free mean-field reinforcement learning algorithm named maximum entropy mean-field deep Q network (ME-MFDQN) to solve the mean-field equilibrium in both fully and partially observable scenarios. The simulation results reveal that the proposed algorithm improves the energy efficiency compared with the benchmark algorithms. Moreover, the performance can be further enhanced if the GUs' service demands exhibit higher temporal correlation or if the UAVs have wider observation capabilities over their nearby GUs.

Trajectory Representation Learning on Road Networks and Grids with Spatio-Temporal Dynamics 2024-11-21
Show

Trajectory representation learning is a fundamental task for applications in fields including smart city, and urban planning, as it facilitates the utilization of trajectory data (e.g., vehicle movements) for various downstream applications, such as trajectory similarity computation or travel time estimation. This is achieved by learning low-dimensional representations from high-dimensional and raw trajectory data. However, existing methods for trajectory representation learning either rely on grid-based or road-based representations, which are inherently different and thus, could lose information contained in the other modality. Moreover, these methods overlook the dynamic nature of urban traffic, relying on static road network features rather than time varying traffic patterns. In this paper, we propose TIGR, a novel model designed to integrate grid and road network modalities while incorporating spatio-temporal dynamics to learn rich, general-purpose representations of trajectories. We evaluate TIGR on two realworld datasets and demonstrate the effectiveness of combining both modalities by substantially outperforming state-of-the-art methods, i.e., up to 43.22% for trajectory similarity, up to 16.65% for travel time estimation, and up to 10.16% for destination prediction.

Trajectory Tracking Using Frenet Coordinates with Deep Deterministic Policy Gradient 2024-11-21
Show

This paper studies the application of the DDPG algorithm in trajectory-tracking tasks and proposes a trajectorytracking control method combined with Frenet coordinate system. By converting the vehicle's position and velocity information from the Cartesian coordinate system to Frenet coordinate system, this method can more accurately describe the vehicle's deviation and travel distance relative to the center line of the road. The DDPG algorithm adopts the Actor-Critic framework, uses deep neural networks for strategy and value evaluation, and combines the experience replay mechanism and target network to improve the algorithm's stability and data utilization efficiency. Experimental results show that the DDPG algorithm based on Frenet coordinate system performs well in trajectory-tracking tasks in complex environments, achieves high-precision and stable path tracking, and demonstrates its application potential in autonomous driving and intelligent transportation systems. Keywords- DDPG; path tracking; robot navigation

Almost Global Trajectory Tracking for Quadrotors Using Thrust Direction Control on $\mathcal{S}^2$ 2024-11-20
Show

Many of the existing works on quadrotor control address the trajectory tracking problem by employing a cascade design in which the translational and rotational dynamics are stabilized by two separate controllers. The stability of the cascade is often proved by employing trajectory-based arguments, most notably, integral input-to-state stability. In this paper, we follow a different route and present a control law ensuring that a composite function constructed from the translational and rotational tracking errors is a Lyapunov function for the closed-loop cascade. In particular, starting from a generic control law for the double integrator, we develop a suitable attitude control extension, by leveraging a backstepping-like procedure. Using this construction, we provide an almost global stability certificate. The proposed design employs the unit sphere $\mathcal{S}^2$ to describe the rotational degrees of freedom required for position control. This enables a simpler controller tuning and an improved tracking performance with respect to previous global solutions. The new design is demonstrated via numerical simulations and on real-world experiments.

Hierarchical Diffusion Policy: manipulation trajectory generation via contact guidance 2024-11-20
Show

Decision-making in robotics using denoising diffusion processes has increasingly become a hot research topic, but end-to-end policies perform poorly in tasks with rich contact and have limited controllability. This paper proposes Hierarchical Diffusion Policy (HDP), a new imitation learning method of using objective contacts to guide the generation of robot trajectories. The policy is divided into two layers: the high-level policy predicts the contact for the robot's next object manipulation based on 3D information, while the low-level policy predicts the action sequence toward the high-level contact based on the latent variables of observation and contact. We represent both level policies as conditional denoising diffusion processes, and combine behavioral cloning and Q-learning to optimize the low level policy for accurately guiding actions towards contact. We benchmark Hierarchical Diffusion Policy across 6 different tasks and find that it significantly outperforms the existing state of-the-art imitation learning method Diffusion Policy with an average improvement of 20.8%. We find that contact guidance yields significant improvements, including superior performance, greater interpretability, and stronger controllability, especially on contact-rich tasks. To further unlock the potential of HDP, this paper proposes a set of key technical contributions including snapshot gradient optimization, 3D conditioning, and prompt guidance, which improve the policy's optimization efficiency, spatial awareness, and controllability respectively. Finally, real world experiments verify that HDP can handle both rigid and deformable objects.

arXiv...

arXiv admin note: text overlap with arXiv:2303.04137 by other authors

A data driven approach to classify descriptors based on their efficiency in translating noisy trajectories into physically-relevant information 2024-11-19
Show

Reconstructing the physical complexity of many-body dynamical systems can be challenging. Starting from the trajectories of their constitutive units (raw data), typical approaches require selecting appropriate descriptors to convert them into time-series, which are then analyzed to extract interpretable information. However, identifying the most effective descriptor is often non-trivial. Here, we report a data-driven approach to compare the efficiency of various descriptors in extracting information from noisy trajectories and translating it into physically relevant insights. As a prototypical system with non-trivial internal complexity, we analyze molecular dynamics trajectories of an atomistic system where ice and water coexist in equilibrium near the solid/liquid transition temperature. We compare general and specific descriptors often used in aqueous systems: number of neighbors, molecular velocities, Smooth Overlap of Atomic Positions (SOAP), Local Environments and Neighbors Shuffling (LENS), Orientational Tetrahedral Order, and distance from the fifth neighbor ($d_5$). Using Onion Clustering -- an efficient unsupervised method for single-point time-series analysis -- we assess the maximum extractable information for each descriptor and rank them via a high-dimensional metric. Our results show that advanced descriptors like SOAP and LENS outperform classical ones due to higher signal-to-noise ratios. Nonetheless, even simple descriptors can rival or exceed advanced ones after local signal denoising. For example, $d_5$, initially among the weakest, becomes the most effective at resolving the system's non-local dynamical complexity after denoising. This work highlights the critical role of noise in information extraction from molecular trajectories and offers a data-driven approach to identify optimal descriptors for systems with characteristic internal complexity.

19 pa...

19 pages, 5 figures + 3 in supporting information (at the bottom of the manuscript)

C$^{2}$INet: Realizing Incremental Trajectory Prediction with Prior-Aware Continual Causal Intervention 2024-11-19
Show

Trajectory prediction for multi-agents in complex scenarios is crucial for applications like autonomous driving. However, existing methods often overlook environmental biases, which leads to poor generalization. Additionally, hardware constraints limit the use of large-scale data across environments, and continual learning settings exacerbate the challenge of catastrophic forgetting. To address these issues, we propose the Continual Causal Intervention (C$^{2}$INet) method for generalizable multi-agent trajectory prediction within a continual learning framework. Using variational inference, we align environment-related prior with posterior estimator of confounding factors in the latent space, thereby intervening in causal correlations that affect trajectory representation. Furthermore, we store optimal variational priors across various scenarios using a memory queue, ensuring continuous debiasing during incremental task training. The proposed C$^{2}$INet enhances adaptability to diverse tasks while preserving previous task information to prevent catastrophic forgetting. It also incorporates pruning strategies to mitigate overfitting. Comparative evaluations on three real and synthetic complex datasets against state-of-the-art methods demonstrate that our proposed method consistently achieves reliable prediction performance, effectively mitigating confounding factors unique to different scenarios. This highlights the practical value of our method for real-world applications.

Age of Information Minimization in UAV-Assisted Covert Communication: Trajectory and Beamforming Design 2024-11-19
Show

Unmanned aerial vehicles (UAVs) have the potential for time-sensitive applications. Due to wireless channel variation, received data may have an expiration time, particularly in critical situations such as rescue operations, natural disasters, or the military. Age of Information (AoI) is a metric that measures the freshness of received packets to specify the validity period of information. In addition, it is necessary to guarantee the privacy of confidential information transmission through air-to-ground links against eavesdroppers. This paper investigates UAV-assisted covert communication to minimize AoI in the presence of an aerial eavesdropper for the first time. However, to ensure the eavesdropper's error detection rate, UAV-enabled beamforming employs the power-domain non-orthogonal multiple access (PD-NOMA) technique to cover the covert user by a public user. PD-NOMA technique significantly improves the user's AoI, too. The joint optimization problem contains non-convex constraints and coupled optimization variables, including UAV trajectory, beamforming design, and the user's AoI which is challenging to derive a direct solution. We have developed an efficient alternating optimization technique to address the formulated optimization problem. Numerical results demonstrate the impact of the main parameters on the performance of the proposed communication system.

A Linear Differential Inclusion for Contraction Analysis to Known Trajectories 2024-11-18
Show

Infinitesimal contraction analysis provides exponential convergence rates between arbitrary pairs of trajectories of a system by studying the system's linearization. An essentially equivalent viewpoint arises through stability analysis of a linear differential inclusion (LDI) encompassing the incremental behavior of the system. In this note, we study contraction of a system to a particular known trajectory, deriving a new LDI characterizing the error between arbitrary trajectories and this known trajectory. As with classical contraction analysis, this new inclusion is constructed via first partial derivatives of the system's vector field, and contraction rates are obtained with familiar tools: uniform bounding of the logarithmic norm and LMI-based Lyapunov conditions. Our LDI is guaranteed to outperform a usual contraction analysis in two special circumstances: i) when the bound on the logarithmic norm arises from an interval overapproximation of the Jacobian matrix, and ii) when the norm considered is the $\ell_1$ norm. Finally, we demonstrate how the proposed approach strictly improves an existing framework for ellipsoidal reachable set computation.

Enhancing Decision Transformer with Diffusion-Based Trajectory Branch Generation 2024-11-18
Show

Decision Transformer (DT) can learn effective policy from offline datasets by converting the offline reinforcement learning (RL) into a supervised sequence modeling task, where the trajectory elements are generated auto-regressively conditioned on the return-to-go (RTG).However, the sequence modeling learning approach tends to learn policies that converge on the sub-optimal trajectories within the dataset, for lack of bridging data to move to better trajectories, even if the condition is set to the highest RTG.To address this issue, we introduce Diffusion-Based Trajectory Branch Generation (BG), which expands the trajectories of the dataset with branches generated by a diffusion model.The trajectory branch is generated based on the segment of the trajectory within the dataset, and leads to trajectories with higher returns.We concatenate the generated branch with the trajectory segment as an expansion of the trajectory.After expanding, DT has more opportunities to learn policies to move to better trajectories, preventing it from converging to the sub-optimal trajectories.Empirically, after processing with BG, DT outperforms state-of-the-art sequence modeling methods on D4RL benchmark, demonstrating the effectiveness of adding branches to the dataset without further modifications.

Map-Free Trajectory Prediction with Map Distillation and Hierarchical Encoding 2024-11-17
Show

Reliable motion forecasting of surrounding agents is essential for ensuring the safe operation of autonomous vehicles. Many existing trajectory prediction methods rely heavily on high-definition (HD) maps as strong driving priors. However, the availability and accuracy of these priors are not guaranteed due to substantial costs to build, localization errors of vehicles, or ongoing road constructions. In this paper, we introduce MFTP, a Map-Free Trajectory Prediction method that offers several advantages. First, it eliminates the need for HD maps during inference while still benefiting from map priors during training via knowledge distillation. Second, we present a novel hierarchical encoder that effectively extracts spatial-temporal agent features and aggregates them into multiple trajectory queries. Additionally, we introduce an iterative decoder that sequentially decodes trajectory queries to generate the final predictions. Extensive experiments show that our approach achieves state-of-the-art performance on the Argoverse dataset under the map-free setting.

Efficient Estimation of Relaxed Model Parameters for Robust UAV Trajectory Optimization 2024-11-17
Show

Online trajectory optimization and optimal control methods are crucial for enabling sustainable unmanned aerial vehicle (UAV) services, such as agriculture, environmental monitoring, and transportation, where available actuation and energy are limited. However, optimal controllers are highly sensitive to model mismatch, which can occur due to loaded equipment, packages to be delivered, or pre-existing variability in fundamental structural and thrust-related parameters. To circumvent this problem, optimal controllers can be paired with parameter estimators to improve their trajectory planning performance and perform adaptive control. However, UAV platforms are limited in terms of onboard processing power, oftentimes making nonlinear parameter estimation too computationally expensive to consider. To address these issues, we propose a relaxed, affine-in-parameters multirotor model along with an efficient optimal parameter estimator. We convexify the nominal Moving Horizon Parameter Estimation (MHPE) problem into a linear-quadratic form (LQ-MHPE) via an affine-in-parameter relaxation on the nonlinear dynamics, resulting in fast quadratic programs (QPs) that facilitate adaptive Model Predictve Control (MPC) in real time. We compare this approach to the equivalent nonlinear estimator in Monte Carlo simulations, demonstrating a decrease in average solve time and trajectory optimality cost by 98.2% and 23.9-56.2%, respectively.

8 pag...

8 pages, 5 figures, submitted to IEEE Sustech 2025

Stable Continual Reinforcement Learning via Diffusion-based Trajectory Replay 2024-11-16
Show

Given the inherent non-stationarity prevalent in real-world applications, continual Reinforcement Learning (RL) aims to equip the agent with the capability to address a series of sequentially presented decision-making tasks. Within this problem setting, a pivotal challenge revolves around \textit{catastrophic forgetting} issue, wherein the agent is prone to effortlessly erode the decisional knowledge associated with past encountered tasks when learning the new one. In recent progresses, the \textit{generative replay} methods have showcased substantial potential by employing generative models to replay data distribution of past tasks. Compared to storing the data from past tasks directly, this category of methods circumvents the growing storage overhead and possible data privacy concerns. However, constrained by the expressive capacity of generative models, existing \textit{generative replay} methods face challenges in faithfully reconstructing the data distribution of past tasks, particularly in scenarios with a myriad of tasks or high-dimensional data. Inspired by the success of diffusion models in various generative tasks, this paper introduces a novel continual RL algorithm DISTR (Diffusion-based Trajectory Replay) that employs a diffusion model to memorize the high-return trajectory distribution of each encountered task and wakeups these distributions during the policy learning on new tasks. Besides, considering the impracticality of replaying all past data each time, a prioritization mechanism is proposed to prioritize the trajectory replay of pivotal tasks in our method. Empirical experiments on the popular continual RL benchmark \texttt{Continual World} demonstrate that our proposed method obtains a favorable balance between \textit{stability} and \textit{plasticity}, surpassing various existing continual RL baselines in average success rate.

10 pa...

10 pages, 3 figures, 1 table, inclusion at ICLR 2024 Workshop on Generative Models for Decision Making

UniTraj: Learning a Universal Trajectory Foundation Model from Billion-Scale Worldwide Traces 2024-11-16
Show

Human trajectory modeling is essential for deciphering movement patterns and supporting advanced applications across various domains. However, existing methods are often tailored to specific tasks and regions, resulting in limitations related to task specificity, regional dependency, and data quality sensitivity. Addressing these challenges requires a universal human trajectory foundation model capable of generalizing and scaling across diverse tasks and geographic contexts. To this end, we propose UniTraj, a Universal human Trajectory foundation model that is task-adaptive, region-independent, and highly generalizable. To further enhance performance, we construct WorldTrace, the first large-scale, high-quality, globally distributed dataset sourced from open web platforms, encompassing 2.45 million trajectories with billions of points across 70 countries. Through multiple resampling and masking strategies designed for pre-training, UniTraj effectively overcomes geographic and task constraints, adapting to heterogeneous data quality. Extensive experiments across multiple trajectory analysis tasks and real-world datasets demonstrate that UniTraj consistently outperforms existing approaches in terms of scalability and adaptability. These results underscore the potential of UniTraj as a versatile, robust solution for a wide range of trajectory analysis applications, with WorldTrace serving as an ideal but non-exclusive foundation for training.

Tenure and Research Trajectories 2024-11-15
Show

Tenure is a cornerstone of the US academic system, yet its relationship to faculty research trajectories remains poorly understood. Conceptually, tenure systems may act as a selection mechanism, screening in high-output researchers; a dynamic incentive mechanism, encouraging high output prior to tenure but low output after tenure; and a creative search mechanism, encouraging tenured individuals to undertake high-risk work. Here, we integrate data from seven different sources to trace US tenure-line faculty and their research outputs at an unprecedented scale and scope, covering over 12,000 researchers across 15 disciplines. Our analysis reveals that faculty publication rates typically increase sharply during the tenure track and peak just before obtaining tenure. Post-tenure trends, however, vary across disciplines: in lab-based fields, such as biology and chemistry, research output typically remains high post-tenure, whereas in non-lab-based fields, such as mathematics and sociology, research output typically declines substantially post-tenure. Turning to creative search, faculty increasingly produce novel, high-risk research after securing tenure. However, this shift toward novelty and risk-taking comes with a decline in impact, with post-tenure research yielding fewer highly cited papers. Comparing outcomes across common career ages but different tenure years or comparing research trajectories in tenure-based and non-tenure-based research settings underscores that breaks in the research trajectories are sharply tied to the individual's tenure year. Overall, these findings provide a new empirical basis for understanding the tenure system, individual research trajectories, and the shape of scientific output.

Temporal Patterns of Multiple Long-Term Conditions in Individuals with Intellectual Disability Living in Wales: An Unsupervised Clustering Approach to Disease Trajectories 2024-11-15
Show

Identifying and understanding the co-occurrence of multiple long-term conditions (MLTC) in individuals with intellectual disabilities (ID) is vital for effective healthcare management. These individuals often face earlier onset and higher prevalence of MLTCs, yet specific co-occurrence patterns remain unexplored. This study applies an unsupervised approach to characterise MLTC clusters based on shared disease trajectories using electronic health records (EHRs) from 13069 individuals with ID in Wales (2000-2021). Disease associations and temporal directionality were assessed, followed by spectral clustering to group shared trajectories. The population consisted of 52.3% males and 47.7% females, with an average of 4.5 conditions per patient. Males under 45 formed a single cluster dominated by neurological conditions (32.4%), while males above 45 had three clusters, the largest characterised circulatory (51.8%). Females under 45 formed one cluster with digestive conditions (24.6%) as most prevalent, while those aged 45 and older showed two clusters: one dominated by circulatory (34.1%), and the other by digestive (25.9%) and musculoskeletal (21.9%) system conditions. Mental illness, epilepsy, and reflux were common across groups. These clusters offer insights into disease progression in individuals with ID, informing targeted interventions and personalised healthcare strategies.

Explanation for Trajectory Planning using Multi-modal Large Language Model for Autonomous Driving 2024-11-15
Show

End-to-end style autonomous driving models have been developed recently. These models lack interpretability of decision-making process from perception to control of the ego vehicle, resulting in anxiety for passengers. To alleviate it, it is effective to build a model which outputs captions describing future behaviors of the ego vehicle and their reason. However, the existing approaches generate reasoning text that inadequately reflects the future plans of the ego vehicle, because they train models to output captions using momentary control signals as inputs. In this study, we propose a reasoning model that takes future planning trajectories of the ego vehicle as inputs to solve this limitation with the dataset newly collected.

Accep...

Accepted and presented at ECCV 2024 2nd Workshop on Vision-Centric Autonomous Driving (VCAD) on September 30, 2024. 13 pages, 5 figures

Enhancing Maritime Trajectory Forecasting via H3 Index and Causal Language Modelling (CLM) 2024-11-14
Show

The prediction of ship trajectories is a growing field of study in artificial intelligence. Traditional methods rely on the use of LSTM, GRU networks, and even Transformer architectures for the prediction of spatio-temporal series. This study proposes a viable alternative for predicting these trajectories using only GNSS positions. It considers this spatio-temporal problem as a natural language processing problem. The latitude/longitude coordinates of AIS messages are transformed into cell identifiers using the H3 index. Thanks to the pseudo-octal representation, it becomes easier for language models to learn the spatial hierarchy of the H3 index. The method is compared with a classical Kalman filter, widely used in the maritime domain, and introduces the Fr'echet distance as the main evaluation metric. We show that it is possible to predict ship trajectories quite precisely up to 8 hours ahead with 30 minutes of context, using solely GNSS positions, without relying on any additional information such as speed, course, or external conditions - unlike many traditional methods. We demonstrate that this alternative works well enough to predict trajectories worldwide.

28 pages, 18 figures
Integrated Precoder and Trajectory Design for MIMO UAV-Assisted Relay System With Finite-Alphabet Inputs 2024-11-13
Show

Unmanned aerial vehicles (UAVs) are gaining widespread use in wireless relay systems due to their exceptional flexibility and cost-effectiveness. This paper focuses on the integrated design of UAV trajectories and the precoders at both the transmitter and UAV in a UAV-assisted relay communication system, accounting for transmit power constraints and UAV flight limitations. Unlike previous works that primarily address multiple-input single-output (MISO) systems with Gaussian inputs, we investigate a more realistic scenario involving multiple-input multiple-output (MIMO) systems with finite-alphabet inputs. To tackle the challenging and inherently non-convex problem, we propose an efficient solution algorithm that leverages successive convex approximation and alternating optimization techniques. Simulation results validate the effectiveness of the proposed algorithm, demonstrating its capability to optimize system performance.

DiVR: incorporating context from diverse VR scenes for human trajectory prediction 2024-11-13
Show

Virtual environments provide a rich and controlled setting for collecting detailed data on human behavior, offering unique opportunities for predicting human trajectories in dynamic scenes. However, most existing approaches have overlooked the potential of these environments, focusing instead on static contexts without considering userspecific factors. Employing the CREATTIVE3D dataset, our work models trajectories recorded in virtual reality (VR) scenes for diverse situations including road-crossing tasks with user interactions and simulated visual impairments. We propose Diverse Context VR Human Motion Prediction (DiVR), a cross-modal transformer based on the Perceiver architecture that integrates both static and dynamic scene context using a heterogeneous graph convolution network. We conduct extensive experiments comparing DiVR against existing architectures including MLP, LSTM, and transformers with gaze and point cloud context. Additionally, we also stress test our model's generalizability across different users, tasks, and scenes. Results show that DiVR achieves higher accuracy and adaptability compared to other models and to static graphs. This work highlights the advantages of using VR datasets for context-aware human trajectory modeling, with potential applications in enhancing user experiences in the metaverse. Our source code is publicly available at https://gitlab.inria.fr/ffrancog/creattive3d-divr-model.

Efficient Trajectory Generation in 3D Environments with Multi-Level Map Construction 2024-11-13
Show

We propose a robust and efficient framework to generate global trajectories for ground robots in complex 3D environments. The proposed method takes point cloud as input and efficiently constructs a multi-level map using triangular patches as the basic elements. A kinematic path search is adopted on the patches, where motion primitives on different patches combine to form the global min-time cost initial trajectory. We use a same-level expansion method to locate the nearest obstacle for each trajectory waypoint and construct an objective function with curvature, smoothness and obstacle terms for optimization. We evaluate the method on several complex 3D point cloud maps. Compared to existing methods, our method demonstrates higher robustness to point cloud noise, enabling the generation of high quality trajectory while maintaining high computational efficiency. Our code will be publicly available at https://github.com/ck-tian/MLMC-planner.

In-Trajectory Inverse Reinforcement Learning: Learn Incrementally Before An Ongoing Trajectory Terminates 2024-11-12
Show

Inverse reinforcement learning (IRL) aims to learn a reward function and a corresponding policy that best fit the demonstrated trajectories of an expert. However, current IRL works cannot learn incrementally from an ongoing trajectory because they have to wait to collect at least one complete trajectory to learn. To bridge the gap, this paper considers the problem of learning a reward function and a corresponding policy while observing the initial state-action pair of an ongoing trajectory and keeping updating the learned reward and policy when new state-action pairs of the ongoing trajectory are observed. We formulate this problem as an online bi-level optimization problem where the upper level dynamically adjusts the learned reward according to the newly observed state-action pairs with the help of a meta-regularization term, and the lower level learns the corresponding policy. We propose a novel algorithm to solve this problem and guarantee that the algorithm achieves sub-linear local regret $O(\sqrt{T}+\log T+\sqrt{T}\log T)$. If the reward function is linear, we prove that the proposed algorithm achieves sub-linear regret $O(\log T)$. Experiments are used to validate the proposed algorithm.

UniTE: A Survey and Unified Pipeline for Pre-training Spatiotemporal Trajectory Embeddings 2024-11-12
Show

Spatiotemporal trajectories are sequences of timestamped locations, which enable a variety of analyses that in turn enable important real-world applications. It is common to map trajectories to vectors, called embeddings, before subsequent analyses. Thus, the qualities of embeddings are very important. Methods for pre-training embeddings, which leverage unlabeled trajectories for training universal embeddings, have shown promising applicability across different tasks, thus attracting considerable interest. However, research progress on this topic faces two key challenges: a lack of a comprehensive overview of existing methods, resulting in several related methods not being well-recognized, and the absence of a unified pipeline, complicating the development of new methods and the analysis of methods. We present UniTE, a survey and a unified pipeline for this domain. In doing so, we present a comprehensive list of existing methods for pre-training trajectory embeddings, which includes methods that either explicitly or implicitly employ pre-training techniques. Further, we present a unified and modular pipeline with publicly available underlying code, simplifying the process of constructing and evaluating methods for pre-training trajectory embeddings. Additionally, we contribute a selection of experimental results using the proposed pipeline on real-world datasets. Implementation of the pipeline is publicly available at https://github.com/Logan-Lin/UniTE.

Cross-Domain Transfer Learning using Attention Latent Features for Multi-Agent Trajectory Prediction 2024-11-12
Show

With the advancements of sensor hardware, traffic infrastructure and deep learning architectures, trajectory prediction of vehicles has established a solid foundation in intelligent transportation systems. However, existing solutions are often tailored to specific traffic networks at particular time periods. Consequently, deep learning models trained on one network may struggle to generalize effectively to unseen networks. To address this, we proposed a novel spatial-temporal trajectory prediction framework that performs cross-domain adaption on the attention representation of a Transformer-based model. A graph convolutional network is also integrated to construct dynamic graph feature embeddings that accurately model the complex spatial-temporal interactions between the multi-agent vehicles across multiple traffic domains. The proposed framework is validated on two case studies involving the cross-city and cross-period settings. Experimental results show that our proposed framework achieves superior trajectory prediction and domain adaptation performances over the state-of-the-art models.

Accep...

Accepted at the IEEE International Conference on Systems, Man, and Cybernetics 2024

Tracing the Roots: Leveraging Temporal Dynamics in Diffusion Trajectories for Origin Attribution 2024-11-12
Show

Diffusion models have revolutionized image synthesis, garnering significant research interest in recent years. Diffusion is an iterative algorithm in which samples are generated step-by-step, starting from pure noise. This process introduces the notion of diffusion trajectories, i.e., paths from the standard Gaussian distribution to the target image distribution. In this context, we study discriminative algorithms operating on these trajectories. Specifically, given a pre-trained diffusion model, we consider the problem of classifying images as part of the training dataset, generated by the model or originating from an external source. Our approach demonstrates the presence of patterns across steps that can be leveraged for classification. We also conduct ablation studies, which reveal that using higher-order gradient features to characterize the trajectories leads to significant performance gains and more robust algorithms.

'Explaining RL Decisions with Trajectories': A Reproducibility Study 2024-11-11
Show

This work investigates the reproducibility of the paper 'Explaining RL decisions with trajectories'. The original paper introduces a novel approach in explainable reinforcement learning based on the attribution decisions of an agent to specific clusters of trajectories encountered during training. We verify the main claims from the paper, which state that (i) training on less trajectories induces a lower initial state value, (ii) trajectories in a cluster present similar high-level patterns, (iii) distant trajectories influence the decision of an agent, and (iv) humans correctly identify the attributed trajectories to the decision of the agent. We recover the environments used by the authors based on the partial original code they provided for one of the environments (Grid-World), and implemented the remaining from scratch (Seaquest, HalfCheetah, Breakout and Q*Bert). While we confirm that (i), (ii), and (iii) partially hold, we extend on the largely qualitative experiments from the authors by introducing a quantitative metric to further support (iii), and new experiments and visual results for (i). Moreover, we investigate the use of different clustering algorithms and encoder architectures to further support (ii). We could not support (iv), given the limited extent of the original experiments. We conclude that, while some of the claims can be supported, further investigations and experiments could be of interest. We recognise the novelty of the work from the authors and hope that our work paves the way for clearer and more transparent approaches.

Quadrotor Trajectory Tracking Using Linear and Nonlinear Model Predictive Control 2024-11-11
Show

Accurate trajectory tracking is an essential characteristic for the safe navigation of a quadrotor in cluttered or disturbed environments. In this paper, we present in detail two state-of-the-art model-based control frameworks for trajectory tracking: the Linear Model Predictive Controller (LMPC) and the Nonlinear Model Predictive Controller (NMPC). Additionally, the kinematic and dynamic models of the quadrotor are comprehensively described. Finally, a simulation system is implemented to verify feasibility, demonstrating the effectiveness of both controllers.

In Vi...

In Vietnamese language, in the 25th National Conference on Electronics, Communications and Information Technology (REV-ECIT 2022), Hanoi, Vietnam

$\mathsf{QuITO}$ $\textsf{v.2}$: Trajectory Optimization with Uniform Error Guarantees under Path Constraints 2024-11-11
Show

This article introduces a new transcription, change point localization, and mesh refinement scheme for direct optimization-based solutions and for uniform approximation of optimal control trajectories associated with a class of nonlinear constrained optimal control problems (OCPs). The base transcription algorithm for which we establish the refinement algorithm is a direct multiple shooting technique -- $\mathsf{QuITO}$ $\textsf{v.2}$ (Quasi-Interpolation based Trajectory Optimization). The mesh refinement technique consists of two steps -- localization of certain irregular regions in an optimal control trajectory via wavelets, followed by a targeted $h$-refinement approach around such regions of irregularity. Theoretical approximation guarantees on uniform grids are presented for optimal controls with certain regularity properties, along with guarantees of localization of change points by wavelet transform. Numerical illustrations are provided for control profiles involving discontinuities to show the effectiveness of the localization and refinement strategy. We also announce, and make freely available, a new software package based on $\mathsf{QuITO}$ $\textsf{v.2}$ along with all its functionalities for completeness. The package is available at: https://github.com/chatterjee-d/QuITOv2.git.

Submi...

Submitted; 42 pages, comments are welcome

Time-delayed Dynamic Mode Decomposition for families of periodic trajectories in Cislunar Space 2024-11-10
Show

In recent years, the development of the Lunar Gateway and Artemis missions has renewed interest in lunar exploration, including both manned and unmanned missions. This interest necessitates accurate initial orbit determination (IOD) and orbit prediction (OP) in this domain, which faces significant challenges such as severe nonlinearity, sensitivity to initial conditions, large state-space volume, and sparse, faint, and unreliable measurements. This paper explores the capability of data-driven Koopman operator-based approximations for OP in these scenarios. Three stable periodic trajectories from distinct cislunar families are analyzed. The analysis includes theoretical justification for using a linear time-invariant system as the data-driven surrogate. This theoretical framework is supported by experimental validation. Furthermore, the accuracy is assessed by comparing the spectral content captured to period estimates derived from the fast Fourier transform (FFT) and Poincare-like sections.

arXiv...

arXiv admin note: text overlap with arXiv:2401.13784

RRT* Based Optimal Trajectory Generation with Linear Temporal Logic Specifications under Kinodynamic Constraints 2024-11-09
Show

In this paper, we present a novel RRT*-based strategy for generating kinodynamically feasible paths that satisfy temporal logic specifications. Our approach integrates a robustness metric for Linear Temporal Logics (LTL) with the system's motion constraints, ensuring that the resulting trajectories are both optimal and executable. We introduce a cost function that recursively computes the robustness of temporal logic specifications while penalizing time and control effort, striking a balance between path feasibility and logical correctness. We validate our approach with simulations and real-world experiments in complex environments, demonstrating its effectiveness in producing robust and practical motion plans. This work represents a significant step towards expanding the applicability of motion planning algorithms to more complex, real-world scenarios.

Online Omnidirectional Jumping Trajectory Planning for Quadrupedal Robots on Uneven Terrains 2024-11-09
Show

Natural terrain complexity often necessitates agile movements like jumping in animals to improve traversal efficiency. To enable similar capabilities in quadruped robots, complex real-time jumping maneuvers are required. Current research does not adequately address the problem of online omnidirectional jumping and neglects the robot's kinodynamic constraints during trajectory generation. This paper proposes a general and complete cascade online optimization framework for omnidirectional jumping for quadruped robots. Our solution systematically encompasses jumping trajectory generation, a trajectory tracking controller, and a landing controller. It also incorporates environmental perception to navigate obstacles that standard locomotion cannot bypass, such as jumping from high platforms. We introduce a novel jumping plane to parameterize omnidirectional jumping motion and formulate a tightly coupled optimization problem accounting for the kinodynamic constraints, simultaneously optimizing CoM trajectory, Ground Reaction Forces (GRFs), and joint states. To meet the online requirements, we propose an accelerated evolutionary algorithm as the trajectory optimizer to address the complexity of kinodynamic constraints. To ensure stability and accuracy in environmental perception post-landing, we introduce a coarse-to-fine relocalization method that combines global Branch and Bound (BnB) search with Maximum a Posteriori (MAP) estimation for precise positioning during navigation and jumping. The proposed framework achieves jump trajectory generation in approximately 0.1 seconds with a warm start and has been successfully validated on two quadruped robots on uneven terrains. Additionally, we extend the framework's versatility to humanoid robots.

Submitted to IJRR
TranSPORTmer: A Holistic Approach to Trajectory Understanding in Multi-Agent Sports 2024-11-09
Show

Understanding trajectories in multi-agent scenarios requires addressing various tasks, including predicting future movements, imputing missing observations, inferring the status of unseen agents, and classifying different global states. Traditional data-driven approaches often handle these tasks separately with specialized models. We introduce TranSPORTmer, a unified transformer-based framework capable of addressing all these tasks, showcasing its application to the intricate dynamics of multi-agent sports scenarios like soccer and basketball. Using Set Attention Blocks, TranSPORTmer effectively captures temporal dynamics and social interactions in an equivariant manner. The model's tasks are guided by an input mask that conceals missing or yet-to-be-predicted observations. Additionally, we introduce a CLS extra agent to classify states along soccer trajectories, including passes, possessions, uncontrolled states, and out-of-play intervals, contributing to an enhancement in modeling trajectories. Evaluations on soccer and basketball datasets show that TranSPORTmer outperforms state-of-the-art task-specific models in player forecasting, player forecasting-imputation, ball inference, and ball imputation. https://youtu.be/8VtSRm8oGoE

Accep...

Accepted to ACCV 2024

Energy-efficient Hybrid Model Predictive Trajectory Planning for Autonomous Electric Vehicles 2024-11-09
Show

To tackle the twin challenges of limited battery life and lengthy charging durations in electric vehicles (EVs), this paper introduces an Energy-efficient Hybrid Model Predictive Planner (EHMPP), which employs an energy-saving optimization strategy. EHMPP focuses on refining the design of the motion planner to be seamlessly integrated with the existing automatic driving algorithms, without additional hardware. It has been validated through simulation experiments on the Prescan, CarSim, and Matlab platforms, demonstrating that it can increase passive recovery energy by 11.74% and effectively track motor speed and acceleration at optimal power. To sum up, EHMPP not only aids in trajectory planning but also significantly boosts energy efficiency in autonomous EVs.

Accep...

Accepted at the IEEE International Conference on Systems, Man, and Cybernetics (SMC) 2024

Safe Reinforcement Learning of Robot Trajectories in the Presence of Moving Obstacles 2024-11-08
Show

In this paper, we present an approach for learning collision-free robot trajectories in the presence of moving obstacles. As a first step, we train a backup policy to generate evasive movements from arbitrary initial robot states using model-free reinforcement learning. When learning policies for other tasks, the backup policy can be used to estimate the potential risk of a collision and to offer an alternative action if the estimated risk is considered too high. No matter which action is selected, our action space ensures that the kinematic limits of the robot joints are not violated. We analyze and evaluate two different methods for estimating the risk of a collision. A physics simulation performed in the background is computationally expensive but provides the best results in deterministic environments. If a data-based risk estimator is used instead, the computational effort is significantly reduced, but an additional source of error is introduced. For evaluation, we successfully learn a reaching task and a basketball task while keeping the risk of collisions low. The results demonstrate the effectiveness of our approach for deterministic and stochastic environments, including a human-robot scenario and a ball environment, where no state can be considered permanently safe. By conducting experiments with a real robot, we show that our approach can generate safe trajectories in real time.

IEEE ...

IEEE Robotics and Automation Letters (RA-L); 8 pages; 7 figures

Generating Synthetic Functional Data for Privacy-Preserving GPS Trajectories 2024-11-08
Show

This research presents FDASynthesis, a novel algorithm designed to generate synthetic GPS trajectory data while preserving privacy. After pre-processing the input GPS data, human mobility traces are modeled as multidimensional curves using Functional Data Analysis (FDA). Then, the synthesis process identifies the K-nearest trajectories and averages their Square-Root Velocity Functions (SRVFs) to generate synthetic data. This results in synthetic trajectories that maintain the utility of the original data while ensuring privacy. Although applied for human mobility research, FDASynthesis is highly adaptable to different types of functional data, offering a scalable solution in various application domains.

Updat...

Updated version, correction of the notation

SG-I2V: Self-Guided Trajectory Control in Image-to-Video Generation 2024-11-07
Show

Methods for image-to-video generation have achieved impressive, photo-realistic quality. However, adjusting specific elements in generated videos, such as object motion or camera movement, is often a tedious process of trial and error, e.g., involving re-generating videos with different random seeds. Recent techniques address this issue by fine-tuning a pre-trained model to follow conditioning signals, such as bounding boxes or point trajectories. Yet, this fine-tuning procedure can be computationally expensive, and it requires datasets with annotated object motion, which can be difficult to procure. In this work, we introduce SG-I2V, a framework for controllable image-to-video generation that is self-guided$\unicode{x2013}$offering zero-shot control by relying solely on the knowledge present in a pre-trained image-to-video diffusion model without the need for fine-tuning or external knowledge. Our zero-shot method outperforms unsupervised baselines while being competitive with supervised models in terms of visual quality and motion fidelity.

Proje...

Project page: https://kmcode1.github.io/Projects/SG-I2V/

Optimal Flow Matching: Learning Straight Trajectories in Just One Step 2024-11-07
Show

Over the several recent years, there has been a boom in development of Flow Matching (FM) methods for generative modeling. One intriguing property pursued by the community is the ability to learn flows with straight trajectories which realize the Optimal Transport (OT) displacements. Straightness is crucial for the fast integration (inference) of the learned flow's paths. Unfortunately, most existing flow straightening methods are based on non-trivial iterative FM procedures which accumulate the error during training or exploit heuristics based on minibatch OT. To address these issues, we develop and theoretically justify the novel \textbf{Optimal Flow Matching} (OFM) approach which allows recovering the straight OT displacement for the quadratic transport in just one FM step. The main idea of our approach is the employment of vector field for FM which are parameterized by convex functions.

Pose2Trajectory: Using Transformers on Body Pose to Predict Tennis Player's Trajectory 2024-11-07
Show

Tracking the trajectory of tennis players can help camera operators in production. Predicting future movement enables cameras to automatically track and predict a player's future trajectory without human intervention. Predicting future human movement in the context of complex physical tasks is also intellectually satisfying. Swift advancements in sports analytics and the wide availability of videos for tennis have inspired us to propose a novel method called Pose2Trajectory, which predicts a tennis player's future trajectory as a sequence derived from their body joints' data and ball position. Demonstrating impressive accuracy, our approach capitalizes on body joint information to provide a comprehensive understanding of the human body's geometry and motion, thereby enhancing the prediction of the player's trajectory. We use encoder-decoder Transformer architecture trained on the joints and trajectory information of the players with ball positions. The predicted sequence can provide information to help close-up cameras to keep tracking the tennis player, following centroid coordinates. We generate a high-quality dataset from multiple videos to assist tennis player movement prediction using object detection and human pose estimation methods. It contains bounding boxes and joint information for tennis players and ball positions in singles tennis games. Our method shows promising results in predicting the tennis player's movement trajectory with different sequence prediction lengths using the joints and trajectory information with the ball position.

TrajGPT: Controlled Synthetic Trajectory Generation Using a Multitask Transformer-Based Spatiotemporal Model 2024-11-07
Show

Human mobility modeling from GPS-trajectories and synthetic trajectory generation are crucial for various applications, such as urban planning, disaster management and epidemiology. Both of these tasks often require filling gaps in a partially specified sequence of visits - a new problem that we call "controlled" synthetic trajectory generation. Existing methods for next-location prediction or synthetic trajectory generation cannot solve this problem as they lack the mechanisms needed to constrain the generated sequences of visits. Moreover, existing approaches (1) frequently treat space and time as independent factors, an assumption that fails to hold true in real-world scenarios, and (2) suffer from challenges in accuracy of temporal prediction as they fail to deal with mixed distributions and the inter-relationships of different modes with latent variables (e.g., day-of-the-week). These limitations become even more pronounced when the task involves filling gaps within sequences instead of solely predicting the next visit. We introduce TrajGPT, a transformer-based, multi-task, joint spatiotemporal generative model to address these issues. Taking inspiration from large language models, TrajGPT poses the problem of controlled trajectory generation as that of text infilling in natural language. TrajGPT integrates the spatial and temporal models in a transformer architecture through a Bayesian probability model that ensures that the gaps in a visit sequence are filled in a spatiotemporally consistent manner. Our experiments on public and private datasets demonstrate that TrajGPT not only excels in controlled synthetic visit generation but also outperforms competing models in next-location prediction tasks - Relatively, TrajGPT achieves a 26-fold improvement in temporal accuracy while retaining more than 98% of spatial accuracy on average.

10 pa...

10 pages, 3 figures, 32nd ACM SIGSPATIAL International Conference on Advances in Geographic Information Systems (ACM SIGSPATIAL 2024)

GTA: Generative Trajectory Augmentation with Guidance for Offline Reinforcement Learning 2024-11-07
Show

Offline Reinforcement Learning (Offline RL) presents challenges of learning effective decision-making policies from static datasets without any online interactions. Data augmentation techniques, such as noise injection and data synthesizing, aim to improve Q-function approximation by smoothing the learned state-action region. However, these methods often fall short of directly improving the quality of offline datasets, leading to suboptimal results. In response, we introduce GTA, Generative Trajectory Augmentation, a novel generative data augmentation approach designed to enrich offline data by augmenting trajectories to be both high-rewarding and dynamically plausible. GTA applies a diffusion model within the data augmentation framework. GTA partially noises original trajectories and then denoises them with classifier-free guidance via conditioning on amplified return value. Our results show that GTA, as a general data augmentation strategy, enhances the performance of widely used offline RL algorithms across various tasks with unique challenges. Furthermore, we conduct a quality analysis of data augmented by GTA and demonstrate that GTA improves the quality of the data. Our code is available at https://github.com/Jaewoopudding/GTA

NeurI...

NeurIPS 2024. Previously accepted (Spotlight) to ICLR 2024 Workshop on Generative Models for Decision Making. Jaewoo Lee and Sujin Yun are equal contribution authors

Efficient Trajectory Forecasting and Generation with Conditional Flow Matching 2024-11-07
Show

Trajectory prediction and generation are crucial for autonomous robots in dynamic environments. While prior research has typically focused on either prediction or generation, our approach unifies these tasks to provide a versatile framework and achieve state-of-the-art performance. While diffusion models excel in trajectory generation, their iterative sampling process is computationally intensive, hindering robotic systems' dynamic capabilities. We introduce Trajectory Conditional Flow Matching (T-CFM), a novel approach using flow matching techniques to learn a solver time-varying vector field for efficient, fast trajectory generation. T-CFM demonstrates effectiveness in adversarial tracking, real-world aircraft trajectory forecasting, and long-horizon planning, outperforming state-of-the-art baselines with 35% higher predictive accuracy and 142% improved planning performance. Crucially, T-CFM achieves up to 100$\times$ speed-up compared to diffusion models without sacrificing accuracy, enabling real-time decision making in robotics. Codebase: https://github.com/CORE-Robotics-Lab/TCFM

Optimal Convex Cover as Collision-free Space Approximation for Trajectory Generation 2024-11-06
Show

We propose an online iterative algorithm to find an optimal convex cover to under-approximate the free space for autonomous navigation to delineate Safe Flight Corridors (SFC). The convex cover consists of a set of polytopes such that the union of the polytopes represents obstacle-free space, allowing us to find trajectories for robots that lie within the convex cover. In order to find the SFC that facilitates optimal trajectory generation, we iteratively find overlapping polytopes of maximum volumes that include specified waypoints initialized by a geometric or kinematic planner. Constraints at waypoints appear in two alternating stages of a joint optimization problem, which is solved by a method inspired by the Alternating Direction Method of Multipliers (ADMM) with partially distributed variables. We validate the effectiveness of our proposed algorithm using a range of parameterized environments and show its applications for two-stage motion planning.

ET-SEED: Efficient Trajectory-Level SE(3) Equivariant Diffusion Policy 2024-11-06
Show

Imitation learning, e.g., diffusion policy, has been proven effective in various robotic manipulation tasks. However, extensive demonstrations are required for policy robustness and generalization. To reduce the demonstration reliance, we leverage spatial symmetry and propose ET-SEED, an efficient trajectory-level SE(3) equivariant diffusion model for generating action sequences in complex robot manipulation tasks. Further, previous equivariant diffusion models require the per-step equivariance in the Markov process, making it difficult to learn policy under such strong constraints. We theoretically extend equivariant Markov kernels and simplify the condition of equivariant diffusion process, thereby significantly improving training efficiency for trajectory-level SE(3) equivariant diffusion policy in an end-to-end manner. We evaluate ET-SEED on representative robotic manipulation tasks, involving rigid body, articulated and deformable object. Experiments demonstrate superior data efficiency and manipulation proficiency of our proposed method, as well as its ability to generalize to unseen configurations with only a few demonstrations. Website: https://et-seed.github.io/

Accep...

Accept to CoRL 2024 Workshop on X-Embodiment Robot Learning

Efficient and Robust Freeway Traffic Speed Estimation under Oblique Grid using Vehicle Trajectory Data 2024-11-06
Show

Accurately estimating spatiotemporal traffic states on freeways is a significant challenge due to limited sensor deployment and potential data corruption. In this study, we propose an efficient and robust low-rank model for precise spatiotemporal traffic speed state estimation (TSE) using lowpenetration vehicle trajectory data. Leveraging traffic wave priors, an oblique grid-based matrix is first designed to transform the inherent dependencies of spatiotemporal traffic states into the algebraic low-rankness of a matrix. Then, with the enhanced traffic state low-rankness in the oblique matrix, a low-rank matrix completion method is tailored to explicitly capture spatiotemporal traffic propagation characteristics and precisely reconstruct traffic states. In addition, an anomaly-tolerant module based on a sparse matrix is developed to accommodate corrupted data input and thereby improve the TSE model robustness. Notably, driven by the understanding of traffic waves, the computational complexity of the proposed efficient method is only correlated with the problem size itself, not with dataset size and hyperparameter selection prevalent in existing studies. Extensive experiments demonstrate the effectiveness, robustness, and efficiency of the proposed model. The performance of the proposed method achieves up to a 12% improvement in Root Mean Squared Error (RMSE) in the TSE scenarios and an 18% improvement in RMSE in the robust TSE scenarios, and it runs more than 20 times faster than the state-of-the-art (SOTA) methods.

accepted by T-ITS
Biomechanics-Aware Trajectory Optimization for Navigation during Robotic Physiotherapy 2024-11-06
Show

Robotic devices hold promise for aiding patients in orthopedic rehabilitation. However, current robotic-assisted physiotherapy methods struggle including biomechanical metrics in their control algorithms, crucial for safe and effective therapy. This paper introduces BATON, a Biomechanics-Aware Trajectory Optimization approach to robotic Navigation of human musculoskeletal loads. The method integrates a high-fidelity musculoskeletal model of the human shoulder into real-time control of robot-patient interaction during rotator cuff tendon rehabilitation. We extract skeletal dynamics and tendon loading information from an OpenSim shoulder model to solve an optimal control problem, generating strain-minimizing trajectories. Trajectories were realized on a healthy subject by an impedance-controlled robot while estimating the state of the subject's shoulder. Target poses were prescribed to design personalized rehabilitation across a wide range of shoulder motion avoiding high-strain areas. BATON was designed with real-time capabilities, enabling continuous trajectory replanning to address unforeseen variations in tendon strain, such as those from changing muscle activation of the subject.

13 pa...

13 pages, 9 figures, under review

How to Drawjectory? -- Trajectory Planning using Programming by Demonstration 2024-11-06
Show

A flight trajectory defines how exactly a quadrocopter moves in the three-dimensional space from one position to another. Automatic flight trajectory planning faces challenges such as high computational effort and a lack of precision. Hence, when low computational effort or precise control is required, programming the flight route trajectory manually might be preferable. However, this requires in-depth knowledge of how to accurately plan flight trajectories in three-dimensional space. We propose planning quadrocopter flight trajectories manually using the Programming by Demonstration (PbD) approach -- simply drawing the trajectory in the three-dimensional space by hand. This simplifies the planning process and reduces the level of in-depth knowledge required. We implemented the approach in the context of the Quadcopter Lab at Ulm University. In order to evaluate our approach, we compare the precision and accuracy of the trajectories drawn by a user using our approach as well as the required time with those manually programmed using a domain specific language. The evaluation shows that the Drawjectory workflow is, on average, 78.7 seconds faster without a significant loss of precision, shown by an average deviation 6.67 cm.

Revisiting CNNs for Trajectory Similarity Learning 2024-11-05
Show

Similarity search is a fundamental but expensive operator in querying trajectory data, due to its quadratic complexity of distance computation. To mitigate the computational burden for long trajectories, neural networks have been widely employed for similarity learning and each trajectory is encoded as a high-dimensional vector for similarity search with linear complexity. Given the sequential nature of trajectory data, previous efforts have been primarily devoted to the utilization of RNNs or Transformers. In this paper, we argue that the common practice of treating trajectory as sequential data results in excessive attention to capturing long-term global dependency between two sequences. Instead, our investigation reveals the pivotal role of local similarity, prompting a revisit of simple CNNs for trajectory similarity learning. We introduce ConvTraj, incorporating both 1D and 2D convolutions to capture sequential and geo-distribution features of trajectories, respectively. In addition, we conduct a series of theoretical analyses to justify the effectiveness of ConvTraj. Experimental results on four real-world large-scale datasets demonstrate that ConvTraj achieves state-of-the-art accuracy in trajectory similarity search. Owing to the simple network structure of ConvTraj, the training and inference speed on the Porto dataset with 1.6 million trajectories are increased by at least $240$x and $2.16$x, respectively. The source code and dataset can be found at \textit{\url{https://github.com/Proudc/ConvTraj}}.

NEOviz: Uncertainty-Driven Visual Analysis of Asteroid Trajectories 2024-11-05
Show

We introduce NEOviz, an interactive visualization system designed to assist planetary defense experts in the visual analysis of the movements of near-Earth objects in the Solar System that might prove hazardous to Earth. Asteroids are often discovered using optical telescopes and their trajectories are calculated from images, resulting in an inherent asymmetric uncertainty in their position and velocity. Consequently, we typically cannot determine the exact trajectory of an asteroid, and an ensemble of trajectories must be generated to estimate an asteroid's movement over time. When propagating these ensembles over decades, it is challenging to visualize the varying paths and determine their potential impact on Earth, which could cause catastrophic damage. NEOviz equips experts with the necessary tools to effectively analyze the existing catalog of asteroid observations. In particular, we present a novel approach for visualizing the 3D uncertainty region through which an asteroid travels, while providing accurate spatial context in relation to system-critical infrastructure such as Earth, the Moon, and artificial satellites. Furthermore, we use NEOviz to visualize the divergence of asteroid trajectories, capturing high-variance events in an asteroid's orbital properties. For potential impactors, we combine the 3D visualization with an uncertainty-aware impact map to illustrate the potential risks to human populations. NEOviz was developed with continuous input from members of the planetary defense community through a participatory design process. It is exemplified in three real-world use cases and evaluated via expert feedback interviews.

Advanced computer vision for extracting georeferenced vehicle trajectories from drone imagery 2024-11-04
Show

This paper presents a framework for extracting georeferenced vehicle trajectories from high-altitude drone footage, addressing key challenges in urban traffic monitoring and limitations of traditional ground-based systems. We employ state-of-the-art computer vision and deep learning to create an end-to-end pipeline that enhances vehicle detection, tracking, and trajectory stabilization. Conducted in the Songdo International Business District, South Korea, the study used a multi-drone experiment over 20 intersections, capturing approximately 12TB of 4K video data over four days. We developed a novel track stabilization method that uses detected vehicle bounding boxes as exclusion masks during image registration, which, combined with advanced georeferencing techniques, accurately transforms vehicle coordinates into real-world geographical data. Additionally, our framework includes robust vehicle dimension estimation and detailed road segmentation for in-depth traffic analysis. The framework produced two high-quality datasets: the Songdo Traffic dataset, comprising nearly 1 million unique vehicle trajectories, and the Songdo Vision dataset, containing over 5,000 human-annotated frames with about 300,000 vehicle instances in four classes. Comparisons between drone-derived data and high-precision sensor data from an instrumented probe vehicle highlight the accuracy and consistency of our framework's extraction in dense urban settings. By publicly releasing these datasets and the pipeline source code, this work sets new benchmarks for data quality, reproducibility, and scalability in traffic research. Results demonstrate the potential of integrating drone technology with advanced computer vision for precise, cost-effective urban traffic monitoring, providing valuable resources for the research community to develop intelligent transportation systems and improve traffic management strategies.

Hyper-SD: Trajectory Segmented Consistency Model for Efficient Image Synthesis 2024-11-04
Show

Recently, a series of diffusion-aware distillation algorithms have emerged to alleviate the computational overhead associated with the multi-step inference process of Diffusion Models (DMs). Current distillation techniques often dichotomize into two distinct aspects: i) ODE Trajectory Preservation; and ii) ODE Trajectory Reformulation. However, these approaches suffer from severe performance degradation or domain shifts. To address these limitations, we propose Hyper-SD, a novel framework that synergistically amalgamates the advantages of ODE Trajectory Preservation and Reformulation, while maintaining near-lossless performance during step compression. Firstly, we introduce Trajectory Segmented Consistency Distillation to progressively perform consistent distillation within pre-defined time-step segments, which facilitates the preservation of the original ODE trajectory from a higher-order perspective. Secondly, we incorporate human feedback learning to boost the performance of the model in a low-step regime and mitigate the performance loss incurred by the distillation process. Thirdly, we integrate score distillation to further improve the low-step generation capability of the model and offer the first attempt to leverage a unified LoRA to support the inference process at all steps. Extensive experiments and user studies demonstrate that Hyper-SD achieves SOTA performance from 1 to 8 inference steps for both SDXL and SD1.5. For example, Hyper-SDXL surpasses SDXL-Lightning by +0.68 in CLIP Score and +0.51 in Aes Score in the 1-step inference.

Accep...

Accepted by NeurIPS 2024 (Camera-Ready Version). Project Page: https://hyper-sd.github.io/

Intrinsic Dimensionality of Fermi-Pasta-Ulam-Tsingou High-Dimensional Trajectories Through Manifold Learning 2024-11-04
Show

A data-driven approach based on unsupervised machine learning is proposed to infer the intrinsic dimensions $m^{\ast}$ of the high-dimensional trajectories of the Fermi-Pasta-Ulam-Tsingou (FPUT) model. Principal component analysis (PCA) is applied to trajectory data consisting of $n_s = 4,000,000$ datapoints, of the FPUT $\beta$ model with $N = 32$ coupled oscillators, revealing a critical relationship between $m^{\ast}$ and the model's nonlinear strength. For weak nonlinearities, $m^{\ast} \ll n$, where $n = 2N$. In contrast, for strong nonlinearities, $m^{\ast} \rightarrow n - 1$, consistently with the ergodic hypothesis. Furthermore, one of the potential limitations of PCA is addressed through an analysis with t-distributed stochastic neighbor embedding ($t$-SNE). Accordingly, we found strong evidence suggesting that the datapoints lie near or on a curved low-dimensional manifold for weak nonlinearities.

20 pages, 18 figures
Enhancing Social Robot Navigation with Integrated Motion Prediction and Trajectory Planning in Dynamic Human Environments 2024-11-04
Show

Navigating safely in dynamic human environments is crucial for mobile service robots, and social navigation is a key aspect of this process. In this paper, we proposed an integrative approach that combines motion prediction and trajectory planning to enable safe and socially-aware robot navigation. The main idea of the proposed method is to leverage the advantages of Socially Acceptable trajectory prediction and Timed Elastic Band (TEB) by incorporating human interactive information including position, orientation, and motion into the objective function of the TEB algorithms. In addition, we designed social constraints to ensure the safety of robot navigation. The proposed system is evaluated through physical simulation using both quantitative and qualitative metrics, demonstrating its superior performance in avoiding human and dynamic obstacles, thereby ensuring safe navigation. The implementations are open source at: \url{https://github.com/thanhnguyencanh/SGan-TEB.git}

In th...

In the 24th International Conference on Control, Automation, and Systems (ICCAS 2024), Jeju, Korea

Estimating Generalization Performance Along the Trajectory of Proximal SGD in Robust Regression 2024-11-03
Show

This paper studies the generalization performance of iterates obtained by Gradient Descent (GD), Stochastic Gradient Descent (SGD) and their proximal variants in high-dimensional robust regression problems. The number of features is comparable to the sample size and errors may be heavy-tailed. We introduce estimators that precisely track the generalization error of the iterates along the trajectory of the iterative algorithm. These estimators are provably consistent under suitable conditions. The results are illustrated through several examples, including Huber regression, pseudo-Huber regression, and their penalized variants with non-smooth regularizer. We provide explicit generalization error estimates for iterates generated from GD and SGD, or from proximal SGD in the presence of a non-smooth regularizer. The proposed risk estimates serve as effective proxies for the actual generalization error, allowing us to determine the optimal stopping iteration that minimizes the generalization error. Extensive simulations confirm the effectiveness of the proposed generalization error estimates.

Camer...

Camera-ready version of NeurIPS 2024 paper

Interaction-Aware Trajectory Prediction for Safe Motion Planning in Autonomous Driving: A Transformer-Transfer Learning Approach 2024-11-03
Show

A critical aspect of safe and efficient motion planning for autonomous vehicles (AVs) is to handle the complex and uncertain behavior of surrounding human-driven vehicles (HDVs). Despite intensive research on driver behavior prediction, existing approaches typically overlook the interactions between AVs and HDVs assuming that HDV trajectories are not affected by AV actions. To address this gap, we present a transformer-transfer learning-based interaction-aware trajectory predictor for safe motion planning of autonomous driving, focusing on a vehicle-to-vehicle (V2V) interaction scenario consisting of an AV and an HDV. Specifically, we construct a transformer-based interaction-aware trajectory predictor using widely available datasets of HDV trajectory data and further transfer the learned predictor using a small set of AV-HDV interaction data. Then, to better incorporate the proposed trajectory predictor into the motion planning module of AVs, we introduce an uncertainty quantification method to characterize the errors of the predictor, which are integrated into the path-planning process. Our experimental results demonstrate the value of explicitly considering interactions and handling uncertainties.

Labeled random finite sets vs. trajectory random finite sets 2024-11-03
Show

The paper [12] discussed two approaches for multitarget tracking (MTT): the generalized labeled multi-Bernoulli (GLMB) filter and three Poisson multi-Bernoulli mixture (PMBM) filters. The paper [13] discussed two frameworks for multitarget trajectory representation--labeled random finite set (LRFS) and set of trajectories (SoT)--and the merging of SoT and PMBM into trajectory PMBM (TPMBM) theory. This paper summarizes and augments the main findings of [12], [13]--specifcally, why SoT, PMBM, and TPMBM are physically and mathematically erroneous.

6 pages, 1 figur4e
TrajRoute: Rethinking Routing with a Simple Trajectory-Based Approach -- Forget the Maps and Traffic! 2024-11-02
Show

The abundance of vehicle trajectory data offers a new opportunity to compute driving routes between origins and destinations. Current graph-based routing pipelines, while effective, involve substantial costs in constructing, maintaining, and updating road network graphs to reflect real-time conditions. In this study, we propose a new trajectory-based routing paradigm that bypasses current workflows by directly utilizing raw trajectory data to compute efficient routes. Our method, named TrajRoute, uniquely "follows" historical trajectories from a source to a destination, constructing paths that reflect actual driver behavior and implicit preferences. To supplement areas with sparse trajectory data, the road network is also incorporated into TrajRoute's index, and tunable parameters are introduced to control the balance between road segments and trajectories, ensuring a unified and adaptable routing approach. We experimentally verify our approach by comparing it to an existing online routing service. Our results demonstrate that as the number of trajectories covering the road network increases, TrajRoute produces increasingly accurate travel time and route length estimates while gradually eliminating the need to downgrade to the road network. This highlights the potential of simpler, data-driven pipelines for routing, offering lower-maintenance alternatives to conventional systems.

GPTR: Gaussian Process Trajectory Representation for Continuous-Time Motion Estimation 2024-11-02
Show

Continuous-time trajectory representation has gained significant popularity in recent years, as it offers an elegant formulation that allows the fusion of a larger number of sensors and sensing modalities, overcoming limitations of traditional discrete-time frameworks. To bolster the adoption of the continuous-time paradigm, we propose a so-called Gaussian Process Trajectory Representation (GPTR) framework for continuous-time motion estimation (CTME) tasks. Our approach stands out by employing a third-order random jerk model, featuring closed-form expressions for both rotational and translational state derivatives. This model provides smooth, continuous trajectory representations that are crucial for precise estimation of complex motion. To support the wider robotics and computer vision communities, we have made the source code for GPTR available as a light-weight header-only library. This format was chosen for its ease of integration, allowing developers to incorporate GPTR into existing systems without needing extensive code modifications. Moreover, we also provide a set of optimization examples with LiDAR, camera, IMU, UWB factors, and closed-form analytical Jacobians under the proposed GP framework. Our experiments demonstrate the efficacy and efficiency of GP-based trajectory representation in various motion estimation tasks, and the examples can serve as the prototype to help researchers quickly develop future applications such as batch optimization, calibration, sensor fusion, trajectory planning, etc., with continuous-time trajectory representation. Our project is accessible at https://github.com/brytsknguyen/gptr .

The s...

The source code has been released. All feedbacks are welcome

SPOT: SE(3) Pose Trajectory Diffusion for Object-Centric Manipulation 2024-11-01
Show

We introduce SPOT, an object-centric imitation learning framework. The key idea is to capture each task by an object-centric representation, specifically the SE(3) object pose trajectory relative to the target. This approach decouples embodiment actions from sensory inputs, facilitating learning from various demonstration types, including both action-based and action-less human hand demonstrations, as well as cross-embodiment generalization. Additionally, object pose trajectories inherently capture planning constraints from demonstrations without the need for manually crafted rules. To guide the robot in executing the task, the object trajectory is used to condition a diffusion policy. We show improvement compared to prior work on RLBench simulated tasks. In real-world evaluation, using only eight demonstrations shot on an iPhone, our approach completed all tasks while fully complying with task constraints. Project page: https://nvlabs.github.io/object_centric_diffusion

RopeTP: Global Human Motion Recovery via Integrating Robust Pose Estimation with Diffusion Trajectory Prior 2024-11-01
Show

We present RopeTP, a novel framework that combines Robust pose estimation with a diffusion Trajectory Prior to reconstruct global human motion from videos. At the heart of RopeTP is a hierarchical attention mechanism that significantly improves context awareness, which is essential for accurately inferring the posture of occluded body parts. This is achieved by exploiting the relationships with visible anatomical structures, enhancing the accuracy of local pose estimations. The improved robustness of these local estimations allows for the reconstruction of precise and stable global trajectories. Additionally, RopeTP incorporates a diffusion trajectory model that predicts realistic human motion from local pose sequences. This model ensures that the generated trajectories are not only consistent with observed local actions but also unfold naturally over time, thereby improving the realism and stability of 3D human motion reconstruction. Extensive experimental validation shows that RopeTP surpasses current methods on two benchmark datasets, particularly excelling in scenarios with occlusions. It also outperforms methods that rely on SLAM for initial camera estimates and extensive optimization, delivering more accurate and realistic trajectories.

Accep...

Accepted by WACV 2025 (Round 1)

Pedestrian Trajectory Prediction with Missing Data: Datasets, Imputation, and Benchmarking 2024-10-31
Show

Pedestrian trajectory prediction is crucial for several applications such as robotics and self-driving vehicles. Significant progress has been made in the past decade thanks to the availability of pedestrian trajectory datasets, which enable trajectory prediction methods to learn from pedestrians' past movements and predict future trajectories. However, these datasets and methods typically assume that the observed trajectory sequence is complete, ignoring real-world issues such as sensor failure, occlusion, and limited fields of view that can result in missing values in observed trajectories. To address this challenge, we present TrajImpute, a pedestrian trajectory prediction dataset that simulates missing coordinates in the observed trajectory, enhancing real-world applicability. TrajImpute maintains a uniform distribution of missing data within the observed trajectories. In this work, we comprehensively examine several imputation methods to reconstruct the missing coordinates and benchmark them for imputing pedestrian trajectories. Furthermore, we provide a thorough analysis of recent trajectory prediction methods and evaluate the performance of these models on the imputed trajectories. Our experimental evaluation of the imputation and trajectory prediction methods offers several valuable insights. Our dataset provides a foundational resource for future research on imputation-aware pedestrian trajectory prediction, potentially accelerating the deployment of these methods in real-world applications. Publicly accessible links to the datasets and code files are available at https://github.com/Pranav-chib/TrajImpute.

Accep...

Accepted at NeurIPS 2024

Conformalized Prediction of Post-Fault Voltage Trajectories Using Pre-trained and Finetuned Attention-Driven Neural Operators 2024-10-31
Show

This paper proposes a new data-driven methodology for predicting intervals of post-fault voltage trajectories in power systems. We begin by introducing the Quantile Attention-Fourier Deep Operator Network (QAF-DeepONet), designed to capture the complex dynamics of voltage trajectories and reliably estimate quantiles of the target trajectory without any distributional assumptions. The proposed operator regression model maps the observed portion of the voltage trajectory to its unobserved post-fault trajectory. Our methodology employs a pre-training and fine-tuning process to address the challenge of limited data availability. To ensure data privacy in learning the pre-trained model, we use merging via federated learning with data from neighboring buses, enabling the model to learn the underlying voltage dynamics from such buses without directly sharing their data. After pre-training, we fine-tune the model with data from the target bus, allowing it to adapt to unique dynamics and operating conditions. Finally, we integrate conformal prediction into the fine-tuned model to ensure coverage guarantees for the predicted intervals. We evaluated the performance of the proposed methodology using the New England 39-bus test system considering detailed models of voltage and frequency controllers. Two metrics, Prediction Interval Coverage Probability (PICP) and Prediction Interval Normalized Average Width (PINAW), are used to numerically assess the model's performance in predicting intervals. The results show that the proposed approach offers practical and reliable uncertainty quantification in predicting the interval of post-fault voltage trajectories.

Graph Neural Networks

Title Date Abstract Comment
Instance-Aware Graph Prompt Learning 2024-11-26
Show

Graph neural networks stand as the predominant technique for graph representation learning owing to their strong expressive power, yet the performance highly depends on the availability of high-quality labels in an end-to-end manner. Thus the pretraining and fine-tuning paradigm has been proposed to mitigate the label cost issue. Subsequently, the gap between the pretext tasks and downstream tasks has spurred the development of graph prompt learning which inserts a set of graph prompts into the original graph data with minimal parameters while preserving competitive performance. However, the current exploratory works are still limited since they all concentrate on learning fixed task-specific prompts which may not generalize well across the diverse instances that the task comprises. To tackle this challenge, we introduce Instance-Aware Graph Prompt Learning (IA-GPL) in this paper, aiming to generate distinct prompts tailored to different input instances. The process involves generating intermediate prompts for each instance using a lightweight architecture, quantizing these prompts through trainable codebook vectors, and employing the exponential moving average technique to ensure stable training. Extensive experiments conducted on multiple datasets and settings showcase the superior performance of IA-GPL compared to state-of-the-art baselines.

CliquePH: Higher-Order Information for Graph Neural Networks through Persistent Homology on Clique Graphs 2024-11-26
Show

Graph neural networks have become the default choice by practitioners for graph learning tasks such as graph classification and node classification. Nevertheless, popular graph neural network models still struggle to capture higher-order information, i.e., information that goes \emph{beyond} pairwise interactions. Recent work has shown that persistent homology, a tool from topological data analysis, can enrich graph neural networks with topological information that they otherwise could not capture. Calculating such features is efficient for dimension 0 (connected components) and dimension 1 (cycles). However, when it comes to higher-order structures, it does not scale well, with a complexity of $O(n^d)$, where $n$ is the number of nodes and $d$ is the order of the structures. In this work, we introduce a novel method that extracts information about higher-order structures in the graph while still using the efficient low-dimensional persistent homology algorithm. On standard benchmark datasets, we show that our method can lead to up to $31%$ improvements in test accuracy.

Publi...

Published in Proceedings of the Third Learning on Graphs Conference (LoG 2024), PMLR 269

Orientation-Aware Graph Neural Networks for Protein Structure Representation Learning 2024-11-26
Show

By folding to particular 3D structures, proteins play a key role in living beings. To learn meaningful representation from a protein structure for downstream tasks, not only the global backbone topology but the local fine-grained orientational relations between amino acids should also be considered. In this work, we propose the Orientation-Aware Graph Neural Networks (OAGNNs) to better sense the geometric characteristics in protein structure (e.g. inner-residue torsion angles, inter-residue orientations). Extending a single weight from a scalar to a 3D vector, we construct a rich set of geometric-meaningful operations to process both the classical and SO(3) representations of a given structure. To plug our designed perceptron unit into existing Graph Neural Networks, we further introduce an equivariant message passing paradigm, showing superior versatility in maintaining SO(3)-equivariance at the global scale. Experiments have shown that our OAGNNs have a remarkable ability to sense geometric orientational features compared to classical networks. OAGNNs have also achieved state-of-the-art performance on various computational biology applications related to protein 3D structures.

Powerformer: A Section-adaptive Transformer for Power Flow Adjustment 2024-11-26
Show

In this paper, we present a novel transformer architecture tailored for learning robust power system state representations, which strives to optimize power dispatch for the power flow adjustment across different transmission sections. Specifically, our proposed approach, named Powerformer, develops a dedicated section-adaptive attention mechanism, separating itself from the self-attention used in conventional transformers. This mechanism effectively integrates power system states with transmission section information, which facilitates the development of robust state representations. Furthermore, by considering the graph topology of power system and the electrical attributes of bus nodes, we introduce two customized strategies to further enhance the expressiveness: graph neural network propagation and multi-factor attention mechanism. Extensive evaluations are conducted on three power system scenarios, including the IEEE 118-bus system, a realistic 300-bus system in China, and a large-scale European system with 9241 buses, where Powerformer demonstrates its superior performance over several baseline methods.

8 figures
A Graph Neural Network deep-dive into successful counterattacks 2024-11-26
Show

A counterattack in soccer is a high speed, high intensity direct attack that can occur when a team transitions from a defensive state to an attacking state after regaining possession of the ball. The aim is to create a goal-scoring opportunity by convering a lot of ground with minimal passes before the opposing team can recover their defensive shape. The purpose of this research is to build gender-specific Graph Neural Networks to model the likelihood of a counterattack being successful and uncover what factors make them successful in professional soccer. These models are trained on a total of 20863 frames of synchronized on-ball event and spatiotemporal (broadcast) tracking data. This dataset is derived from 632 games of MLS (2022), NWSL (2022) and international soccer (2020-2022). With this data we demonstrate that gender-specific Graph Neural Networks outperform architecturally identical gender-ambiguous models in predicting the successful outcome of counterattacks. We show, using Permutation Feature Importance, that byline to byline speed, angle to the goal, angle to the ball and sideline to sideline speed are the node features with the highest impact on model performance. Additionally, we offer some illustrative examples on how to navigate the infinite solution search space to aid in identifying improvements for player decision making. This research is accompanied by an open-source repository containing all data and code, and it is also accompanied by an open-source Python package which simplifies converting spatiotemporal data into graphs. This package also facilitates testing, validation, training and prediction with this data. This should allow the reader to replicate and improve upon our research more easily.

11 pa...

11 pages, 11 figures, first submitted (and accepted) at MIT Sloan Sports Analytics Conference 2023

Rewiring Techniques to Mitigate Oversquashing and Oversmoothing in GNNs: A Survey 2024-11-26
Show

Graph Neural Networks (GNNs) are powerful tools for learning from graph-structured data, but their effectiveness is often constrained by two critical challenges: oversquashing, where the excessive compression of information from distant nodes results in significant information loss, and oversmoothing, where repeated message-passing iterations homogenize node representations, obscuring meaningful distinctions. These issues, intrinsically linked to the underlying graph structure, hinder information flow and constrain the expressiveness of GNNs. In this survey, we examine graph rewiring techniques, a class of methods designed to address these structural bottlenecks by modifying graph topology to enhance information diffusion. We provide a comprehensive review of state-of-the-art rewiring approaches, delving into their theoretical underpinnings, practical implementations, and performance trade-offs.

Epidemiology-informed Graph Neural Network for Heterogeneity-aware Epidemic Forecasting 2024-11-26
Show

Among various spatio-temporal prediction tasks, epidemic forecasting plays a critical role in public health management. Recent studies have demonstrated the strong potential of spatio-temporal graph neural networks (STGNNs) in extracting heterogeneous spatio-temporal patterns for epidemic forecasting. However, most of these methods bear an over-simplified assumption that two locations (e.g., cities) with similar observed features in previous time steps will develop similar infection numbers in the future. In fact, for any epidemic disease, there exists strong heterogeneity of its intrinsic evolution mechanisms across geolocation and time, which can eventually lead to diverged infection numbers in two ``similar'' locations. However, such mechanistic heterogeneity is non-trivial to be captured due to the existence of numerous influencing factors like medical resource accessibility, virus mutations, mobility patterns, etc., most of which are spatio-temporal yet unreachable or even unobservable. To address this challenge, we propose a Heterogeneous Epidemic-Aware Transmission Graph Neural Network (HeatGNN), a novel epidemic forecasting framework. By binding the epidemiology mechanistic model into a GNN, HeatGNN learns epidemiology-informed location embeddings of different locations that reflect their own transmission mechanisms over time. With the time-varying mechanistic affinity graphs computed with the epidemiology-informed location embeddings, a heterogeneous transmission graph network is designed to encode the mechanistic heterogeneity among locations, providing additional predictive signals to facilitate accurate forecasting. Experiments on three benchmark datasets have revealed that HeatGNN outperforms various strong baselines. Moreover, our efficiency analysis verifies the real-world practicality of HeatGNN on datasets of different sizes.

14 pa...

14 pages, 6 figures, 3 tables

Knowledge-aware Evolutionary Graph Neural Architecture Search 2024-11-26
Show

Graph neural architecture search (GNAS) can customize high-performance graph neural network architectures for specific graph tasks or datasets. However, existing GNAS methods begin searching for architectures from a zero-knowledge state, ignoring the prior knowledge that may improve the search efficiency. The available knowledge base (e.g. NAS-Bench-Graph) contains many rich architectures and their multiple performance metrics, such as the accuracy (#Acc) and number of parameters (#Params). This study proposes exploiting such prior knowledge to accelerate the multi-objective evolutionary search on a new graph dataset, named knowledge-aware evolutionary GNAS (KEGNAS). KEGNAS employs the knowledge base to train a knowledge model and a deep multi-output Gaussian process (DMOGP) in one go, which generates and evaluates transfer architectures in only a few GPU seconds. The knowledge model first establishes a dataset-to-architecture mapping, which can quickly generate candidate transfer architectures for a new dataset. Subsequently, the DMOGP with architecture and dataset encodings is designed to predict multiple performance metrics for candidate transfer architectures on the new dataset. According to the predicted metrics, non-dominated candidate transfer architectures are selected to warm-start the multi-objective evolutionary algorithm for optimizing the #Acc and #Params on a new dataset. Empirical studies on NAS-Bench-Graph and five real-world datasets show that KEGNAS swiftly generates top-performance architectures, achieving 4.27% higher accuracy than advanced evolutionary baselines and 11.54% higher accuracy than advanced differentiable baselines. In addition, ablation studies demonstrate that the use of prior knowledge significantly improves the search performance.

This ...

This work has been accepted by Knowledge-Based Systems

GrokFormer: Graph Fourier Kolmogorov-Arnold Transformers 2024-11-26
Show

Graph Transformers (GTs) have demonstrated remarkable performance in incorporating various graph structure information, e.g., long-range structural dependency, into graph representation learning. However, self-attention -- the core module of GTs -- preserves only low-frequency signals on graph features, retaining only homophilic patterns that capture similar features among the connected nodes. Consequently, it has insufficient capacity in modeling complex node label patterns, such as the opposite of homophilic patterns -- heterophilic patterns. Some improved GTs deal with the problem by learning polynomial filters or performing self-attention over the first-order graph spectrum. However, these GTs either ignore rich information contained in the whole spectrum or neglect higher-order spectrum information, resulting in limited flexibility and frequency response in their spectral filters. To tackle these challenges, we propose a novel GT network, namely Graph Fourier Kolmogorov-Arnold Transformers (GrokFormer), to go beyond the self-attention in GTs. GrokFormer leverages learnable activation functions in order-$K$ graph spectrum through Fourier series modeling to i) learn eigenvalue-targeted filter functions producing learnable base that can capture a broad range of frequency signals flexibly, and ii) extract first- and higher-order graph spectral information adaptively. In doing so, GrokFormer can effectively capture intricate patterns hidden across different orders and levels of frequency signals, learning expressive, order-and-frequency-adaptive graph representations. Comprehensive experiments conducted on 10 node classification datasets across various domains, scales, and levels of graph heterophily, as well as 5 graph classification datasets, demonstrate that GrokFormer outperforms state-of-the-art GTs and other advanced graph neural networks.

13 pa...

13 pages, 6 figures, 7tables

DGNN-YOLO: Dynamic Graph Neural Networks with YOLO11 for Small Object Detection and Tracking in Traffic Surveillance 2024-11-26
Show

Accurate detection and tracking of small objects such as pedestrians, cyclists, and motorbikes are critical for traffic surveillance systems, which are crucial in improving road safety and decision-making in intelligent transportation systems. However, traditional methods struggle with challenges such as occlusion, low resolution, and dynamic traffic conditions, necessitating innovative approaches to address these limitations. This paper introduces DGNN-YOLO, a novel framework integrating dynamic graph neural networks (DGNN) with YOLO11 to enhance small object detection and tracking in traffic surveillance systems. The framework leverages YOLO11's advanced spatial feature extraction capabilities for precise object detection and incorporates DGNN to model spatial-temporal relationships for robust real-time tracking dynamically. By constructing and updating graph structures, DGNN-YOLO effectively represents objects as nodes and their interactions as edges, ensuring adaptive and accurate tracking in complex and dynamic environments. Extensive experiments demonstrate that DGNN-YOLO consistently outperforms state-of-the-art methods in detecting and tracking small objects under diverse traffic conditions, achieving the highest precision (0.8382), recall (0.6875), and [email protected]:0.95 (0.6476), showcasing its robustness and scalability, particularly in challenging scenarios involving small and occluded objects. This work provides a scalable, real-time traffic surveillance and analysis solution, significantly contributing to intelligent transportation systems.

Generalization, Expressivity, and Universality of Graph Neural Networks on Attributed Graphs 2024-11-26
Show

We analyze the universality and generalization of graph neural networks (GNNs) on attributed graphs, i.e., with node attributes. To this end, we propose pseudometrics over the space of all attributed graphs that describe the fine-grained expressivity of GNNs. Namely, GNNs are both Lipschitz continuous with respect to our pseudometrics and can separate attributed graphs that are distant in the metric. Moreover, we prove that the space of all attributed graphs is relatively compact with respect to our metrics. Based on these properties, we prove a universal approximation theorem for GNNs and generalization bounds for GNNs on any data distribution of attributed graphs. The proposed metrics compute the similarity between the structures of attributed graphs via a hierarchical optimal transport between computation trees. Our work extends and unites previous approaches which either derived theory only for graphs with no attributes, derived compact metrics under which GNNs are continuous but without separation power, or derived metrics under which GNNs are continuous and separate points but the space of graphs is not relatively compact, which prevents universal approximation and generalization analysis.

GraphSubDetector: Time Series Subsequence Anomaly Detection via Density-Aware Adaptive Graph Neural Network 2024-11-26
Show

Time series subsequence anomaly detection is an important task in a large variety of real-world applications ranging from health monitoring to AIOps, and is challenging due to the following reasons: 1) how to effectively learn complex dynamics and dependencies in time series; 2) diverse and complicated anomalous subsequences as well as the inherent variance and noise of normal patterns; 3) how to determine the proper subsequence length for effective detection, which is a required parameter for many existing algorithms. In this paper, we present a novel approach to subsequence anomaly detection, namely GraphSubDetector. First, it adaptively learns the appropriate subsequence length with a length selection mechanism that highlights the characteristics of both normal and anomalous patterns. Second, we propose a density-aware adaptive graph neural network (DAGNN), which can generate further robust representations against variance of normal data for anomaly detection by message passing between subsequences. The experimental results demonstrate the effectiveness of the proposed algorithm, which achieves superior performance on multiple time series anomaly benchmark datasets compared to state-of-the-art algorithms.

Depth-PC: A Visual Servo Framework Integrated with Cross-Modality Fusion for Sim2Real Transfer 2024-11-26
Show

Visual servo techniques guide robotic motion using visual information to accomplish manipulation tasks, requiring high precision and robustness against noise. Traditional methods often require prior knowledge and are susceptible to external disturbances. Learning-driven alternatives, while promising, frequently struggle with the scarcity of training data and fall short in generalization. To address these challenges, we propose a novel visual servo framework Depth-PC that leverages simulation training and exploits semantic and geometric information of keypoints from images, enabling zero-shot transfer to real-world servo tasks. Our framework focuses on the servo controller which intertwines keypoint feature queries and relative depth information. Subsequently, the fused features from these two modalities are then processed by a Graph Neural Network to establish geometric and semantic correspondence between keypoints and update the robot state. Through simulation and real-world experiments, our approach demonstrates superior convergence basin and accuracy compared to state-of-the-art methods, fulfilling the requirements for robotic servo tasks while enabling zero-shot application to real-world scenarios. In addition to the enhancements achieved with our proposed framework, we have also substantiated the efficacy of cross-modality feature fusion within the realm of servo tasks.

ScaleNet: Scale Invariance Learning in Directed Graphs 2024-11-26
Show

Graph Neural Networks (GNNs) have advanced relational data analysis but lack invariance learning techniques common in image classification. In node classification with GNNs, it is actually the ego-graph of the center node that is classified. This research extends the scale invariance concept to node classification by drawing an analogy to image processing: just as scale invariance being used in image classification to capture multi-scale features, we propose the concept of scaled ego-graphs''. Scaled ego-graphs generalize traditional ego-graphs by replacing undirected single-edges with scaled-edges'', which are ordered sequences of multiple directed edges. We empirically assess the performance of the proposed scale invariance in graphs on seven benchmark datasets, across both homophilic and heterophilic structures. Our scale-invariance-based graph learning outperforms inception models derived from random walks by being simpler, faster, and more accurate. The scale invariance explains inception models' success on homophilic graphs and limitations on heterophilic graphs. To ensure applicability of inception model to heterophilic graphs as well, we further present ScaleNet, an architecture that leverages multi-scaled features. ScaleNet achieves state-of-the-art results on five out of seven datasets (four homophilic and one heterophilic) and matches top performance on the remaining two, demonstrating its excellent applicability. This represents a significant advance in graph learning, offering a unified framework that enhances node classification across various graph types. Our code is available at https://github.com/Qin87/ScaleNet/tree/July25.

Scale...

Scale invariance in node classification is demonstrated and applied in graph transformation to develop ScaleNet, which achieves state-of-the-art performance on both homophilic and heterophilic directed graphs

X-MeshGraphNet: Scalable Multi-Scale Graph Neural Networks for Physics Simulation 2024-11-26
Show

Graph Neural Networks (GNNs) have gained significant traction for simulating complex physical systems, with models like MeshGraphNet demonstrating strong performance on unstructured simulation meshes. However, these models face several limitations, including scalability issues, requirement for meshing at inference, and challenges in handling long-range interactions. In this work, we introduce X-MeshGraphNet, a scalable, multi-scale extension of MeshGraphNet designed to address these challenges. X-MeshGraphNet overcomes the scalability bottleneck by partitioning large graphs and incorporating halo regions that enable seamless message passing across partitions. This, combined with gradient aggregation, ensures that training across partitions is equivalent to processing the entire graph at once. To remove the dependency on simulation meshes, X-MeshGraphNet constructs custom graphs directly from CAD files by generating uniform point clouds on the surface or volume of the object and connecting k-nearest neighbors. Additionally, our model builds multi-scale graphs by iteratively combining coarse and fine-resolution point clouds, where each level refines the previous, allowing for efficient long-range interactions. Our experiments demonstrate that X-MeshGraphNet maintains the predictive accuracy of full-graph GNNs while significantly improving scalability and flexibility. This approach eliminates the need for time-consuming mesh generation at inference, offering a practical solution for real-time simulation across a wide range of applications. The code for reproducing the results presented in this paper is available through NVIDIA Modulus: github.com/NVIDIA/modulus/tree/main/examples/cfd/xaeronet.

Contrastive Graph Condensation: Advancing Data Versatility through Self-Supervised Learning 2024-11-26
Show

With the increasing computation of training graph neural networks (GNNs) on large-scale graphs, graph condensation (GC) has emerged as a promising solution to synthesize a compact, substitute graph of the large-scale original graph for efficient GNN training. However, existing GC methods predominantly employ classification as the surrogate task for optimization, thus excessively relying on node labels and constraining their utility in label-sparsity scenarios. More critically, this surrogate task tends to overfit class-specific information within the condensed graph, consequently restricting the generalization capabilities of GC for other downstream tasks. To address these challenges, we introduce Contrastive Graph Condensation (CTGC), which adopts a self-supervised surrogate task to extract critical, causal information from the original graph and enhance the cross-task generalizability of the condensed graph. Specifically, CTGC employs a dual-branch framework to disentangle the generation of the node attributes and graph structures, where a dedicated structural branch is designed to explicitly encode geometric information through nodes' positional embeddings. By implementing an alternating optimization scheme with contrastive loss terms, CTGC promotes the mutual enhancement of both branches and facilitates high-quality graph generation through the model inversion technique. Extensive experiments demonstrate that CTGC excels in handling various downstream tasks with a limited number of labels, consistently outperforming state-of-the-art GC methods.

Limeade: Let integer molecular encoding aid 2024-11-25
Show

Mixed-integer programming (MIP) is a well-established framework for computer-aided molecular design (CAMD). By precisely encoding the molecular space and score functions, e.g., a graph neural network, the molecular design problem is represented and solved as an optimization problem, the solution of which corresponds to a molecule with optimal score. However, both the extremely large search space and complicated scoring process limit the use of MIP-based CAMD to specific and tiny problems. Moreover, optimal molecule may not be meaningful in practice if scores are imperfect. Instead of pursuing optimality, this paper exploits the ability of MIP in molecular generation and proposes Limeade as an end-to-end tool from real-world needs to feasible molecules. Beyond the basic constraints for structural feasibility, Limeade supports inclusion and exclusion of SMARTS patterns, automating the process of interpreting and formulating chemical requirements to mathematical constraints.

32 pages, 2 figures
TEG-DB: A Comprehensive Dataset and Benchmark of Textual-Edge Graphs 2024-11-25
Show

Text-Attributed Graphs (TAGs) augment graph structures with natural language descriptions, facilitating detailed depictions of data and their interconnections across various real-world settings. However, existing TAG datasets predominantly feature textual information only at the nodes, with edges typically represented by mere binary or categorical attributes. This lack of rich textual edge annotations significantly limits the exploration of contextual relationships between entities, hindering deeper insights into graph-structured data. To address this gap, we introduce Textual-Edge Graphs Datasets and Benchmark (TEG-DB), a comprehensive and diverse collection of benchmark textual-edge datasets featuring rich textual descriptions on nodes and edges. The TEG-DB datasets are large-scale and encompass a wide range of domains, from citation networks to social networks. In addition, we conduct extensive benchmark experiments on TEG-DB to assess the extent to which current techniques, including pre-trained language models, graph neural networks, and their combinations, can utilize textual node and edge information. Our goal is to elicit advancements in textual-edge graph research, specifically in developing methodologies that exploit rich textual node and edge descriptions to enhance graph analysis and provide deeper insights into complex real-world networks. The entire TEG-DB project is publicly accessible as an open-source repository on Github, accessible at https://github.com/Zhuofeng-Li/TEG-Benchmark.

Accep...

Accepted by NeurIPS 2024

Graph neural networks with configuration cross-attention for tensor compilers 2024-11-25
Show

With the recent popularity of neural networks comes the need for efficient serving of inference workloads. A neural network inference workload can be represented as a computational graph with nodes as operators transforming multidimensional tensors. The tensors can be transposed and/or tiled in a combinatorially large number of ways, some configurations leading to accelerated inference. We propose TGraph, a neural graph architecture that allows screening for fast configurations of the target computational graph, thus representing an artificial intelligence (AI) tensor compiler in contrast to the traditional heuristics-based compilers. The proposed solution improves mean Kendall's $\tau$ across layout collections of TpuGraphs from 29.8% of the reliable baseline to 67.4% of TGraph. We estimate the potential CO$_2$ emission reduction associated with our work to be equivalent to over 50% of the total household emissions in the areas hosting AI-oriented data centers.

Graph Neural Networks-based Parameter Design towards Large-Scale Superconducting Quantum Circuits for Crosstalk Mitigation 2024-11-25
Show

To demonstrate supremacy of quantum computing, increasingly large-scale superconducting quantum computing chips are being designed and fabricated, sparking the demand for electronic design automation in pursuit of better efficiency and effectiveness. However, the complexity of simulating quantum systems poses a significant challenge to computer-aided design of quantum chips. Harnessing the scalability of graph neural networks (GNNs), we here propose a parameter designing algorithm for large-scale superconducting quantum circuits. The algorithm depends on the so-called 'three-stair scaling' mechanism, which comprises two neural-network models: an evaluator supervisedly trained on small-scale circuits for applying to medium-scale circuits, and a designer unsupervisedly trained on medium-scale circuits for applying to large-scale ones. We demonstrate our algorithm in mitigating quantum crosstalk errors, which are commonly present and closely related to the graph structures and parameter assignments of superconducting quantum circuits. Parameters for both single- and two-qubit gates are considered simultaneously. Numerical results indicate that the well-trained designer achieves notable advantages not only in efficiency but also in effectiveness, especially for large-scale circuits. For example, in superconducting quantum circuits consisting of around 870 qubits, the trained designer requires only 27 seconds to complete the frequency designing task which necessitates 90 minutes for the traditional Snake algorithm. More importantly, the crosstalk errors using our algorithm are only 51% of those produced by the Snake algorithm. Overall, this study initially demonstrates the advantages of applying graph neural networks to design parameters in quantum processors, and provides insights for systems where large-scale numerical simulations are challenging in electronic design automation.

A Data-Driven Approach to Dataflow-Aware Online Scheduling for Graph Neural Network Inference 2024-11-25
Show

Graph Neural Networks (GNNs) have shown significant promise in various domains, such as recommendation systems, bioinformatics, and network analysis. However, the irregularity of graph data poses unique challenges for efficient computation, leading to the development of specialized GNN accelerator architectures that surpass traditional CPU and GPU performance. Despite this, the structural diversity of input graphs results in varying performance across different GNN accelerators, depending on their dataflows. This variability in performance due to differing dataflows and graph properties remains largely unexplored, limiting the adaptability of GNN accelerators. To address this, we propose a data-driven framework for dataflow-aware latency prediction in GNN inference. Our approach involves training regressors to predict the latency of executing specific graphs on particular dataflows, using simulations on synthetic graphs. Experimental results indicate that our regressors can predict the optimal dataflow for a given graph with up to 91.28% accuracy and a Mean Absolute Percentage Error (MAPE) of 3.78%. Additionally, we introduce an online scheduling algorithm that uses these regressors to enhance scheduling decisions. Our experiments demonstrate that this algorithm achieves up to $3.17\times$ speedup in mean completion time and $6.26\times$ speedup in mean execution time compared to the best feasible baseline across all datasets.

Accep...

Accepted for ASP-DAC 2025

CafkNet: GNN-Empowered Forward Kinematic Modeling for Cable-Driven Parallel Robots 2024-11-25
Show

Cable-driven parallel robots (CDPRs) have gained significant attention due to their promising advantages. When deploying CDPRs in practice, the kinematic modeling is a key question. Unlike serial robots, CDPRs have a simple inverse kinematics problem but a complex forward kinematics (FK) issue. So, the development of accurate and efficient FK solvers has been a prominent research focus in CDPR applications. By observing the topology within CDPRs, in this paper, we propose a graph-based representation to model CDPRs and introduce CafkNet, a fast and general FK solving method, leveraging Graph Neural Network (GNN) to learn the topological structure and yield the real FK solutions with superior generality, high accuracy, and low time cost. CafkNet is extensively tested on 3D and 2D CDPRs in different configurations, both in simulators and real scenarios. The results demonstrate its ability to learn CDPRs' internal topology and accurately solve the FK problem. Then, the zero-shot generalization from one configuration to another is validated. Also, the sim2real gap can be bridged by CafkNet using both simulation and real-world data. To the best of our knowledge, it is the first study that employs the GNN to solve the FK problem for CDPRs.

The 2...

The 2024 IEEE International Conference on Robotics and Biomimetics (IEEE ROBIO 2024). Bangkok, Thailand, December 10-14 2024. Videos and codes are available at https://sites.google.com/view/cafknet/site

Graph Adapter of EEG Foundation Models for Parameter Efficient Fine Tuning 2024-11-25
Show

In diagnosing mental diseases from electroencephalography (EEG) data, neural network models such as Transformers have been employed to capture temporal dynamics. Additionally, it is crucial to learn the spatial relationships between EEG sensors, for which Graph Neural Networks (GNNs) are commonly used. However, fine-tuning large-scale complex neural network models simultaneously to capture both temporal and spatial features increases computational costs due to the more significant number of trainable parameters. It causes the limited availability of EEG datasets for downstream tasks, making it challenging to fine-tune large models effectively. We propose EEG-GraphAdapter (EGA), a parameter-efficient fine-tuning (PEFT) approach to address these challenges. EGA is integrated into pre-trained temporal backbone models as a GNN-based module and fine-tuned itself alone while keeping the backbone model parameters frozen. This enables the acquisition of spatial representations of EEG signals for downstream tasks, significantly reducing computational overhead and data requirements. Experimental evaluations on healthcare-related downstream tasks of Major Depressive Disorder and Abnormality Detection demonstrate that our EGA improves performance by up to 16.1% in the F1-score compared with the backbone BENDR model.

Under review
DF-GNN: Dynamic Fusion Framework for Attention Graph Neural Networks on GPUs 2024-11-25
Show

Attention Graph Neural Networks (AT-GNNs), such as GAT and Graph Transformer, have demonstrated superior performance compared to other GNNs. However, existing GNN systems struggle to efficiently train AT-GNNs on GPUs due to their intricate computation patterns. The execution of AT-GNN operations without kernel fusion results in heavy data movement and significant kernel launch overhead, while fixed thread scheduling in existing GNN kernel fusion strategies leads to sub-optimal performance, redundant computation and unbalanced workload. To address these challenges, we propose a dynamic kernel fusion framework, DF-GNN, for the AT-GNN family. DF-GNN introduces a dynamic bi-level thread scheduling strategy, enabling flexible adjustments to thread scheduling while retaining the benefits of shared memory within the fused kernel. DF-GNN tailors specific thread scheduling for operations in AT-GNNs and considers the performance bottleneck shift caused by the presence of super nodes. Additionally, DF-GNN is integrated with the PyTorch framework for high programmability. Evaluations across diverse GNN models and multiple datasets reveal that DF-GNN surpasses existing GNN kernel optimization works like cuGraph and dgNN, with speedups up to $7.0\times$ over the state-of-the-art non-fusion DGL sparse library. Moreover, it achieves an average speedup of $2.16\times$ in end-to-end training compared to the popular GNN computing framework DGL.

Federated Hypergraph Learning: Hyperedge Completion with Local Differential Privacy 2024-11-25
Show

As the volume and complexity increase, graph-structured data commonly need to be split and stored across distributed systems. To enable data mining on subgraphs within these distributed systems, federated graph learning has been proposed, allowing collaborative training of Graph Neural Networks (GNNs) across clients without sharing raw node features. However, when dealing with graph structures that involve high-order relationships between nodes, known as hypergraphs, existing federated graph learning methods are less effective. In this study, we introduce FedHGL, an innovative federated hypergraph learning algorithm. FedHGL is designed to collaboratively train a comprehensive hypergraph neural network across multiple clients, facilitating mining tasks on subgraphs of a hypergraph where relationships are not merely pairwise. To address the high-order information loss between subgraphs caused by distributed storage, we introduce a pre-propagation hyperedge completion operation before the federated training process. In this pre-propagation step, cross-client feature aggregation is performed and distributed at the central server to ensure that this information can be utilized by the clients. Furthermore, by incorporating local differential privacy (LDP) mechanisms, we ensure that the original node features are not disclosed during this aggregation process. Experimental results on seven real-world datasets confirm the effectiveness of our approach and demonstrate its performance advantages over traditional federated graph learning methods.

Towards a General Recipe for Combinatorial Optimization with Multi-Filter GNNs 2024-11-24
Show

Graph neural networks (GNNs) have achieved great success for a variety of tasks such as node classification, graph classification, and link prediction. However, the use of GNNs (and machine learning more generally) to solve combinatorial optimization (CO) problems is much less explored. Here, we introduce GCON, a novel GNN architecture that leverages a complex filter bank and localized attention mechanisms to solve CO problems on graphs. We show how our method differentiates itself from prior GNN-based CO solvers and how it can be effectively applied to the maximum cut, minimum dominating set, and maximum clique problems in a unsupervised learning setting. GCON is competitive across all tasks and consistently outperforms other specialized GNN-based approaches, and is on par with the powerful Gurobi solver on the max-cut problem. We provide an open-source implementation of our work at https://github.com/WenkelF/copt.

In Pr...

In Proceedings of the Third Learning on Graphs Conference (LoG 2024, Oral); 20 pages, 2 figures

Bias-Free Sentiment Analysis through Semantic Blinding and Graph Neural Networks 2024-11-24
Show

This paper introduces the Semantic Propagation Graph Neural Network (SProp GNN), a machine learning sentiment analysis (SA) architecture that relies exclusively on syntactic structures and word-level emotional cues to predict emotions in text. By semantically blinding the model to information about specific words, it is robust to biases such as political or gender bias that have been plaguing previous machine learning-based SA systems. The SProp GNN shows performance superior to lexicon-based alternatives such as VADER and EmoAtlas on two different prediction tasks, and across two languages. Additionally, it approaches the accuracy of transformer-based models while significantly reducing bias in emotion prediction tasks. By offering improved explainability and reducing bias, the SProp GNN bridges the methodological gap between interpretable lexicon approaches and powerful, yet often opaque, deep learning models, offering a robust tool for fair and effective emotion analysis in understanding human behavior through text.

TASER: Temporal Adaptive Sampling for Fast and Accurate Dynamic Graph Representation Learning 2024-11-23
Show

Recently, Temporal Graph Neural Networks (TGNNs) have demonstrated state-of-the-art performance in various high-impact applications, including fraud detection and content recommendation. Despite the success of TGNNs, they are prone to the prevalent noise found in real-world dynamic graphs like time-deprecated links and skewed interaction distribution. The noise causes two critical issues that significantly compromise the accuracy of TGNNs: (1) models are supervised by inferior interactions, and (2) noisy input induces high variance in the aggregated messages. However, current TGNN denoising techniques do not consider the diverse and dynamic noise pattern of each node. In addition, they also suffer from the excessive mini-batch generation overheads caused by traversing more neighbors. We believe the remedy for fast and accurate TGNNs lies in temporal adaptive sampling. In this work, we propose TASER, the first adaptive sampling method for TGNNs optimized for accuracy, efficiency, and scalability. TASER adapts its mini-batch selection based on training dynamics and temporal neighbor selection based on the contextual, structural, and temporal properties of past interactions. To alleviate the bottleneck in mini-batch generation, TASER implements a pure GPU-based temporal neighbor finder and a dedicated GPU feature cache. We evaluate the performance of TASER using two state-of-the-art backbone TGNNs. On five popular datasets, TASER outperforms the corresponding baselines by an average of 2.3% in Mean Reciprocal Rank (MRR) while achieving an average of 5.1x speedup in training time.

IPDPS 2024
Adaptive Least Mean pth Power Graph Neural Networks 2024-11-23
Show

In the presence of impulsive noise, and missing observations, accurate online prediction of time-varying graph signals poses a crucial challenge in numerous application domains. We propose the Adaptive Least Mean $p^{th}$ Power Graph Neural Networks (LMP-GNN), a universal framework combining adaptive filter and graph neural network for online graph signal estimation. LMP-GNN retains the advantage of adaptive filtering in handling noise and missing observations as well as the online update capability. The incorporated graph neural network within the LMP-GNN can train and update filter parameters online instead of predefined filter parameters in previous methods, outputting more accurate prediction results. The adaptive update scheme of the LMP-GNN follows the solution of a $l_p$-norm optimization, rooting to the minimum dispersion criterion, and yields robust estimation results for time-varying graph signals under impulsive noise. A special case of LMP-GNN named the Sign-GNN is also provided and analyzed, Experiment results on two real-world datasets of temperature graph and traffic graph under four different noise distributions prove the effectiveness and robustness of our proposed LMP-GNN.

A GAN Approach for Node Embedding in Heterogeneous Graphs Using Subgraph Sampling 2024-11-23
Show

Graph neural networks (GNNs) face significant challenges with class imbalance, leading to biased inference results. To address this issue in heterogeneous graphs, we propose a novel framework that combines Graph Neural Network (GNN) and Generative Adversarial Network (GAN) to enhance classification for underrepresented node classes. The framework incorporates an advanced edge generation and selection module, enabling the simultaneous creation of synthetic nodes and edges through adversarial learning. Unlike previous methods, which predominantly focus on homogeneous graphs due to the difficulty of representing heterogeneous graph structures in matrix form, this approach is specifically designed for heterogeneous data. Existing solutions often rely on pre-trained models to incorporate synthetic nodes, which can lead to optimization inconsistencies and mismatches in data representation. Our framework avoids these pitfalls by generating data that aligns closely with the inherent graph topology and attributes, ensuring a more cohesive integration. Evaluations on multiple real-world datasets demonstrate the method's superiority over baseline models, particularly in tasks focused on identifying minority node classes, with notable improvements in performance metrics such as F-score and AUC-PRC score. These findings highlight the potential of this approach for addressing critical challenges in the field.

TANGNN: a Concise, Scalable and Effective Graph Neural Networks with Top-m Attention Mechanism for Graph Representation Learning 2024-11-23
Show

In the field of deep learning, Graph Neural Networks (GNNs) and Graph Transformer models, with their outstanding performance and flexible architectural designs, have become leading technologies for processing structured data, especially graph data. Traditional GNNs often face challenges in capturing information from distant vertices effectively. In contrast, Graph Transformer models are particularly adept at managing long-distance node relationships. Despite these advantages, Graph Transformer models still encounter issues with computational and storage efficiency when scaled to large graph datasets. To address these challenges, we propose an innovative Graph Neural Network (GNN) architecture that integrates a Top-m attention mechanism aggregation component and a neighborhood aggregation component, effectively enhancing the model's ability to aggregate relevant information from both local and extended neighborhoods at each layer. This method not only improves computational efficiency but also enriches the node features, facilitating a deeper analysis of complex graph structures. Additionally, to assess the effectiveness of our proposed model, we have applied it to citation sentiment prediction, a novel task previously unexplored in the GNN field. Accordingly, we constructed a dedicated citation network, ArXivNet. In this dataset, we specifically annotated the sentiment polarity of the citations (positive, neutral, negative) to enable in-depth sentiment analysis. Our approach has shown superior performance across a variety of tasks including vertex classification, link prediction, sentiment prediction, graph regression, and visualization. It outperforms existing methods in terms of effectiveness, as demonstrated by experimental results on multiple datasets.

The c...

The code and ArXivNet dataset are available at https://github.com/ejwww/TANGNN

Enriching GNNs with Text Contextual Representations for Detecting Disinformation Campaigns on Social Media 2024-11-23
Show

Disinformation on social media poses both societal and technical challenges, requiring robust detection systems. While previous studies have integrated textual information into propagation networks, they have yet to fully leverage the advancements in Transformer-based language models for high-quality contextual text representations. This work addresses this gap by incorporating Transformer-based textual features into Graph Neural Networks (GNNs) for fake news detection. We demonstrate that contextual text representations enhance GNN performance, achieving 33.8% relative improvement in Macro F1 over models without textual features and 9.3% over static text representations. We further investigate the impact of different feature sources and the effects of noisy data augmentation. We expect our methodology to open avenues for further research, and we made code publicly available.

Work ...

Work still in progress. Accepted as Extended Abstract Poster at LoG Conference 2024

GeoScatt-GNN: A Geometric Scattering Transform-Based Graph Neural Network Model for Ames Mutagenicity Prediction 2024-11-22
Show

This paper tackles the pressing challenge of mutagenicity prediction by introducing three ground-breaking approaches. First, it showcases the superior performance of 2D scattering coefficients extracted from molecular images, compared to traditional molecular descriptors. Second, it presents a hybrid approach that combines geometric graph scattering (GGS), Graph Isomorphism Networks (GIN), and machine learning models, achieving strong results in mutagenicity prediction. Third, it introduces a novel graph neural network architecture, MOLG3-SAGE, which integrates GGS node features into a fully connected graph structure, delivering outstanding predictive accuracy. Experimental results on the ZINC dataset demonstrate significant improvements, emphasizing the effectiveness of blending 2D and geometric scattering techniques with graph neural networks. This study illustrates the potential of GNNs and GGS for mutagenicity prediction, with broad implications for drug discovery and chemical safety assessment.

Lie-Equivariant Quantum Graph Neural Networks 2024-11-22
Show

Discovering new phenomena at the Large Hadron Collider (LHC) involves the identification of rare signals over conventional backgrounds. Thus binary classification tasks are ubiquitous in analyses of the vast amounts of LHC data. We develop a Lie-Equivariant Quantum Graph Neural Network (Lie-EQGNN), a quantum model that is not only data efficient, but also has symmetry-preserving properties. Since Lorentz group equivariance has been shown to be beneficial for jet tagging, we build a Lorentz-equivariant quantum GNN for quark-gluon jet discrimination and show that its performance is on par with its classical state-of-the-art counterpart LorentzNet, making it a viable alternative to the conventional computing paradigm.

10 pa...

10 pages, 5 figures, accepted to the Machine Learning with New Compute Paradigms (MLNCP) Workshop at NeurIPS 2024

Generalizable data-driven turbulence closure modeling on unstructured grids with differentiable physics 2024-11-22
Show

Differentiable physical simulators are proving to be valuable tools for developing data-driven models in computational fluid dynamics (CFD). These simulators enable end-to-end training of machine learning (ML) models embedded within CFD solvers. This paradigm enables novel algorithms which combine the generalization power and low cost of physics-based simulations with the flexibility and automation of deep learning methods. In this study, we introduce a framework for embedding deep learning models within a generic finite element solver to solve the Navier-Stokes equations, specifically applying this approach to learn a subgrid scale closure with a graph neural network (GNN). We validate our method for flow over a backwards-facing step and test its performance on novel geometries, demonstrating the ability to generalize to novel geometries without sacrificing stability. Additionally, we show that our GNN-based closure model may be learned in a data-limited scenario by interpreting closure modeling as a solver-constrained optimization. Our end-to-end learning paradigm demonstrates a viable pathway for physically consistent and generalizable data-driven closure modeling across complex geometries.

Financial Fraud Detection using Jump-Attentive Graph Neural Networks 2024-11-22
Show

As the availability of financial services online continues to grow, the incidence of fraud has surged correspondingly. Fraudsters continually seek new and innovative ways to circumvent the detection algorithms in place. Traditionally, fraud detection relied on rule-based methods, where rules were manually created based on transaction data features. However, these techniques soon became ineffective due to their reliance on manual rule creation and their inability to detect complex data patterns. Today, a significant portion of the financial services sector employs various machine learning algorithms, such as XGBoost, Random Forest, and neural networks, to model transaction data. While these techniques have proven more efficient than rule-based methods, they still fail to capture interactions between different transactions and their interrelationships. Recently, graph-based techniques have been adopted for financial fraud detection, leveraging graph topology to aggregate neighborhood information of transaction data using Graph Neural Networks (GNNs). Despite showing improvements over previous methods, these techniques still struggle to keep pace with the evolving camouflaging tactics of fraudsters and suffer from information loss due to over-smoothing. In this paper, we propose a novel algorithm that employs an efficient neighborhood sampling method, effective for camouflage detection and preserving crucial feature information from non-similar nodes. Additionally, we introduce a novel GNN architecture that utilizes attention mechanisms and preserves holistic neighborhood information to prevent information loss. We test our algorithm on financial data to show that our method outperforms other state-of-the-art graph algorithms.

Inter...

International Conference on Machine Learning and Applications 2024

What Do GNNs Actually Learn? Towards Understanding their Representations 2024-11-22
Show

In recent years, graph neural networks (GNNs) have achieved great success in the field of graph representation learning. Although prior work has shed light on the expressiveness of those models (\ie whether they can distinguish pairs of non-isomorphic graphs), it is still not clear what structural information is encoded into the node representations that are learned by those models. In this paper, we address this gap by studying the node representations learned by four standard GNN models. We find that some models produce identical representations for all nodes, while the representations learned by other models are linked to some notion of walks of specific length that start from the nodes. We establish Lipschitz bounds for these models with respect to the number of (normalized) walks. Additionally, we investigate the influence of node features on the learned representations. We find that if the initial representations of all nodes point in the same direction, the representations learned at the $k$-th layer of the models are also related to the initial features of nodes that can be reached in exactly $k$ steps. We also apply our findings to understand the phenomenon of oversquashing that occurs in GNNs. Our theoretical analysis is validated through experiments on synthetic and real-world datasets.

Machine Learning for Practical Quantum Error Mitigation 2024-11-22
Show

Quantum computers progress toward outperforming classical supercomputers, but quantum errors remain their primary obstacle. The key to overcoming errors on near-term devices has emerged through the field of quantum error mitigation, enabling improved accuracy at the cost of additional run time. Here, through experiments on state-of-the-art quantum computers using up to 100 qubits, we demonstrate that without sacrificing accuracy machine learning for quantum error mitigation (ML-QEM) drastically reduces the cost of mitigation. We benchmark ML-QEM using a variety of machine learning models -- linear regression, random forests, multi-layer perceptrons, and graph neural networks -- on diverse classes of quantum circuits, over increasingly complex device-noise profiles, under interpolation and extrapolation, and in both numerics and experiments. These tests employ the popular digital zero-noise extrapolation method as an added reference. Finally, we propose a path toward scalable mitigation by using ML-QEM to mimic traditional mitigation methods with superior runtime efficiency. Our results show that classical machine learning can extend the reach and practicality of quantum error mitigation by reducing its overheads and highlight its broader potential for practical quantum computations.

11 pa...

11 pages, 7 figures (main text) + 9 pages, 4 figures (supplementary information)

Can GNNs Learn Link Heuristics? A Concise Review and Evaluation of Link Prediction Methods 2024-11-22
Show

This paper explores the ability of Graph Neural Networks (GNNs) in learning various forms of information for link prediction, alongside a brief review of existing link prediction methods. Our analysis reveals that GNNs cannot effectively learn structural information related to the number of common neighbors between two nodes, primarily due to the nature of set-based pooling of the neighborhood aggregation scheme. Also, our extensive experiments indicate that trainable node embeddings can improve the performance of GNN-based link prediction models. Importantly, we observe that the denser the graph, the greater such the improvement. We attribute this to the characteristics of node embeddings, where the link state of each link sample could be encoded into the embeddings of nodes that are involved in the neighborhood aggregation of the two nodes in that link sample. In denser graphs, every node could have more opportunities to attend the neighborhood aggregation of other nodes and encode states of more link samples to its embedding, thus learning better node embeddings for link prediction. Lastly, we demonstrate that the insights gained from our research carry important implications in identifying the limitations of existing link prediction methods, which could guide the future development of more robust algorithms.

Enhancing Link Prediction with Fuzzy Graph Attention Networks and Dynamic Negative Sampling 2024-11-22
Show

Link prediction is crucial for understanding complex networks but traditional Graph Neural Networks (GNNs) often rely on random negative sampling, leading to suboptimal performance. This paper introduces Fuzzy Graph Attention Networks (FGAT), a novel approach integrating fuzzy rough sets for dynamic negative sampling and enhanced node feature aggregation. Fuzzy Negative Sampling (FNS) systematically selects high-quality negative edges based on fuzzy similarities, improving training efficiency. FGAT layer incorporates fuzzy rough set principles, enabling robust and discriminative node representations. Experiments on two research collaboration networks demonstrate FGAT's superior link prediction accuracy, outperforming state-of-the-art baselines by leveraging the power of fuzzy rough sets for effective negative sampling and node feature learning.

5 pages
Swift: A Multi-FPGA Framework for Scaling Up Accelerated Graph Analytics 2024-11-21
Show

Graph analytics are vital in fields such as social networks, biomedical research, and graph neural networks (GNNs). However, traditional CPUs and GPUs struggle with the memory bottlenecks caused by large graph datasets and their fine-grained memory accesses. While specialized graph accelerators address these challenges, they often support only moderate-sized graphs (under 500 million edges). Our paper proposes Swift, a novel scale-up graph accelerator framework that processes large graphs by leveraging the flexibility of FPGA custom datapath and memory resources, and optimizes utilization of high-bandwidth 3D memory (HBM). Swift supports up to 8 FPGAs in a node. Swift introduces a decoupled, asynchronous model based on the Gather-Apply-Scatter (GAS) scheme. It subgraphs across FPGAs, and each subgraph into intervals based on source vertex IDs. Processing on these intervals is decoupled and executed asynchronously, instead of bulk-synchonous operation, where throughput is limited by the slowest task. This enables simultaneous processing within each multi-FPGA node and optimizes the utilization of communication (PCIe), off-chip (HBM), and on-chip BRAM/URAM resources. Swift demonstrates significant performance improvements compared to prior scalable FPGA-based frameworks, performing 12.8 times better than the ForeGraph. Performance against Gunrock on NVIDIA A40 GPUs is mixed, because NVlink gives the GPU system a nearly 5X bandwidth advantage, but the FPGA system nevertheless achieves 2.6x greater energy efficiency.

Accep...

Accepted in International Conference on Field Programmable Technology (FPT-2024)

LLMs as Zero-shot Graph Learners: Alignment of GNN Representations with LLM Token Embeddings 2024-11-21
Show

Zero-shot graph machine learning, especially with graph neural networks (GNNs), has garnered significant interest due to the challenge of scarce labeled data. While methods like self-supervised learning and graph prompt learning have been extensively explored, they often rely on fine-tuning with task-specific labels, limiting their effectiveness in zero-shot scenarios. Inspired by the zero-shot capabilities of instruction-fine-tuned large language models (LLMs), we introduce a novel framework named Token Embedding-Aligned Graph Language Model (TEA-GLM) that leverages LLMs as cross-dataset and cross-task zero-shot learners for graph machine learning. Concretely, we pretrain a GNN, aligning its representations with token embeddings of an LLM. We then train a linear projector that transforms the GNN's representations into a fixed number of graph token embeddings without tuning the LLM. A unified instruction is designed for various graph tasks at different levels, such as node classification (node-level) and link prediction (edge-level). These design choices collectively enhance our method's effectiveness in zero-shot learning, setting it apart from existing methods. Experiments show that our graph token embeddings help the LLM predictor achieve state-of-the-art performance on unseen datasets and tasks compared to other methods using LLMs as predictors.

Embedding-based Multimodal Learning on Pan-Squamous Cell Carcinomas for Improved Survival Outcomes 2024-11-21
Show

Cancer clinics capture disease data at various scales, from genetic to organ level. Current bioinformatic methods struggle to handle the heterogeneous nature of this data, especially with missing modalities. We propose PARADIGM, a Graph Neural Network (GNN) framework that learns from multimodal, heterogeneous datasets to improve clinical outcome prediction. PARADIGM generates embeddings from multi-resolution data using foundation models, aggregates them into patient-level representations, fuses them into a unified graph, and enhances performance for tasks like survival analysis. We train GNNs on pan-Squamous Cell Carcinomas and validate our approach on Moffitt Cancer Center lung SCC data. Multimodal GNN outperforms other models in patient survival prediction. Converging individual data modalities across varying scales provides a more insightful disease view. Our solution aims to understand the patient's circumstances comprehensively, offering insights on heterogeneous data integration and the benefits of converging maximum data views.

Graph Neural Networks and Arithmetic Circuits 2024-11-21
Show

We characterize the computational power of neural networks that follow the graph neural network (GNN) architecture, not restricted to aggregate-combine GNNs or other particular types. We establish an exact correspondence between the expressivity of GNNs using diverse activation functions and arithmetic circuits over real numbers. In our results the activation function of the network becomes a gate type in the circuit. Our result holds for families of constant depth circuits and networks, both uniformly and non-uniformly, for all common activation functions.

Learning Pore-scale Multi-phase Flow from Experimental Data with Graph Neural Network 2024-11-21
Show

Understanding the process of multiphase fluid flow through porous media is crucial for many climate change mitigation technologies, including CO$_2$ geological storage, hydrogen storage, and fuel cells. However, current numerical models are often incapable of accurately capturing the complex pore-scale physics observed in experiments. In this study, we address this challenge using a graph neural network-based approach and directly learn pore-scale fluid flow using micro-CT experimental data. We propose a Long-Short-Edge MeshGraphNet (LSE-MGN) that predicts the state of each node in the pore space at each time step. During inference, given an initial state, the model can autoregressively predict the evolution of the multiphase flow process over time. This approach successfully captures the physics from the high-resolution experimental data while maintaining computational efficiency, providing a promising direction for accurate and efficient pore-scale modeling of complex multiphase fluid flow dynamics.

Accpe...

Accpeted for Machine Learning and the Physical Sciences Workshop at the 38th conference on Neural Information Processing Systems (NeurIPS 2024)

GNN-MultiFix: Addressing the pitfalls for GNNs for multi-label node classification 2024-11-21
Show

Graph neural networks (GNNs) have emerged as powerful models for learning representations of graph data showing state of the art results in various tasks. Nevertheless, the superiority of these methods is usually supported by either evaluating their performance on small subset of benchmark datasets or by reasoning about their expressive power in terms of certain graph isomorphism tests. In this paper we critically analyse both these aspects through a transductive setting for the task of node classification. First, we delve deeper into the case of multi-label node classification which offers a more realistic scenario and has been ignored in most of the related works. Through analysing the training dynamics for GNN methods we highlight the failure of GNNs to learn over multi-label graph datasets even for the case of abundant training data. Second, we show that specifically for transductive node classification, even the most expressive GNN may fail to learn in absence of node attributes and without using explicit label information as input. To overcome this deficit, we propose a straightforward approach, referred to as GNN-MultiFix, that integrates the feature, label, and positional information of a node. GNN-MultiFix demonstrates significant improvement across all the multi-label datasets. We release our code at https://anonymous.4open.science/r/Graph-MultiFix-4121.

Teaching MLPs to Master Heterogeneous Graph-Structured Knowledge for Efficient and Accurate Inference 2024-11-21
Show

Heterogeneous Graph Neural Networks (HGNNs) have achieved promising results in various heterogeneous graph learning tasks, owing to their superiority in capturing the intricate relationships and diverse relational semantics inherent in heterogeneous graph structures. However, the neighborhood-fetching latency incurred by structure dependency in HGNNs makes it challenging to deploy for latency-constrained applications that require fast inference. Inspired by recent GNN-to-MLP knowledge distillation frameworks, we introduce HG2M and HG2M+ to combine both HGNN's superior performance and MLP's efficient inference. HG2M directly trains student MLPs with node features as input and soft labels from teacher HGNNs as targets, and HG2M+ further distills reliable and heterogeneous semantic knowledge into student MLPs through reliable node distillation and reliable meta-path distillation. Experiments conducted on six heterogeneous graph datasets show that despite lacking structural dependencies, HG2Ms can still achieve competitive or even better performance than HGNNs and significantly outperform vanilla MLPs. Moreover, HG2Ms demonstrate a 379.24$\times$ speedup in inference over HGNNs on the large-scale IGB-3M-19 dataset, showcasing their ability for latency-sensitive deployments.

Predicting Wall Thickness Changes in Cold Forging Processes: An Integrated FEM and Neural Network approach 2024-11-21
Show

This study presents a novel approach for predicting wall thickness changes in tubes during the nosing process. Specifically, we first provide a thorough analysis of nosing processes and the influencing parameters. We further set-up a Finite Element Method (FEM) simulation to better analyse the effects of varying process parameters. As however traditional FEM simulations, while accurate, are time-consuming and computationally intensive, which renders them inapplicable for real-time application, we present a novel modeling framework based on specifically designed graph neural networks as surrogate models. To this end, we extend the neural network architecture by directly incorporating information about the nosing process by adding different types of edges and their corresponding encoders to model object interactions. This augmentation enhances model accuracy and opens the possibility for employing precise surrogate models within closed-loop production processes. The proposed approach is evaluated using a new evaluation metric termed area between thickness curves (ABTC). The results demonstrate promising performance and highlight the potential of neural networks as surrogate models in predicting wall thickness changes during nosing forging processes.

Topology-Aware Popularity Debiasing via Simplicial Complexes 2024-11-21
Show

Recommender systems (RS) play a critical role in delivering personalized content across various online platforms, leveraging collaborative filtering (CF) as a key technique to generate recommendations based on users' historical interaction data. Recent advancements in CF have been driven by the adoption of Graph Neural Networks (GNNs), which model user-item interactions as bipartite graphs, enabling the capture of high-order collaborative signals. Despite their success, GNN-based methods face significant challenges due to the inherent popularity bias in the user-item interaction graph's topology, leading to skewed recommendations that favor popular items over less-known ones. To address this challenge, we propose a novel topology-aware popularity debiasing framework, Test-time Simplicial Propagation (TSP), which incorporates simplicial complexes (SCs) to enhance the expressiveness of GNNs. Unlike traditional methods that focus on pairwise relationships, our approach captures multi-order relationships through SCs, providing a more comprehensive representation of user-item interactions. By enriching the neighborhoods of tail items and leveraging SCs for feature smoothing, TSP enables the propagation of multi-order collaborative signals and effectively mitigates biased propagation. Our TSP module is designed as a plug-and-play solution, allowing for seamless integration into pre-trained GNN-based models without the need for fine-tuning additional parameters. Extensive experiments on five real-world datasets demonstrate the superior performance of our method, particularly in long-tail recommendation tasks. Visualization results further confirm that TSP produces more uniform distributions of item representations, leading to fairer and more accurate recommendations.

Graph Knowledge Distillation to Mixture of Experts 2024-11-21
Show

In terms of accuracy, Graph Neural Networks (GNNs) are the best architectural choice for the node classification task. Their drawback in real-world deployment is the latency that emerges from the neighbourhood processing operation. One solution to the latency issue is to perform knowledge distillation from a trained GNN to a Multi-Layer Perceptron (MLP), where the MLP processes only the features of the node being classified (and possibly some pre-computed structural information). However, the performance of such MLPs in both transductive and inductive settings remains inconsistent for existing knowledge distillation techniques. We propose to address the performance concerns by using a specially-designed student model instead of an MLP. Our model, named Routing-by-Memory (RbM), is a form of Mixture-of-Experts (MoE), with a design that enforces expert specialization. By encouraging each expert to specialize on a certain region on the hidden representation space, we demonstrate experimentally that it is possible to derive considerably more consistent performance across multiple datasets. Code available at https://github.com/Rufaim/routing-by-memory.

Heterophilic Graph Neural Networks Optimization with Causal Message-passing 2024-11-21
Show

In this work, we discover that causal inference provides a promising approach to capture heterophilic message-passing in Graph Neural Network (GNN). By leveraging cause-effect analysis, we can discern heterophilic edges based on asymmetric node dependency. The learned causal structure offers more accurate relationships among nodes. To reduce the computational complexity, we introduce intervention-based causal inference in graph learning. We first simplify causal analysis on graphs by formulating it as a structural learning model and define the optimization problem within the Bayesian scheme. We then present an analysis of decomposing the optimization target into a consistency penalty and a structure modification based on cause-effect relations. We then estimate this target by conditional entropy and present insights into how conditional entropy quantifies the heterophily. Accordingly, we propose CausalMP, a causal message-passing discovery network for heterophilic graph learning, that iteratively learns the explicit causal structure of input graphs. We conduct extensive experiments in both heterophilic and homophilic graph settings. The result demonstrates that the our model achieves superior link prediction performance. Training on causal structure can also enhance node representation in classification task across different base models.

Scalable Multitask Learning Using Gradient-based Estimation of Task Affinity 2024-11-20
Show

Multitask learning is a widely used paradigm for training models on diverse tasks, with applications ranging from graph neural networks to language model fine-tuning. Since tasks may interfere with each other, a key notion for modeling their relationships is task affinity. This includes pairwise task affinity, computed among pairs of tasks, and higher-order affinity, computed among subsets of tasks. Naively computing either of them requires repeatedly training on data from various task combinations, which is computationally intensive. We present a new algorithm Grad-TAG that can estimate task affinities without this repeated training. The key idea of Grad-TAG is to train a "base" model for all tasks and then use a linearization technique to estimate the loss of the model for a specific task combination. The linearization works by computing a gradient-based approximation of the loss, using low-dimensional projections of gradients as features in a logistic regression to predict labels for the task combination. We show that the linearized model can provably approximate the loss when the gradient-based approximation is accurate, and also empirically verify that on several large models. Then, given the estimated task affinity, we design a semi-definite program for clustering similar tasks by maximizing the average density of clusters. We evaluate Grad-TAG's performance across seven datasets, including multi-label classification on graphs, and instruction fine-tuning of language models. Our task affinity estimates are within 2.7% distance to the true affinities while needing only 3% of FLOPs in full training. On our largest graph with 21M edges and 500 labeling tasks, our algorithm delivers estimates within 5% distance to the true affinities, using only 112 GPU hours. Our results show that Grad-TAG achieves excellent performance and runtime tradeoffs compared to existing approaches.

16 pa...

16 pages. Appeared in KDD 2024

Investigating Graph Neural Networks and Classical Feature-Extraction Techniques in Activity-Cliff and Molecular Property Prediction 2024-11-20
Show

Molecular featurisation refers to the transformation of molecular data into numerical feature vectors. It is one of the key research areas in molecular machine learning and computational drug discovery. Recently, message-passing graph neural networks (GNNs) have emerged as a novel method to learn differentiable features directly from molecular graphs. While such techniques hold great promise, further investigations are needed to clarify if and when they indeed manage to definitively outcompete classical molecular featurisations such as extended-connectivity fingerprints (ECFPs) and physicochemical-descriptor vectors (PDVs). We systematically explore and further develop classical and graph-based molecular featurisation methods for two important tasks: molecular property prediction, in particular, quantitative structure-activity relationship (QSAR) prediction, and the largely unexplored challenge of activity-cliff (AC) prediction. We first give a technical description and critical analysis of PDVs, ECFPs and message-passing GNNs, with a focus on graph isomorphism networks (GINs). We then conduct a rigorous computational study to compare the performance of PDVs, ECFPs and GINs for QSAR and AC-prediction. Following this, we mathematically describe and computationally evaluate a novel twin neural network model for AC-prediction. We further introduce an operation called substructure pooling for the vectorisation of structural fingerprints as a natural counterpart to graph pooling in GNN architectures. We go on to propose Sort & Slice, a simple substructure-pooling technique for ECFPs that robustly outperforms hash-based folding at molecular property prediction. Finally, we outline two ideas for future research: (i) a graph-based self-supervised learning strategy to make classical molecular featurisations trainable, and (ii) trainable substructure-pooling via differentiable self-attention.

Docto...

Doctoral Thesis (Mathematical Institute, University of Oxford)

Graph neural network framework for energy mapping of hybrid monte-carlo molecular dynamics simulations of Medium Entropy Alloys 2024-11-20
Show

Machine learning (ML) methods have drawn significant interest in material design and discovery. Graph neural networks (GNNs), in particular, have demonstrated strong potential for predicting material properties. The present study proposes a graph-based representation for modeling medium-entropy alloys (MEAs). Hybrid Monte-Carlo molecular dynamics (MC/MD) simulations are employed to achieve thermally stable structures across various annealing temperatures in an MEA. These simulations generate dump files and potential energy labels, which are used to construct graph representations of the atomic configurations. Edges are created between each atom and its 12 nearest neighbors without incorporating explicit edge features. These graphs then serve as input for a Graph Convolutional Neural Network (GCNN) based ML model to predict the system's potential energy. The GCNN architecture effectively captures the local environment and chemical ordering within the MEA structure. The GCNN-based ML model demonstrates strong performance in predicting potential energy at different steps, showing satisfactory results on both the training data and unseen configurations. Our approach presents a graph-based modeling framework for MEAs and high-entropy alloys (HEAs), which effectively captures the local chemical order (LCO) within the alloy structure. This allows us to predict key material properties influenced by LCO in both MEAs and HEAs, providing deeper insights into how atomic-scale arrangements affect the properties of these alloys.

28 pages, 9 figures
Predictive Insights into LGBTQ+ Minority Stress: A Transductive Exploration of Social Media Discourse 2024-11-20
Show

Individuals who identify as sexual and gender minorities, including lesbian, gay, bisexual, transgender, queer, and others (LGBTQ+) are more likely to experience poorer health than their heterosexual and cisgender counterparts. One primary source that drives these health disparities is minority stress (i.e., chronic and social stressors unique to LGBTQ+ communities' experiences adapting to the dominant culture). This stress is frequently expressed in LGBTQ+ users' posts on social media platforms. However, these expressions are not just straightforward manifestations of minority stress. They involve linguistic complexity (e.g., idiom or lexical diversity), rendering them challenging for many traditional natural language processing methods to detect. In this work, we designed a hybrid model using Graph Neural Networks (GNN) and Bidirectional Encoder Representations from Transformers (BERT), a pre-trained deep language model to improve the classification performance of minority stress detection. We experimented with our model on a benchmark social media dataset for minority stress detection (LGBTQ+ MiSSoM+). The dataset is comprised of 5,789 human-annotated Reddit posts from LGBTQ+ subreddits. Our approach enables the extraction of hidden linguistic nuances through pretraining on a vast amount of raw data, while also engaging in transductive learning to jointly develop representations for both labeled training data and unlabeled test data. The RoBERTa-GCN model achieved an accuracy of 0.86 and an F1 score of 0.86, surpassing the performance of other baseline models in predicting LGBTQ+ minority stress. Improved prediction of minority stress expressions on social media could lead to digital health interventions to improve the wellbeing of LGBTQ+ people-a community with high rates of stress-sensitive health problems.

This ...

This paper is accepted in 2024 IEEE 11th International Conference on Data Science and Advanced Analytics (DSAA)

Advancing Heatwave Forecasting via Distribution Informed-Graph Neural Networks (DI-GNNs): Integrating Extreme Value Theory with GNNs 2024-11-20
Show

Heatwaves, prolonged periods of extreme heat, have intensified in frequency and severity due to climate change, posing substantial risks to public health, ecosystems, and infrastructure. Despite advancements in Machine Learning (ML) modeling, accurate heatwave forecasting at weather scales (1--15 days) remains challenging due to the non-linear interactions between atmospheric drivers and the rarity of these extreme events. Traditional models relying on heuristic feature engineering often fail to generalize across diverse climates and capture the complexities of heatwave dynamics. This study introduces the Distribution-Informed Graph Neural Network (DI-GNN), a novel framework that integrates principles from Extreme Value Theory (EVT) into the graph neural network architecture. DI-GNN incorporates Generalized Pareto Distribution (GPD)-derived descriptors into the feature space, adjacency matrix, and loss function to enhance its sensitivity to rare heatwave occurrences. By prioritizing the tails of climatic distributions, DI-GNN addresses the limitations of existing methods, particularly in imbalanced datasets where traditional metrics like accuracy are misleading. Empirical evaluations using weather station data from British Columbia, Canada, demonstrate the superior performance of DI-GNN compared to baseline models. DI-GNN achieved significant improvements in balanced accuracy, recall, and precision, with high AUC and average precision scores, reflecting its robustness in distinguishing heatwave events.

23 pa...

23 pages, 13 figures, pdf format

Effective Analog ICs Floorplanning with Relational Graph Neural Networks and Reinforcement Learning 2024-11-20
Show

Analog integrated circuit (IC) floorplanning is typically a manual process with the placement of components (devices and modules) planned by a layout engineer. This process is further complicated by the interdependence of floorplanning and routing steps, numerous electric and layout-dependent constraints, as well as the high level of customization expected in analog design. This paper presents a novel automatic floorplanning algorithm based on reinforcement learning. It is augmented by a relational graph convolutional neural network model for encoding circuit features and positional constraints. The combination of these two machine learning methods enables knowledge transfer across different circuit designs with distinct topologies and constraints, increasing the \emph{generalization ability} of the solution. Applied to $6$ industrial circuits, our approach surpassed established floorplanning techniques in terms of speed, area and half-perimeter wire length. When integrated into a \emph{procedural generator} for layout completion, overall layout time was reduced by $67.3%$ with a $8.3%$ mean area reduction compared to manual layout.

7 pag...

7 pages, 7 figures, Accepted at DATE25

Regional Ocean Forecasting with Hierarchical Graph Neural Networks 2024-11-20
Show

Accurate ocean forecasting systems are vital for understanding marine dynamics, which play a crucial role in environmental management and climate adaptation strategies. Traditional numerical solvers, while effective, are computationally expensive and time-consuming. Recent advancements in machine learning have revolutionized weather forecasting, offering fast and energy-efficient alternatives. Building on these advancements, we introduce SeaCast, a neural network designed for high-resolution, medium-range ocean forecasting. SeaCast employs a graph-based framework to effectively handle the complex geometry of ocean grids and integrates external forcing data tailored to the regional ocean context. Our approach is validated through experiments at a high spatial resolution using the operational numerical model of the Mediterranean Sea provided by the Copernicus Marine Service, along with both numerical and data-driven atmospheric forcings.

28 pa...

28 pages, 35 figures. Accepted to the Tackling Climate Change with Machine Learning workshop at NeurIPS 2024

Domain Adaptive Unfolded Graph Neural Networks 2024-11-20
Show

Over the last decade, graph neural networks (GNNs) have made significant progress in numerous graph machine learning tasks. In real-world applications, where domain shifts occur and labels are often unavailable for a new target domain, graph domain adaptation (GDA) approaches have been proposed to facilitate knowledge transfer from the source domain to the target domain. Previous efforts in tackling distribution shifts across domains have mainly focused on aligning the node embedding distributions generated by the GNNs in the source and target domains. However, as the core part of GDA approaches, the impact of the underlying GNN architecture has received limited attention. In this work, we explore this orthogonal direction, i.e., how to facilitate GDA with architectural enhancement. In particular, we consider a class of GNNs that are designed explicitly based on optimization problems, namely unfolded GNNs (UGNNs), whose training process can be represented as bi-level optimization. Empirical and theoretical analyses demonstrate that when transferring from the source domain to the target domain, the lower-level objective value generated by the UGNNs significantly increases, resulting in an increase in the upper-level objective as well. Motivated by this observation, we propose a simple yet effective strategy called cascaded propagation (CP), which is guaranteed to decrease the lower-level objective value. The CP strategy is widely applicable to general UGNNs, and we evaluate its efficacy with three representative UGNN architectures. Extensive experiments on five real-world datasets demonstrate that the UGNNs integrated with CP outperform state-of-the-art GDA baselines.

Self-Supervised Conditional Distribution Learning on Graphs 2024-11-20
Show

Graph contrastive learning (GCL) has shown promising performance in semisupervised graph classification. However, existing studies still encounter significant challenges in GCL. First, successive layers in graph neural network (GNN) tend to produce more similar node embeddings, while GCL aims to increase the dissimilarity between negative pairs of node embeddings. This inevitably results in a conflict between the message-passing mechanism of GNNs and the contrastive learning of negative pairs via intraviews. Second, leveraging the diversity and quantity of data provided by graph-structured data augmentations while preserving intrinsic semantic information is challenging. In this paper, we propose a self-supervised conditional distribution learning (SSCDL) method designed to learn graph representations from graph-structured data for semisupervised graph classification. Specifically, we present an end-to-end graph representation learning model to align the conditional distributions of weakly and strongly augmented features over the original features. This alignment effectively reduces the risk of disrupting intrinsic semantic information through graph-structured data augmentation. To avoid conflict between the message-passing mechanism and contrastive learning of negative pairs, positive pairs of node representations are retained for measuring the similarity between the original features and the corresponding weakly augmented features. Extensive experiments with several benchmark graph datasets demonstrate the effectiveness of the proposed SSCDL method.

8 pages
ORID: Organ-Regional Information Driven Framework for Radiology Report Generation 2024-11-20
Show

The objective of Radiology Report Generation (RRG) is to automatically generate coherent textual analyses of diseases based on radiological images, thereby alleviating the workload of radiologists. Current AI-based methods for RRG primarily focus on modifications to the encoder-decoder model architecture. To advance these approaches, this paper introduces an Organ-Regional Information Driven (ORID) framework which can effectively integrate multi-modal information and reduce the influence of noise from unrelated organs. Specifically, based on the LLaVA-Med, we first construct an RRG-related instruction dataset to improve organ-regional diagnosis description ability and get the LLaVA-Med-RRG. After that, we propose an organ-based cross-modal fusion module to effectively combine the information from the organ-regional diagnosis description and radiology image. To further reduce the influence of noise from unrelated organs on the radiology report generation, we introduce an organ importance coefficient analysis module, which leverages Graph Neural Network (GNN) to examine the interconnections of the cross-modal information of each organ region. Extensive experiments an1d comparisons with state-of-the-art methods across various evaluation metrics demonstrate the superior performance of our proposed method.

13 pa...

13 pages, 11 figures, WACV2025

Epidemiology-informed Network for Robust Rumor Detection 2024-11-20
Show

The rapid spread of rumors on social media has posed significant challenges to maintaining public trust and information integrity. Since an information cascade process is essentially a propagation tree, recent rumor detection models leverage graph neural networks to additionally capture information propagation patterns, thus outperforming text-only solutions. Given the variations in topics and social impact of the root node, different source information naturally has distinct outreach capabilities, resulting in different heights of propagation trees. This variation, however, impedes the data-driven design of existing graph-based rumor detectors. Given a shallow propagation tree with limited interactions, it is unlikely for graph-based approaches to capture sufficient cascading patterns, questioning their ability to handle less popular news or early detection needs. In contrast, a deep propagation tree is prone to noisy user responses, and this can in turn obfuscate the predictions. In this paper, we propose a novel Epidemiology-informed Network (EIN) that integrates epidemiological knowledge to enhance performance by overcoming data-driven methods sensitivity to data quality. Meanwhile, to adapt epidemiology theory to rumor detection, it is expected that each users stance toward the source information will be annotated. To bypass the costly and time-consuming human labeling process, we take advantage of large language models to generate stance labels, facilitating optimization objectives for learning epidemiology-informed representations. Our experimental results demonstrate that the proposed EIN not only outperforms state-of-the-art methods on real-world datasets but also exhibits enhanced robustness across varying tree depths.

MLDGG: Meta-Learning for Domain Generalization on Graphs 2024-11-19
Show

Domain generalization on graphs aims to develop models with robust generalization capabilities, ensuring effective performance on the testing set despite disparities between testing and training distributions. However, existing methods often rely on static encoders directly applied to the target domain, constraining its flexible adaptability. In contrast to conventional methodologies, which concentrate on developing specific generalized models, our framework, MLDGG, endeavors to achieve adaptable generalization across diverse domains by integrating cross-multi-domain meta-learning with structure learning and semantic identification. Initially, it introduces a generalized structure learner to mitigate the adverse effects of task-unrelated edges, enhancing the comprehensiveness of representations learned by Graph Neural Networks (GNNs) while capturing shared structural information across domains. Subsequently, a representation learner is designed to disentangle domain-invariant semantic and domain-specific variation information in node embedding by leveraging causal reasoning for semantic identification, further enhancing generalization. In the context of meta-learning, meta-parameters for both learners are optimized to facilitate knowledge transfer and enable effective adaptation to graphs through fine-tuning within the target domains, where target graphs are inaccessible during training. Our empirical results demonstrate that MLDGG surpasses baseline methods, showcasing its effectiveness in three different distribution shift settings.

Accep...

Accepted in KDD 2025 (research track)

Efficient Model-Stealing Attacks Against Inductive Graph Neural Networks 2024-11-19
Show

Graph Neural Networks (GNNs) are recognized as potent tools for processing real-world data organized in graph structures. Especially inductive GNNs, which allow for the processing of graph-structured data without relying on predefined graph structures, are becoming increasingly important in a wide range of applications. As such these networks become attractive targets for model-stealing attacks where an adversary seeks to replicate the functionality of the targeted network. Significant efforts have been devoted to developing model-stealing attacks that extract models trained on images and texts. However, little attention has been given to stealing GNNs trained on graph data. This paper identifies a new method of performing unsupervised model-stealing attacks against inductive GNNs, utilizing graph contrastive learning and spectral graph augmentations to efficiently extract information from the targeted model. The new type of attack is thoroughly evaluated on six datasets and the results show that our approach outperforms the current state-of-the-art by Shen et al. (2021). In particular, our attack surpasses the baseline across all benchmarks, attaining superior fidelity and downstream accuracy of the stolen model while necessitating fewer queries directed toward the target model.

Accep...

Accepted at ECAI - 27th European Conference on Artificial Intelligence

Benchmarking Positional Encodings for GNNs and Graph Transformers 2024-11-19
Show

Recent advances in Graph Neural Networks (GNNs) and Graph Transformers (GTs) have been driven by innovations in architectures and Positional Encodings (PEs), which are critical for augmenting node features and capturing graph topology. PEs are essential for GTs, where topological information would otherwise be lost without message-passing. However, PEs are often tested alongside novel architectures, making it difficult to isolate their effect on established models. To address this, we present a comprehensive benchmark of PEs in a unified framework that includes both message-passing GNNs and GTs. We also establish theoretical connections between MPNNs and GTs and introduce a sparsified GRIT attention mechanism to examine the influence of global connectivity. Our findings demonstrate that previously untested combinations of GNN architectures and PEs can outperform existing methods and offer a more comprehensive picture of the state-of-the-art. To support future research and experimentation in our framework, we make the code publicly available.

Estimating Dark Matter Halo Masses in Simulated Galaxy Clusters with Graph Neural Networks 2024-11-19
Show

Galaxies grow and evolve in dark matter halos. Because dark matter is not visible, galaxies' halo masses ($\rm{M}{\rm{halo}}$) must be inferred indirectly. We present a graph neural network (GNN) model for predicting $\rm{M}{\rm{halo}}$ from stellar mass ($\rm{M}_{*}$) in simulated galaxy clusters using data from the IllustrisTNG simulation suite. Unlike traditional machine learning models like random forests, our GNN captures the information-rich substructure of galaxy clusters by using spatial and kinematic relationships between galaxy neighbour. A GNN model trained on the TNG-Cluster dataset and independently tested on the TNG300 simulation achieves superior predictive performance compared to other baseline models we tested. Future work will extend this approach to different simulations and real observational datasets to further validate the GNN model's ability to generalise.

9 pag...

9 pages, 4 figures, accepted at the NeurIPS ML4PS 2024 workshop

Graph Neural Network-Based Entity Extraction and Relationship Reasoning in Complex Knowledge Graphs 2024-11-19
Show

This study proposed a knowledge graph entity extraction and relationship reasoning algorithm based on a graph neural network, using a graph convolutional network and graph attention network to model the complex structure in the knowledge graph. By building an end-to-end joint model, this paper achieves efficient recognition and reasoning of entities and relationships. In the experiment, this paper compared the model with a variety of deep learning algorithms and verified its superiority through indicators such as AUC, recall rate, precision rate, and F1 value. The experimental results show that the model proposed in this paper performs well in all indicators, especially in complex knowledge graphs, it has stronger generalization ability and stability. This provides strong support for further research on knowledge graphs and also demonstrates the application potential of graph neural networks in entity extraction and relationship reasoning.

GNNAS-Dock: Budget Aware Algorithm Selection with Graph Neural Networks for Molecular Docking 2024-11-19
Show

Molecular docking is a major element in drug discovery and design. It enables the prediction of ligand-protein interactions by simulating the binding of small molecules to proteins. Despite the availability of numerous docking algorithms, there is no single algorithm consistently outperforms the others across a diverse set of docking scenarios. This paper introduces GNNAS-Dock, a novel Graph Neural Network (GNN)-based automated algorithm selection system for molecular docking in blind docking situations. GNNs are accommodated to process the complex structural data of both ligands and proteins. They benefit from the inherent graph-like properties to predict the performance of various docking algorithms under different conditions. The present study pursues two main objectives: 1) predict the performance of each candidate docking algorithm, in terms of Root Mean Square Deviation (RMSD), thereby identifying the most accurate method for specific scenarios; and 2) choose the best computationally efficient docking algorithm for each docking case, aiming to reduce the time required for docking while maintaining high accuracy. We validate our approach on PDBBind 2020 refined set, which contains about 5,300 pairs of protein-ligand complexes.

On Size and Hardness Generalization in Unsupervised Learning for the Travelling Salesman Problem 2024-11-19
Show

We study the generalization capability of Unsupervised Learning in solving the Travelling Salesman Problem (TSP). We use a Graph Neural Network (GNN) trained with a surrogate loss function to generate an embedding for each node. We use these embeddings to construct a heat map that indicates the likelihood of each edge being part of the optimal route. We then apply local search to generate our final predictions. Our investigation explores how different training instance sizes, embedding dimensions, and distributions influence the outcomes of Unsupervised Learning methods. Our results show that training with larger instance sizes and increasing embedding dimensions can build a more effective representation, enhancing the model's ability to solve TSP. Furthermore, in evaluating generalization across different distributions, we first determine the hardness of various distributions and explore how different hardnesses affect the final results. Our findings suggest that models trained on harder instances exhibit better generalization capabilities, highlighting the importance of selecting appropriate training instances in solving TSP using Unsupervised Learning.

Guiding Word Equation Solving using Graph Neural Networks (Extended Technical Report) 2024-11-19
Show

This paper proposes a Graph Neural Network-guided algorithm for solving word equations, based on the well-known Nielsen transformation for splitting equations. The algorithm iteratively rewrites the first terms of each side of an equation, giving rise to a tree-like search space. The choice of path at each split point of the tree significantly impacts solving time, motivating the use of Graph Neural Networks (GNNs) for efficient split decision-making. Split decisions are encoded as multi-classification tasks, and five graph representations of word equations are introduced to encode their structural information for GNNs. The algorithm is implemented as a solver named DragonLi. Experiments are conducted on artificial and real-world benchmarks. The algorithm performs particularly well on satisfiable problems. For single word \mbox{equations}, DragonLi can solve significantly more problems than well-established string solvers. For the conjunction of multiple word equations, DragonLi is competitive with state-of-the-art string solvers.

GNN-Based Code Annotation Logic for Establishing Security Boundaries in C Code 2024-11-19
Show

Securing sensitive operations in today's interconnected software landscape is crucial yet challenging. Modern platforms rely on Trusted Execution Environments (TEEs), such as Intel SGX and ARM TrustZone, to isolate security sensitive code from the main system, reducing the Trusted Computing Base (TCB) and providing stronger assurances. However, identifying which code should reside in TEEs is complex and requires specialized expertise, which is not supported by current automated tools. Existing solutions often migrate entire applications to TEEs, leading to suboptimal use and an increased TCB. To address this gap, we propose Code Annotation Logic (CAL), a pioneering tool that automatically identifies security sensitive components for TEE isolation. CAL analyzes codebases, leveraging a graph-based approach with novel feature construction and employing a custom graph neural network model to accurately determine which parts of the code should be isolated. CAL effectively optimizes TCB, reducing the burden of manual analysis and enhancing overall security. Our contributions include the definition of security sensitive code, the construction and labeling of a comprehensive dataset of source files, a feature rich graph based data preparation pipeline, and the CAL model for TEE integration. Evaluation results demonstrate CAL's efficacy in identifying sensitive code with a recall of 86.05%, an F1 score of 81.56%, and an identification rate of 91.59% for security sensitive functions. By enabling efficient code isolation, CAL advances the secure development of applications using TEEs, offering a practical solution for developers to reduce attack vectors.

Submitted
Graph as a feature: improving node classification with non-neural graph-aware logistic regression 2024-11-19
Show

Graph Neural Networks (GNNs) and their message passing framework that leverages both structural and feature information, have become a standard method for solving graph-based machine learning problems. However, these approaches still struggle to generalise well beyond datasets that exhibit strong homophily, where nodes of the same class tend to connect. This limitation has led to the development of complex neural architectures that pose challenges in terms of efficiency and scalability. In response to these limitations, we focus on simpler and more scalable approaches and introduce Graph-aware Logistic Regression (GLR), a non-neural model designed for node classification tasks. Unlike traditional graph algorithms that use only a fraction of the information accessible to GNNs, our proposed model simultaneously leverages both node features and the relationships between entities. However instead of relying on message passing, our approach encodes each node's relationships as an additional feature vector, which is then combined with the node's self attributes. Extensive experimental results, conducted within a rigorous evaluation framework, show that our proposed GLR approach outperforms both foundational and sophisticated state-of-the-art GNN models in node classification tasks. Going beyond the traditional limited benchmarks, our experiments indicate that GLR increases generalisation ability while reaching performance gains in computation time up to two orders of magnitude compared to it best neural competitor.

Variational Graph Autoencoder for Heterogeneous Information Networks with Missing and Inaccurate Attributes 2024-11-19
Show

Heterogeneous Information Networks (HINs), which consist of various types of nodes and edges, have recently demonstrated excellent performance in graph mining. However, most existing heterogeneous graph neural networks (HGNNs) ignore the problems of missing attributes, inaccurate attributes and scarce labels for nodes, which limits their expressiveness. In this paper, we propose a generative self-supervised model GraMI to address these issues simultaneously. Specifically, GraMI first initializes all the nodes in the graph with a low-dimensional representation matrix. After that, based on the variational graph autoencoder framework, GraMI learns both node-level and attribute-level embeddings in the encoder, which can provide fine-grained semantic information to construct node attributes. In the decoder, GraMI reconstructs both links and attributes. Instead of directly reconstructing raw features for attributed nodes, GraMI generates the initial low-dimensional representation matrix for all the nodes, based on which raw features of attributed nodes are further reconstructed to leverage accurate attributes. In this way, GraMI can not only complete informative features for non-attributed nodes, but rectify inaccurate ones for attributed nodes. Finally, we conduct extensive experiments to show the superiority of GraMI in tackling HINs with missing and inaccurate attributes.

Accepted by KDD 2025
RELIEF: Reinforcement Learning Empowered Graph Feature Prompt Tuning 2024-11-19
Show

The advent of the "pre-train, prompt" paradigm has recently extended its generalization ability and data efficiency to graph representation learning, following its achievements in Natural Language Processing (NLP). Initial graph prompt tuning approaches tailored specialized prompting functions for Graph Neural Network (GNN) models pre-trained with specific strategies, such as edge prediction, thus limiting their applicability. In contrast, another pioneering line of research has explored universal prompting via adding prompts to the input graph's feature space, thereby removing the reliance on specific pre-training strategies. However, the necessity to add feature prompts to all nodes remains an open question. Motivated by findings from prompt tuning research in the NLP domain, which suggest that highly capable pre-trained models need less conditioning signal to achieve desired behaviors, we advocate for strategically incorporating necessary and lightweight feature prompts to certain graph nodes to enhance downstream task performance. This introduces a combinatorial optimization problem, requiring a policy to decide 1) which nodes to prompt and 2) what specific feature prompts to attach. We then address the problem by framing the prompt incorporation process as a sequential decision-making problem and propose our method, RELIEF, which employs Reinforcement Learning (RL) to optimize it. At each step, the RL agent selects a node (discrete action) and determines the prompt content (continuous action), aiming to maximize cumulative performance gain. Extensive experiments on graph and node-level tasks with various pre-training strategies in few-shot scenarios demonstrate that our RELIEF outperforms fine-tuning and other prompt-based approaches in classification performance and data efficiency.

Accep...

Accepted by SIGKDD 2025

Optimizing Luxury Vehicle Dealership Networks: A Graph Neural Network Approach to Site Selection 2024-11-18
Show

This study presents a novel application of Graph Neural Networks (GNNs) to optimize dealership network planning for a luxury car manufacturer in the U.S. By conducting a comprehensive literature review on dealership location determinants, the study identifies 65 county-level explanatory variables, augmented by two additional measures of regional interconnectedness derived from social and mobility data. An ablation study involving 34 variable combinations and ten state-of-the-art GNN operators reveals key insights into the predictive power of various variables, particularly highlighting the significance of competition, demographic factors, and mobility patterns in influencing dealership location decisions. The analysis pinpoints seven specific counties as promising targets for network expansion. This research not only illustrates the effectiveness of GNNs in solving complex geospatial decision-making problems but also provides actionable recommendations and valuable methodological insights for industry practitioners.

Accet...

Accetped at IEEE BigData 2024, 10 pages, 4 figures, 6 tables, code and data are available at https://github.com/carocciluca/gnn-site-selection

Robust Subgraph Learning by Monitoring Early Training Representations 2024-11-18
Show

Graph neural networks (GNNs) have attracted significant attention for their outstanding performance in graph learning and node classification tasks. However, their vulnerability to adversarial attacks, particularly through susceptible nodes, poses a challenge in decision-making. The need for robust graph summarization is evident in adversarial challenges resulting from the propagation of attacks throughout the entire graph. In this paper, we address both performance and adversarial robustness in graph input by introducing the novel technique SHERD (Subgraph Learning Hale through Early Training Representation Distances). SHERD leverages information from layers of a partially trained graph convolutional network (GCN) to detect susceptible nodes during adversarial attacks using standard distance metrics. The method identifies "vulnerable (bad)" nodes and removes such nodes to form a robust subgraph while maintaining node classification performance. Through our experiments, we demonstrate the increased performance of SHERD in enhancing robustness by comparing the network's performance on original and subgraph inputs against various baselines alongside existing adversarial attacks. Our experiments across multiple datasets, including citation datasets such as Cora, Citeseer, and Pubmed, as well as microanatomical tissue structures of cell graphs in the placenta, highlight that SHERD not only achieves substantial improvement in robust performance but also outperforms several baselines in terms of node classification accuracy and computational complexity.

Efficient and Robust Continual Graph Learning for Graph Classification in Biology 2024-11-18
Show

Graph classification is essential for understanding complex biological systems, where molecular structures and interactions are naturally represented as graphs. Traditional graph neural networks (GNNs) perform well on static tasks but struggle in dynamic settings due to catastrophic forgetting. We present Perturbed and Sparsified Continual Graph Learning (PSCGL), a robust and efficient continual graph learning framework for graph data classification, specifically targeting biological datasets. We introduce a perturbed sampling strategy to identify critical data points that contribute to model learning and a motif-based graph sparsification technique to reduce storage needs while maintaining performance. Additionally, our PSCGL framework inherently defends against graph backdoor attacks, which is crucial for applications in sensitive biological contexts. Extensive experiments on biological datasets demonstrate that PSCGL not only retains knowledge across tasks but also enhances the efficiency and robustness of graph classification models in biology.

Thermodynamic Transferability in Coarse-Grained Force Fields using Graph Neural Networks 2024-11-18
Show

Coarse-graining is a molecular modeling technique in which an atomistic system is represented in a simplified fashion that retains the most significant system features that contribute to a target output, while removing the degrees of freedom that are less relevant. This reduction in model complexity allows coarse-grained molecular simulations to reach increased spatial and temporal scales compared to corresponding all-atom models. A core challenge in coarse-graining is to construct a force field that represents the interactions in the new representation in a way that preserves the atomistic-level properties. Many approaches to building coarse-grained force fields have limited transferability between different thermodynamic conditions as a result of averaging over internal fluctuations at a specific thermodynamic state point. Here, we use a graph-convolutional neural network architecture, the Hierarchically Interacting Particle Neural Network with Tensor Sensitivity (HIP-NN-TS), to develop a highly automated training pipeline for coarse grained force fields which allows for studying the transferability of coarse-grained models based on the force-matching approach. We show that this approach not only yields highly accurate force fields, but also that these force fields are more transferable through a variety of thermodynamic conditions. These results illustrate the potential of machine learning techniques such as graph neural networks to improve the construction of transferable coarse-grained force fields.

Post-...

Post-referee revisions. Accepted by Journal of Chemical Theory and Computation (JCTC). 46 pages, 10 figures + TOC figure + SI (19 pages, 6 figures)

PyGim: An Efficient Graph Neural Network Library for Real Processing-In-Memory Architectures 2024-11-18
Show

Graph Neural Networks (GNNs) are emerging ML models to analyze graph-structure data. Graph Neural Network (GNN) execution involves both compute-intensive and memory-intensive kernels, the latter dominates the total time, being significantly bottlenecked by data movement between memory and processors. Processing-In-Memory (PIM) systems can alleviate this data movement bottleneck by placing simple processors near or inside to memory arrays. In this work, we introduce PyGim, an efficient ML library that accelerates GNNs on real PIM systems. We propose intelligent parallelization techniques for memory-intensive kernels of GNNs tailored for real PIM systems, and develop handy Python API for them. We provide hybrid GNN execution, in which the compute-intensive and memory-intensive kernels are executed in processor-centric and memory-centric computing systems, respectively. We extensively evaluate PyGim on a real-world PIM system with 1992 PIM cores using emerging GNN models, and demonstrate that it outperforms its state-of-the-art CPU counterpart on Intel Xeon by on average 3.04x, and achieves higher resource utilization than CPU and GPU systems. Our work provides useful recommendations for software, system and hardware designers. PyGim is publicly available at https://github.com/CMU-SAFARI/PyGim.

Graph Artificial Intelligence for Quantifying Compatibility Mechanisms in Traditional Chinese Medicine 2024-11-18
Show

Traditional Chinese Medicine (TCM) involves complex compatibility mechanisms characterized by multi-component and multi-target interactions, which are challenging to quantify. To address this challenge, we applied graph artificial intelligence to develop a TCM multi-dimensional knowledge graph that bridges traditional TCM theory and modern biomedical science (https://zenodo.org/records/13763953 ). Using feature engineering and embedding, we processed key TCM terminology and Chinese herbal pieces (CHP), introducing medicinal properties as virtual nodes and employing graph neural networks with attention mechanisms to model and analyze 6,080 Chinese herbal formulas (CHF). Our method quantitatively assessed the roles of CHP within CHF and was validated using 215 CHF designed for COVID-19 management. With interpretable models, open-source data, and code (https://github.com/ZENGJingqi/GraphAI-for-TCM ), this study provides robust tools for advancing TCM theory and drug discovery.

10 pa...

10 pages, 5 figures. Includes open-source dataset and code for reproducibility

Physics meets Topology: Physics-informed topological neural networks for learning rigid body dynamics 2024-11-18
Show

Rigid body interactions are fundamental to numerous scientific disciplines, but remain challenging to simulate due to their abrupt nonlinear nature and sensitivity to complex, often unknown environmental factors. These challenges call for adaptable learning-based methods capable of capturing complex interactions beyond explicit physical models and simulations. While graph neural networks can handle simple scenarios, they struggle with complex scenes and long-term predictions. We introduce a novel framework for modeling rigid body dynamics and learning collision interactions, addressing key limitations of existing graph-based methods. Our approach extends the traditional representation of meshes by incorporating higher-order topology complexes, offering a physically consistent representation. Additionally, we propose a physics-informed message-passing neural architecture, embedding physical laws directly in the model. Our method demonstrates superior accuracy, even during long rollouts, and exhibits strong generalization to unseen scenarios. Importantly, this work addresses the challenge of multi-entity dynamic interactions, with applications spanning diverse scientific and engineering domains.

17 pages, 9 figures
Unveiling the Inflexibility of Adaptive Embedding in Traffic Forecasting 2024-11-18
Show

Spatiotemporal Graph Neural Networks (ST-GNNs) and Transformers have shown significant promise in traffic forecasting by effectively modeling temporal and spatial correlations. However, rapid urbanization in recent years has led to dynamic shifts in traffic patterns and travel demand, posing major challenges for accurate long-term traffic prediction. The generalization capability of ST-GNNs in extended temporal scenarios and cross-city applications remains largely unexplored. In this study, we evaluate state-of-the-art models on an extended traffic benchmark and observe substantial performance degradation in existing ST-GNNs over time, which we attribute to their limited inductive capabilities. Our analysis reveals that this degradation stems from an inability to adapt to evolving spatial relationships within urban environments. To address this limitation, we reconsider the design of adaptive embeddings and propose a Principal Component Analysis (PCA) embedding approach that enables models to adapt to new scenarios without retraining. We incorporate PCA embeddings into existing ST-GNN and Transformer architectures, achieving marked improvements in performance. Notably, PCA embeddings allow for flexibility in graph structures between training and testing, enabling models trained on one city to perform zero-shot predictions on other cities. This adaptability demonstrates the potential of PCA embeddings in enhancing the robustness and generalization of spatiotemporal models.

Integrating GNN and Neural ODEs for Estimating Non-Reciprocal Two-Body Interactions in Mixed-Species Collective Motion 2024-11-18
Show

Analyzing the motion of multiple biological agents, be it cells or individual animals, is pivotal for the understanding of complex collective behaviors. With the advent of advanced microscopy, detailed images of complex tissue formations involving multiple cell types have become more accessible in recent years. However, deciphering the underlying rules that govern cell movements is far from trivial. Here, we present a novel deep learning framework for estimating the underlying equations of motion from observed trajectories, a pivotal step in decoding such complex dynamics. Our framework integrates graph neural networks with neural differential equations, enabling effective prediction of two-body interactions based on the states of the interacting entities. We demonstrate the efficacy of our approach through two numerical experiments. First, we used simulated data from a toy model to tune the hyperparameters. Based on the obtained hyperparameters, we then applied this approach to a more complex model with non-reciprocal forces that mimic the collective dynamics of the cells of slime molds. Our results show that the proposed method can accurately estimate the functional forms of two-body interactions -- even when they are nonreciprocal -- thereby precisely replicating both individual and collective behaviors within these systems.

Accep...

Accepted at NeurIPS 2024. Some contents are omitted due to arXiv's storage limit. Please refer to the full paper at OpenReview (NeurIPS 2024) or https://github.com/MasahitoUWAMICHI/collectiveMotionNN

The GECo algorithm for Graph Neural Networks Explanation 2024-11-18
Show

Graph Neural Networks (GNNs) are powerful models that can manage complex data sources and their interconnection links. One of GNNs' main drawbacks is their lack of interpretability, which limits their application in sensitive fields. In this paper, we introduce a new methodology involving graph communities to address the interpretability of graph classification problems. The proposed method, called GECo, exploits the idea that if a community is a subset of graph nodes densely connected, this property should play a role in graph classification. This is reasonable, especially if we consider the message-passing mechanism, which is the basic mechanism of GNNs. GECo analyzes the contribution to the classification result of the communities in the graph, building a mask that highlights graph-relevant structures. GECo is tested for Graph Convolutional Networks on six artificial and four real-world graph datasets and is compared to the main explainability methods such as PGMExplainer, PGExplainer, GNNExplainer, and SubgraphX using four different metrics. The obtained results outperform the other methods for artificial graph datasets and most real-world datasets.

Graph Neural Networks on Graph Databases 2024-11-18
Show

Training graph neural networks on large datasets has long been a challenge. Traditional approaches include efficiently representing the whole graph in-memory, designing parameter efficient and sampling-based models, and graph partitioning in a distributed setup. Separately, graph databases with native graph storage and query engines have been developed, which enable time and resource efficient graph analytics workloads. We show how to directly train a GNN on a graph DB, by retrieving minimal data into memory and sampling using the query engine. Our experiments show resource advantages for single-machine and distributed training. Our approach opens up a new way of scaling GNNs as well as a new application area for graph DBs.

14 pages, 8 figures
Federated Graph Condensation with Information Bottleneck Principles 2024-11-18
Show

Graph condensation, which reduces the size of a large-scale graph by synthesizing a small-scale condensed graph as its substitution, has immediately benefited various graph learning tasks. However, existing graph condensation methods rely on centralized data storage, which is unfeasible for real-world decentralized data distribution, and overlook data holders' privacy-preserving requirements. To bridge the gap, we propose and study the novel problem of federated graph condensation for graph neural networks (GNNs). Specifically, we first propose a general framework for federated graph condensation, in which we decouple the typical gradient matching process for graph condensation into client-side gradient calculation and server-side gradient matching. In this way, the burdensome computation cost in client-side is largely alleviated. Besides, our empirical studies show that under the federated setting, the condensed graph will consistently leak data membership privacy, i.e., the condensed graph during the federated training can be utilized to steal the training data under the membership inference attacks (MIA). To tackle this issue, we innovatively incorporate information bottleneck principles into the federated graph condensation, which only needs to extract partial node features in one local pre-training step and utilize the features during federated training. Extensive experiments on real-world datasets demonstrate that our framework can consistently protect membership privacy during training. Meanwhile, it also achieves comparable and even superior performance against existing centralized graph condensation and federated graph learning methods.

14 pages
Dual-Frequency Filtering Self-aware Graph Neural Networks for Homophilic and Heterophilic Graphs 2024-11-18
Show

Graph Neural Networks (GNNs) have excelled in handling graph-structured data, attracting significant research interest. However, two primary challenges have emerged: interference between topology and attributes distorting node representations, and the low-pass filtering nature of most GNNs leading to the oversight of valuable high-frequency information in graph signals. These issues are particularly pronounced in heterophilic graphs. To address these challenges, we propose Dual-Frequency Filtering Self-aware Graph Neural Networks (DFGNN). DFGNN integrates low-pass and high-pass filters to extract smooth and detailed topological features, using frequency-specific constraints to minimize noise and redundancy in the respective frequency bands. The model dynamically adjusts filtering ratios to accommodate both homophilic and heterophilic graphs. Furthermore, DFGNN mitigates interference by aligning topological and attribute representations through dynamic correspondences between their respective frequency bands, enhancing overall model performance and expressiveness. Extensive experiments conducted on benchmark datasets demonstrate that DFGNN outperforms state-of-the-art methods in classification performance, highlighting its effectiveness in handling both homophilic and heterophilic graphs.

11pages,17figures
MEEG and AT-DGNN: Improving EEG Emotion Recognition with Music Introducing and Graph-based Learning 2024-11-18
Show

We present the MEEG dataset, a multi-modal collection of music-induced electroencephalogram (EEG) recordings designed to capture emotional responses to various musical stimuli across different valence and arousal levels. This public dataset facilitates an in-depth examination of brainwave patterns within musical contexts, providing a robust foundation for studying brain network topology during emotional processing. Leveraging the MEEG dataset, we introduce the Attention-based Temporal Learner with Dynamic Graph Neural Network (AT-DGNN), a novel framework for EEG-based emotion recognition. This model combines an attention mechanism with a dynamic graph neural network (DGNN) to capture intricate EEG dynamics. The AT-DGNN achieves state-of-the-art (SOTA) performance with an accuracy of 83.74% in arousal recognition and 86.01% in valence recognition, outperforming existing SOTA methods. Comparative analysis with traditional datasets, such as DEAP, further validates the model's effectiveness and underscores the potency of music as an emotional stimulus. This study advances graph-based learning methodology in brain-computer interfaces (BCI), significantly improving the accuracy of EEG-based emotion recognition. The MEEG dataset and source code are publicly available at https://github.com/xmh1011/AT-DGNN.

Stealing Training Graphs from Graph Neural Networks 2024-11-17
Show

Graph Neural Networks (GNNs) have shown promising results in modeling graphs in various tasks. The training of GNNs, especially on specialized tasks such as bioinformatics, demands extensive expert annotations, which are expensive and usually contain sensitive information of data providers. The trained GNN models are often shared for deployment in the real world. As neural networks can memorize the training samples, the model parameters of GNNs have a high risk of leaking private training data. Our theoretical analysis shows the strong connections between trained GNN parameters and the training graphs used, confirming the training graph leakage issue. However, explorations into training data leakage from trained GNNs are rather limited. Therefore, we investigate a novel problem of stealing graphs from trained GNNs. To obtain high-quality graphs that resemble the target training set, a graph diffusion model with diffusion noise optimization is deployed as a graph generator. Furthermore, we propose a selection method that effectively leverages GNN model parameters to identify training graphs from samples generated by the graph diffusion model. Extensive experiments on real-world datasets demonstrate the effectiveness of the proposed framework in stealing training graphs from the trained GNN.

To be...

To be appeared in KDD 2025

Robust Defense Against Extreme Grid Events Using Dual-Policy Reinforcement Learning Agents 2024-11-17
Show

Reinforcement learning (RL) agents are powerful tools for managing power grids. They use large amounts of data to inform their actions and receive rewards or penalties as feedback to learn favorable responses for the system. Once trained, these agents can efficiently make decisions that would be too computationally complex for a human operator. This ability is especially valuable in decarbonizing power networks, where the demand for RL agents is increasing. These agents are well suited to control grid actions since the action space is constantly growing due to uncertainties in renewable generation, microgrid integration, and cybersecurity threats. To assess the efficacy of RL agents in response to an adverse grid event, we use the Grid2Op platform for agent training. We employ a proximal policy optimization (PPO) algorithm in conjunction with graph neural networks (GNNs). By simulating agents' responses to grid events, we assess their performance in avoiding grid failure for as long as possible. The performance of an agent is expressed concisely through its reward function, which helps the agent learn the most optimal ways to reconfigure a grid's topology amidst certain events. To model multi-actor scenarios that threaten modern power networks, particularly those resulting from cyberattacks, we integrate an opponent that acts iteratively against a given agent. This interplay between the RL agent and opponent is utilized in N-k contingency screening, providing a novel alternative to the traditional security assessment.

6 pag...

6 pages, 5 figures, submitted to the 2025 Texas Power and Energy Conference (TPEC)

Modularity aided consistent attributed graph clustering via coarsening 2024-11-17
Show

Graph clustering is an important unsupervised learning technique for partitioning graphs with attributes and detecting communities. However, current methods struggle to accurately capture true community structures and intra-cluster relations, be computationally efficient, and identify smaller communities. We address these challenges by integrating coarsening and modularity maximization, effectively leveraging both adjacency and node features to enhance clustering accuracy. We propose a loss function incorporating log-determinant, smoothness, and modularity components using a block majorization-minimization technique, resulting in superior clustering outcomes. The method is theoretically consistent under the Degree-Corrected Stochastic Block Model (DC-SBM), ensuring asymptotic error-free performance and complete label recovery. Our provably convergent and time-efficient algorithm seamlessly integrates with graph neural networks (GNNs) and variational graph autoencoders (VGAEs) to learn enhanced node features and deliver exceptional clustering performance. Extensive experiments on benchmark datasets demonstrate its superiority over existing state-of-the-art methods for both attributed and non-attributed graphs.

The f...

The first two authors contributed equally to this work

About

Daily ArXiv Papers.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages