You can search the table for keywords and sort by PI name, or whether the project relates to AI, Quantum, Computer Science, Data Science, or Digital Twins.

Be sure to reference the proposal ID number when applying as many PIs have multiple proposals.

Click here for complete submission guidelines.

Click here to apply in InfoReady. 

PI (Click to read full proposal)Project Title (Click to read full abstract)AIQCSDSDT
He, Ting (#29679)

Implementation of Optimized Resource Allocation Algorithms for Distributed LLM ServingLarge Language Models (LLMs) represent a transformative step toward Artificial General Intelligence (AGI), enabling breakthroughs across diverse domains such as healthcare, education, and scientific discovery. However, their deployment remains highly resource-intensive, requiring clusters of well-connected high-end GPUs that are prohibitively expensive for most organizations. This creates an urgent need for methods that democratize access to LLMs without compromising performance.A promising solution is PETALS, a recently proposed open-source system that enables distributed LLM serving by pooling computational resources from distributed nodes equipped with modest GPUs. PETALS significantly reduces the cost barrier by leveraging collaborative resource sharing with broad applications in Edge AI scenarios where the LLM must be hosted on premise (e.g., due to privacy/policy requirements). Despite its potential, PETALS suffers from limitations in throughput and response time due to its simplistic resource allocation strategies, particularly in distributing transformer blocks across nodes and scheduling inference requests.Recent work in Dr. He’s group has addressed these challenges by developing analytical models of PETALS’ system behavior. These models enabled the development of heuristic algorithms for optimized block placement and request routing, achieving 60–80% reductions in response time in simulation studies. However, these algorithms have not yet been validated in real-world deployments, primarily due to PETALS’ lack of extensible interfaces for integrating advanced resource management strategies.This project proposes to bridge this gap by extending PETALS with a suite of APIs that enable seamless adoption of third-party resource allocation algorithms. The APIs will expose critical system metrics, including available GPU memory (in terms of transformer blocks, KV caches, and concurrent sessions), demand forecasts, remaining time and total response time estimates, and queueing status at granular levels (client, node, and node chains hosting the full LLM). These enhancements will not only allow the integration of Dr. He’s algorithms into PETALS but also foster further innovations in distributed LLM serving. By enabling systematic resource management, this work will advance the efficiency and accessibility of LLMs, a cornerstone for scaling AI applications in research and industry.
X
Hu, Xianbiao (XB) (#29748)

Robust Automated Driving in Adverse Weather: AI-Enabled Scene Understanding, Perception, and Control with Digital Twins

Large Language Models (LLMs) represent a transformative step toward Artificial General Intelligence (AGI), enabling breakthroughs across diverse domains such as healthcare, education, and scientific discovery. However, their deployment remains highly resource-intensive, requiring clusters of well-connected high-end GPUs that are prohibitively expensive for most organizations. This creates an urgent need for methods that democratize access to LLMs without compromising performance.A promising solution is PETALS, a recently proposed open-source system that enables distributed LLM serving by pooling computational resources from distributed nodes equipped with modest GPUs. PETALS significantly reduces the cost barrier by leveraging collaborative resource sharing with broad applications in Edge AI scenarios where the LLM must be hosted on premise (e.g., due to privacy/policy requirements). Despite its potential, PETALS suffers from limitations in throughput and response time due to its simplistic resource allocation strategies, particularly in distributing transformer blocks across nodes and scheduling inference requests.Recent work in Dr. He’s group has addressed these challenges by developing analytical models of PETALS’ system behavior. These models enabled the development of heuristic algorithms for optimized block placement and request routing, achieving 60–80% reductions in response time in simulation studies. However, these algorithms have not yet been validated in real-world deployments, primarily due to PETALS’ lack of extensible interfaces for integrating advanced resource management strategies.This project proposes to bridge this gap by extending PETALS with a suite of APIs that enable seamless adoption of third-party resource allocation algorithms. The APIs will expose critical system metrics, including available GPU memory (in terms of transformer blocks, KV caches, and concurrent sessions), demand forecasts, remaining time and total response time estimates, and queueing status at granular levels (client, node, and node chains hosting the full LLM). These enhancements will not only allow the integration of Dr. He’s algorithms into PETALS but also foster further innovations in distributed LLM serving. By enabling systematic resource management, this work will advance the efficiency and accessibility of LLMs, a cornerstone for scaling AI applications in research and industry.


XX
Radice, David (#29857)

Development of a Scalable Multigrid Solver in AthenaK

We propose to develop a scalable multigrid solver for AthenaK, a high-performance compressible magnetohydrodynamics code designed for large-scale computational astrophysics and numerical relativity simulations. While AthenaK excels at solving hyperbolic conservation laws and has demonstrated exceptional scaling to 64,000 GPUs, many astrophysical and terrestrial applications require the solution of Poisson-like (elliptic) equations. These include self-gravity calculations essential for phenomena ranging from planet formation to cosmological structure formation, implicit flux-limited diffusion for radiation-hydrodynamics, and the conformally flat approximation to general relativity used in core-collapse supernovae simulations. We will implement a flexible multigrid method combined with Krylov subspace iterative solvers, exploring strategies optimized for GPU performance including integration with the algebraic multigrid solver MueLU from the Trilinos project. This development will leverage existing multigrid implementations from the legacy Athena++ code while adopting GPU-optimized algorithms to minimize kernel-launch latency. The resulting solver will expand AthenaK’s capabilities to address a broad range of astrophysical and terrestrial applications involving self-gravity, diffusive processes, and incompressible flows, while positioning AthenaK as a competitive platform for applications beyond the astronomy and astrophysics community.


X
Zuo, Wangda (#29879)
Physics Informed Machine Learning and Transfer Learning for HVAC Controls in Data Centers

The exponential growth of artificial intelligence and cloud computing has accelerated the energy demand of high-performance computing (HPC) infrastructure. Data centers now account for a significant portion of global electricity usage, with cooling systems often consuming more than 40% of a facility’s total power. As server densities increase, traditional rule-based control strategies struggle to manage complex, non-linear thermal dynamics, while purely data-driven methods lack safety guarantees.  This project addresses these barriers by developing a Physics-Informed Machine Learning (PIML) framework accelerated by Transfer Learning. Unlike “black-box” AI models, PIML integrates fundamental thermodynamic laws directly into the training process, ensuring that control actions are physically consistent and safe. Furthermore, to avoid the prohibitive cost of retraining models for every scenario, this project will apply Transfer Learning techniques. This will allow control policies optimized in a source simulation to be efficiently adapted to varying IT load profiles or operational constraints, reducing data requirements and enabling robust, sustainable HPC operations.

XX
Ward, Charlotte (#29915)
Testing models for black hole accretion disk instabilities with simulation-based inference, deep learning, and Rubin Observatory light curves

We have entered an exciting era for time-domain astronomy and data science, with the Legacy Survey of Space and Time (LSST) at Rubin Observatory discovering millions of transient and variable objects each year. This proposal aims to develop the statistical methods needed to detect and interpret variability from changing-state active galactic nuclei (CSAGN), which are important probes of supermassive black hole (SMBH) growth. Recent general-relativistic magneto-hydrodynamic (GRMHD) simulations have shown that tearing of the inner accretion disk may result in the unusual variability properties of CSAGN. We aim to support a Rising Researcher to lead an interdisciplinary project between the astronomy and statistics departments in order to develop a simulation-based inference framework to a) estimate the parameters of a GRMHD simulator given realistic light curve data, and b) construct a classifier that can distinguish light curves arising from ‘changing-state events’ to typical light curves from active galactic nuclei. This will be achieved by implementing a normalizing-flow-based neural posterior estimation approach and a neural network classifier. The proposed work will form the foundation for larger proposals to build a real-time detection system for CSAGN that will benefit the broader Rubin community, and support a graduate student to undertake this interdisciplinary study in data science and machine learning.

XX
Zuo, Wangda (#29916)
Adaptive Building HVAC Control via Meta-Reinforcement Learning

Reinforcement learning (RL) has demonstrated strong potential for optimizing HVAC control strategies by simultaneously reducing energy consumption and improving indoor thermal comfort, outperforming conventional rule-based control approaches in many simulation-based studies. Despite these advantages, several critical challenges hinder the large-scale real-world deployment of RL-based HVAC controllers. RL training typically requires years of operational data to achieve stable and satisfactory performance. Moreover, the trial-and-error nature of RL training can result in suboptimal or unsafe control actions during the learning process, raising significant concerns for real building operations. In addition, buildings are inherently heterogeneous, which severely limits the adaptability of RL controllers trained for a single building context when transferred to new buildings or operating conditions. To address these challenges, this project aims to develop an adaptive building HVAC control framework based on meta-reinforcement learning (Meta-RL). Unlike standard RL, Meta-RL trains control policies over a distribution of building environments, enabling the learned controller to rapidly adapt to new environment using only limited additional training data. In this study, multiple task distributions will be systematically constructed to evaluate the feasibility of Meta-RL across different adaptation scenarios, including variations in weather conditions, occupancy patterns, and building configurations.

X
Silverman, Justin (#29934)
Causal Inference for Noisy Sequence Count Data

Across biomedicine, modern sequencing technologies—including microbiome profiling, single-cell RNA sequencing, and bulk gene expression assays—are routinely used to study how treatments, exposures, or environmental factors affect complex biological systems. These technologies generate multivariate count data that serve only as indirect, noisy measurements of biologically meaningful quantities such as microbial abundance, gene expression, or cellular activity. In addition to the intrinsic challenges posed by these data—compositional constraints, sparsity, and count uncertainty—analyses are further complicated by pervasive confounding. Large-scale meta-analyses have demonstrated that host and environmental covariates such as body mass index, age, or medication use can fully explain microbial differences previously attributed to disease, underscoring the need for principled causal methods in this domain.While causal inference methods based on the potential outcomes framework (POF) are widely used to address confounding in biomedical studies, they have not been adapted to the challenges of sequence count data. Existing approaches implicitly define potential outcomes on the observed count scale, even though treatments act on latent biological quantities—true microbial abundance or gene expression—rather than on the stochastic, technology-dependent counts produced by sequencing. This creates a fundamental mismatch between the scientific question and the statistical estimand: causal effects on observed counts conflate biological signal with sampling variability, sequencing depth, and other technical artifacts. Without modeling the latent potential outcomes, causal estimation performed on observed counts lack a clear biological interpretation and can be misleading.The core objective of this project is to develop computational methods that explicitly define and estimate causal effects on latent potential outcomes, while rigorously accounting for uncertainty in the measurement process linking latent biology to observed sequence counts. The proposed approach will introduce joint models that couple latent biological states with stochastic observation models for sequencing data, enabling causal estimands to be defined on the latent biological scale rather than the observed count scale. Building on our recent advances in probabilistic modeling of count data and partially identified causal inference, this work will (i) formalize biologically meaningful causal estimands for latent outcomes, (ii) characterize identifiability and uncertainty under realistic sequencing and sampling models, and (iii) develop scalable algorithms—using both parametric and flexible nonparametric components—for estimating latent average treatment effects in high-dimensional settings. The methods will be evaluated using simulation studies and real microbiome, single-cell, and gene expression datasets to assess statistical performance, robustness to measurement noise, and practical scalability.This project will develop new statistical methods for biologically meaningful causal inference from sequencing data. More broadly, the resulting tools will be applicable across microbiome research, genomics, and other domains where causal questions must be answered using indirect, high-dimensional measurements. Central to this effort is the training of a graduate student, who will work at the intersection of causal inference, probabilistic modeling, and statistical computing to develop and implement these methods. Through hands-on involvement in defining latent causal estimands, characterizing identifiability under realistic sequencing models, and building scalable computational algorithms, the student will gain rigorous training in modern causal methodology for complex biological data.

X
Ward, Charlotte (#29938)
Joint analysis of ground and space-based imaging in the era of Rubin and Euclid: applying a GPU-accelerated, neural network-stabilized approach to modeling multi-resolution astronomical datasets

With Vera C. Rubin Observatory’s Legacy Survey of Space and Time (LSST) poised to discover tens of millions of variable and transient objects over the next decade, extracting accurate host galaxy properties for these sources presents a critical algorithmic challenge: ground-based imaging will suffer from significant galaxy blending due to the point spread function, with half of Rubin galaxies overlapping with nearby sources. This proposal aims to support a junior research to scale up and test Scarlet2, a GPU-compatible software package written in JAX that enables simultaneous modeling of multi-resolution imaging from ground-based (Rubin) and space-based (Euclid) telescopes, leveraging Euclid’s superior spatial resolution to improve galaxy property extraction from Rubin data. Scarlet2’s key advantages include GPU acceleration for computational efficiency, the ability to model complex non-parametric galaxy morphologies using either wavelet coefficients or score-based neural network priors trained on relevant galaxy populations, and simultaneous modeling of correlated noise from image alignment. Our specific objectives are to: (1) develop an optimal training set and train the neural network prior for AGN host galaxies, (2) create a scalable pipeline for extracting and jointly modeling overlapping Rubin-Euclid imaging, (3) benchmark the joint analysis approach against single-survey data products to validate performance and reliability, and (4) release Jupyter notebooks on the Rubin Science Platform demonstrating the methodology for broader community use. We will apply this pipeline to determine host galaxy properties of changing-look AGN, establishing the foundation for future large-scale cosmological applications.

XX
Emami-Meybodi, Hamid (#29940)
Time-Series Entropy-Aware Digital Twin for Gas-Lift Optimization in Unconventional Wells

Unconventional shale oil wells operated under gas lift exhibit strongly coupled, nonlinear, and time-varying multiphase flow behavior driven by evolving reservoir conditions, operating practices, and surface constraints. A central variable governing gas-lift effectiveness and well deliverability is flowing bottomhole pressure (FBHP). The proposed research aims to develop an AI-enabled, closed-loop workflow for dynamic gas-lift operations by coupling an entropy-aware FBHP digital twin with reinforcement learning-based control. The first component of the workflow focuses on FBHP prediction using an augmented Zentropy-Enhanced Neural Network (ZENN) that incorporates time-series data via Long Short-Term Memory (LSTM). The second component of the research integrates the ZENN-based FBHP digital twin with deep reinforcement learning to optimize field-wide gas-lift allocation. Specifically, Proximal Policy Optimization (PPO) will be employed to learn an injected-gas allocation policy that maximizes total oil production across a set of gas-lifted wells while satisfying shared gas-supply constraints and per-well operational limits. The expected outcome of this work is a prototype “digital twin + control” framework that (i) develops a time series-oriented AI model based on ZENN-LSTM to predict FBHP accurately and robustly under dynamic gas-lift conditions and (ii) identifies gas-allocation strategies that outperform baseline heuristics while respecting operational constraints. Project Description: The Zentropy-Enhanced Neural Network (ZENN) [1] is a recently proposed thermodynamics-inspired, data-driven modeling framework that embeds zentropy theory into neural network architectures to explicitly account for data heterogeneity and intrinsic uncertainty. As illustrated in Fig. 1, unlike conventional ANNs [2], which implicitly assume data homogeneity and rely solely on empirical loss functions, ZENN simultaneously learns an energy term and an intrinsic entropy term for each latent configuration, integrating them via the Helmholtz energy formulation. In contrast to conventional ANNs, which often overfit and struggle with extrapolation, ZENN demonstrates superior stability and accuracy, especially in predicting high-order derivatives, due to its thermodynamically consistent architecture. These characteristics make ZENN particularly well suited to predicting FBHP during gas-lift operations, where production data originate from heterogeneous sources (rates, pressures, temperatures, injection conditions) and exhibit strong nonlinear behavior, especially in unconventional reservoirs, where well behavior can vary significantly across the same reservoir. However, the ZENN approach does not capture time-dependent relationships. This is where the LSTM [3] approach can be utilized as it captures nonlinear, time-dependent relationships in well and reservoir data. Their ability to learn both short- and long-term trends makes them effective for handling noisy, history-dependent pressure behavior in wells.This project aims to develop, for the first time, a time-series, entropy-aware, data-driven digital twin [1] by incorporating ZENN and LSTM for FBHP prediction in gas-lifted unconventional oil wells, integrating reinforcement learning [4] to optimize field-wide injected-gas allocation and maximize overall production, subject to operational constraints. Proximal Policy Optimization (PPO) [5] will be employed as the reinforcement learning algorithm due to its training stability, sample efficiency, and suitability for continuous, constrained control problems typical of gas-lift operations. The research will leverage field data from 21 unconventional wells in the Permian Basin [6], including wellhead pressure, hydrocarbon production rates, gas and water rates, and gauge pressure measurements.

XX
Maeng, Kiwan (#29942)
Building a Benchmark and Simulation Framework for Agentic AI Systems

Agentic AI—multiple AI agents interacting with one another and with external tools to solve complex problems—is becoming a new paradigm in artificial intelligence. Agentic AI imposes heavy demands on computational resources and exhibits unique system characteristics, calling for tailored system design and research from the larger system/hardware community. However, research on system optimization for agentic AI is hindered by two key challenges: (1) low experimental reproducibility stemming from the inherently stochastic nature of agent behaviors, and (2) the substantial computational resource requirements. To accelerate system research in this emerging domain, we propose to develop a benchmark and simulation methodology/framework for agentic AI. Our approach centers on collecting replayable traces of agent behaviors and building a simulation framework that can efficiently run the traces to evaluate the system overheads under varying system and hardware configurations. By decoupling workload behavior from underlying system execution and controlling each separately, our framework will enable rapid, scalable, and reproducible exploration of the system design space for agentic AI. Our framework will be open-sourced to the community for broader impacts.

X
Zhang, Rui (#29946)
Gene Set Function Discovery by Harnessing Large Language Models

Proposal AbstractScientific Aims. Gene set function discovery identifies the biological roles of gene groups, but traditional enrichment approaches are limited to known functions in reference databases, hindering the discovery of novel mechanisms for understudied genes. This project develops an innovative framework harnessing large language models (LLMs) for gene set function discovery through three interdependent tasks: (1) a knowledge-grounded retrieval system using heterogeneous genomics knowledge graphs to mitigate LLM hallucination, (2) an adaptive LLM pipeline for hypothesis generation with reinforcement learning fine-tuning to incentivize biological reasoning, and (3) a composite confidence score combining similarity metrics, LLM predictions, and literature evidence for reliable uncertainty quantification.Intellectual Merit. Our innovation lies in seamlessly integrating expert genomics knowledge with LLM reasoning capabilities through a knowledge-grounded, reasoning-rich, and uncertainty-aware framework. From a biological perspective, we accelerate hypothesis generation by overcoming the limitations of traditional statistical methods. From an AI perspective, we contribute novel solutions including heterogeneous genetic knowledge graphs, reinforcement learning for biological reasoning, and normalized pointwise mutual information for confidence estimation based on literature mining.Broader Impacts. This research will revolutionize functional genomics by enabling the discovery of novel biological mechanisms not captured in existing databases, particularly for understudied gene sets from cutting-edge omics experiments. Our framework will significantly impact drug discovery and personalized medicine by helping researchers identify therapeutic targets and understand disease mechanisms, potentially accelerating treatment development for rare diseases and complex disorders.

X
Li, Xiantao (#29969)
Quantum-Accelerated State Estimation: Breaking the Covariance Bottleneck in High-Dimensional Systems

State estimation—inferring the internal state of a system from noisy sensor data—is the backbone of modern control theory, used in everything from tracking satellites to monitoring power grids. The Kalman Filter (KF) is the optimal estimator for linear systems, but it faces a severe scaling limit in high dimension. The Ensemble Kalman Filter (EnKF) approximates covariance using Monte Carlo samples, but it introduces sampling noise, requires managing a larger number of trajectories, and can under-resolve low-probability but high-impact events that are critical for safety and reliability. We propose a fundamentally new approach grounded in the PI’s expertise in simulating open quantum systems via ensembles of quantum trajectories. The PI’s recent unitary-dilation framework for SDEs maps the evolution of covariance-relevant statistics to a deterministic Lindblad master equation on a dilated quantum processor, enabling direct estimation of quadratic uncertainty measures, which is particularly efficient for high-dimensional systems with sparse observations.

XXX
Li, Xiantao (#29970)
Quantum Algorithms for Non-Hermitian Materials and Topological Phases

A ground or thermal state of a quantum system can be reached via imaginary time evolution, which is intrinsically non-unitary and can be viewed through an effective non-Hermitian generator. In addition, many emerging materials, ranging from driven-dissipative lattices to non-Hermitian topological systems, are governed by effective non-unitary dynamics. These models exhibit distinctive physical signatures, including non-Hermitian topological phases and the skin effect, but they can be difficult to simulate reliably at scale. We propose a dilation-based strategy that renders non-unitary evolution compatible with unitary quantum circuits while remaining hardware-aware. The project will deliver: (1) a prototype implementation of dilation-based ground-state preparation using quantum imaginary time evolution on digital quantum platforms; and (2) preliminary studies of non-Hermitian topological phases and the non-Hermitian skin effect, on IBM and IONQ’s quantum computers. 

XX
Khan, Arslan (#29980)
Behavior-Centric High Fidelity Sensor Modeling for Robust Digital Twins

AbstractDigital twins are critical for validating Cyber-Physical Systems (CPS), yet they predominantly rely on idealized sensor models that fail to capture the complex, firmware-driven behaviors of modern sensors. This emulation gap is often so severe that the code running on the digital twin frequently utilizes a fundamentally different software stack than the physical system, causing the results of the emulation to diverge significantly from real-world results. To address this, we propose a behavior-centric sensor modeling framework that bridges the divide between physical reality and digital simulation. By moving beyond static noise parameters, we utilize hybrid system identification to construct probabilistic state machines that represent sensor behaviors, such as bursty latency, mode switching, and correlated drift, as abstract classes rather than specific device instances. We further introduce a continuous validation pipeline using statistical distance metrics (e.g., Wasserstein distance) to detect distributional mismatch between simulated and physical streams in real-time. The result is a reusable ontology of sensor behaviors that allows researchers to run the exact same software stack in simulation as in deployment, ensuring robust validation against realistic operational uncertainties. Students will work on our lab’s in-house simulator, BoardRunner, developed with the DARPA FIRE program, to extend the capabilities of our framework.   

X
Iadecola, Thomas (#29992)
Transport and information dynamics of kinetically constrained quantum many-body systems

Kinetic constraints arise naturally in strongly interacting systems, where forbidden local rearrangements force transport of particles or energy to occur through collective motion, leading to slow dynamics as seen in glassy materials. In quantum many-body systems, such constraints give rise to an even richer phenomenology due to phase coherence and interference effects. While these systems remain poorly understood, growing evidence suggests they may support robust quantum information storage and processing. Their intrinsic complexity also makes them natural targets for study using emerging quantum simulation platforms, such as Rydberg atoms in optical tweezers.In this project, we will study a quantum many-body system with “tower of Hanoi” kinetic constraints, which can be naturally implemented in Rydberg atom quantum simulators. The system consists of two species of hard-core particles, a mobile “light” species whose configuration constrains the motion of a “heavy” species. We will investigate three key aspects of the resulting dynamics. First, we will characterize particle transport, focusing on how the heavy-particle dynamics slow with increasing light-particle density and determining whether transport is diffusive or anomalous. Second, we will analyze the system’s Hilbert space as a graph of allowed transitions, identifying interference-based motifs that may produce localized eigenstates via Aharonov–Bohm caging, a distinctly quantum phenomenon accessible in experiments. Finally, we will study energy transport and entanglement dynamics, which are expected to exhibit behavior beyond classical intuition due to the non-commutativity of local energy operators.This work will be closely connected to ongoing discussions with the experimental group of Bryce Gadway (PSU Physics), whose Rydberg atom quantum simulator can realize these constraints. Our results will help guide and interpret early experiments on this platform.

X
Hao, Wenrui (#29993)
Learning Patterns and Structures in Partial Differential Equations via Large Language Models

Partial differential equations (PDEs) underpin many digital twin models used to represent complex spatiotemporal processes in biology, physics, and engineering. In applications such as disease progression and treatment response, PDE-based models encode how system states evolve and form patterns over time and space. Identifying stable solution patterns, such as regimes in which no drug intervention is required for a particular patient, is critical for digital twin design, validation, and control, yet remains computationally expensive and difficult to explore using traditional modeling workflows alone.Conventional numerical approaches, including time integration, stability analysis, and bifurcation-based solvers, have been effective for simulating PDE models but often do not scale well when models must be personalized, parameter spaces become high-dimensional, or rapid decision-making is required. These limitations pose significant challenges for AI-driven digital twins, where models are queried repeatedly to support prediction, uncertainty quantification, and intervention planning.This project aims to harness Large Language Models (LLMs) as AI engines that augment PDE-based digital twins, with a focus on reaction–diffusion systems and pattern formation problems. Rather than treating PDE solvers as black boxes, we propose an AI-integrated framework in which LLMs learn from simulation data, prior model evaluations, and domain knowledge to identify patterns, summarize system behaviors, and guide efficient exploration of model parameters. The overarching goal is to enable faster, more interpretable, and more adaptive digital twin workflows that support real-time analysis and control.Using canonical models such as the Gray–Scott reaction–diffusion system as testbeds, the project will investigate how LLMs can (i) automatically recognize and label qualitative pattern regimes (e.g., spots, stripes, mixed, or chaotic structures), (ii) learn mappings between model parameters and clinically or physically meaningful outcomes such as stability or multistability, and (iii) recommend promising parameter regions for targeted simulation where transitions or multiple steady states are likely to occur. By learning from accumulated simulations and historical model evaluations, LLMs can substantially reduce the need for exhaustive parameter sweeps.Beyond pattern recognition, the project will explore how LLMs can organize, interpret, and communicate large collections of PDE solutions, transforming raw simulation outputs into actionable insights for digital twin deployment. This includes identifying robust pattern regimes, detecting early-warning signals of transitions, and supporting hypothesis generation for personalized intervention strategies. All AI-generated insights will be validated through targeted numerical simulations to ensure reliability and trustworthiness.PI Hao brings expertise in nonlinear PDE–based pattern formation, AI-driven modeling, and digital twin development, while Co-PI Yin is an expert in large language models and AI systems. The two PIs have a strong history of long-term collaboration, providing a solid foundation for this interdisciplinary project.Rising Researchers, including a graduate student from Co-PI Yin’s group and a postdoctoral researcher from PI Hao’s group, will contribute to LLM-enhanced digital twin frameworks, AI-guided PDE simulation pipelines, pattern discovery and representation learning, and human–AI interaction for scientific modeling. The project sits at the intersection of **artificial intelligence, digital twins, and scientific computing**, with strong potential impact on personalized medicine, biological modeling, and AI-driven decision support systems. 

X
Emami-Meybodi, Hamid (#30008)
Learning-Accelerated Modeling and Optimization of In-Situ H2 Production and CO2 Storage

Abstract: In-situ hydrogen (H2) production through engineered water-rock reactions in mafic and ultramafic formations (“orange hydrogen”) offers a promising low-carbon energy pathway, particularly when coupled with permanent CO2 sequestration via mineralization. These systems involve tightly coupled multiphase flow and geochemical reactions, making prediction and optimization highly challenging. High-resolution reactive transport models can resolve these processes, but are computationally prohibitive for uncertainty quantification and operational optimization. The proposed research aims to develop a physics-informed deep-learning framework to enable rapid, high-resolution prediction and optimization of coupled in-situ H2 production and CO2 mineralization. The research consists of two integrated components. First, a surrogate model is developed to learn the solution operator of the multiphase reactive transport system, enabling fast, generalizable predictions of pressure evolution, gas saturation, mineralization, and H2 generation. Second, the surrogate is embedded within an optimization workflow to identify operational strategies that maximize H2 production and CO2 storage while minimizing pressure buildup. This deep-learning-accelerated framework provides a scalable tool for evaluating and designing coupled H2 production and CO2 sequestration systems, advancing the development of low-carbon subsurface energy technologies.Project Description: Recent advances in deep learning in geoscience applications have enabled rapid prediction of multiphase subsurface flow [1]. In particular, Fourier Neural Operators (FNO) [2] have emerged as a powerful approach for learning solution operators of partial differential equations, offering high-resolution generalization and efficient prediction across heterogeneous domains (see Fig.1) [3]. Compared with traditional Convolutional Neural Network (CNN) surrogates, which perform local image-to-image regression, FNO better represents global pressure propagation and has superior data utilization efficiency. In this project, both FNO- and CNN-based surrogate models are developed and systematically compared to quantify their relative strengths in predicting coupled multiphase flow, gas generation, and geochemical reactions. Once trained, the surrogate models serve as fast forward engines for optimization and uncertainty analysis, replacing computationally expensive numerical solvers. The surrogate models enable large-scale parametric studies and multi-objective optimization to identify operational strategies that balance competing performance metrics [4, 5]. Specifically, the optimization framework targets maximizing H2 production and CO2 mineralization efficiency while minimizing pressure buildup. This learning-accelerated workflow enables thousands to millions of model evaluations that would otherwise be infeasible using high-fidelity simulations.This project aims to develop a physics-informed prediction and optimization framework for coupled in-situ H2 production and CO2 mineralization. Training data for surrogate model development are generated from 3D multiphase reactive transport simulations implemented in the MOOSE framework [6], which resolve fully coupled multiphase flow and geochemical reactions under realistic geological heterogeneity and operational conditions. ReferencesTahmasebi, P., et al., Machine learning in geo- and environmental sciences: From small to large scale. Advances in Water Resources, 2020. 142: p. 103619.Wen, G., et al., U-FNO-An enhanced Fourier neural operator-based deep-learning model for multiphase flow. Advances in Water Resources, 2022. 163: p. 104180.Wen, G., et al., Real-time high-resolution CO2 geological storage prediction using nested Fourier neural operators. Energy & Environmental Science, 2023. 16(4): p. 1732-1741.Ma, M., Q. Zhang, and H. Emami-Meybodi, A proxy-based workflow for screening and optimizing cyclic CO2 injection in shale reservoirs. Fuel, 2025.Bocoum, A.O. and M.R. Rasaei, Multi-objective optimization of WAG injection using machine learning and data-driven proxy models. Applied Energy, 2023. 349: p. 121593.Gaston, D., Newman, C., Hansen, G., Lebrun-Grandié. MOOSE: A parallel computational framework for coupled systems of nonlinear equations. Nuclear Engineering and Design, 2009. 239 (10), 1768-1778.

XX
Liu, Chaoxing (#30013)
Discovering Emergent Quantum Matter in Ideal Topological Flat Bands with Neural Networks

Topological flat bands (TFBs) have sparked intense research interest due to their potential to realize exotic correlated physics, including fractional Chern insulators (FCIs) and superconductivity. The PI has recently proposed a new design principle to realize TFBs in two-dimensional moiré heterostructures with type-II band alignment. Crucially, these predicted bands possess “ideal quantum geometry,” a condition theoretically argued to promote interaction-driven topological phases. However, the full interacting phase diagram of these systems remains unexplored because strong Coulomb interactions could induce significant inter-band mixing, which is challenging using standard numerical methods, such as band-projected exact diagonalization. To overcome this, we propose leveraging the variational Monte Carlo method integrating with transformer-based neural network representation of the many-body wavefunctions to map the many-body phase diagram of ideal TFBs. This unbiased numerical approach will allow us to explore correlated phenomena in two-dimensional moiré heterostructures — such as FCIs, Kondo physics, and superconductivity — beyond the system-size limitations of band-projected exact diagonalization. Planned activities include:  Solve many-body ground state using exact diagonalization for small size systems in the limit with negligible inter-band mixing. Solve many-body ground state with variational Monte Carlo method integrating with transformer-based neural networks for small size systems and compare it with the solution from exact diagonalization to justify the numerical approach. Apply the neural network approach to large size systems to explore interacting phase diagram and enable reliable extrapolation toward the thermodynamic limit. Apply the neural network approach to realistic moiré models for two-dimensional moiré heterostructures with type-II band alignment to optimize model parameters for the fractional Chern insulator phase. 

XX
Obringer, Renee (#30020)
AI-Driven Risk Analysis for Future Utility-Scale Renewable Energy Integration

Renewable energy is being rapidly adopted around the world as a means of decarbonizing the electricity sector, while also providing adequate supply for growing populations and end-uses. However, these energy sources are susceptible to changes in the surrounding environmental conditions, which can reduce their reliability in the face of extreme events. Further, the plans for large-scale renewable energy integration align with intensifying climate change, which is likely to lead to changes in the known patterns and frequencies of extreme events, possibly leading to unexpected impacts on the electricity sector. Given this potential feedback loop between climate change and one of our primary means of reducing emissions, there is an urgent need to understand how renewable energy systems respond to extreme weather events, which are likely to increase in intensity and frequency in the next few decades. Despite acknowledgement in the research community that this need exists, there is limited movement to integrate these impacts into existing energy system models used in practice, which often require extensive calibration of individual systems. There is a need to expand the capabilities of these models to provide a more representative view of how renewable energy technologies might behave in the future. This research aims to leverage explainable AI to fill these gaps in the understanding of weather-related impacts on renewable energy systems, which will, in turn, facilitate more proactive resilience planning within the energy sector.Specifically, the goal of this project will be to develop a probabilistic AI-driven framework to predict renewable energy potential under various extreme weather events and analyze the risk of supply inadequacies due to those events. Under the mentorship of the PI, the graduate student will integrate several AI algorithms into an ensemble framework that can predict the renewable energy generation under specific extreme weather events, such as droughts and heatwaves. The study will leverage publicly available data, as well as simulated weather data to generate rare, but plausible events for the analysis. This work will build off the PI’s preliminary analysis considering average long-term changes to the renewable energy sector to further explore the unique relationship between extreme weather and renewable energy, now and in the future. Ultimately, this proposal aims develop an AI-driven framework that can facilitate proactive resilience planning within the electricity sector. Further, this framework will be generalizable such that it can be easily transferred to other sectors and events for future analysis. 

XX
Li, Bin (#30022)
Scalable Computational Frameworks for Large-Scale Digital Twins: A Heterogeneous Mean-Field Approach

Digital Twins (DT) are evolving rapidly from isolated, high-fidelity replicas of single assets (e.g., a single turbine or robot) into massive, interconnected systems. For example, a DT-enabled intelligent traffic system requires the simultaneous coordination of thousands of interacting agents (e.g., operating vehicles, traffic facilities like traffic signals, traffic cameras, etc.). However, a critical gap exists in current DT literature. The vast majority of existing work focuses predominantly on application-level system design and scenario visualization (e.g., high-fidelity graphical rendering in game engines). However, these application-centric studies often rely on idealized communication assumptions, presuming unlimited bandwidth, zero latency, and perfect synchronization. This assumption is fundamentally unrealistic for large-scale deployments. In practice, synchronizing such large-scale systems triggers the “Curse of Dimensionality”. As the number of physical entities grows, the communication and computational resources required to model interactions and maintain physical-virtual coherence grow exponentially. Current centralized algorithms, which attempt to track every individual entity, suffer from high complexity, leading to prohibitive resource consumption that violates real-world network constraints. This overhead causes unacceptable latency, where the “Twin World” lags behind the physical system, breaking the real-time bi-directional feedback loop essential for safety-critical control. Consequently, standard “application-first” methods become computationally intractable for large-scale systems, highlighting an urgent need for resource-efficient, theoretically grounded synchronization strategies that account for realistic physical constraints.

X
Li, Bin (#30023)
Generative Artificial Intelligence for Adaptive and Scalable Extended Reality

This project addresses fundamental bottlenecks that limit the widespread adoption of Extended Reality (XR) by everyday users. Current XR systems face two primary obstacles: (1) the high cost and complexity of authoring high-fidelity 3D content, which typically requires domain expertise in 3D modeling, animation, and programming; and (2) a steep learning curve driven by non-intuitive interaction methods, such as handheld controllers, scripted gestures, or rigid voice commands. While XR (VR, AR, and MR) offers transformative potential across applications like VR-based education, AR-based assistance, and MR-based training, scaling these experiences to meet diverse user needs remains impractical under manual workflows. Building upon our research in the paper “When Generative AI Meets Extended Reality: Enabling Scalable and Natural Interactions”, this project focuses on Context-Aware Embodied AI. In particular, we focus on fine-tuning Vision-Language Models (VLMs) to interpret physical spatial data and user intent. This enables “scene-aware” assistants that provide real-time, spatially anchored guidance, such as automatically detecting a floor to place navigation arrows or identifying specific building entrances to anchor digital labels. This move toward intelligent, adaptive assistants helps move XR beyond one-size-fits-all solutions.

X
Li, Bin (#30025)
Constrained Online Learning in Evolving Environments

Many real-world problems, including digital twin synchronization, scheduling for federated learning, and control of quantum systems, can be framed as constrained online optimization or learning. In these problems, an agent repeatedly makes decisions over time and incurs a performance cost along with several constraint-related penalties. The goal is to achieve low long-run cost while keeping the long-run average penalties within acceptable limits. The key challenge is that the cost and penalty functions for a given round are revealed only after the decision is made. To make such problems tractable, many works assume the system behaves similarly over time, for example by modeling costs and penalties as independent and identically distributed (i.i.d.) across rounds. We are interested in the harder constrained online learning setting, where the agent only observes the outcomes of the action it actually took. This creates a natural exploration–exploitation tradeoff, as in reinforcement learning, and better reflects practical systems where outcomes of actions not taken are not observable. In this project, we will develop theory to explain and guarantee this rapid recovery behavior, including worst-case performance bounds that remain favorable even when infeasible periods are long. We will also generalize the approach beyond our initial wireless scheduling case study, with the goal of making the technique broadly applicable in constrained online learning and machine learning systems.

XX
Van Duin, Adrianus C (#30030)
A Unified Database and Metric System for Parameter Analysis and Transferability of Reactive Force Fields

Reactive force fields, particularly ReaxFF, are widely used to model chemically reactive systems at length and time scales inaccessible to quantum mechanical methods. Since its introduction (van Duin et al., 2001), ReaxFF has enabled simulations of bond breaking and formation using continuous bond formation and bond breaking in complex environments such as catalysis, materials growth, corrosion, energetic materials, and solid–liquid interfaces (Mao et al., 2023; Senftle et al., 2016). Many academic and industry groups around the world are currently using ReaxFF simulations – over a 1000 ReaxFF related publications are reported in literature, and ReaxFF has been integrated in leading open-source (LAMMPS) and commercial (AMS, Material Studio) software frameworks.Despite its success, ReaxFF force-field development remains fragmented and difficult to evaluate systematically, especially for outside groups not directly involved in the training process. Parameter sets are typically developed in isolated studies using different training datasets, weighting schemes, and optimization strategies, and are often reported as single optimized solutions (Dumortier et al., 2024; Shchygol et al., 2019). Multiple studies have demonstrated that many distinct parameter sets can fit the same training data equally well while producing substantially different predictions outside the fitting domain (Larentzos et al., 2015; Shchygol et al., n.d.) These observations highlight unresolved challenges related to parameter uncertainty, identifiability, and transferability (Krishnamoorthy et al., 2021; Mishra et al., 2018; Senftle et al., 2016).In addition, numerous advanced ReaxFF parameterization strategies have been proposed, including genetic algorithms (Jaramillo-Botero et al., 2014), Monte Carlo and simulated annealing approaches (Iype et al., 2013), evolutionary multi-objective methods (Krishnamoorthy et al., 2021), and machine-learning–assisted frameworks  (Guo et al., 2020; Kaymak et al., 2022). While these methods improve optimization efficiency, they do not resolve a fundamental gap: the lack of a unified framework to analyze, compare, and interpret the resulting force fields in a consistent and reproducible manner (Dumortier et al., 2024; Senftle et al., 2016).This project proposes to develop a unified, data-driven infrastructure for organizing and evaluating reactive force-field parameter sets, with an initial focus on ReaxFF. Rather than creating new force fields, the project emphasizes systematic post-optimization analysis and provide facilities to merge force fields and training sets. A curated database will link published ReaxFF parameter sets with their associated training data, optimization methods, and validation targets. Standardized analysis tools will be developed to quantify parameter uncertainty, correlations, sensitivity, and predictive performance across multiple chemical environments and property classes.The resulting open-source software platform will be designed to integrate with existing molecular simulation workflows and to scale efficiently on ICDS computational infrastructure. The project offers multiple entry points for Rising Researchers, including database curation, development of analysis metrics, benchmarking studies, visualization tools, and high-throughput computational workflows. ReferencesDumortier et al. JCTC 20, 3779, 2024; Guo et al. Comp.Mat.Sci. 172, 2020; Iype et al. J.Comp.Chem. 34, 1143, 2013Jaramillo-Botero et al. JCTC 10, 1426, 2014; Kaymak et al. JCTC 18 5181, 2022; Krisnhamoorthy et al. SoftwareX 13, 2021Larentzos et al. JCTC 11, 381 2015; Mao et al. Prog. Energy & Combust. Sci. 97, 2023; Mishra et al. npj CompMatSci 4, 2018Senftle et al. npj CompMatSci 2, 2016; Shchygo et al. JCTC 15, 6799, 2019; van Duin et al. J.Phys.Chem A. 105, 9396, 2001  

XX
Grab, Heather (#30049)
DeepFlora: Integrating Citizen Science and Remote Sensing for Spatiotemporal Biodiversity Prediction

This project aims to develop DeepFlora, a scalable spatiotemporal deep learning framework that predicts flowering plant community composition and blooming phenology by integrating citizen science biodiversity records with remotely sensed environmental data. The central motivation is a persistent data gap in resource use and conservation ecology: while flowering plants provide critical resources for pollinators and many other organisms, we lack high-resolution, time-resolved maps of where and when flowering resources occur across landscapes. Addressing this gap requires new approaches that can fuse heterogeneous, imperfect, and large-volume data sources using modern AI methods.DeepFlora leverages deep convolutional neural networks trained on millions of citizen-reported plant occurrence records combined with spatial environmental data. In this project, we will adapt and extend these models to incorporate a temporal dimension, enabling daily predictions of flowering probability for over 100 key plant species across Pennsylvania at 30 m spatial resolution. Input data will include plant occurrence records from the Global Biodiversity Information Facility, remotely-sensed satellite imagery, and daily weather and phenology covariates. The resulting model outputs will provide spatially explicit predictions of flowering plant communities and their seasonal dynamics.From a data science perspective, the project centers on several open and tractable challenges that may be of interest to Rising Researchers. These include learning from irregular and biased citizen science data, integrating multi-modal inputs that vary in scale and uncertainty, extending spatial deep learning models to spatiotemporal prediction, and evaluating model performance in data-sparse regions. There are also opportunities to explore transfer learning, uncertainty quantification, and model interpretability in an applied environmental context. While the ecological application focuses on flowering plant communities and pollinator habitat, the methods developed are broadly applicable to biodiversity modeling, land-cover dynamics, and ecological forecasting. Rising Researchers with interests in machine learning, geospatial data science, remote sensing, or scientific visualization would have opportunities to contribute to model development, data pipelines, evaluation strategies, or interactive visualization tools. Contributions from Rising Researchers could also include improving temporal encoding methods, testing alternative architectures, developing scalable training workflows using high-performance computing, or creating visualization interfaces that translate complex model outputs into interpretable maps. 

X
Shang, Shunli (#30053)
Thermodynamics-Inspired AI-Enabled Framework for Predictive Modeling of Materials Properties

Materials properties originate from free-energy landscape and its derivatives, underscoring its critical role in governing structure, stability, and response with respect to external conditions. The proposed project seeks to establish a thermodynamics‑informed, artificial intelligence (AI) enabled framework that leverages the free‑energy landscape for predictive materials modeling. We will demonstrate two complementary approaches: (1) Bottom‑Up Forward Prediction, where materials properties are derived from chemistry and configurations using our multi‑entropy (zentropy) formalism based on density functional theory (DFT) and machine learning [JPCM 36, 2024, 343003, DOI: 10.1088/1361-648X/ad4762]; and (2) Top‑Down Inverse Discovery, where macroscopic properties guide reconstruction of the free‑energy landscape using our zentropy‑enhanced neural network (ZENN) [PNAS 123, 2026, e2511227122, DOI: 10.1073/pnas.2511227122]. Our testbed system is the medium‑entropy alloy VCoNi, chosen for its chemical complexity and technological relevance. We will focus on two representative properties, stacking fault energy (SFE) and the fcc-hcp phase transformation, each directly tied to the free‑energy landscape. To support these efforts and broaden accessibility, we will build PyZENN, an open‑source Python package for learning and predicting free‑energy landscapes.  The project integrates predictive modeling, AI development, and computational tool building, while fostering interdisciplinary collaboration with junior researchers. The outcomes will provide preliminary data, software infrastructure, and conceptual advances to position our group for competitive federal proposals. 

XX
Ivory, Sarah (#30073)
AI-Assisted Fossil Pollen Recognition with ZENN-based Uncertainty Modeling

Significance: Climate is changing at historically unprecedented rates with important implications for ecosystem stability.  In Africa, in particular, people depend daily on services provided by natural ecosystems, like medicine, food, and fresh water. Thus, understanding and predicting ecosystem changes is critical, as most economies aren’t prepared to adapt once resources are gone.  Information about the response of vegetation to climate in the past from fossil pollen plays an important role in understanding natural ranges of variability and ecosystem vulnerability under changing conditions not observed in the historical record. Problem: Pollen analysis today is performed in the same way as it was over 100 years ago and is a slow, manual process.  Machine learning and AI tools stand to revolutionize the way that fossil data is generated, the volume of data available, and techniques for generating predictions from sparse, uncertain data. Unfortunately, pollen data is heterogeneous (comes from multiple sources and is viewed from multiple perspectives), and identifying pollen from a fossil record requires algorithms to extrapolate from training sets derived from modern pollen reference images.  Traditional image classifiers do not perform well on complex, multi-domain, heterogeneous datasets.The proposition: Entropy-based methods that characterize physical disorder have the potential to effectively learn and capture the underlying structure of complex data and generalize beyond training data.  Zentropy-enhanced neural network (ZENN 10.1073/pnas.2511227122) is a newly available method that applies the concept of intrinsic entropy and Helmholtz energy to data science.  This project has two aims. First, we wish to determine if ZENN improves classification of pollen fossils in comparison to the-state-of-the-art techniques.  To accomplish this, a student would conduct a literature review and compile recent studies applying image classification techniques to modern reference or fossil pollen datasets.  ZENN would be applied to each published training set and compared against published classification rates from other methods.  Second, a reference library of modern African pollen types is currently in the Ivory Paleoecology Lab for building a training library. For this project, a student might develop a small pilot to develop an image classifier of four common and morphologically distinct pollen taxa (such as Poaceae, Celtis, Podocarpus, Ericaceae).  Then, this classifier could be applied to fossil samples from Lake Mahoma, Uganda, to evaluate its performance on real samples.  Samples from this lake have already been analyzed manually and include important changes in abundance of many common, morphologically distinct taxa. This project would potentially leverage all four pillars of ICDS, especially data sciences and AI.  Further, the research themes of this project with a focus on impacts of climate change, ecosystems, and risk aligns with many research interests of ICDS students and faculty, but who are not currently leveraging fossil data. 

XX
Ivory, Sarah (#30074)
Integrating human and ecological dimensions of restoration: A user-friendly digital platform for assessing and monitoring impacts of people-centered nature-based climate solutions

Abstract: Climate hazards such as flooding, drought, and wildfire increasingly threaten ecosystems and livelihoods in Africa, driving interest in nature-based solutions such as forest restoration for adaptation to climate change. Despite interest, there are few existing tools to holistically evaluate restoration success across both ecological and social dimensions, and none that leverage AI to integrate these outcomes and make them accessible to communities. This project will develop a social monitoring module into a restoration platform and develop a roadmap for integrating AI-driven monitoring using Plant Village and geospatial AI tools. Together, these efforts will lay the groundwork for tailorable socio-ecological monitoring of restoration outcomes to inform effective climate adaptation strategies and put data in the hands of communities.Significance: Africa is particularly vulnerable to climate extremes associated with flooding, drought, and wildfire, with amplified risks and impacts on ecosystems and communities. Community-based reforestation is a nature-based solution centered around people that helps mitigate climate impacts, with natural or assisted regeneration of tree cover, as trees provide many ecosystems regulating and supporting services such as soil stabilization, evaporative cooling, as well as provisioning and cultural resources for people such as food, medicine, timber, bioenergy, and recreation.Problem: There is strong demand for forest restoration, and many well-publicized reforestation efforts have taken place throughout Africa.  However, ensuring that social benefits are delivered equitably and inclusively alongside ecological goals can allow lasting, sustainable impacts of restoration efforts. However, there has been little empirical study gauging holistic socio-ecological success of different restoration schemes towards intervention goals.  Specifically, the lack of long-term and effective socio-ecological monitoring approaches remains an important gap given the many challenges. For instance, a recent review found that most existing restoration monitoring tools have limited capacity to collect non-ecological data, even as many advanced monitoring frameworks suggest tracking social outcomes with socio-economic and governance indicators.The proposition: New data tools, such as cloud-based web apps and artificial intelligence (AI), have the potential to connect communities with their data and provide real-time information for effective decision-making throughout restoration endeavors.  However, while apps and AI can be powerfully transformative, the former needs to be developed and adequately customized and the latter needs to be integrated into socio-ecological frameworks thoughtfully.  In particular, assessing restoration success requires monitoring not only ecological parameters but also how the needs and aspirations of affected nature-dependent communities are also met.  In this project, Rising Researcher would develop an assessment module with indicators for social measures of restoration implementation and outcomes, a module to ultimately integrate into the existing restoration app, such as the Regreening App. RISE would provide consultation and technical expertise during the initial phase of application development, helping to position the project for future external funding and a more fully resourced implementation. Second, they would explore the potential for using Plant Village AI functions and real time data to improve speed and accuracy of ecological data collection as well as to investigate the potential for employing geospatial AI for decision-making and outcome assessment.

X
Gadway, Bryce (#30075)
Classical Algorithms and Quantum Simulations supporting Quantum-Enhanced Sensing

The proposed project seeks to use quantum devices at Penn State, namely many-atom Rydberg arrays with controllable interactions, to develop techniques for quantum-enhanced sensing and to explore thermalization in closed quantum systems. Starting with two-level systems and the generation of squeezed states, this project seeks to develop quantum sensors that go beyond normal squeezing approaches, as well as extensions to multi-level systems for multi-mode squeezing and entanglement. This project combines expertise in theoretical many-body quantum physics with the experimental development of quantum simulators and sensors.The project seeks input from theoretical quantum researchers to efficiently model the behavior of our many-atom systems, in particular quantum squeezing and entanglement growth, using advanced techniques from matrix product states, cluster discrete Truncated Wigner approximations, moving-averaged cluster expansions, and other theoretical techniques. These challenging tasks seek input from researchers with background knowledge of quantum systems and computational techniques.Additionally, this project seeks input from quantum simulation researchers capable of performing quantum simulations on native hardware (devices) developed at Penn State. For this, we seek researchers to generate quantum simulation data using a dipolar Rydberg atom simulator, to be benchmarked against the aforementioned theoretical simulation methods. The researchers sought should be comfortable with Python and coding for theoretical modeling and instrumentation control, and should be familiar with the operation and control of laser and microwave systems.

X
Brant, Willaim Nielsen (#30080)
A Generalized, Survey-Agnostic Framework for CNN-Based Galaxy Merger Identification

Supermassive black holes (SMBHs) are thought to reside in the centers of almost all massive galaxies in the Universe. When SMBHs are actively accreting gas and growing, they are observed as active galactic nuclei (AGNs). Galaxy mergers are a key driver of galaxy and SMBH growth: when two or more galaxies merge, their gas reservoirs can be disturbed, fueling both star formation and SMBH accretion and triggering the coevolution of galaxies and their central SMBHs. To study the role of galaxy mergers in galaxy-SMBH evolution, the key is to accurately identify large, statistically meaningful samples of galaxy mergers that are well characterized across the electromagnetic spectrum.Identifying galaxy mergers is challenging because galaxy morphologies evolve significantly across different merger stages. Many methods are sensitive to different morphological features and often suffer from strong incompleteness. For example, identifying close galaxy pairs based upon projected or physical separation tends to select early-stage mergers, but cannot effectively select post-mergers where the pairs have just coalesced (e.g., Lackner et al. 2014; Mundy et al. 2017). Other approaches based on nonparametric morphological measurements can identify galaxies with disturbed structures, but they are typically sensitive to only a fraction of the merger timeline and depend strongly on image quality and survey characteristics.Recent studies have demonstrated that machine learning (ML) techniques, particularly Convolutional Neural Networks (CNNs), have the potential to mitigate these biases compared to traditional galaxy merger classification methods and provide more complete and accurate results (e.g., Ackermann et al. 2018; Bickley et al. 2021). However, most studies focus on deep high-resolution Hubble Space Telescope (HST) imaging in small fields, such as the ≈ 0.3 deg2 Cosmic Assembly Near-infrared Deep Extragalactic Legacy Survey (CANDELS), which are most suitable for sampling distant faint galaxies (e.g., Ferreira et al. 2020; Schechter et al. 2025).With the advent of upcoming wide-field surveys such as the Vera C. Rubin Observatory’s Legacy Survey of Space and Time (LSST) (e.g., Ivezic et al. 2019) and Roman (e.g., Akeson et al. 2019), there will soon be extensive high-quality and multi-band imaging for millions of galaxies in the nearby universe spanning ∼ 104 deg2. These data include ≈15 deg2 of LSST Deep-Drilling Fields (DDFs), which already have sensitive multiwavelength coverage from X-rays to radio with superb galaxy and AGN characterization (e.g., Chen et al. 2018; Ni et al. 2021). These datasets are well-suited for identifying galaxy mergers in the nearby universe and for studying the role of mergers in SMBH evolution since Cosmic Noon (10 billion years ago). Thus, there is a strong need for a generalized end-to-end framework for accurate and complete galaxy merger classification that can be easily applied across surveys with different instruments, data quality, and redshift coverage.The goal can be achieved by training CNNs on mock multi-band galaxy images generated from cosmological simulations such as IllustrisTNG (e.g., Nelson et al. 2019), and validating and benchmarking them using real HST and Euclid imaging in several well-characterized small-to-middle-sized extragalactic survey fields, e.g., CANDELS, Cosmic Evolution Survey (COSMOS), and Wide Chandra Deep Field-South (W-CDF-S). The COSMOS and W-CDF-S fields are two of the LSST DDFs, and our group already has extensive experience studying galaxy-SMBH coevolution using the rich multiwavelength data in the DDFs, including AGN selection and characterization (Zou et al. 2022), mapping SMBH growth rate as a function of galaxy properties (Zou et al. 2024), and characterizing the drivers of SMBH growth decline (Yu et al. 2025). A public generalized framework for galaxy merger classification based upon the multi-band imaging in these fields will not only advance our understanding of the role that mergers play in SMBH evolution, but will also broadly benefit the LSST community.

XX
Jiang, Wen (#30089)
Deblurring of Electron Beam Induced Motion to Improve Cryo-EM Image Quality

Cryogenic electron microscopy (cryo-EM) has revolutionized structural biology in the past decade, enabling the determination of macromolecular structures at near-atomic resolutions (2-4 Å) for a wide range of protein complexes, viruses, and amyloids. This capability has significantly advanced the structural-functional understanding of basic sciences, disease mechanisms, and treatments. A critical challenge in cryo-EM imaging is the radiation damage caused by inelastically scattered electrons and the resultant beam-induced motions of the sample. As the high-energy electron beam (e.g. 300 keV) traverses the frozen specimen, the ratio of “bad” inelastic scattering to “good” elastic scattering is approximately 3:1. The inelastic scattering events deposit energy, leading to sample damage and non-uniform motions. These motions, particularly rapid movements during the initial phase of exposure, blur the recorded images, thereby limiting the achievable quality of 2D images and the resolution of 3D reconstructions derived from these “blurred” 2D images.In recent years, the development of fast direct electron detectors has partially addressed this issue. These detectors can output short movies at tens of frames per second, allowing for computational motion correction to align the frames and reduce blurring. While current motion correction algorithms have significantly improved image quality, they still struggle to effectively compensate for the faster, more complex motions that occur, especially at the beginning of exposure. Our Falcon 4i camera on the state-of-the-art Thermo Fisher Titan Krios 300 kV Cs-corrected FEG TEM in the Huck Cryo-EM Facility offers a powerful capability known as electron event recording (EER). Instead of recording the data as a stack of 2D image arrays, this mode captures the precise x/y coordinates and a high-resolution timestamp (at an internal recording frame rate of 320 frames/sec) for each electron hitting the sensor, effectively creating a 3D (x, y, and time) point cloud of electrons. This rich dataset, with a time resolution of approximately 3 ms, theoretically provides an unprecedented opportunity for cryo-EM to better deblur the fast motions. However, current motion correction software in cryo-EM requires a stack of images as input, forcing the down-conversion of the EER recordings to a stack of 2D images, which results in a loss of more than 10-fold in time resolution. In this project, we aim to develop an AI-based motion deblurring method that works directly with the EER recording of the “3D electron cloud” using its original spatial and temporal resolution. The goal is to achieve the highest quality recovery of image signals and improve the resolution of subsequent 3D reconstructions.

XX
Jiang, Wen (#30090)
Learning Helical Polymer Architecture by Contrastive Learning

Cryo-electron microscopy (cryo-EM) has emerged as a fundamental tool for resolving the structures of biological macromolecules, thereby contributing to our understanding of the structural basis of biological functions. Among the major targets of structural biology, helical polymers constitute a broad and important class of molecular assemblies, spanning systems from bacteriophage tails and cytoskeletal filaments to amyloid fibrils implicated in neurodegenerative disease. In recent years, an increasing number of helical structures have been solved at near-atomic resolutions (2–4 Å) using cryo-EM, which images frozen hydrated specimens on electron microscopy grids using high-energy electron beams.Helical filaments typically lie flat on the two-dimensional cryo-EM grids on which they are imaged. A single micrograph may contain tens of filaments, often on the order of tens of nanometers in width and extending for hundreds of nanometers in length. Although many filaments are approximately straight over short distances, small bends and variations accumulate over longer scales. To manage this variability computationally, filaments are commonly subdivided into shorter “segments”. This subdivision facilitates operations such as Fourier transforms and template matching, but it also has a conceptual cost: once subdivided, segments are generally treated as independent particles, and contextual information along the filament is largely discarded. Consequently, relationships between nearby segments along the same filament are underutilized.This project seeks to reintroduce these contextual priors by leveraging contrastive learning, a neural-network framework designed to learn structure from relationships rather than from absolute labels. In this setting, segments derived from the same filament, or from related filament types, are associated based on their relative positions along the filament axis, encouraging learned representations to preserve these relationships in a latent space. Conceptually, this approach is closely aligned with methods such as non-metric multidimensional scaling (nMDS), which uncover global structure from relational information while remaining robust to outliers. We will initially focus on low-order correlations along filaments and subsequently incorporate segment image data themselves, in a manner complementary to existing reconstruction pipelines such as RELION. Importantly, contrastive learning also supports “hard negative mining” (training on difficult cases), which is particularly relevant for helical polymers where distinct symmetries may produce similar power spectra.

XX
Yost, Kaleigh (#30093)
Modeling System Response of Partially Saturated Deposits Subjected to Earthquake Shaking

Soil liquefaction is a phenomenon that occurs when loose, saturated soils are subjected to earthquake shaking and subsequently lose their strength and stiffness. Partially saturated soils, which commonly exist in highly stratified deposits and in portions of soil profiles subject to significant fluctuations in groundwater table, are also liquefiable. The presence of trapped air bubbles in the otherwise saturated soil void space results in increased compressibility of the void space and consequently enhanced resistance to liquefaction. The presence of partially saturated soils also impacts the systems-level response of the soil profile, including the dynamic interactions between individual soil layers, generation of excess pore water pressure, and upward flow of water between layers during the liquefaction event. Existing standard-of-practice and even state-of-the-art procedures to evaluate liquefaction hazard typically ignore the presence of partial saturation, resulting in notable overestimation of the severity of liquefaction manifestation in partially saturated deposits. This has costly consequences for insurance, rebuilding, and retrofitting efforts in regions with high earthquake hazard. This project aims to develop a numerical framework to mechanistically account for the increased liquefaction resistance of partially saturated soils in a dynamic effective stress analysis (ESA) using the finite difference method as implemented in the commercial software FLAC2D. Existing literature has proven the merits of dynamic ESAs for demonstrating systems-level liquefaction mechanisms but has not yet incorporated partially saturated soils. To do this, the student will explore how partial saturation can be incorporated into ESAs through modification of appropriate soil constitutive models and parameters and mechanistically-consistent hydromechanical coupling.The project team has collected detailed, high-resolution subsurface data from liquefaction case history sites in Napier, New Zealand and has compiled a database of liquefaction case histories in partially saturated deposits from Christchurch, New Zealand. The student will use these real-world data to validate the numerical framework. The framework will then be extended by leveraging the computing resources at ICDS to upscale the study, either through generation of randomized soil profiles or compilation of a larger case history dataset, to draw broader conclusions about the implications of partial saturation for liquefaction studies.The following are the research objectives of this project:Develop a computational framework capable of modeling behavior of partially saturated soils under earthquake loading;Validate computational framework with multiple liquefaction case histories;Leverage high-performance computing resources at ICDS to investigate a larger database of partially saturated soil profiles using the computational framework developed herein.The outcomes of this project are expected to drive a paradigm shift in dynamic ESA modeling of complex soil profiles, significantly improving the ability to accurately model soil behavior during earthquakes by mechanistically accounting for the increased liquefaction resistance in partially saturated deposits.

X
Harlim, John (#30094)
Variational loss functions for training AI/ML models with noisy data

The success of AI/ML models depends critically on the quality of the available data. In many applications, raw observational data are corrupted by noise and are therefore not used directly for training. Instead, AI/ML models are often trained on computer-generated datasets reconstructed from raw observations using principled data assimilation procedures that filter noise and enforce physical consistency. Empirically, we observe empirically that AI/ML models trained with standard loss functions can remain unstable or fail to produce accurate predictions when trained on either raw or assimilated datasets.Interestingly, when the training loss is penalized with constraints that resemble weak or variational formulations that are often used in numerical analysis, including the finite-element methods, the resulting models exhibit stability, which allows one to reproduce invariant statistical quantities, such as climatological statistics in climate prediction applications. These observations motivate the need for a rigorous mathematical understanding of variational penalization in machine learning training.In this project, we propose to study this problem using tools from signal processing, particularly wavelet expansions. Beyond developing theoretical foundations, we will investigate the robustness of this training strategy across different machine learning architectures, enabling its deployment in a broad class of practical AI models beyond learning dynamical systems applications, which serves as the primary focus of this work.

XXX
Renganathan, Ashwin (#30096)
Real-time Agentic Digital Twins for Aerospace Testing & Evaluation

To prepare defense systems for combat readiness, extensive testing & evaluation (T&E) isinevitable. However, T&E is time-consuming and expensive, impeding US’s ability to mitigate andovercome technological surprise. Inspired by DARPA’s mission of tightly coupling areal-time digital twin with an AI-driven test agent, we will build an end-to-end, closed-loop digitaltwin capability that (a) runs fast enough for operational use, (b) remains statistically calibratedas new data arrive, and (c) actively chooses the next most valuable tests, simulations, or sensingactions

XX
Renganathan, Ashwin (#30098)
Reduced order modeling for supersonic and hypersonic aerodynamic flows via probabilistic machine learning

The goal of this project is to develop a novel, data-driven reduced order modeling methodology andassociated software, applicable to next-generation aerospace and defense applications. Specifically, wewill focus on supersonic and hypersonic aerodynamic flows, which are highly convection-dominated,leading to “shocks” which are difficult to model and emulate with existing reduced order modelingtechniques. This project will develop strategies, founded on computational science, machine learning,optimization, and software development, to overcome these limitations. If successful, our tool candrastically reduce turn-around times and cost for design of revolutionary next-generation aerospacesystems, thereby contributing to US leadership in this domain.

XX
Renganathan, Ashwin (#30099)
From Optimization to Sampling: Optimality-Consistent Generative Models for Aerodynamic Design

Aerodynamic shape design in aerospace and automotive engineering is routinely formulated as aconstrained optimization problem solved via expensive CFD-driven iterative loops (often requiringhundreds to thousands of flow solves per design task). This proposal advances a different paradigm:replace repeated optimization with sampling. We will learn a conditional generative model (diffusion-or flow-matching-inspired) that directly samples near-optimal, constraint-satisfying geometries ondemand. Rather than producing a single optimum per run, the model yields a distribution ofhigh-performing designs conditioned on mission/operating parameters (e.g., lift targets, Reynoldsnumber, packaging constraints).The core technical requirement is consistency with optimality conditions. We propose to incorpo-rate Karush–Kuhn–Tucker (KKT) residuals and constraint satisfaction into the generative trainingobjective and/or sampling guidance, yielding a sampler that is “optimizer-distilled”: it amortizes thecost of optimization into an offline training stage and returns designs in milliseconds at inference.

XXX
Li, Zhenlong (#30102)
Exploration of AI-powered Autonomous Geographic Modeling for Geospatial Science

Modelling is critical in geospatial analysis, aiming to uncover patterns and dynamics in societal and environmental contexts. Here, geographic modeling refers to a logical series of steps and mathematical representations that support prediction, explanation, and what if analysis for geographic phenomena, such as explaining biodiversity decline and predicting the process in areas with specific land-use changes. One key mission of geospatial science is to convert a geographically embedded physical/human process and data into a modeling workflow and computational solution. Human GIS analysts usually follow an iterative cycle when conducting geographic modeling. They choose candidate models, gather data, define variables, set parameters, run experiments, and judge results. This cycle is demanding and slow, even for simple models. It also depends heavily on individual experience and available time. Generative AI, especially large language models (LLMs), is changing how we represent, analyze, and work with geographic information. These systems support natural language interaction, code generation, and tool use, which lowers barriers to complex spatial workflows and accelerates analysis. More importantly, they are reshaping how geographic knowledge is produced by shifting part of the procedural and interpretive work from humans to AI assisted systems. In this context, Autonomous GIS is an emerging paradigm that treats the system as an artificial geospatial analyst capable of planning and executing multi step geospatial tasks. However, while early work has explored autonomous geoprocessing and data access, autonomous geographic modeling remains largely unexplored. This gap limits how far AI can go in supporting end to end scientific modeling, from problem framing to model selection, estimation, evaluation, and refinement.This project will develop an AI-powered autonomous geographic modeling agent that helps researchers translate spatial questions and associated data into complete, executable modeling workflows. The agent will propose, run, and iteratively refine modeling pipelines with limited human intervention, while maintaining human in the loop oversight and ensuring that each step is auditable and reproducible. Project deliverables will include a working prototype, evaluation results on representative modeling tasks, and a well defined pathway to a larger external proposal on AI enabled autonomous geographic modeling. 

XXX
Kim, Taegyu (#30107)
Fuzzing-Based Dataset Synthesis for Training AI in Safe Robot Control

Research Problem and Motivation: Autonomous robots have been increasingly deployed in various domains. For example, many logistics and E-commerce companies, such as Amazon, have managed warehouses with multiple robots and developed delivery robots. To meet their increasing needs, robot developers nowadays have trained AI models to control robots instead of the rule-based, handcrafted robot development. Traditionally, developers have trained robot control AI models that can properly adjust motor/actuator power in response to a robot’s sensory feedback. However, training a robot control AI model to execute a command-level objective (e.g., moving to Waypoint X) from raw sensory–motor-level control (e.g., setting a motor control signal) is inefficient. A promising alternative is command-level robot control AI training by collecting and abstracting command-level behaviors of existing well-established robots and training a robot control AI, using the collected behavioral data. Fortunately, such existing robot programs support robot control commands for robot operators to use. By collecting and abstracting commands and their resulting behaviors, we can train AIs to assign commands to control robots, such as “move to Waypoint X”, for more efficient command execution than adjusting raw motor control signals. The promising learning source is a robot software design specification. Yet, such specifications often suffer from missing, ambiguous, or inconsistent descriptions, or implementation bugs causing unintended, unsafe behaviors. Alternatively, we can collect and synthesize command-level robot behaviors as robot control AI training datasets. However, this approach requires extensive command-level behavioral data to represent as many robot behaviors as possible. Furthermore, the resulting dataset may include buggy or unsafe behaviors. These factors for safe robot control AI training dataset synthesis introduce four technical challenges: (1) automatic extraction of command-level robot physical behaviors, (2) classification of safe and unsafe behaviors, (3) abstraction of command-level behaviors and their execution conditions suitable for AI learning, and (4) limited fuzzing-based extraction scalability. Research Objective: To address the four challenges, we propose a fuzzing-based safe robot behavior dataset synthesis technique with the physical simulator (e.g., Gazebo [10]). This technique consists of four thrusts: (Thrust I) fuzzing-based extraction of command-level robot physical behaviors, (Thrust II) control-metricguided classification of safe and unsafe behaviors, (Thrust III) abstraction of robot physical behaviors to make it suitable for AI learning, and (Thrust IV) fuzzing acceleration via forward shifting times (i.e., advancing simulation time rather than rewinding it). The resulting dataset will enable AIs to learn how to compose these known commands to safely execute the given objectives in the physical robots.Research Impacts and Expected Outcomes: Our research will advance and accelerate safe robot AI training by acting as a bridge between robot control commands and physical operations. This enables efficient and scalable gathering of command-level robot behaviors for safe robot control AI training and deploy AIs on robots. The expected outcomes will be our system software and datasets for safe robot AI training. Furthermore, we plan to publish two research papers: one based on Thrusts I–III, focusing on safe robot behavior learning, and another based on Thrusts I and IV, focusing on scalable and accelerated robot AI training. Finally, we will submit one NSF SHF program as part of the NSF Future CoRe. 

X
Brant, Willaim Nielsen (#30108)
Machine Learning Identification and AGN Characterization of High-Redshift Protoclusters

Understanding the co-evolution of supermassive black holes (SMBHs) and their host galaxies in dense cosmic environments represents one of the most fundamental challenges in modern astrophysics. Protoclusters, the progenitors of local massive galaxy clusters, existed during the peak epoch of both cosmic star formation and black-hole growth (cosmic noon, Universe age < 4 Gyr), and are the ideal sites for studying environmental effects on galaxy evolution. These extreme environments are believed to impact galaxy mergers, regulate star formation, trigger active galactic nuclei (AGN), and affect SMBH growth [1,2]. The exact mechanisms, however, remain unanswered for the following questions: How does the dense protocluster environment trigger or suppress AGN growth compared to field galaxies [3]? What role do AGN play in regulating star formation and shaping the evolutionary pathway from protoclusters to mature clusters [4]? How do AGN trace and influence the assembly of massive dark matter halos in the early universe? Addressing these questions requires simultaneous identification of both protoclusters and their AGN members across large cosmological volumes, which is a challenge that demands innovative computational approaches beyond traditional methods.Classical techniques for protocluster identification rely on 2D density mapping of galaxy positions [5], which assumes accurate knowledge of galaxy distances (redshifts). While successful in identifying some relatively low-redshift (later time), well-studied systems, these methods become increasingly ineffective for high-redshift (earlier time) protoclusters for several reasons. First, galaxies and AGN become increasingly faint, having larger photometric uncertainties, which propagate into their photometric redshift determinations and degrade 3D structure reconstruction. Second, classical methods focus on identifying galaxy members and rely on external information for AGN member identification, failing to efficiently incorporate multi-dimensional information such as colors, morphologies, and multi-wavelength properties. While spectroscopic observations provide reliable distances, obtaining spectra for thousands of galaxies across multiple protoclusters is time-consuming and economically prohibitive. Current AGN-protocluster studies are therefore limited to only <10 confirmed protoclusters with AGN detection, resulting in severe small-number statistics that preclude robust evolutionary studies.Recent observations hint at elevated AGN activity in protoclusters compared to the field [6,7], but systematic confirmation requires analyzing at least several tens of systems with statistically significant AGN samples. Machine learning (ML) methods offer transformative capabilities for multi-dimensional feature extraction, enabling joint models to simultaneously learn patterns from multi-band photometry, morphology, and spatial clustering. In addition, trained ML classifiers can easily scale to process millions of galaxies and are readily applicable to future surveys and potentially transferable to other domain studies, making them highly cost-effective for large-scale science.The upcoming Vera C. Rubin Observatory Legacy Survey of Space and Time (LSST) Deep-Drilling Fields (DDFs) present a transformative opportunity. These regions will receive ~ 5-10x deeper coverage than the main LSST survey, with rich multi-wavelength ancillary data spanning X-ray to radio wavelengths. These datasets are optimal for simultaneous protocluster and AGN identification extending beyond the local universe, but realizing their full potential requires sophisticated ML frameworks. Our team has extensive research experience characterizing general AGN populations across a wide range of parameter spaces [8,9] and is eager to recruit a Rising Researcher(s) to develop ML/AI applications that will advance AGN-protocluster studies.This project will proceed through three integrated phases over 12 months. In Phase 1, the Rising Researcher will develop and optimize ML classifiers by implementing convolutional neural network (CNN) architectures for multi-band imaging and eXtreme Gradient Boosting (XGBoost) models incorporating galaxy/AGN properties and photometric redshift uncertainties, training these hybrid models on cosmological simulations to learn protocluster signatures. The benchmark performance will be evaluated and compared with traditional 2D density mapping, and 3D spectroscopy Voronoi tessellation Monte Carlo. Phase 2 will validate and calibrate classifiers using existing deep surveys: applying trained models to extragalactic fields with extensive spectroscopic validation data (e.g., XMM-LSS and COSMOS), characterizing AGN populations in identified protoclusters. In Phase 3, the validated ML pipeline will be deployed to the LSST DDFs to simultaneously identify protocluster candidates and AGN members. We anticipate to deliver trained a classifier and a high-confidence protocluster catalog with well-characterized AGN populations. 

XXX
Jiang, Wen (#30109)
Robust Determination of 3D Structural Heterogeneity from Cryo-EM 2D Projection Images

Cryogenic electron microscopy (cryo-EM) has revolutionized structural biology in the past decade, enabling the determination of macromolecular structures at near-atomic resolutions (2-4 Å) for a wide range of protein complexes, viruses, and amyloids. This has significantly advanced the structural-functional understanding of basic sciences, disease mechanisms, and potential treatments. Due to radiation damage to the biological samples caused by the electron beam, cryo-EM is limited to low-dose imaging and relies on computational averaging of the 2D projections of a large number (105-106) of copies of the target proteins in different, unknown orientations (poses). Computational methods in cryo-EM have been successful in determining the poses and obtaining high-resolution 3D reconstructions when the protein particles have identical 3D structures or only a few discrete structures. However, significant challenges remain when the protein particles assume continuously varying conformations, either from the intrinsic dynamics essential for protein functions—for example, catalyzing a reaction—or from damage to the samples during protein purification or sample grid preparation.Several deep neural network-based methods are currently available for de novo estimation of the heterogeneous states of protein particles from 2D projection images. However, these methods are still limited; for example, they may require known particle poses or can handle only conformational heterogeneity of a single type of protein, but not compositional heterogeneity arising from multiple types of proteins in the dataset. Furthermore, the models learned by these methods are often limited to a single dataset for training, lacking the ability to generalize to different datasets. In this project, we aim to significantly improve computational analysis by using the Zentropy-Enhanced Neural Network (ZENN), a thermodynamics-inspired computational framework for heterogeneous data-driven modeling that has been developed by the co-PI Liu and collaborator Hao (https://doi.org/10.1073/pnas.2511227122). ZENN has demonstrated superior effectiveness in classification tasks on images (CIFAR-10/100) and texts (BBC News, and AG News) and in energy landscape reconstructions, showing superior generalization capabilities and robustness. As a framework for data-driven machine learning, ZENN is a versatile and robust approach for scientific problems involving complex, heterogeneous datasets, which presents exciting potential for the robust determination of 3D structural heterogeneity from cryo-EM 2D projection images.

XX
Lee, Dongwon (#30110)
Cross-Platform Narrative Chain Graphs for Tracking Information and Toxicity Drift

Social platforms now form a coupled, multimodal information ecosystem where ideas recur, get reframed, and spread far beyond their original context. Many influential messages are history dependent. Later conclusions or calls to action often only make sense when earlier premises, introduced in prior contexts and repeatedly resurfacing over time, are remembered. At the same time, cross platform amplification can reshape content, shifting stance and certainty and sometimes escalating toward targeted toxicity. This project will develop AI methods to (1) cluster recurring narrative units into canonical identities across platforms, (2) model their evolution as narrative chains over time, and (3) trace how those chains propagate via quotes, reposts, clips, captions, and paraphrases to quantify semantic drift and toxicity drift during spread. Using temporal modeling methods, we will forecast which narratives are likely to jump platforms, persist, or escalate. We will evaluate multiple modeling families, including time series and point process baselines, as well as graph-based representations and learned embeddings, prioritizing interpretability and robustness.This project will develop an AI driven framework to model how information spreads and evolves across modern social platforms. These platforms increasingly support multiple modalities and formats, including short text posts, images, videos, and longer spoken or streamed content such as podcasts, livestreams, and debates. Rather than treating individual posts, episodes, or threads as isolated units, we will focus on cross episode narrative fragments, which are recurring pieces of information that are paraphrased, reframed, and reintroduced over time, and study how they later appear as quotes, clips, screenshots, captions, reposts, paraphrases across platforms.The core technical output will be a “Temporal Narrative Chain Graph” with two linked layers. First, we will extract narrative units, such as event framings, causal explanations, and prescriptions or calls to action, from multimodal platform content and cluster rephrasings into canonical narrative identities. We will then induce narrative chains that represent how later conclusions build on earlier premises over time, moving from premise to reframing to conclusion to call to action, capturing history dependence and escalation dynamics. Second, we will link each canonical narrative identity to downstream echoes using robust quote and paraphrase tracing based on retrieval, reranking, and confidence calibration, producing diffusion subgraphs that show how a narrative travels across platforms and communities.Using these coupled graphs, we will quantify two key forms of change during spread. The first is semantic drift, meaning changes in meaning, stance, or certainty. The second is toxicity drift, meaning shifts toward targeted hostility or aggressive framing. We will also develop predictive models, especially scalable temporal modeling and diffusion analysis methods such as time series forecasting, point process models, dynamic network analytics, and graph-based representations when appropriate. Because repeated exposure can create misleading signals, the project will include explicit robustness tests and bias audits to ensure models do not equate repetition with credibility, sometimes called repetition equals credibility bias. The result will be a scalable computational pipeline and evaluation suite that supports interdisciplinary research on information dynamics in the social media era.

XX
Lee, Dongwon (#30111)
Evaluating Refusal Behaviors in Large Language Models

The rapid proliferation of large language models (LLMs) across domains has created novel opportunities for accelerating knowledge. Nevertheless, this unique advancement has created emerging risks related to reliability, trust, and responsible use. The generation of hallucinations and other nonfactual content, along with toxic and harmful responses, remains a well-documented challenge in large language models. In response, contemporary AI systems increasingly employ refusal behaviors, such as explicitly declining to answer, to mitigate the risks associated with generating incorrect or unsafe content. While refusals offer a promising approach to enhancing model safety and reliability, their broader impacts on effective AI deployment and user interaction have yet to be systematically examined. This project proposes a systematic, data-driven investigation of refusal behaviors in LLMs, intending to inform the development of more robust, trustworthy, and user-aware AI systems. Using controlled online experiments and computational analysis, the study will evaluate how different refusal strategies, varying in frequency and type, affect the reliability, usability, and assessment of AI systems.

XX
Li, Zhenlong (#30116)
Natural Language to Geovisual Analytics: Agentic AI for Spatial Data Visualization

Geovisualization is often the first step in geospatial analysis and geo-visual analytics. It enables users to detect spatial patterns, compare regions, and develop hypotheses that motivate subsequent modeling and inference. In many applications, maps and linked charts provide the fastest mechanism for assessing spatial structure, heterogeneity, and change over time. Despite its importance, geovisualization workflows remain difficult to execute and reproduce. Traditional approaches typically require specialized expertise to identify suitable datasets, understand schemas and coordinate reference systems, choose visualization methods, write spatial queries and spatial programs, and refine results through repeated trial and error. These requirements increase the entry barrier for non specialists and slow exploratory work even for experienced analysts, particularly when analysis involves large, heterogeneous, or rapidly updating geospatial datasets.This project will develop an AI-driven geovisualization agent that bridges this gap by translating natural language questions into executable, transparent geovisual analytics workflows. The agent will map user intent to a sequence of operations including data selection, spatial and temporal filtering, aggregation, statistical summaries, comparison across places or periods, and visualization design choices. The system will then generate either static or interactive maps and charts and will support iterative refinement through user feedback. Additionally, the agent will maintain an end to end execution workflow, enabling results to be reproduced, inspected, and shared. A core technical emphasis is scalable data access and computation. The agent should be designed to connect large geospatial databases, e.g., the database maintained by the GIBD lab, including large scale geotagged social media datasets and human mobility origin destination flows derived from SafeGraph. The agent will also support user uploads of geospatial datasets for in-browser visualization and analysis, enabling external users to work with their own data and, when permitted, contextualize findings with derived summaries from the lab infrastructure.

XXX
Saha, Suman (#30117)
Composite Pedagogical Digital Twins: Modeling and Synthesizing Instructional Strategies from Educational Data

Pedagogy refers to how instructional concepts are explained, sequenced, and scaffolded to support learning. Prior work in computing and engineering education shows that effective pedagogical practices such as clear explanation, scaffolding, timely feedback, and cognitive load management strongly influence how students engage with complex material. Despite this, pedagogy remains difficult to model and transfer at scale using computational methods, as effective instructional practices are often embedded in individual instructors’ routines rather than represented in forms that support systematic analysis and reuse. At the same time, educational technologies such as intelligent tutoring systems and large language model based assistants are increasingly used to support adaptive learning and personalization, particularly in large courses where individualized instructor attention is limited. While these systems effectively scale instructional support by adjusting content presentation, pacing, or task difficulty, they rarely model how expert instructors teach. Instead, instructional behavior is shaped indirectly through design choices, decision rules, or prompt templates, even when systems rely on instructor-created instructional materials.Recent advances in learning analytics, machine learning, and representation learning, combined with the growing availability of digital teaching artifacts such as recorded lectures, slides, worked examples, and instructional explanations, make it increasingly feasible to model pedagogy computationally. When analyzed systematically, these materials encode not only what is taught, but how concepts are introduced, sequenced, and reinforced over time. Prior work by the project team, ALAASKA, shows that pedagogically informed interaction behavior can be learned from instructional data in a real classroom setting. However, the system is grounded in a single pedagogical perspective and therefore inherits the strengths and limitations of that approach. Modeling the pedagogy of a single instructor is insufficient for building robust or generalizable instructional systems. Individual instructors rarely excel across all dimensions of effective pedagogy. One instructor may offer strong conceptual explanations but limited scaffolding for novice learners, while another may excel at step-by-step guidance but provide fewer opportunities for abstraction or transfer. A pedagogical digital twin derived from a single instructor, therefore, risks encoding idiosyncratic practices rather than instructional strategies that generalize across contexts. This project addresses this limitation by introducing the Composite Pedagogical Digital Twin, a computational framework for modeling pedagogy as learnable instructional strategy representations rather than instructor-specific teaching traits. By analyzing instructional materials from multiple instructors teaching a common curriculum, the project identifies effective pedagogical strategies across key dimensions and selectively integrates them into a composite representation. This approach captures how effective instruction emerges across instructors, rather than reproducing the teaching style of any one individual. The project focuses on methodological development rather than classroom intervention. Using a controlled course context with multiple instructors, the work will develop computational representations of pedagogical dimensions such as explanation structure, scaffolding patterns, sequencing decisions, and feedback timing. Outcomes include a principled modeling framework, curated datasets of instructional artifacts, and preliminary empirical results that support future pedagogy-aware educational AI systems and external funding proposals. For Rising Researchers, this project offers opportunities to contribute to dataset construction, representation learning, comparative modeling, and analysis at the intersection of education, data science, and artificial intelligence. More broadly, the Composite Pedagogical Digital Twin lays the groundwork for next-generation educational AI systems that adapt instructional strategies, not only content delivery, enabling high-quality pedagogy to scale across courses, instructors, and institutions.

XX
Sathian, Krishnankutty (#30121)
Uncovering innate language mechanisms through neural decoding and data driven AI models

Understanding how the human brain transforms the acoustic signals of speech into meaningful judgments is a central challenge in neuroscience and an emerging frontier for artificial intelligence. Although language is often viewed as arbitrary, there is growing evidence for systematic mappings between sound and meaning, a phenomenon known as iconicity. Iconicity facilitates language learning and is thought to have bootstrapped language evolution. Motivated by our recent machine-learning modeling that demonstrates how specific combinations of acoustic features predict listeners’ iconicity judgments, this project will investigate the neural basis of sound-to-meaning mappings using stereo-electroencephalography (sEEG) data collected from patients undergoing monitoring for epilepsy treatment.Participants will engage in psychophysical tasks that require them to listen to spoken items and make judgments about their sound and meaning. This design will allow us to examine how the brain transforms sound into semantic representations and to unravel the mechanisms that reflect fundamental properties of human language. The analysis will rely on advanced computational modeling and AI-driven analysis of neural data. Rising Researchers will contribute by developing pipelines for preprocessing and aligning sEEG recordings to the external stimuli, performing time-frequency decomposition of the time series, implementing time-resolved decoding and temporal generalization analyses to trace the flow of information over milliseconds, applying unsupervised approaches such as Gaussian mixture models to cluster electrodes according to representational dynamics, and constructing deep learning models such as convolutional neural networks and transformer-based architectures to identify neural patterns predictive of sound-to-meaning judgments. This work will require expertise in machine learning for decoding time series data, signal processing, and the use of high-performance computing resources for large-scale neural data analysis. These skills are essential for training models, performing permutation-based statistical evaluation, and developing a robust AI-guided framework for understanding how the brain maps sound to meaning.This project is theoretically significant because it advances understanding of the neural basis of iconicity and clarifies how sound-based mappings relate to both phonological and semantic processing in language. It is technically innovative because it integrates computational neuroscience, machine learning, and cognitive science to reveal how the human brain gradually transforms acoustic features into higher level semantic representations. The findings will be clinically relevant because iconicity in language appears to remain robust even in individuals with aphasia (language dysfunction resulting from stroke, traumatic brain injury or neurodegenerative disorders). The insights gained may guide new approaches to rehabilitation and inform the development of future language-based brain-computer interfaces. Beyond clinical relevance, understanding how sound maps to meaning also has broad implications for communication, design of brand names, and consumer psychology, where intuitive associations between sounds and meaning shape perception and decision-making.

XX
Liu, Shimin (#30122)
Transforming 100-Terabyte Fiber-Optic Sensing Data into a Predictive Geohazard Modeling Platform

The critical mineral supply chain is a national priority, with its security and sustainability depending on safe and reliable deep subsurface mining. Achieving safe extraction at increasing depths requires continuous monitoring of stress, deformation, and dynamic rock mass responses that govern seismicity and instability. Recent advances in distributed fiber-optic sensing enable kilometer-scale, high-resolution monitoring of strain and acoustic activity, but these deployments now generate data volumes exceeding 100 terabytes per month, far beyond the capacity of traditional, event-based analysis approaches.This project leverages a first-of-its-kind, industry-scale fiber-optic dataset from two deep subsurface mining operations, where approximately 8 km of permanently installed fibers at depths of 600–1200 m continuously record stress and seismic responses. We will establish the computational foundation for a scalable geohazard modeling platform by harmonizing extreme-scale DAS/DTS/DSS datasets, extracting multi-scale seismic and instability precursors, and developing physics-informed, AI-driven models linking stress evolution to geohazard likelihood. Using ICDS data storage and high-performance computing infrastructure, the project will deliver a prototype, AI-enabled analytics platform that converts continuous fiber-optic data streams into actionable geohazard indicators. The outcomes will support near–real-time subsurface diagnostics, predictive geohazard assessment, and future digital-twin development, positioning the team for external funding in AI-enabled geohazard prediction and intelligent subsurface monitoring.

XX
Farooque, Mahfuza (#30124)
ARES: Autonomous Reasoning & Engineered Safety: An AI-Augmented Neuro-Symbolic IDE for Verifiable Safety Specifications and Formal Twins

Autonomous and AI-enabled systems are increasingly deployed in high-stakes settings, yet their safety and reliability often depend on informal natural-language requirements (e.g., “the robot must never enter an unsafe zone” or “the controller must maintain separation from obstacles”). Turning these requirements into precise, machine-checkable specifications is slow and difficult because it typically requires specialized expertise in formal methods. This “formalization bottleneck” limits the practical adoption of rigorous assurance, creates gaps between what stakeholders intend and what systems actually enforce, and makes audits and regressions harder as systems evolve. For example, a requirement such as “the controller must maintain separation from obstacles under all operating conditions” must ultimately be translated into an explicit invariant with quantified assumptions over system state, timing, and environmental constraints—an error-prone process when done manually.ARES (Autonomous Reasoning & Engineered Safety) proposes an AI-augmented, neuro-symbolic development environment that helps engineers translate natural-language safety requirements into solver-validated formal specifications and proof evidence. The key principle is solver-first authority: AI components propose candidate formalizations and proof steps, but a trusted symbolic backend (e.g., theorem provers or SMT solvers) checks every step and produces actionable diagnostics when something fails (e.g., missing assumptions, type mismatches, incorrect quantifiers, or unprovable goals). ARES only accepts artifacts that come with machine-checkable evidence, reducing hallucination risk and making the workflow robust enough for safety-critical contexts.ARES focuses on generating reusable, proof-backed safety artifacts we call formal twins: structured formal specifications and proofs (or counterexamples) that serve as safety envelopes, regression checks, and audit-ready assurance documentation. The project also introduces an agentic repair loop, APOLLO, which uses solver diagnostics to iteratively refine failed formalizations and proofs. When a requirement cannot be proven due to ambiguity or inconsistency, APOLLO produces a structured proof obligation report that explains what information is missing or contradictory, enabling targeted human refinement rather than trial-and-error debugging.Current LLM-based systems can produce plausible formal text, but they are not inherently trustworthy. ARES directly addresses this limitation by placing formal verification at the center of the loop and treating AI as a proposal generator rather than an authority. The result is a practical pathway from natural-language requirements to verifiable safety constraints, enabling stronger assurance and easier maintenance over time.This project is designed for meaningful contributions from Rising Researchers with backgrounds in AI/ML, formal methods, systems, human-centered computing, and safety engineering. Example contribution paths include building and evaluating kernel-guided repair policies that use solver diagnostics to guide proof and specification refinement; developing retrieval methods over formal libraries for lemma and tactic selection; creating traceability mechanisms that link natural-language requirements to formal clauses and proof steps; designing reusable safety-envelope templates in Higher-Order Logic; and conducting controlled evaluations on benchmarks such as ProofNet and miniF2F. Expected outcomes include a prototype workflow, a reproducible evaluation report, and a curated set of requirement-to-proof exemplars demonstrating how solver-first AI can support verifiable safety specification at scale.

XX
Cameron, Christopher Daryl (#30125)
Developing Platforms for Human-AI Empathic Collaboration

This project will examine how to foster empathetic motivations and abilities using the development of novel platforms for human-AI interactions. With the accelerated growth of large language models, people are faced with increasing chances to receive “empathetic” messages from chatbots, raising several scientific, ethical, and practical questions about what empathy means in these contexts and whether and how it should be developed reliably. The PI brought together social scientists, philosophers, engineers and computer scientists to discuss these issues (see 2024 and 2026 conferences on empathic AI here at Penn State through the Consortium on Moral Decision-Making; as well as Perry & Cameron, forthcoming, Empathy and Artificial Intelligence: Challenges, Advances, and Ethical Considerations, from Cambridge University Press, which grew from the conference).With this project, the PI and Rising Research will work together to develop a novel platform for fostering empathetic growth through human-LLM interaction in survey environments.  The PI and Rising Researcher will consult existing attempts to develop in-the-loop empathic LLM interventions, with an aim to broaden approaches to respect the broader array of empathy sub-facets (e.g., emotion sharing; perspective-taking; compassion) and types of empathic challenge contexts (e.g., intergroup conflict; greater uncertainty of need). Through collaboration over the course of a year, the team will explore possible challenges to the development of effective empathetic LLMs, including the risks for overly agreeable, “sycophantic” feedback, as well as considering different methodologies to benchmark effective empathetic and moral growth (e.g., greater willingness to engage in empathetic opportunities, or make more temperate moral judgments, after human-AI intervention). Such efforts will support the longer-term research program in the PI’s lab to think about how human-AI interactions could bootstrap moral development, providing pilot data to support external grant submissions to federal agencies (e.g., National Science Foundation) and private foundations (e.g., John Templeton Foundation).Importantly, this project requires partnership with a Rising Researcher who has experience in working with, developing, and optimizing LLMs for human testing. This experience is necessary to consider ways in which to fine-tune LLMs to optimize empathetic development possibilities through human interaction, and to ensure that the interventions to do so are scalable. The ideal goal would be to build a platform that could be integrated into several testing environments, tunable to the needs of research questions, and which could have an accessible user interface on both the participant and researcher ends. 

X
Gong, Xi (#30126)
Characterizing Wildfire Smoke Mixtures Using Machine Learning and Geospatial Modeling

Wildfires in the United States have intensified in frequency, severity, and duration, resulting in widespread and recurring exposure to wildfire smoke and a growing burden of adverse health outcomes. Although wildfire-attributable PM2.5 is commonly used to characterize smoke exposure, this single-metric approach obscures substantial chemical heterogeneity in wildfire emissions. In reality, wildfire smoke consists of complex, high-dimensional mixtures of organic aerosols, black carbon, trace metals, and other constituents, whose relative compositions vary with fuel type, combustion conditions, fire behavior, and regional fire regimes. The frequent overlap and succession of smoke plumes further amplify this complexity. These variabilities pose a major data science and geospatial modeling challenge: how to identify a finite set of major, recurring wildfire “fingerprints” from millions of possible combinations of pollutant types and proportions that reliably capture environmentally realistic wildfire smoke mixtures across space and time.This project will address this challenge through a geospatial data science framework that integrates long-term air monitoring, satellite-derived wildfire and smoke observations, and meteorological data. It will leverage advanced geospatial modeling and customized machine learning methods to identify recurrent wildfire “fingerprints” and map their spatiotemporal distributions across the United States. The project will produce a computationally derived, composition-informed characterization of wildfire smoke that advances data-driven exposure assessment and informs future environmental health research and policy development in an era of escalating wildfire risk.

XX
Du, Manyu (#30127)
Physics-Informed Artificial Intelligence for Predictive Modeling of Gene Expression Mediated by Transcriptional Condensates

Precise spatial and temporal regulation of gene expression is essential for the development of a single-cell embryo into a complex organism. Our recent studies of eukaryotic gene regulation reveal that diffraction-limited transcriptional condensates — composed of RNA polymerase II, Mediator, and associated cofactors — serve as a primary driver of transcriptional activity and a substantially better predictor of gene expression than the longstanding enhancer–promoter interaction model. However, the highly dynamic nature of condensate–gene interactions and the near‑diffraction‑limited size of these structures make them difficult to detect with conventional imaging and threshold‑based analysis pipelines. To address this challenge, we propose to develop a physics-informed, artificial intelligence (AI) enabled framework for quantitative analysis of multi-channel super-resolution images capturing transcriptional condensates and nascent transcriptional activity.  Building on advanced machine learning architectures, such as the zentropy‑enhanced neural network (ZENN) recently developed at Penn State, we aim to generate predictive models that link condensate size, intensity, and spatial organization to gene expression dynamics. This integrated computational framework will provide foundational tools for dissecting the physical principles that govern eukaryotic gene regulation.

XX
Ogunmodimu, Olumide (#30129)
Leveraging Graph Neural Networks to Model Slurry Behavior in Stirred Mills for Improved Energy Efficiency in Mineral Processing

Dense slurry flows in stirred media mills exhibit complex, non-Newtonian behavior due to the formation of dynamic particle contact networks. At high solid concentrations, these slurries can undergo shear thickening or jamming, which hinders efficient mixing and grinding. In this project, we propose to model and predict the evolution of particle contact and force networks within a stirred mill to better understand the microstructural origins of flow behavior. Key metrics, such as coordination number (average number of contacts per particle), contact persistence, and renewal rates, will be used to characterize the transition between fluid-like and solid-like regimes. Additionally, the emergence of system-spanning force chains, captured through force network percolation, will be analyzed as indicators of shear jamming. Preliminary findings suggest that increasing solids fraction or impeller speed leads to a rise in coordination number and the development of persistent frictional contact networks, correlating with increased flow resistance. Conversely, rapidly renewing contact networks reflect a well-mixed, fluid-like state. This work offers a novel, quantitative framework for identifying operating conditions that lead to excessive thickening or dead zones, ultimately guiding the design of more energy-efficient and uniform mixing strategies in mineral processing.

XX
Li, Qunhua (#30130)
AI-Enabled End-to-End Particle Tracking to Quantify Transcriptional Condensate–Gene Interactions

Precise control of gene expression is fundamental to human health, and its disruption contributes to many diseases. Recent discoveries suggest that gene activity is influenced not only by DNA regulatory elements, but also by the dynamic organization of transcriptional machinery within the cell nucleus, including structures known as transcriptional condensates. These findings point to a previously underappreciated physical mechanism of gene regulation with important implications for understanding and potentially modulating disease-relevant gene expression.However, current analytical tools are not well suited to accurately quantify these dynamic processes in live-cell imaging data. This project seeks to develop AI-driven computational approaches to track and analyze transcriptional condensates and gene activity across space and time. The work will involve applying modern machine learning and computer vision methods—such as object detection, tracking in image sequences, and representation learning—to multi-channel microscopy videos. Students will contribute to building models that improve particle localization and tracking, and to developing interpretable approaches that relate measurable features (e.g., distance, size, motion) to changes in gene expression. By combining AI, data science, and biological imaging, the project aims to generate new quantitative insight into gene regulation and lay a foundation for future translational research in gene-based therapies.

XX
Bi, Zhen (#30132)
Tensor Network Methods for Exotic Quantum Matter: Generalized Symmetries and Open Quantum Systems

Introduction Understanding and classifying quantum phases of matter with no classical analog is a central goal of modern theoretical physics, with broad interdisciplinary impact across high energy physics, quantum information science, and quantum computing. Symmetry has long served as a powerful organizing principle for phases of matter: it constrains low-energy excitations, shapes universal behavior, and can enable intrinsically quantum phases such as symmetry-protected topological states and topological order—phenomena with deep relevance to next-generation quantum technologies. Two major frontiers are now pushing this framework into new territory. First, the very notion of symmetry is being expanded, revealing generalized symmetries that go beyond the conventional onsite paradigm and open the door to new forms of exotic quantum matter. Second, the study of quantum phases is moving beyond the zero-temperature ground-state picture toward the more realistic setting of nonequilibrium open quantum systems, where dissipation, noise, and mixed states can fundamentally reshape phase structure and dynamics.Generalized symmetry and topological phases One active direction in my group is to understand the structure and consequences of modulated symmetries—symmetry operations that go beyond conventional onsite global symmetries by weaving spatial structure directly into the symmetry definition itself. Two prominent examples that have received growing attention are multipolar symmetries and exponential symmetries. These symmetries can be engineered in current-generation quantum simulator platforms, and they may also arise effectively in more traditional condensed matter settings, including quantum Hall systems. Our central goal is to understand what kinds of symmetry-protected topological (SPT) phases modulated symmetries can enable, and how these phases—along with their diagnostics, boundary phenomena, and robustness—differ from those protected by ordinary global symmetries.Open quantum systems Another major frontier is the study of open quantum systems, where a quantum many-body system continuously interacts with its environment and is therefore naturally described by mixed states rather than pure states. In this setting, the symmetry structure becomes richer and more subtle: mixed-state dynamics can support notions such as strong and weak symmetries, which have no direct counterpart in the traditional ground-state paradigm. This added complexity can enable new forms of exotic quantum matter—phases and phenomena that are inaccessible in closed, pure-state systems—and it opens a wide, largely unexplored landscape that calls for systematic theoretical and computational exploration in the mixed-state regime.Tensor network methods In this project, we will adapt the powerful tensor-network toolkit to the two frontiers described above. For pure states with ordinary global symmetries, tensor-network methods have become a gold standard for analyzing symmetry-protected topological phases and related topological phenomena. By contrast, how to systematically extend these methods to modulated symmetries and to mixed states in open quantum systems remains far less developed. Our goal is to build the necessary theoretical and algorithmic foundations and use them to characterize the new quantum phases that arise in these settings. Concrete goals:Modulated symmetries and tensor-network constraints. We will derive the appropriate symmetry “push-through” conditions for modulated symmetries, establishing a practical tensor-network framework to classify and diagnose topological phases protected by these spatially structured symmetries. These results will also motivate new generalized-symmetry model Hamiltonians that can realize exotic SPT phases, and we will actively pursue opportunities to collaborate with quantum-simulation experiments to explore their realization.Mixed-state structure and information-theoretic diagnostics. We will establish quantitative links between injective matrix-product-state structure in the canonical purification of a mixed state and entanglement/correlation measures defined directly on the density matrix—such as mutual information and conditional mutual information. This will yield diagnostics tailored to mixed-state quantum phases and sharpen our understanding of phases of matter in open quantum systems.

X
Nudy, Matthew (#30141)
Zentropy-Enhanced Neural Network (ZENN) for Automated Breast Arterial Calcification Detection on Women’s Health Initiative Mammograms

Breast arterial calcification (BAC) is an incidental finding on mammograms obtained for breast cancer screening1 and reflects medial layer arterial calcification (Figure 1). BAC has been associated with cardiovascular disease (CVD) risk and downstream cardiovascular outcomes among women.2 Currently, there are no standardized methods for BAC detection and reporting.3 Mammogram images obtained for research purposes are often heterogeneous across acquisition settings, digitization processes, and image quality, creating multi-source variation that can reduce the stability and generalizability of standard deep learning models designed to detect BAC. We will address this gap by applying a Zentropy-Enhanced Neural Network (ZENN), a thermodynamics-inspired framework designed for entropy-aware learning from complex, heterogeneous datasets.4 ZENN embeds zentropy theory into deep learning to explicitly learn both energy and intrinsic entropy components, enabling robust integration of heterogeneous real-world data and leveraging a learnable temperature variable to identify latent multi-source structure that conventional cross-entropy cannot capture. As a high-impact testbed, we will apply ZENN to longitudinal mammograms from participants enrolled in the Women’s Health Initiative (WHI). The WHI is a large National Institutes of Health-funded study of menopausal women in the United States which began enrollment in 1993 and is currently following participants for the development of health outcomes including cardiovascular disease.5,6 This cohort includes 848 participants from the WHI-Hormone Therapy Clinical Trial (411 randomized to hormone therapy and 437 randomized to placebo) with baseline mammograms collected ~6 months prior to randomization and follow-up mammograms obtained at years 1 and 2 following randomization (Figure 2). Mammogram films were digitized using a Lumisys 85 laser digitizer (50 μm resolution; 12-bit depth), saved in bitmap format, and labeled with laterality and standard views (craniocaudal and mediolateral oblique for both breasts).7,8 Drs. Chetlen and Sivarajah will determine “ground truth” on the mammogram images. They will determine which images are BAC negative and which mammogram images contain BAC. Among those mammograms with BAC, Drs. Chetlen and Sivarajah will determine BAC severity by grading the BAC mild, moderate, and severe (Figure 1). These determinations will assist in training the ZENN program.Project goals are to:(1) develop and adapt a ZENN-based pipeline for robust BAC detection and quantification on WHI mammograms despite multi-source variation.(2) generate rigorous preliminary data to support a large grant submission to the National Institutes of Health and support downstream cardiovascular risk assessment in women by enabling scalable, reproducible BAC measurement across longitudinal imaging. (3) Produce a peer-reviewed publication.This work aligns with ICDS priorities in AI and data science by pairing a novel physics-inspired learning framework with a clinically meaningful, heterogeneous imaging dataset to improve generalization, reliability, and translational potential of automated BAC assessment.

XXX
Renganathan, Ashwin (#30143)
AI Driven Cost-aware Multifidelity Multiobjective Optimization for Aircraft Design

Multiobjective optimization is a difficult problem in mathematical programming that requiresthe identification of a set of solutions representing a pragmatic compromise between the objectives –the so called Pareto frontier. Traditional methods for solving such problems, e.g., via evolutionaryalgorithms, are sample hungry and prohibitive for real-world applications. Furthermore, when theobjectives are available for evaluation at multiple levels of fidelity, no method exists to exploit thecost-fidelity tradeoff in a principled manner. The goal of this project is to develop a cost-aware,probabilistic AI driven, fast multifidelity method for multiobjective optimization that is theoreticallygrounded. We will apply it to the aerodynamic design optimization of the NASA common researchmodel (CRM) aircraft.

XX
Renganathan, Ashwin (#30145)
FoamPilot: An LLM-based Agent to Automate CFD Workflows with OpenFoam

Computational fluid dynamics (CFD) is widely used to simulate complex fluid flows acrossengineering domains. In the engineering design context, CFD codes are typically wrapped within adecision-making tool that queries the CFD code several times, under varying boundary and meshconditions. The time it takes to setup the CFD simulation for changing conditions (particularlymeshing) is nontrivial and compounds to be prohibitive in the design context. This proposal describesan LLM-assisted automation layer for CFD workflows using OpenFOAM. The goal is to reduceturnaround time and operator effort for routine CFD tasks—case setup, meshing, solver selection,job submission, monitoring, post-processing, and reporting—while maintaining engineering rigorthrough guardrails, validation, and reproducible execution. The system combines a Large LanguageModel (LLM) with deterministic tooling (OpenFOAM utilities, Python scripts, CI checks, andHPC schedulers) so that the LLM orchestrates and documents decisions rather than acting as anunbounded “black box.”

XX
Liu, Shimin (#30149)
AI-Powered CFD Framework for Evaluating Respiratory Health Impacts of Inhaled Mineral Dust

Inhalation of mineral dust remains a persistent occupational and environmental health challenge, contributing to elevated risks of chronic respiratory and cardiovascular disease worldwide. Health risk is governed not solely by ambient dust concentration, but by lung-resolved deposition and delivered dose, which depend jointly on mineral dust properties (particle size, morphology, chemical and mineralogical composition) and respiratory conditions (breathing patterns, activity level, and airway geometry). While computational fluid dynamics (CFD) can resolve these mechanisms with high physical fidelity, the extreme computational cost of high-resolution respiratory CFD has limited its application for health-relevant parameterization, uncertainty analysis, and prevention-oriented risk assessment across realistic exposure scenarios.This project will develop an AI-powered, physics-consistent CFD framework to enable scalable evaluation of respiratory deposition and health risk from inhaled mineral dust. Leveraging unique multi-decadal datasets from CDC-NIOSH–funded studies, historical Penn State respirable dust records, high-fidelity CFD simulations, and emerging lung-on-chip toxicity data, the project will integrate heterogeneous exposure and response data into a unified computational asset. Physics-informed AI methods will be used to extract key CFD-resolved indicators governing particle transport, deposition hot spots, and dose, and to construct AI-accelerated surrogate models that map dust properties and breathing conditions directly to health-relevant deposition metrics. The resulting framework will transform computationally intensive CFD workflows into a rapid, uncertainty-aware predictive capability, enabling construction of a quantitative respiratory risk matrix to support occupational health assessment, prevention strategies, and future translational research.

XXX
Edgerton, Jared (#30150)
Silencing by Statute: Copyright Law as Information Repression

States increasingly rely on legal and regulatory mechanisms—rather than overt censorship—to suppress unfavorable political information online. This project studies copyright enforcement as a form of lawfare using large-scale copyright takedown request data from the Lumen Database, a public repository that archives legal complaints submitted to major online platforms. Lumen is widely used by researchers, journalists, and civil-society organizations to study platform governance, copyright enforcement, and online censorship, and it contains detailed information on complainants, intermediaries, legal claims, and targeted content. The research team has already collected and consolidated the full universe of available Lumen records spanning 1997–2024, yielding a comprehensive corpus of takedown requests. While extensive, these data are currently unstructured for systematic social-scientific inference. The central objective of this Stage 1 project is therefore measurement-focused: to convert raw takedown records into a structured, validated dataset that enables analysis of digital governance, information control, and the strategic use of law.The core research tasks involve building a transparent classification and labeling framework for takedown requests. We will develop a taxonomy of takedown behavior (e.g., routine intellectual-property enforcement versus potentially strategic or politically relevant suppression), define coding rules for key fields (complainant identity and type, intermediaries, targeted platforms/domains/URLs, stated legal basis, content category, and indicators of coordination), and produce a human-labeled training set. Using this training data, the project will train and validate supervised classification models that can scale labeling across the full corpus, with explicit reporting of model performance (e.g., precision/recall), calibration, and measurement uncertainty. A priority throughout is interpretability and reproducibility: clear codebooks, replicable computational pipelines (R/Python), and documentation that ensures the dataset can be extended beyond the initial project period.A second line of work links classification outputs to relational and temporal structure. The project will construct network representations connecting complainants, agents or intermediaries, platforms, and targeted domains, enabling detection of repeated or coordinated takedown behavior, bursts of activity, and cross-platform targeting. These network and time-stamped features will be used to generate descriptive and inferential evidence on when copyright-based takedowns appear consistent with strategic information suppression, and how these patterns vary across political contexts and over time. Substantively, the project speaks to core social-science debates about censorship, repression, and institutional substitution—specifically, when states rely on legal tools as complements to, or substitutes for, more overt coercive strategies.Rising Researchers would contribute directly to (1) designing coding and labeling protocols; (2) implementing entity resolution and record linkage for complainants and intermediaries; (3) training, validating, and stress-testing supervised classifiers; (4) building network and temporal representations of takedown activity; and (5) producing reproducible outputs (tables and figures) that support paper drafting and external grant proposals. Deliverables from this effort will support a planned multi-paper research agenda and near-term external funding submissions (including an NSF proposal), with the structured dataset serving as the core preliminary research product.

X
Liu, Xiao (#30151)
Neural Variability and Behavioral Performance Through the Lens of Brain–Artificial Neural Network Similarity

The brain does not respond to the same stimulus in a fixed or deterministic manner. Instead, identical sensory inputs or task demands can elicit markedly different neural and behavioral responses across time. For example, repeated presentations of the same visual stimulus may evoke neural responses of varying magnitude in visual cortex, and individuals often show substantial trial-to-trial variability in reaction time or memory performance during identical tasks.This neural and behavioral variability, which has been once treated as noise, is now increasingly recognized as a fundamental feature of brain function. It reflects moment-to-moment fluctuations in attention, perception, and internal state, as well as stable individual traits. Importantly, the magnitude and structure of this variability change across the lifespan and are systematically altered in neurological and neurodevelopmental disorders, including Alzheimer’s disease (AD) and autism. Understanding the neural basis of variability therefore has broad implications: it may reveal how the brain dynamically processes information, improve brain–computer interfaces by accounting for endogenous fluctuations, and uncover early, sensitive biomarkers of brain dysfunction. Despite its importance, the mechanistic origins of neural variability remain poorly understood.Artificial neural networks (ANNs), originally inspired by biological neural systems, have recently emerged as powerful tools for probing brain function. Converging evidence shows that ANNs and the human brain can exhibit strikingly similar response patterns to identical stimuli, and critically that ANN models whose representations more closely resemble those of the brain achieve superior task performance. These findings raise a fundamental but unexplored question: can trial-to-trial variability in neural and behavioral performance be understood through fluctuations in brain–ANN representational similarity?The goal of this project is to quantify neural variability through the lens of brain–ANN similarity and to determine whether this similarity explains variability in behavioral performance. We hypothesize that the brain’s responses to visual stimuli vary over time in their representational similarity to ANNs, and that greater brain–ANN similarity predicts better subsequent memory for those stimuli. To test this hypothesis, we will compare activation patterns in biological neural networks and ANNs elicited by identical visual inputs and examine how moment-to-moment fluctuations in their similarity relate to differences in memory outcomes.By reframing neural variability as a dynamic alignment between biological and artificial systems, this project introduces a novel computational framework for understanding variability in brain function and behavior, with implications for cognitive neuroscience, artificial intelligence, and translational research in aging and disease.

XXX
Hu, Renyu (#30152)
Accelerating Planetary Climate Modeling with Transformer-Based Radiative Transfer Emulators

This project aims to substantially improve planetary climate modeling by developing transformer-based machine learning emulators that dramatically accelerate radiative transfer calculations, a key computational bottleneck. Leveraging recent advances in neural network architectures and extensive training on diverse atmospheric datasets, these emulators will be integrated into one-dimensional climate models to achieve speedups of up to 100 times without sacrificing accuracy. The accelerated models will enable rapid interpretation of thermal emission spectra from the James Webb Space Telescope, facilitating detailed analysis of exoplanet atmospheres across a wide range of compositions and conditions. Ultimately, this work will enhance our ability to explore planetary climates efficiently and lay the foundation for future extensions to three-dimensional climate simulations.

X
Edgerton, Jared (#30153)
No Joking Matter: Foreign Policy Framing in Traditional and Non-Traditional Political Media

Political comedy and other non-traditional media have become central venues for discussion of foreign policy and international affairs, yet they remain understudied relative to traditional news outlets. This project examines how foreign policy issues are framed across traditional media (e.g., mainstream news podcasts) and non-traditional media (e.g., left- and right-leaning comedy podcasts), focusing on high-salience international cases such as Ukraine–Russia, Gaza, Venezuela, and Greenland. While traditional media have historically dominated elite discourse, non-traditional sources now reach mass audiences—particularly among younger and politically engaged listeners—raising important questions about how tone, framing, and informational content differ across media ecosystems.The project builds directly on an ongoing, related effort analyzing foreign funding and political content in podcast media. Leveraging an already assembled corpus of podcast transcripts and metadata, this Stage 1 project shifts focus toward comparative framing rather than funding effects. The core objective is measurement: to develop scalable, validated classifications of foreign policy framing, sentiment, narrative structure, and references to conspiratorial or unverified claims across media types and ideological orientations.Using computational text analysis and supervised machine-learning methods, the project will classify discussions of foreign policy topics by framing dimensions such as threat attribution, moral evaluation, institutional trust, and use of humor or irony. Temporal and comparative analyses will assess how framing differs between comedians and traditional media sources on the same issues and whether non-traditional sources are more likely to engage with speculative or conspiratorial narratives (e.g., claims about bioweapons, intelligence manipulation, or covert state control). Rising Researchers will contribute to labeling protocols, model development, and reproducible analysis pipelines. Substantively, the project advances social-science debates on media systems, political communication, and foreign policy opinion formation, while aligning with ICDS priorities in data science and AI-enabled social measurement.

X
Liu, Xiao (#30155)
Big Data Approaches to Neural Mechanisms Linking Early Menopause to Dementias in Women

Menopause is a major biological transition in women’s aging, characterized by a midlife decline in ovarian hormone production. Women who experience early menopause undergo accelerated physiological aging and face increased risks of cognitive impairment and neurodegenerative disease, including Alzheimer’s disease (AD). Early menopause may therefore contribute to the nearly twofold higher prevalence of AD observed in females compared with males.Dr. Luo’s (co-PI) group has used large-scale population data to study factors underlying dementia prevalence, with particular focus on AD and its disproportionate burden in women. Dr. Liu’s (PI) laboratory has recently characterized a highly structured infra-slow (<0.1 Hz) global brain activity, measured as global mean blood-oxygenation-level–dependent (gBOLD) signal with fMRI, which is linked to arousal, memory, and AD-related pathology. Importantly, they recently showed that gBOLD activity is significantly reduced in older women with a history of early menopause compared with age-matched controls, and correlated with cognitive dysfunction, suggesting that altered gBOLD activity may mediate the link between early menopause and elevated dementia risk.The goal of this project is to elucidate how menopause-related hormonal changes influence gBOLD activity and to determine whether gBOLD mediates the association between early menopause and dementia risk in females. We will first analyze repeated (N = 60) resting-state fMRI and hormone data from a single female in the 28andMe dataset to characterize within-individual changes in gBOLD activity across menstrual phases and their associations with estradiol. We will then leverage resting-state fMRI and phenotypic data from over 60,000 participants in the UK Biobank to test whether gBOLD mediates the relationship between early menopause (defined as natural menopause before age 45) and incident dementia, and whether this pathway contributes to female-specific dementia risk. By mining these large-scale neuroimaging datasets, this computational project is expected to provide mechanistic insight into how early menopause alters brain function and contributes to heightened dementia risk in women.

XX
Shandera, Sarah Elizabeth (#30161)
Parallelizing quantum circuits on mixed states

Understanding and controlling the dynamics of quantum information in quantum circuits has many applications in advancing our fundamental understanding of quantum systems as well as in practical computational questions. The researcher working on this project will help improve tools to design quantum networks with novel information dynamics by developing and implementing a parallelization technique that enables the efficient calculation of a class of circuit dynamics over qubits initialized in mixed states. The mixed states allow thermodynamically defined optimization criteria to be used to control how quantum correlations spread, and lead to a large class of circuit models that interpolate between and extend the information dynamics achieved by well-known Hamiltonians. However, exploring these models on larger qubit networks is essential for understanding and characterizing finite system size effects.The currently existing code, which uses Python and PennyLane, runs efficiently on CPUs for mixed states of up to 12 qubits. For more qubits, PennyLane can achieve significant GPU acceleration as long as pure states are used. Similar improvements for mixed states in the context of the circuits we are using require the development of techniques for sampling the density matrix, or the quantum trajectories. The appropriate techniques must take into account the optimization criteria used to generate the circuit. Trajectory techniques should be validated on existing results for the smaller system sizes. The researcher will be encouraged to engage with AI on code development for the algorithms, learning to increase their personal efficiency while taking responsibility for accuracy of the result. 

X
Van Duin, Adrianus C (#30162)
Development of AI-driven Dataset of the Nitrogen-vacancy Centers in Diamond for Possible Reactive Atomistic Modeling

Nitrogen-vacancy centers (NVCs) in diamond have unique optical and electronic properties, which can be utilized in nanoscale sensing, quantum information technologies or biomedical applications. With the use of the NVCs, the changes in the magnetic field of order of nT can be measured, enabling e.g., precise mapping of the brain activity at the cellular level (Li et al. 2025). The NVCs can also serve as qubits, characterized by relatively long coherence time (up to 1 s at low temperatures), which can survive even at room temperature (Katsumi et al. 2025). A fabrication of these nitrogen-vacancy centers in a controllable manner is a key area of ongoing research. While the first-principles calculations have greatly contributed to understanding the properties of the NVCs (Gali et al.2019), these calculations are generally limited in time and size scale. With the use of reactive molecular dynamics, such as the ReaxFF method (van Duin et al. 2001), much longer and bigger systems can be considered. The ReaxFF method can be a useful approach in testing a range of growth scenarios, providing a unique atomistic perspective, essential for a better understanding of the underlying structural changes. The ReaxFF method was shown to accurately model not only carbon growth (Chen et al. 2023, Li et al. 2018) but was also successfully used in modeling the CVD deposition process (Xuan et al. 2019, Zhang et al.2020, Yan et al. 2024) or ion irradiation (Zhang et al. 2022, Feng et al.2023). In this project, we proposed to utilize the CHON-2019 ReaxFF parameter (Kowalik et al.2019) set and generate a range of diamond structures introducing nitrogen atoms and carbon vacancies. These generated models will be subjected to a series of heating/cooling scenarios to investigate the capability of the ReaxFF force field to model the mobility of the introduced defects. Based on the generated data set a machine learning (ML) approach can be further utilized in the possible prediction of the most stable structures (Bezik et al.2025). With the use of the combined ReaxFF/ML approaches, a set of structures can also be generated to further compare to the DFT data, which can be later utilized to further optimize the ReaxFF parameter set, if needed. Additionally, the structural features of the generated structures will also be compared to the experimental data, to assess with the use of which simulation scenario we can use to generate the most realistic samples. References:Li Y, Li H, Yi T, Li C, Wei J. A review of the study of diamond NV color centers: fabrication, application and challenge. Functional Diamond. 2025 Dec 31;5(1):2567286.Katsumi R, Takada K, Jelezko F, Yatsui T. Recent progress in hybrid diamond photonics for quantum information processing and sensing. Communications Engineering. 2025 May 8;4(1):85.Gali Á. Ab initio theory of the nitrogen-vacancy center in diamond. Nanophotonics. 2019 Nov 26;8(11):1907-43.van Duin AC, Dasgupta S, Lorant F, Goddard WA. ReaxFF: a reactive force field for hydrocarbons. The Journal of Physical Chemistry A. 2001 Oct 18; 105(41):9396-409.Li K, Zhang H, Li G, Zhang J, Bouhadja M, Liu Z, Skelton AA, Barati M. ReaxFF molecular dynamics simulation for the graphitization of amorphous carbon: a parametric study. Journal of chemical theory and computation. 2018 Apr 19;14(5):2322-31. Chen S, Bai Q, Wang H, Wang S. Unveiling the site-dependent characteristics and atomic mechanism of graphene growth on the polycrystalline diamond: Insight from the experiments and ReaxFF studies. Carbon. 2023 Sep 1;213:118303.Xuan Y, Jain A, Zafar S, Lotfi R, Nayir N, Wang Y, Choudhury TH, Wright S, Feraca J, Rosenbaum L, Redwing JM. Multi-scale modeling of gas-phase reactions in metal-organic chemical vapor deposition growth of WSe2.Journal of Crystal Growth. 2019 Dec. 1:527:125247.Yan Z, Tian Y, Liu R, Liu B, Shao Y, Liu M. Atomistic insights into chemical vapor deposition process of preparing silicon carbide materials using ReaxFF-MD simulations. Computational Materials Science. 2024 May 25;2411:113032. Zhang W, Van Duin AC. Atomistic-scale simulations of the graphene growth on a silicon carbide substrate using thermal decomposition and chemical vapor deposition. Chemistry of Materials. 2020 Sep 10;32(19):8306-17.Zhang Y, Yuan H, Yan W, Zhang Z, Chen S, Liao B, Ouyang X, Chen L, Zhang X. The effects of atomic oxygen and ion irradiation degradation on multi-polymers: A combined ground-based exposure and ReaxFF-MD simulation. Polymer Degradation and Stability. 2022 Nov 1;205:110134.Feng S, Guo F, Yuan C, Cheng X, Wang Y, Zhang H, Chen J, Su L. Effect of neutron irradiation on structure and decomposition of α-RDX: A ReaxFF molecular dynamics study. Computational and Theoretical Chemistry. 2023 Jan 1;1219:113965.Kowalik M, Ashraf C, Damirchi B, Akbarian D, Rajabpour S, Van Duin AC. Atomistic scale analysis of the carbonization process for C/H/O/N-based polymers with the ReaxFF reactive force field. The Journal of Physical Chemistry B. 2019 May 30;123(25):5357-67.Bezik CT, Ethier JG, Vashisth A, Varshney V. A Hybrid Simulation-Machine Learning Framework for Predicting Thermal Degradation in High-Performance Polyimides. Macromolecules. 2025 Dec 17.Hossain MJ, Pawar G, van Duin AC. Development and applications of an eReaxFF force field for graphitic anodes of lithium-ion batteries. Journal of the Electrochemical Society. 2022 Nov 25;169(11):110540.

XX
Van Duin, Adrianus C (#30163)
A ReaxFF-First Digital Twin Framework Bridging Experimental Structure and Reactive Atomistic Simulation

Reactive force fields, particularly ReaxFF, are widely used to model chemically reactive systems at length and time scales inaccessible to quantum mechanical methods. Since its introduction 1, ReaxFF has enabled simulations in complex environments such as catalysis, materials growth, corrosion, energetic materials, and solid–liquid interfaces 2,3. Despite its wide adoption, ReaxFF is most often applied in a loosely coupled manner with experiments, where experimental insight informs simulation design through expert interpretation, but experimental characterization data— such as X-ray diffraction (XRD) or microscopy which provide rich but indirect structural information 4,5—are not directly or systematically translated into the atomistic structures 6,7. As a result, discrepancies can arise between these two, especially in systems where defects and disorder play a dominant role. This project addresses this gap by developing a ReaxFF-first digital bridge between experiment and atomistic simulation, focused explicitly on enabling mechanism-level insights on defect-driven and disordered materials 8. Moreover, this project focuses on a practical workflow for mechanistic interpretation of experimental data at realistic scales, without attempting full inverse reconstruction or closed-loop optimization. The proposed workflow consists of four tightly connected components. First, experimentally accessible inputs are used to generate families of plausible, non-ideal initial atomistic structures. AI tools can be used here as a translator from large, high-dimensional experimental data to physically meaningful structural descriptors, using unsupervised and weakly supervised approaches 9,10. Importantly, AI enables an ensemble-based inverse approach instead of forcing a single “best” structure 11. Second, ReaxFF reactive molecular dynamics is applied to simulate growth, annealing, or processing-relevant conditions. Third, experiment-facing structural observables are computed directly from ReaxFF trajectories, and transparent comparison metrics are used to assess structural consistency between simulation outputs and experimental data. Here, AI acts as an accelerator and interpreter that makes rich, high-dimensional reactive simulation data comparable to experimental observables. This is done through fast surrogate mappings between ReaxFF-derived structural statistics and experiment-facing observables 12,13. Furthermore, interpretable AI methods can identify which atomistic features most strongly influence experimental signatures, enabling mechanism-level insight rather than black-box prediction. This project offers multiple entry points for Rising Researchers, including experiment-informed structure generation, development of analysis and comparison metrics, AI integration into the bridge, and validation or benchmarking studies on representative material systems.1.           Van Duin, A. C. T., Dasgupta, S., Lorant, F. & Goddard, W. A. ReaxFF: A reactive force field for hydrocarbons. Journal of Physical Chemistry A 105, 9396–9409 (2001).2.           Senftle, T. P. et al. The ReaxFF reactive force-field: Development, applications and future directions. npj Computational Materials vol. 2 Preprint at https://doi.org/10.1038/npjcompumats.2015.11 (2016).3.           Mao, Q. et al. Classical and reactive molecular dynamics: Principles and applications in combustion and energy systems. Progress in Energy and Combustion Science vol. 97 Preprint at https://doi.org/10.1016/j.pecs.2023.101084 (2023).4.           Barnes, A., Afroz, M. M., Kyung Shin, Y., T van Duin, A. C. & Li-Oakey, D. Mapping TpPa-1 Covalent Organic Framework (COF) Molecular Interactions in Mixed 1 Solvents via Atomistic Modeling and Experimental Study 2. (2024).5.           Zhu, J. et al. Advances in developing cost-effective carbon fibers by coupling multiscale modeling and experiments: A critical review. Progress in Materials Science vol. 146 Preprint at https://doi.org/10.1016/j.pmatsci.2024.101329 (2024).6.           Duong, P. H. H. et al. Mechanistic study of pH effect on organic solvent nanofiltration using carboxylated covalent organic framework as a modeling and experimental platform. Sep. Purif. Technol. 282, (2022).7.           Li, T. et al. Critical nanoparticle formation in iron combustion: single particle experiments with in-situ multi-parameter diagnostics aided by multi-scale simulations. Fuel404, (2026).8.           Zarrouk, T., Ibragimova, R., Bartók, A. P. & Caro, M. A. Experiment-Driven Atomistic Materials Modeling: A Case Study Combining X-Ray Photoelectron Spectroscopy and Machine Learning Potentials to Infer the Structure of Oxygen-Rich Amorphous Carbon. J. Am. Chem. Soc. 146, 14645–14659 (2024).9.           Yu, M., Moses, I. A., Reinhart, W. F. & Law, S. Multimodal Machine Learning Analysis of GaSe Molecular Beam Epitaxy Growth Conditions. ACS Appl. Mater. Interfaces 17, 34707–34716 (2025).10.        Chin, J. R. et al. Analyzing the impact of Se concentration during the molecular beam epitaxy deposition of 2D SnSe with atomistic-scale simulations and explainable machine learning. Mater. Today Adv. 28, (2025).11.        Ghosh, A., Ziatdinov, M., Dyck, O., Sumpter, B. G. & Kalinin, S. V. Bridging microscopy with molecular dynamics and quantum simulations: an atomAI based pipeline. NPJ Comput. Mater. 8, (2022).12.        Kalinin, S. V., Sumpter, B. G. & Archibald, R. K. Big-deep-smart data in imaging for guiding materials design. Nat. Mater. 14, 973–980 (2015).13.        Sumpter, B. G., Vasudevan, R. K., Potok, T. & Kalinin, S. V. A bridge for accelerating materials by design. npj Computational Materials vol. 1 Preprint at https://doi.org/10.1038/npjcompumats.2015.8 (2015).

XX
Chen, Jinghui (#30165)
Improving Reasoning and Decision-Making Reliability and Soundness in Large Language Model Agents

AI agents built on large language models are increasingly capable of planning, reasoning, and interacting with tools to perform complex multi-step tasks. However, despite these advances, current agent systems often lack reliable and sound reasoning behaviors: they may construct fragile plans, overlook constraints, or fail to recognize errors early, leading to inefficiency or task failure. This project seeks to improve the reasoning capability and reliability of AI agents by studying how agents can better evaluate, monitor, and revise their own decisions as tasks unfold. One key direction is enhancing reflective reasoning, where agents assess and refine plans before and during execution, complemented by self-checking mechanisms that help detect inconsistencies and unexpected outcomes. The project emphasizes broadly applicable principles and practical components that improve robustness while remaining computationally efficient. Outcomes will include reusable agent modules, empirical evaluations, and preliminary results that support publications in robust and trustworthy AI. 

XX
Bose, Mallika (#30166)
Evaluating AI-Assisted Power Mapping for Relational Placemaking

Understanding relationships among stakeholders involved in urban redevelopment is critical to identifying how power operates across institutions, spatial scales, and decision-making processes, particularly in contexts shaped by racial inequity and uneven investment. Power mapping, an analytic and visual practice historically rooted in civil rights activism and resistance to structural racism (Inwood and Alderman 2020), has long served as a method for making power relationships intelligible across disciplines, institutions, and diverse interest groups. However, existing approaches to power analysis rely heavily on qualitative research, which can be time-intensive, incomplete, and difficult to visually represent or analyze. This project proposes to evaluate the potential of artificial intelligence (AI) tools, including large language models and network visualization techniques, to support more thorough, efficient, and visually expressive power analysis methods for relational planning practice.

X
Lin, Yan (#30170)
Estimating Legacy Uranium Mine Waste Redistribution Across the Western United States Using Knowledge-informed Machine Learning

More than a century of hard-rock mining has left over 160,000 abandoned mines across the western United States, including more than 4,000 abandoned uranium mine (AUM) sites. These legacy mines release uranium and associate metals into surrounding soils, dust, surface water, and groundwater, particularly in arid and semi-arid regions where wind-driven and hydrologic transport processes dominate. Prior studies have demonstrated strong spatial associations between environmental metal concentrations and proximity to mine sites, wind exposure, landforms, and drainage pathways, with disproportionate impacts on rural and Indigenous communities. However, most existing assessments are limited to local scales or simple proximity-based metrics, which are difficult to scale nationally and often fail to capture the complex, multi-pathway processes that govern mine-waste redistribution. This project will develop a scalable, knowledge-informed machine-learning framework to estimate the spatial redistribution of uranium mine waste across the western United States. Building on the PI’s prior work on the Navajo Nation, the study will scale up to the national scale, integrating atmospheric and hydrologic transport pathways by combining meteorological data (e.g., wind speed, direction, relative humidity), topographic and landform characteristics, vegetation cover, and downslope drainage metrics. The modeling framework will leverage large national environmental datasets, including the National Uranium Resource Evaluation (NURE) database and USGS soil geochemistry data, to train and validate random forest models and compare their performance with alternative approaches such as neural networks and regression models. Project outputs will include high-resolution maps of predicted uranium and co-occurring metal concentrations, along with uncertainty estimates and variable importance analyses. 

XX
Ryan, Timothy Michael (#30171)
Development of an Analytical Research Suite for 3D Image Analysis

Modern humans have relatively lightly built skeletons compared to other primates and earlier extinct hominin species, a condition that may predispose contemporary humans to bone-related health risks such as osteopenia and osteoporosis. Multiple factors are likely responsible for low bone mass in humans including reduced physical activity levels, dietary/nutritional changes, hormonal changes, evolved life history strategies, among others. Work in my lab uses three-dimensional microcomputed tomography (microCT) scan data to quantify variation in cortical and trabecular bone structure in humans and other vertebrates, using both a comparative evolutionary morphology perspective and an experimental animal model approach. Bone is a dynamic, living tissue that is impacted by metabolic demands and environmental stimuli like the mechanical loading experienced during locomotion and other behaviors. The different tissue types within the skeleton, the dense cortical bone and the complex mesh-like trabecular bone, respond to internal and external stimuli in different ways that can be used to understand behavioral and biological characteristics of living and extinct organisms. Ongoing work in my research lab has been focused on developing a multi-step, open-source workflow to process and analyze large volumetric microCT image datasets using Python, Visualization Toolkit, R, and other tools. Our workflow currently combines multiple Python scripts, together with one proprietary software package called Medtool (Dr Pahr Ingenieurs e.U.; Gross et al., 2014). Our workflow includes reproducible routines for microCT image/mesh reconstruction, machine learning based image segmentation, 3D registration, statistical analysis, and visual comparisons of quantitative results. To compare commonly studied structural characteristics like the bone volume fraction, average strut thickness/separation, and anisotropy between species or experimental groups, we map the quantified structural variables to point clouds and use the Coherent Point Drift algorithm (Myronenko & Song, 2010) to rigidly, affinely, and deformably register a mean point cloud to each individual’s unique point cloud (DeMars et al., 2021). This approach allows us to quantitatively assess location-specific differences between groups or species using either Bayesian or frequentist statistical approaches, depending on the limitations of each sample being analyzed. This unique approach allows for whole bone comparisons of trabecular bone variables in a manner that was previously not possible.One major challenge in our analytical pipeline, however, is that the statistical analysis of such large datasets, with upwards of thousands of pointwise comparisons, is resource and time intensive and difficult to use without a background in Python coding or high computing resources, limiting its more widespread adoption and use. The goals of this project are to:Refactor and parallelize existing code in our processing pipeline, especially the code used for statistical analyses of multi-dimensional data, and optimize the code to run efficiently on ICDS resourcesBuild in flexibility in the types of statistical approaches available in the code including both Bayesian and frequentist methods as well as data reduction methods like principal components analysis and linear modelingDevelop analytical tools to allow us to easily visualize and evaluate impacts of point cloud/shape deformationsExplore the implementation of novel point set registration approaches, including machine learning based approachesApply to the analysis of human skeletal variationRobust quantitative characterization of complex anatomical and morphological structures is critical for the life sciences and related disciplines. Analysis of complex skeletal data provides unparalleled insights into morphological structure and function with relevance to various interconnected disciplines including human health, evolutionary developmental biology, comparative functional morphology, biomechanics, and paleontology. Improving our ability to conduct repeatable, high-throughput phenotyping will help us address critical questions related to bone biology and function as well as human bone health. ReferencesDeMars, L. J., Stephens, N. B., Saers, J. P., Gordon, A., Stock, J. T., & Ryan, T. M. (2021). Using point clouds to investigate the relationship between trabecular bone phenotype and behavior: An example utilizing the human calcaneus. American Journal of Human Biology, 33(2), e23468.Gross, T., Kivell, T. L., Skinner, M. M., Nguyen, N. H., & Pahr, D. H. (2014). A CT-image-based framework for the holistic analysis of cortical and trabecular bone morphology. Palaeontologia Electronica, 17(3;33A), 1–13. http://palaeoelectronica.org/content/2014/889-holistic-analysis-of-boneMyronenko, A., & Song, X. (2010). Point set registration: Coherent point drifts. IEEE Transactions on Pattern Analysis and Machine Intelligence, 32(12), 2262–2275. https://doi.org/10.1109/TPAMI.2010.46

XX
Ryan, Timothy Michael (#30172)
High-Throughput Phenotyping of Bone Microstructure from MicroCT Data

Quantitative characterization of complex anatomical and morphological structure and form is critical for the life sciences and related disciplines. Over the last two decades, there has been a significant increase in the use and availability of high-resolution 3D morphological data in the life sciences using a variety of volumetric imaging modalities (e.g., micro- and nanocomputed tomography, micromagnetic resonance, FIB-SEM, confocal microscopy). These data provide unparalleled insights into morphological structure and function with relevance to various interconnected disciplines, including evolutionary developmental biology, comparative functional morphology, biomechanics, and paleontology. The rapid growth in the amount and availability of these data highlights the need for high-quality analytical tools to conduct repeatable high-throughput phenotyping. For the last 10 years, my research lab has developed a multi-step, open-source workflow to process and analyze 3D microCT image data using Python, Visualization Toolkit, R, and other open-source tools. We leverage these methods to quantify three-dimensional structural variation in trabecular bone, the complex mesh-like bone found in the joints of long bones and in short bones like the vertebral bodies. Comparative analyses of trabecular bone structure in various primate and other mammalian species as well as experimental animal models are essential for addressing questions related to skeletal biomechanics, phenotypic plasticity, human and primate evolutionary morphology, and for understanding and addressing important issues related to human skeletal health. Our workflow currently combines multiple custom Python scripts together with one proprietary software package called Medtool (Dr Pahr Ingenieurs e.U.; Gross et al., 2014; DeMars et al., 2021). We use Medtool to quantify various structural features of trabecular bone, including the bone volume fraction, average trabecular strut thickness and separation, and structural anisotropy (Sorrentino et al., 2021). These structural variables are quantified using a rolling volume of interest (VOI) approach in which these structural features are measured within a large number of small VOIs placed on a regular grid across the entire microCT image volume. The quantitated structural features are then mapped to tetrahedral finite element meshes for visualization and further statistical analysis. This method allows us to compare structures within and between species or experimental groups to address questions related to skeletal response to locomotor loading. One significant technical limitation in our work is our reliance on the proprietary Medtool software package as part of our otherwise open-source pipeline. While innovative and useful, this software has several limitations, particularly in not being adaptable to different types of data or easily allowing characterization of unique structural or compositional features present in our data (e.g., bone mineral density). In addition, the yearly licenses are relatively expensive and node-locked, and the code is designed for workstations and not parallelized for efficient processing. The goals of this project, therefore, are to:Develop multiprocessing/parallelized code to holistically quantify trabecular bone structure in whole bones (and other complex 3D structures) mostly using existing open-source functions (e.g., CGAL mesh, BoneJ-headless)Develop code to interpolate quantitated variables onto tetrahedral mesh and/or directly to deformable, mesh free point cloudsIntegrate code with upstream and downstream components of our existing analytical pipelineOptimize the code to run efficiently on ICDS resourcesApply code to the analysis of skeletal variation in vertebratesReferencesDeMars, L. J., Stephens, N. B., Saers, J. P., Gordon, A., Stock, J. T., & Ryan, T. M. (2021). Using point clouds to investigate the relationship between trabecular bone phenotype and behavior: An example utilizing the human calcaneus. American Journal of Human Biology, 33(2), e23468.Gross, T., Kivell, T. L., Skinner, M. M., Nguyen, N. H., & Pahr, D. H. (2014). A CT-image-based framework for the holistic analysis of cortical and trabecular bone morphology. Palaeontologia Electronica, 17(3;33A), 1–13. http://palaeoelectronica.org/content/2014/889-holistic-analysis-of-boneSorrentino R, Stephens NB, Marchi D, DeMars LJD, Figus C, Bortolini E, Badino F, Saers JPP, Bettuzzi M, Boschin F, Capecchi G, Feletti F, Guarnieri T, May H, Morigi MP, Parr W, Ricci S, Ronchitelli A, Stock JT, Carlson KJ, Ryan TM, Belcastro MG, Benazzi S. (2021) Unique foot posture in Neanderthals reflects their body mass and high mechanical stress. Journal of Human Evolution, 161:103093.  

XX
Leja, Joel (#30173)
SBI-Powered Population Modeling: Understanding Our Origins Using Millions of Milky Ways Across Cosmic Time

Our goal is to understand how galaxies like the Milky Way evolve over cosmictime. We use as input images of many galaxies at different cosmic times, from which we must ascertain their typical formation histories. The key challenge is thatlight from distant galaxies is complex to interpret, resultingin typical factor of 2-3 uncertainties in basic parameters such as mass in stars orcurrent rate of star formation, and factor of 10 uncertainties in their formation history. Thislarge measurement uncertainty is coupled with a complex range of possibilities for galaxies,which span 3-6+ orders of magnitude in mass, size, formation pathway, and current starformation activity. The combination of their intrinsic diversity and their weakly-measuredproperties requires a large number (> 10^6) of independent observations to be combined inorder to put scientifically meaningful factor-of-two constraints on typical growth histories.Until recently, standard MCMC inference techniques with 10-15 parameters would take~10 hours per object, rendering multi-level or population modeling on top of this tobe effectively impossible. However, new neural density estimators such as simulation-basedinference (SBI) have radically altered this picture, with trained models yielding accurate posteriors in ~10 seconds per object. We are currently exploiting thisapproach to generate catalogs of individual galaxy properties for ~10^6 distant galaxies fromthe SHELA/HETDEX survey, of which Penn State is a leading member. These posteriorsare ready to be turned into fully self-consistent galaxy formation histories for huge ellipticalgalaxies such as M87, grand spirals such as the Milky Way, and to some extentdimmer, irregular, lower-mass galaxies. This previous analysis is open-source and supported by ICDS, providing a firm foundation for follow-up work.Each individual galaxy we observe is frozen in time. However, these systemscan be causally connected on a statistical level, as galaxies observed at an earlier time mustevolve into galaxies observed at a later time. This yet-to-be-realized constraint can be en-coded into a full population model, allowing us to cut through fundamental degeneracies: the formation history of any individual galaxy will be constrained also by the galaxies that exist before and after it. This will be accomplished by developing a rigorous methodology to draw possible formation histories from the posteriors of individual systems, and apply a multi-level model to force the sum of the populations to be self-consistent, i.e. that the uncertain formation histories of later systems are probed by the well-constrainedcurrent star formation rates and stellar masses of earlier systems. A key technical is to allowfor the effects of galaxy mergers, i.e. that multiple early-universe galaxies can be summedto create later-universe galaxies.Crucially, information will ultimately flows both ways in this model; not only will thiswork produce a self-consistent story of galaxy growth, it will also build this story into theunderlying inference framework, leaving us with both a complete story of galaxy formationand a statistical machine to apply that story to future observations. In this way this is a pilotproject which will unlock future work on more ambitious surveys, such as the ongoingPFS spectroscopic galaxy survey of millions of galaxies existing 6-12 billion years ago, ofwhich Penn State is an active member.

X
Chen, Jinghui (#30175)
Towards Better Distillation for Reliable and Efficient Small Language Model Agents

AI agents built on large language models are increasingly used to perform multi-step tasks that require planning, reasoning, and interaction with tools. While large models can exhibit strong agentic behaviors, their computational cost limits practical deployment, motivating the use of smaller or compressed models. However, smaller agents often struggle to sustain effective progress over long task horizons, exhibiting behaviors such as repetition, stagnation, or failure to recover from early mistakes. This project investigates how agent learning and distillation methods can better capture progress-aware behaviors, enabling efficient agents to recognize when they are making meaningful forward progress and to adapt when they are not. The goal is to develop broadly applicable principles for improving the reliability and efficiency of small language model agents without relying on large-scale inference at deployment time. Outcomes will include conceptual frameworks, empirical studies, and reusable components that support robust agent design and future publications on efficient agentic AI. 

XX
Singh, Madhusudan (#30178)
Quantum-Enabled Multimodal Disease Detection for Early Pancreatic Cancer Diagnosis

Early detection of pancreatic cancer remains a major unmet challenge in healthcare, with over 80% of cases diagnosed at advanced stages. This project proposes the development of a quantum?enabled computational framework that integrates multimodal clinical data, medical imaging, molecular biomarkers, and patient-reported symptoms to identify early-stage disease signatures that are difficult or impossible to detect using conventional machine learning methods. The research leverages quantum machine learning (QML), specifically Variational Quantum Transformers (VQTs) and hybrid quantum–classical classifiers, to encode heterogeneous, high?dimensional clinical data into quantum feature spaces. The central hypothesis is that quantum representations can capture subtle, nonlinear correlations across disparate biomedical modalities more effectively than classical models, enabling earlier and more sensitive disease detection. Designed explicitly for the ICDS Rising Researcher program, this project provides a platform for interdisciplinary training at the intersection of quantum computing, AI, data science, and healthcare. Outcomes include a validated prototype, rigorous benchmarking against classical approaches, peer-reviewed publications, and a strong foundation for external funding proposals.

XXXX
Wang, Shujie (#30179)
Advancing Cryosphere Research Through LLM-Powered Knowledge Discovery: A Pilot Initiative

The cryosphere, the frozen component of the Earth system, plays a vital role in regulating global climate and sustaining human society. Ongoing cryospheric changes, including glacier retreat, ice sheet mass loss, and declining sea ice, have far-reaching consequences for sea level rise, water resources, agriculture, and biodiversity. Among these, the Antarctic ice sheet represents the largest potential contributor to future sea level rise and remains a major source of uncertainty in sea level projections, driven by complex ice flow, fracture processes, and coupled ice–ocean–atmosphere interactions operating across multiple spatial and temporal scales. Predicting the future evolution of the Antarctic ice sheet is therefore highly challenging, and the relevant scientific knowledge is dispersed across a large and fragmented body of literature.To address these challenges, this project will conduct a pilot study leveraging advanced large language models (LLMs) and foundation-model-driven approaches to construct quantitative, causal knowledge graphs from existing literature and multi-source datasets. We designate this effort as a pilot because the cryosphere research community has not yet systematically explored the potential of large foundation models to support scientific discovery and knowledge integration, despite their demonstrated success in other domains such as biomedical science and applied mathematics. Using Antarctic ice shelf research as a case study, we will evaluate: (1) whether LLMs can extract scientifically meaningful and interpretable graph-structured knowledge from the literature, and (2) whether these literature-derived graphs can be refined and evolved by integrating observational and modeling datasets. Ice shelves are floating extensions of the grounded ice sheet and have undergone substantial thinning and retreat in recent decades. Because ice shelves exert backstress that restrains grounded ice discharge into the ocean, accurate prediction of changes in ice shelf thickness and extent is essential for projecting Antarctic mass loss. However, integrating the many interacting drivers of ice shelf change within a traditional modeling framework remains difficult, motivating the need for new AI-enabled pathways that can synthesize both scientific text and real-world datasets into interpretable, testable representations.This interdisciplinary project represents an innovative and exploratory effort to bridge foundation models and cryosphere science through close collaboration between domain and AI experts. Dr. Shujie Wang (Geography) contributes extensive multi-source cryosphere datasets and expertise in Antarctic ice shelf processes, while Dr. Wenpeng Yin (Computer Science and Engineering) brings strong experience in AI for science, including foundation-model-based methods for structured knowledge extraction and data-driven modeling. The expected outcomes include a prototype LLM-enabled pipeline for extracting graph-structured scientific knowledge from ice shelf literature; an initial quantitative knowledge graph linking physical processes, environmental forcings, and observable ice shelf responses; a proof-of-concept framework for refining the graph using observational and modeling datasets; and an evaluation protocol to assess robustness, interpretability, and scientific usefulness through domain expert feedback. If successful, the workflow will be readily extensible to other cryospheric components such as sea ice, snow, and icebergs, enabling broader impact across cryosphere and climate research. While ambitious, this pilot effort represents a timely and critical step toward using foundation models to accelerate scientific discovery and reduce uncertainty in major Earth system challenges.

XXX
Singh, Madhusudan (#30180)
Explainable AI Ledger for Vehicle Reasoning: Auditable Vision–Language Models for Transportation Systems

As more and more vehicles and transportation systems use Artificial Intelligence (AI) to make decisions and see things, explainability, accountability, and the ability to audit after an incident have become very important problems. Vision–Language Models (VLMs) now have strong multimodal reasoning abilities, but their results are still short-lived, hard to understand, and hard to check after safety-critical events. This project suggests an Explainable AI Ledger for Vehicle Reasoning that combines VLM-based perception and scene understanding with blockchain-based explainability. The main idea is to make short, meaningful summaries of scene understanding that include structured semantic maps and natural language explanations. These summaries are then cryptographically signed and hashed to an unchangeable ledger. These records make it possible to do forensic reconstruction, hold AI-driven vehicles accountable, and govern their reasoning after accidents.The project will look into (1) ways to compress multimodal explanations so they can be stored on-chain, (2) ways to measure how well raw sensor data and VLM-generated descriptions match up, and (3) ways to use decentralized oracles to validate explanations. The research is specifically tailored for ICDS Rising Researchers, providing interdisciplinary training at the convergence of AI, data science, transportation systems, and computational governance.

XXX
Singh, Madhusudan (#30183)
Explainable Retrieval-Augmented Generation with On-Chain Reasoning Artifacts

Retrieval-Augmented Generation (RAG) has emerged as a dominant architecture for grounding large language models in external knowledge sources. However, while RAG improves factual accuracy, it remains largely opaque: users cannot reliably inspect which documents were retrieved, how they were used, or whether explanations were altered after the fact. This lack of auditability is a critical barrier to deploying RAG systems in high-stakes domains such as science, policy, healthcare, and public infrastructure.This project proposes an Explainable RAG framework with on-chain reasoning artifacts, where key intermediate reasoning components, retrieved document identifiers, relevance scores, explanation summaries, and confidence metrics, are cryptographically hashed and anchored to an immutable ledger. Rather than storing raw data on-chain, the system records compact, verifiable reasoning traces that allow post-hoc auditing, forensic reconstruction, and trust assessment of AI outputs.The research explores (1) which reasoning artifacts are necessary and sufficient for meaningful explanation, (2) how to compress and abstract these artifacts for ledger anchoring, and (3) how on-chain explainability affects trust, reproducibility, and system performance. The project is explicitly designed for ICDS Rising Researchers, enabling interdisciplinary contributions spanning AI, data science, and computational governance.

XXX
Tahmasbi, Nargess (#30184)
Generative AI Decision Support System for Sustainable Consumer Choices in Online Retail

In today’s environmentally conscious landscape, consumers increasingly want to choose sustainable products, but online retail platforms rarely provide clear, trustworthy, or verifiable information about the environmental and social impact of what they’re buying. This project develops a Generative AI-powered decision support system designed to synthesize and present sustainability insights about products sold on platforms like Amazon — starting with the personal care and beauty category.At its core, the system will use Large Language Models (LLMs) trained on independent, verified sustainability data sources such as EPA databases, Data.gov, corporate disclosures, and third-party certifications (e.g., Biodegradable Products Institute, LEED). These models will generate natural-language sustainability summaries that are understandable, traceable, and verifiable. The system also incorporates user trust modeling, allowing users to rate and flag the AI-generated information, creating a feedback loop to improve transparency and reliability over time.Rising Researchers will participate in a range of technical and applied research tasks including:– Engineering LLM-based pipelines to process and summarize multi-source data–  Designing user-facing interfaces to visualize sustainability impact and track user trust–  Implementing feedback mechanisms to rate AI outputs and reinforce model learning–  Applying behavioral science concepts to analyze how explainability affects consumer intentThis project builds on the PI’s validated trust model and a peer-reviewed study presented and published at HICSS 2025. It aligns closely with ICDS’s mission by combining cutting-edge AI with cross-disciplinary sustainability and consumer behavior research, ultimately advancing both theoretical insights and real-world impact in ethical AI for social good.

XX
Yuan, Yubai (#30185)
Evaluating Large Language Models on Complex Network Analysis

The recent success of Large Language Models (LLMs) has revolutionized and automatized natural language processing and text data analysis. However, the capacity of LLMs in handing and analyzing network data remains largely under-examined. LLMs are primarily trained on massive textual corpora, whose sequential dependence structure is fundamentally different from graph structure, it is unclear whether the generalization capacity of LLMs can be transferred into network data and also achieve similar performance on network data analysis. Given that network analysis is a cornerstone of modern research—ranging from social computing to bioinformatics—establishing a rigorous baseline for LLM reliability in graph domain is essential to determine their utility in both scientific and practical applications.This proposal aims to systematically investigate performance of LLMs on network analysis. Specifically, we evaluate and quantify LLMs’ performance on each of following tasks, which are both fundamental in science and prevalent in applications. The tasks follow a hierarchy based on both their computational complexity and intrinsic difficulty.Task 1: basic network summary statistics: provide distribution of node degrees, network density, connectivity, shortest distance, clustering coefficients. Task 2: distribution of motifs:  count the number of specific motifs such as triad, star, and clique. Adaptively discover the recurrent sub-graphs without pre-specification.Task 3: community detection: partition networks into cohesive clusters or estimate the total number of underlying communities. Extract community from background network.Task 4: prediction at node, link, and graph level: recover missing edge or predict  future edge based on current network; classify nodes and based on network structure and context information.Task 5: causal inference on networks: estimate treatment effects and spillover effect under different types of network interference; provide statistical inference and uncertainty quantification for causal estimation. Task 6: experimental design on networks: propose optimal strategies for network interventions under specific objective or constraints, such as 1) selecting seed nodes for maximizing information diffusion, 2) treatment assignment plan to maximize outcome at population level under budget constrain, 3) A/B testing design on network to efficiently estimate spillover.To ensure systematic and robust evaluation, the project will examine the performance of LLMs on both synthetic networks generated with representative topologies, and real-world networks from different domains including social networks, molecular networks, and protein-protein interaction networks. Performance of LLMs will be benchmarked against state-of-the-art algorithms for each specific task from statistics and machine learning community. The project aims to further dissect performance across various network sizes, topological constraints, and node feature dimensions, while comparing the reasoning capabilities of diverse open-source and closed-source LLM architectures. This project seeks to inspire a new paradigm of AI-empowered network analysis, where the generative power of LLMs is strategically integrated with human expert guidance to solve complex structural problems.

XX
Wang, Julian (#30187)
A Diagnostic Digital Twin for Residential Building Envelope Deficiency Identification

Residential building envelope deficiencies—such as excessive air infiltration, degraded insulation, or inefficient window systems—are a major contributor to thermal vulnerability and energy inefficiency in existing housing stock, particularly in older and underperforming residential buildings. Despite their importance, diagnosing these deficiencies at scale remains challenging, as current approaches rely heavily on labor-intensive on-site inspections and detailed audits. Meanwhile, low-cost indoor and outdoor environmental sensors increasingly provide rich time-series data, which may theoretically reflect the potential envelope system features and behaviors responding to the external weather conditions, yet these data are rarely used to infer underlying envelope conditions.This project proposes a physics-informed diagnostic digital twin framework to infer residential building envelope deficiencies from sparse indoor–outdoor time-series data. The core idea is to formulate envelope diagnosis as an inverse inference problem, rather than a forward prediction task. Large-scale building performance simulations will be conducted using representative residential home prototypes and historical weather data from the Philadelphia region as a proof-of-concept case. These simulations will generate a labeled response space linking envelope deficiency states to observable indoor environmental time-series behavior under realistic outdoor forcing.An inverse, probabilistic machine-learning model will be developed to map observed time-series data to the most plausible envelope deficiency states, together with associated uncertainty. By projecting real-world observations onto this simulation-derived response manifold, the framework enables rapid, non-intrusive envelope diagnostics without requiring detailed physical inspections. The project aligns with ICDS priorities in Digital Twins and Artificial Intelligence, and provides a computational foundation for scalable retrofit diagnostics and future external funding proposals.Keywords: digital twin; inverse modeling; building envelope diagnostics; time-series analysis; physics-informed machine learning; residential buildings; computational simulation; Philadelphia housing stock

XX
Fantle, Matthew Scott (#30190)
Evaluating the Ocean Carbon Cycle and Deep-Time Paleoclimate model with a spatial clustering approach

This project aims to develop data-informed interpretations of the climate system by applying a new Dynamic Graph-Theoretic clustering framework to evaluate modeled and empirical ancient marine geochemical datasets. The team will utilize a recently developed clustering model approach by our group (ExoCCyle, Bogumil et al., 2025), which allows for oceanographic and climate data (e.g., temperature, salinity) to be spatially grouped into similar “basins”. Spatially aggregating data can reduce 3D and 4D data to simpler, interpretable frameworks and yield more statistically robust geochemical interpretations by averaging over local noise/model errors. We are interested in applying this technique to datasets including modern and ancient marine geochemical data (associated with the Paleocene-Eocene Thermal Maximum) with varying spatial resolutions, with the goals of determining how the ocean carbon cycle responds to climate perturbations. The methods developed in this project will enable researchers to formulate data-driven hypotheses about the paleoclimate system’s response to a rapid increase in atmospheric carbon dioxide levels and to effectively identify relationships among the variables that govern the climate system.

XX
Tirupatikumara, Soundar Rajan (#30191)
Optical Interconnect Routing for Photonic Chips via Hybrid Surface Minimization

As Photonic Integrated Circuits (PICs) transition into high-volume manufacturing with a 2026 market valuation projected to exceed $20 billion, they have become the backbone for next-generation AI accelerators and scalable quantum processors. However, critical to this deployment is the optimization of waveguide routing. Unlike electronic circuits that tolerate sharp 90-degree turns, photonic waveguides suffer from catastrophic signal leakage at sub-micron bends. A single unoptimized turn can incur insertion losses as high as 1.7 to 3.5 dB, converting quantum information into wasted heat. Consequently, optimized routing is essential not just for connectivity, but for minimizing total path length and footprint to enable portable, high-density optical hardware. We propose a hybrid workflow that bridges the gap between the speed of heuristics and the accuracy of inverse design. We utilize the Surface Minimization Algorithm, which is derived from string theory matrix models as a generative engine for local topology, followed by physics-based fine-tuning and global AI placement. Our work is based on the theoretical foundations of:Instead of treating waveguides as 1D lines, we model connections as continuous surface manifolds with finite tension. Minimizing the surface area of this manifold yields two emergent geometric properties that solve the limitations of Automated Photonic Routing:• Native Adiabaticity: The minimization of surface tension acts as a geometric proxy for adiabaticity. The resulting paths naturally relax into smooth, curvature-continuous shapes that minimize scattering loss without manual smoothing.• Emergent Multi-Way Junctions: Surface minimization naturally collapses cascaded binary splitters into single, stable trifurcations and orthogonal sprouts whenever they represent the energy-minimal solution. This directly resolves the “daisy-chaining” inefficiency of standard routers.

XX
Medvedev, Paul (#30192)
Digital twin genomes for oncology

Digital twins are a growing area of focus among researchers spanning many fields, with a particularly tremendous potential in human oncology. We envision an oncology digital twin as a dynamic, patient-specific computational model that mirrors the biological and clinical state of an individual’s cancer over time. It can be constructed from genomic profiling, imaging, pathology, and clinical data; with this data, it could capture a tumor’s molecular drivers, clonal structure, growth behavior, and interaction with therapy. With new data generated during treatment, the model would be updated to reflect changes in tumor burden and evolutionary dynamics. In this way, the digital twin would serve as a faithful  representation of the disease, allowing real-world measurements to inform care through predictive simulations. An oncology digital twin could profoundly improve how cancer is treated by helping clinicians stay ahead of the disease. By continuously combining genomic data, imaging, and clinical information, the model could flag early signs of resistance and allow different treatment options to be explored virtually without risk to the patient. This could spare patients from ineffective therapies, reduce unnecessary toxicity, and help to expedite the identification of  the most promising strategy. In the long term, this could allow cancer care to become more adaptive and closely tailored to each patient’s evolving tumor biology. In spite of this great potential, this vision of digital twins in oncology is still in its infancy. One of the immediate bottlenecks is our inability to generate a DNA sequence that resembles a human genome structure. Current simulators are limited to taking a reference genome and implanting into it variation that is observed in other humans. However, this type of model is very limited in what it can generate and will overfit the current datasets, which are biased in favor of populations with the current majority of sequencing data.  In this project, we propose to test the hypothesis that a low-order Markovian process that is trained on human genomes can generate a genome that resembles a human genome in its substring composition (called k-mer spectra). Unlike the implanting model above, a low-order Markov model does not have the power to overfit the training data. On the other hand, we hypothesize that it is powerful enough to match the k-mer spectra patterns of real genomes. The goal of the project is to test the validity of this hypothesis. The outcome of this project will be able to guide the next steps towards the oncology digital twins vision. If our hypothesis is validated, it will be of great interest to the genomics community and will open the doors for creation of realistic human genome sequences without overfitting to biased data. If the hypothesis is disproved, it will drive the next steps of using a more powerful type of generative model that models long range interactions. In any case, it will form the basis of a larger NSF or NIH proposal to develop oncology digital twins. Digital twins are a growing funding opportunity and this project will facilitate the generation of training data for genome-centric digital twin projects and preliminary results for such funding. 

XX
Brunner, Gerd (#30193)
Computational Framework for Automated Microcirculation Assessment and Multimodal Machine Learning–Based Risk Identification in Peripheral Artery Disease

Peripheral artery disease (PAD) or or lower-extremity (LE) arterial disease is associated with reduced LE blood flow, impaired leg muscle function, possible limb loss and an increased risk of atherothrombotic cardiovascular events and mortality. Intermittent claudication (IC) is a classic PAD symptom occurring in 40% of symptomatic patients, and is associated with 5-, 10-, and 15-year mortality rates of 30%, 50%, and 70%, respectively. The primary location of claudication pain is in the calf muscles and alterations of the microcirculation contribute to functional impairment in PAD patients. However, it remains unknown how impairments of the microvascular circulation in the calf muscles contribute to disease progression and adverse outcomes. We have developed a method based on contrast-enhanced magnetic resonance imaging (CE-MRI) to evaluate microvascular perfusion in the calf muscles, the predominant location of PAD claudication pain. In PAD, prolonged ischemia results in pathophysiological changes of the extracellular matrix and is associated with inflammatory processes that can result in interstitial fibrosis. MRI can non-invasively quantify extracellular volume fraction (ECV) and perform T1 mapping which can detect diffuse (interstitial) disease. These findings suggest that non-invasive T1 mapping and ECV quantification, known measures of interstitial fibrosis and inflammation, are of interest in PAD. However, the role of calf muscle ECV and T1 mapping remains understudied in PAD, and its role with disease progression and outcomes is unknown. The overall objective is to combine multi-modality data and multi-sequence MRIs with sophisticated machine learning (ML)/deep learning (DL) approaches that will help identify PAD patients with an elevated risk profile and to extract features of disease progression. The work will be based on our previously developed ML/DL methods incorporating support vector machine (SVM), logistic regression (LR), extreme gradient boosting (XGB), and convolutional neural network (CNN) approaches including resNet and divNet. The readily available multi-sequence and time series MRI data will be combined with clinical patient data obtained from the electronic health record (EHR) system, questionnaires, exercise capacity measures and demographic information. The long-term goal of this project is to prospectively study the utility of a ML/DL framework to assess PAD patients longitudinally and for the computational tools to aid in the clinical treatment decision process.  

XX
Kaul, Ribhu (#30195)
Finite Size spectra of fermions from quantum Monte Carlo

Quantum Monte Carlo provides us with the most controlled way to study two dimensional quantum many body problems, when the sign problem is absent. Quantum Monte Carlo which works in imaginary time is usually limited to thermodynamic quantities in equilibrium (ground state) or equal time correlation functions. Even in sign problem free models extracting the spectrum of excitations is a difficult challenge – this issue is related to the well known difficulty of analytically continuing from imaginary time to real time to obtain spectral functions. Recently we have shown with our collaborators this problem can be mitigated if one limits oneself to low lying energy spectra (as opposed to the entire spectral function) in quantum spin models. An outstanding questions is how to adapt this to fermionic models which are simulated with entirely different algorithms than quantum spin models. 

XX
Mahony, Shaun (#30198)
Predicting genomic regulatory elements across species using domain adaptive neural networks

Every cell in our body contains a copy of the same DNA genome, but different cell types achieve their own particular biological functions and behaviors by producing different sets of RNA molecules and proteins. The first step in gene regulation is controlled by proteins called transcription factors, which recognize specific DNA patterns on the genome and recruit molecular machinery to turn nearby genes on or off (eventually determining which RNAs and proteins are created). Thus, the function and identity of a cell is determined by which combination of transcription factors are active in that cell type. The architecture of gene regulatory networks – i.e., the transcription factors that are active in a given cell type and the DNA patterns that they recognize – is highly conserved across closely related species. In other words, the regulatory patterns that determine whether a gene is expressed in human livers are very closely similar to the regulatory patterns that determine whether a gene is expressed in mouse livers. This observation raises an interesting question: rather than trying to experimentally characterize gene regulatory codes in every species separately, could we train a computational model on experimental data performed in one species and then use that model to accurately predict what the same experiment would look like in other species? Such cross-species regulatory models would open the possibility of studying gene regulation in cell types that are difficult to study directly in humans, and it would allow the study of cell types in agricultural and other species of interest without the need for costly experiments. We and others have demonstrated that convolutional and transformer based neural networks are highly effective at learning gene regulatory codes from experimental data. However, current approaches are not fully effective at cross-species predictions; models trained on one species and tested on another consistently underperform models trained and tested in the same species. In previous work, we demonstrated that this performance gap is due to a domain shift that exists between genomes from different species (Cochran, et al. Genome Res, 2022). We further showed that a simple domain adaptation approach closed some of the performance gap, but issues remained. In this project, our goal is to implement additional domain adaptation strategies to enable accurate cross-species gene regulatory predictions. We are particularly interested in the multi-source training scenario, where we have labeled training data from multiple genomes/domains. We wish to test several recent approaches for training domain adaptive neural networks, including those based on moment matching and Wasserstein distance guided representations. Our ultimate goal is to train cross-species models that can accurately predict gene regulatory features across hundreds of vertebrate genomes, thereby enabling the study of how regulatory networks and cellular function evolves across species. 

XX
Medvedev, Paul (#30200)
Reinforcement learning for understanding the complexity of our genomes

How much information does the human genome contain? This seemingly simple question remains unanswered. However, it is a foundational question that points the arrow of measurability at questions about the relationship of human abilities to those of artificial intelligence.  The notion of information content is something that Computer Science and Electrical Engineering has long been well suited to answer. Informational entropy, first introduced by the pioneering work of Claude Shannon, has been applied to the DNA sequence of the human genome, but the results do not give a satisfactory answer to our guiding question. They essentially say that the genome has entropy similar to a random string, in spite of the fact that biology tells us the genome is far from random.  Entropy-based compressors can represent the human genome sequence in about 500 million bits, giving an upper bound on the information content. As better and better compressors are applied the number of bits can decrease. At the other end of the spectrum, the most powerful notion of information content is that of Kolmogorov Complexity. The Kolmogorov complexity of a string is the length of the shortest program that can generate it. Kolmogorov complexity is widely adopted as the best measure of the true information content of a string.  However, computing the Kolmogorov complexity of the human genome is not something that there exist algorithms to do. In this project, we propose to use Reinforcement Learning to tackle the challenge of coming up with a program to generate a human genome. The length of such a program would be a much better estimate of the information content of the human genome than informational entropy. In a recent Nature paper, Mankowitz et al showed how Reinforcement Learning can be used to reduce the length of an assembly program to sort numbers to what has not been previously possible using human creativity. The same idea can be applied here. We propose a simple assembly language for generating a genome: INSERT, DUPLICATE, DELETE, and we task the agent to write the shortest possible program to generate the sequence of the human genome. This project is a novel application of Reinforcement Learning to the study of a fundamental question of biology and evolution, which is how much information our genome contains. We hope that at the end of this project we will have a working prototype that can beat the current entropy-based compression levels.  References1. Mankowitz et al, Faster sorting algorithms discovered using deep reinforcement learning, 2023, Nature.  

XX
Chen, Qian (#30202)
Learning to Decide from Decentralized Data: A Federated Multi-Modal Framework for Language-Enhanced Clinical Decision Support

Accurate and timely clinical decision-making increasingly relies on information from heterogeneous sources within electronic health records (EHRs), including both structured data (e.g., vitals, laboratory results, medications) and unstructured clinical narratives written by physicians. While structured variables are widely used in predictive models, unstructured text—despite containing rich contextual information about clinical reasoning, uncertainty, and disease progression—remains underutilized in deployable decision-support systems. This project aims to develop a privacy-preserving, multi-modal clinical decision-support framework that integrates structured EHR data with physician notes using large language model (LLM) representations, with the goal of supporting—not replacing—clinical judgment.A key challenge in developing robust clinical AI systems is generalizability across institutions. Models trained at a single hospital often overfit to local patient populations and documentation styles, leading to degraded performance when deployed elsewhere. Leveraging data from multiple institutions is therefore essential, yet strict privacy regulations (e.g., HIPAA), IRB requirements, and institutional governance policies prevent centralized data pooling. To address this barrier, the proposed project adopts federated learning, which enables collaborative model training across institutions without sharing raw patient data. Each participating site performs local updates and shares only aggregated model parameters, preserving privacy while benefiting from distributed data.However, federated learning in healthcare faces an additional challenge: non-IID data heterogeneity. Patient demographics, disease prevalence, and clinical documentation practices vary substantially across care units and hospitals, which can destabilize training and impair model performance. This project addresses these issues by developing a federated multi-modal learning framework that explicitly accounts for institutional and domain heterogeneity. We employ a FedProx-style optimization objective with an ℓ₂ proximal regularization term to stabilize training under heterogeneous data distributions and mitigate client drift. Beyond empirical evaluation, the project will investigate the theoretical properties of this regularization—examining how it affects convergence, stability, and bias under varying degrees of heterogeneity.Methodologically, the project explores two complementary strategies for multi-modal representation learning: (1) a unified encoder that jointly learns representations from structured variables and clinical text, and (2) a dual-pass design in which the same encoder processes structured data and text separately before fusion. This comparison sheds light on how cross-modal interaction and representation sharing influence generalization in federated environments.The framework will be evaluated using a large, de-identified critical care dataset, configured to simulate a multi-site federated setting with heterogeneous patient populations and documentation styles. The primary application is disease identification and risk prediction across multiple diagnostic categories. Importantly, the envisioned clinical use positions the system as a decision-support “safety net”: when model predictions diverge from frontline clinical assessments, cases are flagged for senior clinician review rather than automated decision-making.Overall, this project advances the foundations of privacy-preserving, language-enhanced clinical decision support by integrating AI, data science, and computational methods. It contributes both methodological insight and practical guidance for deploying trustworthy multi-modal AI systems in real-world healthcare settings.

XX
Kovalenko, Ilya (#30203)
GenTwin: LLM-Powered Generative Digital Twins for Dynamic Control of Adaptive Manufacturing Systems

Digital twins (DTs) can significantly improve performance, energy efficiency, and operational flexibility in manufacturing systems. However, their development remains expert-intensive, time-consuming, and configuration-specific. These limitations hinder DT scalability in manufacturing environments that require frequent reconfiguration. This work leverages recent advances in Large Language Models (LLMs) to address this challenge by introducing a Generative Digital Twin (GenTwin) framework in which LLMs act as a reasoning layer over structured, machine-readable system representations. The proposed approach enables automated construction, adaptation, and validation of DTs in response to system reconfigurations, reducing manual effort and improving deployability in modern manufacturing systems.

XX
Subramanyam, Anirudh (#30204)
Development of a GPU-Native Solver for Vehicle Routing and Scheduling in Julia

Background and GapAutomated vehicle scheduling and route optimization is fundamental to modern transportation and logistics. Applications include intra-factory transport by autonomous mobile robots, inter-city truck freight, urban last-mile delivery, and ride-sharing platforms. All of these applications solve a variant of the `vehicle routing problem’, a foundational problem in discrete optimization. Recent applications demand vehicle routing solvers that run in real time and at large scale. GPU acceleration is a natural fit for this shift, especially as GPUs become common in modern computing systems, data centers, and embedded/edge platforms such as autonomous robots and drones. However, state-of-the-art solvers rely on branch-and-bound and local/metaheuristic search algorithms that were engineered for CPUs and are difficult to port efficiently to GPUs. This is because they use branch-heavy logic for neighborhood exploration, pointer-heavy data structures, and sequential search procedures that do not map cleanly to “single-instruction multiple-thread” GPU executions.Recent solvers like NVIDIA cuOpt demonstrate the promise of GPU acceleration. However, cuOpt is implemented in CUDA C++, which limits extensibility and imposes steep development barriers for industrial engineering researchers who need rapid iteration and customization in a high-level language. In particular, its architecture makes it difficult to(i) customize search neighborhoods and kernels,(ii) implement new application-specific constraints (e.g., new vehicle types), and(iii) embed the solver inside larger planning workflows. Proposed ApproachThe project will develop one of the first open-source GPU-native vehicle routing solvers written entirely in the Julia programming language. Julia allows us to write high-level code (e.g., using CUDA.jl) that achieves performance comparable to C++ without the development overhead. Packages like KernelAbstractions.jl provide hardware-agnostic layers, enabling a single codebase that can target NVIDIA GPUs and other backends via portable kernels. Julia’s powerful multiple dispatch can support extensible constraint modules without rewriting the core search engine. Project GoalsThe goal is to provide an open-source Julia package with:(i) GPU-friendly data structures for representing vehicle routes and operational constraints (including time windows, pickup and delivery precedences, multiple depots, and heterogeneous fleets),(ii) GPU kernels for high-throughput parallel evaluation of local search neighborhoods and route feasibility,(iii) metaheuristic solvers for vehicle fleet optimization, and(iv) documentation and benchmarks against existing CPU solvers and NVIDIA cuOpt on standard test instances. Expected OutcomesThe project provides an opportunity for the selected graduate student to train at the intersection of high-performance GPU computing, combinatorial optimization, and transportation and supply chain research. In addition to PI Subramanyam, the student will also train under the guidance and mentorship of external collaborators from industry and national labs (e.g., AMD and Argonne). We anticipate at least one peer-reviewed manuscript and one conference presentation in a leading computational optimization or operations research outlet. In addition, the developed software will serve as the foundation for seed proposals to the NVIDIA Academic Grant Program, whose call explicitly mentions operations research and route optimization, and larger multi-year proposals to the Operations Engineering program within NSF-CMMI.

X
Kaul, Ribhu (#30205)
Simulating conformal field theories on a quantum computer (correct version)

An enduring intellectual challenge in fundamental theoretical physics is to understand quantum field theory in non-perturbative settings, e.g. conformally invariant field theories that emerge at critical points. While remarkable progress has been made in this area over the last fifty years, many exciting questions remain outstanding? A new tool to solve these problems has emerged with various quantum simulation platforms that have emerged in the last decade. In this research project we will develop a method to study how three dimensional conformal critical points can be studied on various current platforms of qubits. Our idea is to take advantage of the state-operator correspondence of conformal field theory which reduces the problem of conformal field theory to the spectrum of a quantum many body problem. A central goal of this project is how to encode these quantum many body problems in current qubit technology, and how to extract conformal data from these quantum simulations on a quantum computer. The project will involve extensive simulation of quantum systems on classical computers to benchmark the proposed quantum simulations. We will work with quantum computing expert Professor Bryce Gadway to figure out how best these quantum simulations can be carried out on a system of Rydberg atoms. This project will involve home grown code development and large scale simulations to collect data on the spectra of the Hubbard model, and understanding of atomic physics Rydberg atom quantum simulators.

XX
Mittal, Tushar (#30206)
Fusing Analog Archives and Modern Geophysics: A Data-Driven Multi-Physics Reconstruction of the Central Atlantic Magmatic Province Sills

The 201 Ma Central Atlantic Magmatic Province (CAMP) sills in the Newark-Gettysburg basins are a critical geological target linked to both the End-Triassic Mass Extinction and critical mineral deposits. However, the causal mechanisms linking sill emplacement to both climate-forcing volatile release (CO₂, CH₄) and hydrothermal ore formation remain poorly constrained due to a lack of resolved 3D subsurface architecture, especially the complex multi-layered magmatic sills. This project proposes a 4D thermo-chemical reconstruction of the Newark-Gettysburg Basin sills using a multi-modal data fusion framework: high-resolution 2025 USGS Earth MRI aeromagnetic surveys, state geological GIS maps, and unstructured legacy data from geological reports via modern Natural Language Processing (NLP) pipelines. We will develop a computational workflow that integrates these inputs into an existing multi-physics thermal and fluid flow solver to constrain the emplacement history of the intrusions. By identifying specific zones of fracture permeability and modeling the metamorphic decarbonation reactions, this work will provide rigorous, data-constrained estimates of the carbon pulse responsible for the End-Triassic Mass Extinction and locate predictive targets for cobalt mineralization. This research advances the mission of computational science by establishing a scalable pipeline for transforming existing, but hard to use, scientific archives (e.g., geological reports) into quantitative boundary conditions for high-performance computing. 

XX
Honavar, Vasant Gajanan (#30207)
Exploring Hyperdimensional Representation for Quantum AI

Project Overview: This project proposes an exploratory, interdisciplinary investigation of the role of quantum computing in modern machine learning, with a particular focus on transformer-based models and representation learning. Rather than pursuing algorithmic speedups in isolation, the project adopts a representation-centric perspective to examine whether quantum state representations and similarity measures offer qualitatively different computational affordances than those used in contemporary AI systems. Transformer-based machine learning models have become the dominant paradigm for language, vision, and multi-modal AI, but their success relies on increasingly expensive representational and computational assumptions. At the same time, quantum computing is frequently proposed as a disruptive alternative for machine learning, despite ongoing uncertainty about where it provides genuine advantages for modern architectures. This project is motivated by the observation that many debates around quantum machine learning conflate algorithmic acceleration with representational innovation. By reframing the problem around representations—how information is encoded, compared, and transformed—the project seeks to clarify both the opportunities and the fundamental limitations of quantum approaches for contemporary AI. The project brings together accomplished senior researchers with complementary expertise in AI (Honavar) and Quantum Computing (Ghosh) with a rising researcher. Project Objectives and Scope. The Rising Researcher will pursue one or more of the following objectives, depending on background and interests: Analyze formal relationships between transformer representations (embeddings and attention scores) and candidate quantum state representations. Investigate quantum similarity measures as analogs or alternatives to dot-product attention. Examine representational constraints arising from data loading, noise, and training limitations in quantum settings. Propose minimal hybrid classical–quantum workflows informed by representational analysis. The project is intentionally exploratory and conceptual, with expectations calibrated to the scope and duration of Rising Researcher support. Expected Outcomes include: A conceptual framework clarifying representational opportunities and limits of quantum computing for modern machine learning. One or more publishable manuscripts or workshop papers. Preliminary results that position the Rising Researcher and PI for future external funding proposals.  

XX
Roman-Reyna, Veronica (#30208)
AI-Driven Workflow for Functional Profiling of Microbial Communities in Compost Methane Biofilters.

The mitigation of environmentally harmful gases often relies on naturally occurring microorganisms operating within complex microbial communities. Harnessing the metabolic capabilities of these communities in engineered biotechnological systems provides an opportunity to understand their collective behavior, interactions, and responses to controlled operational conditions. Compost-based methane biofilters serve as a representative case study in which methanotrophic metabolism is utilized to oxidize methane from contaminated air streams; however, the functional potential of the broader microbial consortia driving this process remains poorly characterized. Using existing metagenomic datasets generated across multiple process conditions, this project aims to develop an AI-driven bioinformatics workflow that integrates established computational tools to elucidate system-level functional potential in compost methane biofilters. The workflow will identify functional patterns and pathway-level shifts in microbial communities, relating these to key biogeochemical processes, including methane oxidation, carbon assimilation, nitrogen cycling, and copper homeostasis. This project will enable mechanistic understanding and improved prediction of biofilter performance, while establishing a reusable analytical framework for future studies of microbial consortia–based systems.

XX
Li, Qunhua (#30209)
Reproducible and Transparent LLM-assisted Annotation for Political Text and Multimodal Data

Large language models (LLMs) are increasingly used to annotate large-scale text and multimedia data in social science research, enabling unprecedented analysis of how elected officials communicate with constituents across digital and physical platforms. However, LLM annotation suffers from poor reproducibility, opaque reasoning, and misalignment with human codebooks, which can bias downstream statistical inference and undermine trust in AI-assisted research.This project develops reproducible and transparent LLM annotation frameworks for political text and multimodal data. We introduce (1) a dual-agent system that enforces codebook-grounded rationales during annotation, and (2) a multi-agent, human-in-the-loop framework that uses diverse LLM agents to iteratively refine codebooks and produce expert-quality annotations. The resulting annotations will serve as reliable surrogates for costly and error-prone human coding in downstream analysis. These methods will be applied to large-scale political communication texts curated by the team. This provides an unparalleled testbed for scalable, reproducible AI methods to study democratic accountability and governance.A Rising Researcher will implement, evaluate, and extend these frameworks, producing open-source tools, curated annotations, and publishable research at the intersection of AI, data science, and computational social science.

XX
Zhu, Linxiao (#30210)
Machine Learning Design of Selective Emitter for Thermophotovoltaics

There is significant need to develop efficient heat-to-electricity converter that operates at high temperatures ~1000 K due to the interests including developing modular nuclear reactor for coping with surging electricity demand from AI, deployment of nuclear reactor in space, and recovery of waste heat from industrial processes and concentrated solar thermal. Conventional steam-based power cycle is bulky and challenging to be miniaturized. Further, steam-based power can have safety concern due to leakage of the working fluid. Developing efficient solid-state heat-to-electricity converter can provide attractive alternative cycle, in particular for miniaturizing the system, such as for modular nuclear reactor, and for applications where a compact size and absence of vibration and noise is preferred. Existing main technology for solid-state converter based on thermoelectric effect suffers from limited efficiency typically below 10% at these high temperatures due to materials challenge [1]. Thermophotovoltaic power generation by converting thermal radiation photons emitted by a hot thermal emitter directly to electricity using photovoltaic effect has garnered substantial recent interest. Power conversion efficiency of thermophotovoltaics can overcome the Shockley-Queisser limit which is the efficiency limit for single-junction solar cells [2], and has shown rapid progress in recent years [3]. The PI Zhu has worked on the field of thermophotovoltaics [4-6], but the emitter designs have zero or relatively small spectral selectivity. A key challenge is to achieve efficient thermophotovoltaics is to develop a spectrally-selective emitter that maximally emit thermal radiation at photons energies beyond the bandgap energy of photovoltaic cells, but have negligible emissivity below bandgap energy, as illustrated in Fig. 1. Such spectrally-selective thermal emitter can lead to high heat-to-electricity conversion efficiency. Moreover, the reduced parasitic heating on the photovoltaic cell from the selective emitter, will lower the temperature of the photovoltaic cell, leading to enhanced lifespan of the system and reduced need for active cooling for the system. However, achieving such spectrally-selective thermal emitter is challenging, as dramatically different optical properties are needed over a broad spectral range. Also, the structures need be high-temperature stable. Typically, physical intuition is not sufficient to achieve ideal selective emitter. Therefore, development of machine learning method to achieve optimized design of thermal emitter is need. We will first consider machine learning design of multilayer to achieve selective emission. Further, we will consider apply machine learning to nanostructured emitter to enhance the spectral selectivity.This research is done in collaboration with Dr. Bed Poudel from the Department of Materials Science and Engineering. Thus, the rising researcher will have opportunity to learn from an interdisciplinary team.References Cited[1]           J. He and T. M. Tritt, Science 357, eaak9997 (2017).[2]           E. Rephaeli and S. Fan, Opt. Express 17, 15145 (2009).[3]           A. Lenert, D. M. Bierman, Y. Nam, W. R. Chan, I. Celanovic, M. Soljacic, and E. N. Wang, Nat. Nanotechnol. 9, 126 (2014).[4]           K. A. Arpin et al., Nature Communications 4, 2630, 2630 (2013).[5]           A. Fiorino, L. Zhu, D. Thompson, R. Mittapally, P. Reddy, and E. Meyhofer, Nat. Nanotechnol. 13, 806 (2018).[6]           R. Mittapally, B. Lee, L. Zhu, A. Reihani, J. W. Lim, D. Fan, S. R. Forrest, P. Reddy, and E. Meyhofer, Nature Communications 12, 4364, 4364 (2021).[7]           A. Sharan and L. Zhu, J. Appl. Phys. 138, 083103 (2025).[8]           A. Kalantari Dehaghi and L. Zhu, arXiv e-prints, arXiv:2511.20372 (2025).

X
Papakonstantinou, Kostas (#30212)
AI-supported Topology Optimization of Metamaterials

 The primary scientific objective of this effort is to investigate AI frameworks for topology optimization of metamaterials, enabling the design of devices and structures with enhanced functionalities. Metamaterials are engineered substances designed to have unique properties not found in naturally occurring materials. Unlike traditional materials, whose properties depend on their chemical or atomic composition, metamaterials derive their behaviors from their internal structure and geometry. This engineering field can thus enable radical new performance, technologies and breakthroughs.We will explore and benchmark a range of optimization techniques, from established gradient-based, adjoint methods to evolutionary approaches, and we will investigate how modern AI techniques can address the multiple design, algorithmic, and dimensionality challenges. In particular, we are interested in investigating: Whether/how diffusion, flow-based, and other generative deep learning models and architectures can be integrated in the optimization framework, to enable better and more scalable designs under nonlinearities.Whether/how Deep Reinforcement Learning (DRL) approaches can ease or eliminate the need for supervised learning of such models. How these techniques behave when underlying uncertainties are also prevalent and the entire formulation should be cast based on Uncertainty Quantification (UQ) principles.   By achieving these objectives, this project will set the foundations for advancing relevant algorithmic and methodological understanding and will provide several future research and funding opportunities in the fields of optimization, metamaterials, deep learning, and UQ.

XX
Dudas, Patrick (#30213)
AI-Powered Smart Glasses for Digital-Twin-Based Training in Manufacturing

Smart glasses are emerging as a transformative technology in industrial environments, driven by advancements in augmented reality and artificial intelligence. Early generations of smart glasses demonstrated value by providing hands-free, context-aware support for complex tasks, while also revealing ergonomic and safety challenges that slowed adoption. Recent studies in logistics and warehousing show renewed potential: artificial intelligence (AI)-enabled smart glasses can deliver real-time guidance and reduce cognitive demands. Yet questions remain around trust, privacy, and overall user acceptance.This study examines the effectiveness of Meta Display smart glasses in warehouse order-picking tasks through a controlled, within-subjects laboratory experiment. Participants will complete equivalent tasks using traditional picking methods, voice-only Meta smart glasses, and full AI-supported Meta Display smart glasses. Performance outcomes—including task speed, accuracy, workload, and error recovery—will be evaluated alongside qualitative feedback on usability and situational awareness. By combining objective metrics with user experience insights, this research will clarify how AI-enabled wearables influence efficiency and human performance. The findings will guide the development of workforce-aligned training that prepares individuals and organizations for evolving Industry 4.0 and Industry 5.0 environments.

XX
Honavar, Vasant Gajanan (#30214)
Advanced Machine Learning Methods for Predictive Modeling of Health Risks

Predictive modeling of health risks from longitudinal, multi-modal, health dataProject Overview. This project proposes an exploratory, interdisciplinary investigation of predictive modeling of health risks using multimodal longitudinal health data. The focus is on methodological foundations for learning from temporally evolving, heterogeneous data rather than on immediate clinical deployment.Health data collected over time—such as electronic health records, clinical measurements, and other patient-related data—are characterized by irregular sampling, missing values, and evolving distributions. These properties pose fundamental challenges for reliable predictive modeling.This project is motivated by the need for principled approaches to representation, temporal modeling, uncertainty and bias quantification and mitigation in longitudinal health risk prediction. By emphasizing foundational modeling questions, the project seeks to clarify both the potential and the limitations of current AI approaches in real-world health data settings.The project brings together accomplished senior researchers with complementary expertise in AI and health sciences with a rising researcher.Project Objectives and ScopeThe Rising Researcher will pursue one or more of the following objectives, depending on background and interests:Develop representations for multimodal longitudinal health data  that capture temporal structure and cross-modal relationships for health risk prediction.Evaluate predictive models for health risk assessment under realistic conditions of missing data and distribution shift.Investigate uncertainty-aware or interpretable modeling approaches suitable for health risk prediction.Analyze sources of bias and limitations in longitudinal predictive modeling.The team has access to large clinical data set of patients with and without some form of cardiovascular disease diagnosis. The project is intentionally exploratory and conceptual, with expectations calibrated to the scope and duration of Rising Researcher support.Expected Outcomes. Expected outcomes include:Methodological insights into predictive modeling of health risks from multimodal longitudinal data.Proof-of-concept demonstration of improved cardiovascular risk prediction relative to current state-of-the-art (AHA’s PREVENT calculatorOne or more publishable manuscripts.Preliminary results that position the the collaborative team for future proposals for funding from NIH, NSF, or ARPA-H. 

XX
Hu, Yuqing (#30215)
AI-Driven BIM-Graph Agents for Intelligent and Adaptive Indoor Robotic Inspection

Indoor facility inspection is a critical yet under-automated component of building operations, relying heavily on manual workflows and fragmented data collected through visual inspection, mobile devices, and unstructured reports. While Building Information Models (BIM) and digital twins are increasingly available, they are rarely used as active computational representations for inspection planning, spatial reasoning, or adaptive decision-making. At the same time, recent advances in large foundation models have demonstrated strong capabilities in reasoning and perception but lack access to structured, domain-specific representations of the built environment. This project addresses this gap by developing an AI-driven computational framework that integrates BIM-derived graph representations with foundation-model-based agents for intelligent and adaptive indoor robotic inspection.The proposed research centers on a BIM-Graph representation, in which architectural, mechanical, and spatial elements are encoded as hierarchical, multi-scale graphs capturing topology, adjacency, and functional relationships. This representation enables AI agents to reason explicitly over building structure rather than relying solely on unstructured sensory inputs. A Large Language Model (LLM)–based planning agent operates over the BIM-Graph to generate inspection plans and waypoints informed by building semantics, while a Vision-Language Model (VLM) agent grounds perception in real-time observations, detects anomalies or obstructions, and updates the graph during execution. The interaction between these agents forms a cooperative planning–perception loop that supports adaptive re-planning in dynamic indoor environments. The project will evaluate the computational performance of this framework in terms of planning efficiency, task completion, and spatial localization accuracy within digital twin settings.This project is well suited for ICDS Rising Researchers with interests in artificial intelligence, graph learning, robotics, digital twins, or computational sciences. Potential contributions include developing BIM-to-graph extraction pipelines, designing graph-aware reasoning and prompting strategies for foundation models, integrating perception outputs with structured graph updates, and conducting computational evaluations of agent performance. Through this work, Rising Researchers will gain hands-on experience at the intersection of AI, data science, and the built environment, while engaging with the ICDS community through seminars, workshops, and symposia.

XXX
Honavar, Vasant Gajanan (#30217)
Understanding and Improving the Robustness of Learning in Overparameterized Systems

Project Overview: This project proposes an exploratory, interdisciplinary investigation of robust learning in modern overparameterized machine learning systems through the lens of Z-information. While many learning models achieve similar predictive accuracy, they often differ dramatically in robustness to perturbation, retraining, noise, and distribution shift.Z-information extends classical information-theoretic measures by explicitly accounting for configuration entropy: the multiplicity of internal representations and computational paths that support a given observable behavior. From this perspective, robust models are those whose predictive performance is supported by a broad equivalence class of internal realizations, rather than by fragile, narrowly tuned configurations.This project is motivated by the need for principled learning objectives and evaluation criteria that distinguish among accuracy-matched solutions based on robustness. By grounding robust learning in Z-information, the project seeks to clarify how redundancy, stochasticity, and architectural depth can improve stability without violating classical information-theoretic constraints.The project brings together accomplished senior researchers with complementary expertise in AI and statistical physics and materials science with a rising researcher.Project Objectives and Scope. The Rising Researcher will pursue one or more of the following objectives, depending onbackground and interests:Develop learning objectives or regularization strategies inspired by Z-information that explicitly favor solutions with high configuration entropy.Evaluate robustness of learned models under perturbation, retraining, and distribution shift while holding predictive accuracy fixed.Investigate empirical proxies for configuration entropy, such as curvature, perturbation sensitivity, and retraining variability.Analyze sources of fragility and bias in overparameterized learning systems through the lens of multiplicity and redundancy of solutions and learning dynamicsThe project is intentionally exploratory and conceptual, with expectations calibrated to the scope and duration of Rising Researcher support.Expected Outcomes include: Methodological insights linking robustness and stability to configuration entropy and Z-information.Proof-of-concept demonstrations of robustness improvements without loss of predictive accuracy.One or more publishable manuscripts.Preliminary results that position the collaborative team for future proposals for funding from NSF or DOE 

XX
Tak, Hyungsuk (#30218)
Deep Learning Model Checking with Application to Astronomical Reverberation Mapping Time Lag Estimation

The primary goal of this project is to apply advanced model-checking procedures, such as simulation-based calibration and posterior predictive checks, to deep learning models in order to identify the best-performing model among the candidate models developed in the PIs’ previous work for inferring time lags in astronomical reverberation mapping. For reference, reverberation mapping provides a reliable method for mass measurement by estimating time lags between multi-band time series data sets of active galactic nuclei. This is fundamental to understanding the co-evolution of supermassive black holes and galaxies.

X
Edgerton, Jared (#30219)
Paramilitary Violence in Strong States: The Ku Klux Klan and State-Sanctioned Coercion

This project reconstructs the organizational structure, coordination, and violence of the Ku Klux Klan (KKK) using already-declassified FBI and related archival records. While the KKK has been widely studied historically, much of the empirical evidence on how it operated as a paramilitary organization—how members coordinated, how violence and intimidation were deployed, and how activity varied across time and place—remains locked in unstructured archival text. Declassified FBI records contain detailed descriptions of KKK meetings, leadership structures, intimidation campaigns, violent events, arrests, and law enforcement responses, but these materials are not readily usable for systematic social-scientific analysis without substantial data engineering and measurement work.The central objective of this project is therefore to transform publicly available archival records into research-ready actor–event and network datasets. Using reproducible computational pipelines, the project will extract individuals, organizational units, events, and relationships from declassified FBI documents and related public records. These data will be used to construct dynamic networks linking people, local chapters, and paramilitary activities over time, enabling analysis of how KKK organization and violence evolved in response to political context and state enforcement.Substantively, the project focuses on research questions that are well supported by the available data: (1) how paramilitary violence functioned as a form of informal repression within a strong state; (2) how patterns of intimidation and violence were structured to maximize signaling and performative effects; and (3) how KKK organizations adapted structurally over time in response to surveillance, arrests, and legal pressure. Rising Researchers will contribute to document processing, annotation, entity resolution, and network construction, producing reusable data infrastructure that supports multiple empirical papers and future extensions to other archival corpora on domestic political violence. The project advances ICDS priorities by applying data science and AI methods to convert complex historical records into validated measures for interdisciplinary social-scientific research.

X
Radlinska, Aleksandra Z (#30220)
A Digital Twins for Concrete Slabs Using GPR Simulation and Machine Learning

Ground penetrating radar (GPR) for the non-destructive evaluation (NDE) of concrete infrastructure generates high-dimensional waveform data that are challenging to analyze, interpret, and integrate into reliable condition assessments. Although electromagnetic simulation enables realistic modeling of wave propagation under controlled defect scenarios, translating these signals into accurate predictions of subsurface conditions remains difficult. At the same time, recent advances in machine learning provide powerful tools for extracting information from complex signals, provided that representative and well-characterized training data are available. Emerging digital twin concepts offer a promising framework for integrating physics-based simulation, experimental measurements, and data-driven inference into continuously updated computational models of physical systems.This project aims to develop physics-based digital twin models of concrete laboratory slabs by combining high-fidelity finite-difference time-domain electromagnetic simulations, controlled experimental GPR measurements, and machine learning techniques. Synthetic datasets will be generated to span realistic variability in defect geometries, material properties, and antenna configurations, while laboratory measurements will be used to validate and calibrate the simulations. Signal processing and feature extraction pipelines will transform raw radar data into representations suitable for learning-based inference. Machine learning models will then estimate subsurface defect properties, including depth, size, and spatial extent, while accounting for uncertainty and variability. The resulting closed-loop computational workflow will enable rapid and scalable inference, continuous refinement of model fidelity as new data become available, and objective performance benchmarking. The proposed framework provides a reusable platform for advancing digital inspection methodologies in laboratory settings and establishes a foundation for future extension to in-service bridge deck monitoring and broader infrastructure health assessment applications.

X
Brunner, Gerd (#30221)
Plaque Characteristics in Atherosclerotic Carotid Artery Disease

Atherosclerotic carotid artery disease (ACAD) is one of the major underlying causes of ischemic stroke. Advanced atherosclerotic plaques are characterized by the formation of a lipid rich/necrotic core (LRNC) which is often accompanied by a thin-fibrous cap, intra-plaque hemorrhage (IPH), and calcium crystal formation leading to plaque calcification. Plaque characteristics have been associated with a higher risk of plaque rupture, highlighting the importance of plaque imaging. Magnetic resonance imaging (MRI) based carotid plaque characteristics and the remodeling index have been associated with cardiovascular (CV) events in participants of the epidemiological Multi-Ethnic Study of Atherosclerosis (MESA). In the MRI sub-study of the AIM-HIGH trial, carotid lipid core and fibrous cap thickness or rupture were associated with CV outcomes. Calcified plaques in the carotids have been studied in the Diabetes Heart Study with computed tomography (CT) scans which identified carotid artery calcification as a significant predictor of cardiovascular events. Taking together these findings highlight the importance of carotid plaque characteristics for CVD risk assessment. An image fusion workflow that incorporates multi-modality imaging could help with validating and standardizing carotid plaque characteristics. This proposal will explore a multi-modality approach to identify carotid plaque characteristics which will be validated with histology sections. The resulting validated plaque characteristics model will then be applied to readily available carotid plaque imaging scans which will be combined with data from the electronic health record (EHR), lipid levels, demographic information, and outcomes data of ACAD patients. The multi-modality data will be utilized to develop a machine learning (ML)/deep learning (DL) approach to identify ACAD patients with adverse outcomes. The long-term goal of this project is to develop a prospective clinical trial to study the utility of a ML/DL framework to assess ACAD patients longitudinally and for the computational tools to aid in the clinical treatment decision making process.

XX
Chakraborty, Prakash (#30222)
Signature Transforms for Phase Detection in Sleep and Acoustic Time Series

Time-series data in scientific and engineering settings are often heterogeneous and imperfect: multichannel, noisy, and frequently irregularly sampled. They also commonly exhibit phase transitions, where the underlying dynamics shift in ways that are scientifically meaningful. Examples include transitions between sleep stages (or related physiological states) and changes in acoustic patterns that reflect different sources, environments, or operating conditions.This project will develop a signature-based phase detection and time-series segmentation in two application domains: (i) Sleep physiology: segmenting physiological signals into interpretable phases (e.g., sleep stages and transition periods) and detecting change points corresponding to state transitions. (ii) Acoustics: segmenting acoustic recordings into phases and detecting change points associated with changes in source characteristics, environment, or operating condition.Methodologically, the project will leverage the signature transform from rough path theory, which represents a time-indexed path through a structured collection of features with strong theoretical foundations and increasing empirical adoption. The Rising Researcher will contribute by implementing scalable signature feature computation and model training pipelines, designing and evaluating detection or segmentation methods, and benchmarking performance against baselines (e.g., spectral features, state-space models or modern deep learning approaches). Expected outcomes include reusable software, validated empirical results on open datasets, and a publishable manuscript or technical report.

XX
Edgerton, Jared (#30224)
Rhetorical Polarization and Media Transformation, 1995–2025

This project supports an ICDS Rising Researcher (advanced Ph.D. student, postdoctoral researcher, or non-tenure-line faculty member) to conduct computational analysis of long-run changes in elite political rhetoric on U.S. television news. The Rising Researcher will take primary responsibility for extending, labeling, and modeling a large-scale corpus of televised political discourse spanning 1995–2025, working under the supervision of the project PIs. The project is designed as a discrete, technically focused research module integrating natural language processing, Bayesian modeling, and high-performance computing in direct alignment with the ICDS mission.The motivation is that polarization is neither linear nor uniform: its intensity varies across issues, time, and communicative contexts, and it may exhibit cycles of polarization, realignment, and depolarization rather than monotonic change. A longer, fine-grained view of televised discourse—who speaks, on which network, and in what rhetorical register—enables direct measurement of how rhetorical polarization evolves and how these shifts map onto changes in party coalitions and media institutions.Data will include transcripts of news and talk programming from ABC, CBS, CNN, MSNBC, NBC, and Fox News archived in LexisNexis from 1995–2020, with coverage extended through 2021–2025 via the LexisNexis API. The unit of analysis is an utterance: a speaker-attributed span of speech associated with a timestamp, program, and network. Key tasks include metadata normalization, near-duplicate detection and removal, segmentation into speaker turns, and contributor linkage (including linking speakers to political actors and labeling partisan affiliation where possible).Methodologically, the Rising Researcher will expand expert-coded training data and implement active-learning workflows to classify issue domains and rhetorical frames; measure rhetorical style (affect, incivility, outrage, hedging, modal assertiveness) using dictionaries and supervised classifiers; estimate diachronic semantic change using aligned time-sliced embeddings; and construct uncertainty-aware polarization indices using hierarchical Bayesian models. Validation will include event-study designs around major political shocks, difference-in-differences comparisons between network pairs, convergent validity checks, and robustness analyses across dictionaries, embedding families, and temporal smoothing windows. Compute-intensive stages will run on ICDS-supported high-performance computing using containerized workflows to ensure reproducibility.

X
Kifer, Daniel (#30225)
Accelerating metal defect discovery via “domain expert” AI-agents for Transmission Electron Microscopy (TEM) operation

Transmission Electron Microscopy (TEM), with its ability to directly image individual atoms, is an indispensable tool for defect analysis in nuclear reactor components, turbine alloys, solid-state batteries, and semiconductor devices. However, operating a TEM and interpreting its output requires years of specialized training for each application domain and hundreds of thousand of dollars in costs. Analyzing structural steel in a light water reactor to determine fracture susceptibility after radiation exposure requires a different skillset than analyzing whether a turbine’s blade can survive extreme operating conditions. An expert must rely on domain knowledge, intuition, and accumulated experience to manually adjust imaging parameters, identify regions of interest, evaluate image quality in real-time, and adapt their analysis based on the data they observe. A single characterization session can require hours of continuous expert attention, and comprehensive defect analysis often demands weeks or months of labor. In 2026, Penn State acquired the ability to control TEM instruments programmatically (imaging parameters, stage positioning, aperture selection, data acquisition) through APIs such as Thermo Fisher’s AutoScript. Recent work from Lawrence Berkeley National Laboratory demonstrates that LLMs can control TEM via text-based instructions [1]. However, even LLMs trained for materials sciences question-answering tasks struggle in laboratory settings and often deviate from their instructions [2]. The reason is that TEM expertise requires not only knowledge of domain-specific literature, but also direct experience gained from iterative feedback: observing images, adjusting parameters, evaluating results, and refining strategy (i.e., data-efficient domain-specific reinforcement learning). The goals of open science and data security further require that these skills must be added to open-source models.   This project’s goal is to design cooperative AI agents, using reinforcement learning, that will observe and learn from the workflow of senior researchers as they operate the TEM across diverse metallic samples, capturing how experts select imaging modes, target regions of interest, adjust parameters, and recover from errors. A high-level planner agent will interpret research goals and orchestrate strategy, while a low-level executor agent will send commands to the microscope based on real-time observations. Agents will ground their decisions in real instrument feedback: agents will compare predictions to observations and learn to verify and adjust rather than hallucinate. Agents will also use Retrieval-Augmented Generation (RAG) to consult instrument manuals, crystallography databases, and published methods at runtime, so that they may even support materials they have not seen before. This combination of learned expert intuition, on-demand knowledge retrieval, and direct instrument control will assist researchers more efficiently, accelerating the pace of materials discovery. [1] Wall, M.K., Pattison, A.J., Barnard, E.S., Ribet, S.M., & Ercius, P. (2025). TEM Agent: enhancing transmission electron microscopy (TEM) with modern AI tools. arXiv:2511.08819. [2] Krishnan, N.M.A. et al. (2025). Evaluating large language model agents for automation of atomic force microscopy. Nature Communications. DOI: 10.1038/s41467-025-64105-7  

XX
Papakonstantinou, Kostas (#30226)
AI-informed optimal control of large systems under uncertainty

The proposed research aims at studying and developing novel AI-informed frameworks for stochastic, optimal, multi-agent control, based on Foundation Models, Large Language Models (LLMs), Deep Reinforcement Learning (DRL), and Partially Observable Markov Decision Processes (POMDPs). Multi-agent distributed control, with one or multiple objectives and constraints, can also find several favorable application domains in general problems involving safety-critical systems. Such systems should have the capacity to operate under incomplete information and should be supported by advanced systems of decision-making able to integrate fast incoming data and adjust actions in real-time, aiming for optimality under safety considerations and other constraints.We are thus interested in investigating the following key areas in this project:(i) Whether and how LLMs can be inherently incorporated in DRL-POMDP frameworks, in order to address existing algorithmic, agents’ cooperation, and dimensionality challenges.(ii) Whether and how DL architectures can support efficient scaling up of stochastic control solutions to systems with thousands of agents and more.(iii) How can safety-critical systems be approached considering the rarity of failure data?Overall, this project aims to set foundations for advancing relevant algorithmic and methodological understanding, offer solutions to important, timely, and pressing issues encountered in numerous industries, and provide several future research and funding opportunities.

XX
Vasco-Correa, Juliana (#30227)
Developing a digital twin of anaerobic digestion systems by integrating multi-omics-derived functional indicators into an ADM1-based mechanistic model

Integrating multi-omics data into computational digital twins is a central challenge in modeling microbial community–driven engineered biological systems. This project uses anaerobic digestion (AD) as a representative system to develop a mechanistic digital twin framework that incorporates multi-omics-derived functional indicators into process-scale models, enabling biological information to inform system-level simulations in an interpretable manner. Using existing experimental datasets that combine reactor performance data with metagenomic and metatranscriptomic sequencing, the project will extract pathway-level microbial functional indicators and integrate them into an Anaerobic Digestion Model No. 1 (ADM1)–based modeling framework. By linking multi-omics data with mechanistic modeling, the proposed digital twin will improve ADM1’s ability to represent functional shifts across different operational regimes. More broadly, this work will establish a scalable and generalizable framework for integrating multi-omics data into digital twins of engineered microbial systems, advancing data-driven modeling capabilities for AD and related microbial consortia–based technologies.

XX
Brick, Timothy Raymond (#30228)
Private Social Environment Analysis via AI-Glasses

The goal of this project is to pilot and test a privacy-preserving analysis pipeline capable of processing egocentric audio and video from face-mounted “AI Glasses” to deliver insights and scientific data.Social environment is one of the primary factors in determining success and failure in a number of important life challenges.  For example, a person recovering from opioid use disorder may be at much higher risk in the company of friends who are still using than when surrounded by others in or supportive of recovery. Similar patterns have been shown in recovery from other major life events, such as spinal cord injury or cancer. In team interactions, social environment is tightly related to psychological safety and other metrics of team functioning–for medical care teams, such effectiveness may be life-altering.  However, data about a person’s social interactions is extremely difficult to collect and process because these interactions are interwoven throughout everyday life.AI glasses and other devices for passive streaming of egocentric audio and video data provide one possible solution to this problem.  Such devices are becoming increasingly common and affordable, but suffer from a number of deeply concerning privacy challenges, as well as technological limitations.  As a first step while a more complete legal and ethical framework is developed, this project will use laboratory-collected data about in-laboratory simulated team meetings and social interactions collected via open-source audiovisual streaming glasses, and develop a privacy-preserving, AI-based pipeline for extracting insights.  Initial results will be used to apply for external funding to tackle the many challenges of taking the project to scale.

XX
Burghardt, Liana (#30229)
Accelerating the discovery of the genomic basis of plant-environment interactions using AI for image analysis, genome-trait associations, and time series analysis.

Burghardt Lab research focuses on beneficial plant-microbe interactions. In particular, we study legumes that are used as cash crops, forages, and cover crops. These plants have a beneficial relationship with rhizobial bacteria that provide Nitrogen (a limiting resource in many agricultural environments) in exchange for energy from photosynthesis. However, despite centuries of use, there remains a significant gap in the potential and realized benefits from these interactions. My research program fills this gap by measuring ecological and evolutionary processes affecting rhizobia and legume hosts, with the long-term goal of enhancing legume productivity and reducing inputs in agricultural systems. The purpose of this Rising Researcher project is to explore and connect in new ways diverse data types. These data include thousands of images of plant root symbiotic and anatomical traits; hundreds of long-read-sequenced strains of rhizobial bacteria; multi-year time series of whole-genome sequencing data for rhizobial bacteria living in plants in the field; greenhouse phenotyping of plant and bacterial symbiotic traits, and plant genome sequences.  More specifically, we plan to: 1) Use AI-driven image segmentation and computer vision to quantify traits from plant-environment interactions, focusing on key plant-microbe mutualisms and stress tolerance, 2) Analyze the genetic basis of symbiotic traits in plant and microbial genomes by comparing traditional genome-wide association studies and pangenome analysis with advanced machine-learning methods, and 3) Develop methods to infer environmental factors influencing microbial evolution and microbiome variation linked to roots, measured seasonally over several years. The short-term goal is to publish novel insights into the genomic basis of plant-microbe interactions and stress responses by analyzing and combining these data types in novel ways. In the longer term, the goal is to generate preliminary data and methodologies that advance high-throughput below-ground plant and microbial phenotyping to increase agricultural sustainability, given increased environmental stressors.

XX
Bilen, Sven G (#30230)
A Data-Driven Machine Learning and Signal Analytics Framework for Subsurface Defect Mapping Using Wideband Radar Sensing

Wideband pulsed electromagnetic radar sensing is widely used to interrogate complex and heterogeneous materials in applications where direct access is limited and internal features must be inferred indirectly from measured waveforms. Typical targets include internal discontinuities, voids, material interfaces, and embedded objects whose signatures are often weak, overlapping, and distorted by propagation effects. The resulting time-domain signals are high dimensional and difficult to interpret because the observed response reflects a superposition of antenna coupling, dispersive propagation, attenuation, scattering, and multipath interference. Additional variability introduced by material heterogeneity, environmental conditions, and sensor configuration further complicates reliable defect detection and characterization. This project will develop a scalable computational framework that integrates physics-based electromagnetic simulation, signal processing, and machine learning to enable automated detection and mapping of subsurface defects from wideband radar waveforms. High-fidelity simulations will be used as a controllable data generator to produce diverse synthetic datasets spanning realistic variability in material properties, defect geometries, and sensing configurations. Signal processing pipelines will transform raw waveforms into compact and physically meaningful representations that enhance defect sensitivity while suppressing nuisance variability. Machine learning models will leverage these representations to estimate defect presence, depth, spatial extent, and associated uncertainty, with emphasis on robustness under domain shift between simulated and experimental data. Validation will be performed using laboratory radar mapping datasets to quantify detection accuracy, localization performance, uncertainty calibration, and scalability for large-area inspection workflows. Computational pipelines will be implemented using reproducible, high-throughput workflows suitable for execution on shared high-performance computing resources. Resulting datasets, software tools, and benchmarking workflows will be curated to promote reuse and knowledge transfer within the ICDS community.

X
Zhong, Baxi (#30231)
From Molting to Motion: A Data-Driven Study of the Physics of Locomotion Development in Centipedes

Centipedes exhibit strikingly different locomotor strategies depending on their developmental mode, yet how development shapes the physics of locomotion remains poorly understood. Anamorphic centipedes, which gradually acquire legs across molts, employ leg-driven gaits that are robust to changing morphology, whereas epimorphic centipedes, which hatch with a fixed number of legs, use body-undulation–driven gaits that are highly effective but fragile to leg loss. This project aims to quantify how these contrasting locomotor strategies emerge and persist across development by developing data-driven methods to extract detailed kinematics from challenging high-speed videos of centipedes across species, sizes, and life stages. The work will address key challenges in tracking animals with variable limb number, frequent occlusions, and small, fast-moving bodies by integrating computer vision, data analysis, and experimental design. The resulting kinematic datasets will enable rigorous tests of how developmental constraints bias locomotor control toward robustness or optimization, advancing fundamental understanding of locomotion development while producing reusable, open-source tools and datasets of broad relevance to biomechanics and robotics.

XXXX
Brick, Timothy Raymond (#30232)
End-to-end AI-assisted Study and Analysis Deployment for Behavioral Science Research

Ecological Momentary Assessment research in the behavioral sciences leverages the use of smartphone-based surveys alongside passive smartphone measures, wearables, and internet-of-things devices to collect data about human processes while those processes unfold in everyday life.  However, the design and programming of these studies and the analysis of their results can be complex and challenging for average behavioral science researchers, who often do not have training in programming or technology.The eventual goal of this project is to develop an agentic AI system capable of working interactively with scientists to  design, preregister, deploy, and analyze the data resulting from studies in the EMA paradigm. This Rising Researcher proposal will develop a pilot AI capable of demonstrating the feasibility of a larger-scale project.

XXX
Ogunmodimu, Olumide (#30233)
Physics-Informed Graph Neural Networks for Modeling Bubble-Particle Interactions in Coarse Particle Flotation

Coarse particle flotation remains a key challenge in mineral processing due to limited bubble-particle attachment efficiency under turbulent flow conditions. Technologies like the HydroFloat™ separator have improved coarse particle recovery by creating a fluidized environment that promotes interaction, yet a fundamental understanding of the micro-scale dynamics governing attachment and detachment remains incomplete. Traditional modeling tools, such as CFD-DEM simulations, provide detailed insights but are computationally expensive and impractical for real-time or system-scale applications. This project proposes a Physics-Informed Graph Neural Network (PI-GNN) framework to model and predict bubble-particle attachment behavior in coarse particle flotation. Inspired by recent advances in physics-informed machine learning for particulate systems, the model will represent particles and bubbles as nodes in a graph, with edges capturing local interaction features such as collision dynamics, hydrodynamic forces, and surface chemistry. Physical constraints, such as non-negative contact durations, friction limits, and bounds on attachment probability, will be embedded in the network architecture to ensure realistic, interpretable outputs. Training will be based on a hybrid dataset of high-fidelity CFD-DEM simulations and experimental data from HydroFloat tests. The resulting model will provide a scalable, data-driven surrogate for traditional simulations, enabling real-time predictions of attachment efficiency and supporting optimization of flotation performance. Ultimately, this research aims to advance the integration of machine learning and physical modeling in mineral processing, providing actionable insights for improving coarse particle recovery and process sustainability.

XX
Xue, Lingzhou (#30234)
Causal Inference for Non-Euclidean Data Objects

Traditional causal discovery methods fundamentally rely on vector-space assumptions (e.g., additivity of noise, linear residuals) that are violated by modern scientific data structures, including microbiome compositions (simplex), brain connectomes (SPD manifolds), and phylogenetic trees. Applying Euclidean methods to these curved spaces yields spurious causal discoveries. This proposal seeks to establish a causal learning framework for non-Euclidean data objects. We will (1) generalize causal models using Fréchet expectations and geodesic perturbations; (2) develop statistical tests based on metric-preserving kernels; and (3) validate these methods on microbiome data, establishing a pipeline for non-Euclidean causal discovery.

XX
Xue, Lingzhou (#30235)
Novel Algorithms and Applications for Online Reinforcement Learning

This proposed project aims to advance the state-of-the-art online reinforcement learning (RL) by developing novel algorithms that address critical challenges in scalability, privacy, and communication efficiency under both single-agent and multi-agent environments. Unlike traditional static RL methods, online RL enables agents to continuously learn a global policy and to adapt in real-time. Moreover, we will explore the application of these algorithms to data assimilation for optimizing the prediction of high-impact weather events, leveraging high-performance computing to solve real-world challenges. This proposal aligns with the ICDS mission to foster interdisciplinary innovation in Artificial Intelligence and Computational Science.

XX
Housego, Rachel (#30236)
Modeling Groundwater Levels and Saltwater Intrusion in the Heretaunga Aquifer to Investigate Compound Hazards

Coastal regions worldwide face increasing risk from compound natural hazards driven by the interaction of groundwater processes, climate stressors, and seismic activity. The Hawke’s Bay region of Aotearoa New Zealand’s North Island is particularly vulnerable due to its proximity to the Hikurangi Subduction Zone and the presence of a shallow, highly permeable coastal aquifer that is sensitive to saltwater intrusion. Elevated groundwater levels and pore pressures can intensify earthquake impacts by promoting sediment liquefaction and lateral spreading, while saltwater intrusion threatens freshwater resources under conditions of pumping, drought, sea-level rise, and tectonic subsidence. This project will develop a three-dimensional numerical groundwater flow and transport model in FEFLOW to simulate groundwater levels and the freshwater–saltwater interface in the Heretaunga aquifer. Using existing hydrogeologic and geospatial datasets, the model will be calibrated to observed groundwater levels and applied in transient scenarios that incorporate sea-level rise at the coastal boundary. The resulting field-informed, calibrated model will provide a quantitative framework for investigating how groundwater conditions contribute to compound earthquake and coastal hazards. Project outcomes include a transferable modeling workflow, enhanced understanding of groundwater–hazard interactions, and results that will form the basis of a peer-reviewed manuscript co-authored by the Rising Researcher.

XX
Sorokina, Nonna (#30237)
Real Options of Advanced Nuclear Reactors: Investor Perspective

Nuclear energy is essential for advancing energy independence, economic competitiveness, and long-term sustainability, yet private capital investment remains constrained by persistent perceptions of regulatory rigidity, technological irreversibility, and societal resistance. These barriers are especially binding for new nuclear technologies, despite major advances in Small Modular Reactors (SMRs) and microreactors that offer scalable, modular, and potentially lower-risk deployment pathways. This project proposes a dual financial innovation to unlock private investment in nuclear energy: the Nuclear Energy Acceptance Rating (NEAR), a sentiment-based index grounded in the Sociotechnical Readiness Level framework, and a real-options investment framework specifically designed to capture the flexibility inherent in modular reactor technologies.NEAR translates public, policy, and media sentiment toward nuclear energy into a finance-ready signal that can be embedded directly into portfolio optimization and hedging strategies. Building on prior work demonstrating that sentiment is both measurable and financially actionable, the project will construct NEAR using large-scale natural language processing and machine learning techniques applied to diverse textual data sources. This dynamic index will then be integrated into rolling portfolio optimization models with options-based hedging, allowing investment strategies to adjust in real time to evolving societal attitudes and regulatory environments. In parallel, the project will develop real-options models tailored to SMRs and microreactors, explicitly valuing staged deployment, modular expansion, and abandonment options that reduce capital irreversibility and enhance strategic flexibility relative to traditional large-scale nuclear investments.The central innovation of the project lies in integrating societal readiness directly into financial decision-making while simultaneously accounting for the intrinsic option value embedded in modular nuclear technologies. By linking NEAR to real-options investment strategies, the framework enables portfolios not only to hedge against sentiment-driven shocks but also to exploit the expandability and scalability of next-generation reactor designs. Preliminary evidence from prior work indicates that sentiment-aware, option-hedged portfolios outperform equity-only strategies, including during major crisis periods such as Fukushima. Extending this approach to real-options valuation of SMRs and microreactors is expected to further improve risk-adjusted performance and reduce downside exposure, directly addressing key deterrents to private capital participation.The project will integrate large-scale sentiment analytics, financial market data, and energy-sector investment parameters to test the performance of NEAR-guided real-options portfolios against conventional investment benchmarks. Results will be disseminated through interdisciplinary conferences and peer-reviewed journals in finance, energy economics, and sustainability policy. In parallel, the project will support workforce development and interdisciplinary training for undergraduate, graduate, and postdoctoral researchers working at the intersection of finance, data science, and energy systems. By embedding research outputs into educational programs and engaging industry and regional partners, the project advances Penn State’s land-grant mission while building practical pathways for nuclear deployment and investment. Ultimately, this research establishes a scalable, adaptive financial architecture capable of mobilizing private capital at the pace and scale required for nuclear energy to serve as a cornerstone of a secure, low-carbon energy future.

XX
Blanco, Carlos (#30241)
Lighting the Dark Cosmos: Bayesian Inverse Design of New Materials for Next-Gen Dark Matter Detectors.

This proposal seeks to transform the search of sub-GeV Dark Matter (DM) by enabling the design of a new class of anisotropic molecular crystals. Current detectors rely on isotropic targets, which lack the directional sensitivity required to identify the daily modulation signal characteristic of the DM wind. Identifying optimal anisotropic targets, however, presents a formidable challenge: navigating a chemical space of $10^{60}$ molecules to maximize complex quantum properties such as scintillation efficiency and directional scattering form factors. We propose a novel Inverse Design framework that integrates Graph Transformers for robust molecular representation learning with a decoupled Bayesian Optimization strategy. Validation will be powered by our GPU-accelerated SCarFFF code, enabling rapid ab initio assessment of candidates. This interdisciplinary effort will bridge particle physics and AI, delivering a prioritized list of materials for the next generation of directional DM detectors. 

XXX
Mejia, Alfonso Ignacio (#30242)
A Flow-Based Deep Learning Framework for Efficient Probabilistic Time Series Forecasting: Real-Time Ensemble Flood Prediction

Probabilistic time series forecasting is essential for decision-making under uncertainty, particularly in high-impact domains such as flood prediction, where understanding the full range of possible outcomes can directly save lives and reduce economic losses. Despite the clear need for uncertainty-aware forecasts, most operational flood forecasting systems remain deterministic due to the prohibitive computational cost of generating large, well-calibrated ensembles. Recent deep generative models, especially diffusion-based approaches, have demonstrated strong probabilistic forecasting skill but require hundreds of sequential inference steps per ensemble member, rendering them impractical for real-time operational deployment. This project proposes a flow-based deep learning framework for efficient probabilistic time-series forecasting, with real-time ensemble flood prediction as a primary application. Specifically, the research will develop and adapt flow-matching generative models as a computationally efficient alternative to diffusion models. The framework will consist of three core components: (1) an encoder network that ingests historical observations, meteorological forcings, and watershed characteristics to produce a latent context representation; (2) a velocity network that learns how to transport samples from a base distribution to the target forecast distribution conditioned on this context; and (3) an efficient ordinary differential equation solver for rapid ensemble generation. Methodological innovations will address challenges unique to time series forecasting, including preserving temporal structure, designing effective context representations, improving uncertainty calibration through stochastic augmentation, and benchmarking probability path designs informed by hydrological priors. The approach will be evaluated through application to ensemble streamflow forecasting in flash-flood-prone Pennsylvania watersheds, using historical and forecast meteorological data. Forecast skill, uncertainty calibration, and computational efficiency will be compared against existing benchmarks, including NOAA operational forecasts and USGS streamflow observations. Operational viability will be assessed through profiling inference latency, memory usage, and energy consumption, as well as performance on extreme and out-of-distribution events. While flood forecasting is a compelling use case, the proposed framework is domain-agnostic and broadly applicable to probabilistic forecasting problems across energy systems, natural hazards, public health, finance, and transportation. The project aligns with ICDS’s mission to advance AI-driven, computationally efficient data science and will deliver open-source software, benchmark datasets, peer-reviewed publications, and a scalable methodology ready for real-world deployment.

XXX
Sorokina, Nonna (#30243)
Economic Impact of Opioid Epidemic on Rural Pennsylvania

The opioid epidemic, once viewed primarily as an urban crisis, now imposes a disproportionate and deeply entrenched burden on rural Pennsylvania, where overdose death rates per capita exceed those in urban areas and communities face compounding medical, economic, and workforce challenges. Although recent data suggest a decline in overdose mortality, it remains unclear whether this improvement reflects a true reduction in opioid misuse or primarily the effects of expanded harm-reduction measures such as naloxone distribution. Understanding whether addiction prevalence, economic disruption, and social costs are also declining is essential for ensuring that public interventions remain effective and responsive to evolving community needs.This project conducts a comprehensive county-level assessment of the economic and social costs of opioid misuse across rural Pennsylvania from 2006 to 2023, integrating mortality and nonfatal overdose incidents, prescription rates, treatment admissions, healthcare utilization, labor market outcomes, and law-enforcement expenditures. By linking opioid outcomes with economic, demographic, and healthcare access indicators, the study identifies the factors most strongly associated with sustained addiction and mortality at the community level. The project further evaluates how state-administered interventions, including medication-assisted treatment programs, naloxone distribution, and community outreach funded by the Pennsylvania Department of Drug and Alcohol Programs and the Department of Health, influence both health outcomes and economic recovery over time.Methodologically, the study leverages advanced computational modeling, geospatial analysis, and AI-enabled data integration to combine administrative, public health, and socioeconomic datasets that are not typically analyzed jointly. Multivariate and structural models are used to estimate causal relationships between intervention intensity and outcomes, while forecasting models identify counties at elevated risk of future opioid-related harm and rising economic costs. Cost-effectiveness metrics translate public spending into measurable improvements in health, employment, and treatment engagement, enabling direct comparison across programs and regions.A central goal of the project is to inform evidence-based allocation of opioid settlement funds administered at the state level. Rather than relying solely on mortality statistics, the study evaluates broader indicators of addiction burden and economic distress to determine where resources can generate the greatest combined health and economic returns. The results provide actionable guidance for state agencies and the General Assembly on how to prioritize interventions that reduce long-term dependence, strengthen community recovery, and mitigate labor-market and healthcare system strain. By shifting the focus from short-term mortality reduction to sustained community resilience, this project supports more strategic, data-driven opioid policy and improves the effectiveness of public investments in rural health and economic stability.

XX
Gulis, Cindy (#30245)
Building a New Cosmological Paradigm with High-Performance Computing and Data Analysis

Recent observations, including those from the James Webb Space Telescope (JWST), have revealed persistent tensions with the standard ΛCDM cosmological model, such as evidence for evolving dark energy and the presence of massive galaxies and supermassive black holes at cosmic dawn. To address these challenges, Prof. Gulis has developed Harmonic Axion Cosmology (HAC), a unified theoretical framework in which axion fields oscillating at different mass scales and cosmic epochs naturally give rise to evolving dark energy, ultralight dark matter produced via the misalignment mechanism, and primordial black holes formed from inflation-era overdensities.This project aims to perform the first pioneering cosmological simulations of HAC by implementing its physical components into initial-condition generators and hydrodynamic simulation codes. Individual HAC components will be tested using controlled small-volume simulations, followed by large-volume simulations to track the formation and evolution of galaxies and supermassive black holes across cosmic time. The resulting simulations will be post-processed with comprehensive three-dimensional radiative transfer calculations to derive multi-wavelength galaxy properties for direct comparison with JWST and other observational datasets.By integrating fundamental theory, high-performance computing, numerical simulation, and observational analysis, this project will provide a rigorous computational test of a new cosmological paradigm. It will offer ICDS Rising Researchers the opportunity to contribute to theory-based, computation-intensive, and data-driven investigations at the frontier of modern cosmology.

XXX