Multilayered Large Language Models Strategies for Generating Time Series Simulation Data

Jon Chun, Kenyon College – Co-Founder, AI for the Humanities Curriculum and AI Digital Collaboratory

Abstract:

In the rapidly changing AI landscape, Large Language Models (LLMs) like OpenAI's GPT4 initially may appear tangential to simulation technologies. However, with a closer look, the potential of LLMs becomes clear, presenting exciting opportunities for NAFEMS members. This presentation will deliver a hands-on exploration of leveraging OpenAI's GPT4 and associated LLM frameworks to generate synthetic and enrich existing time series datasets, all within the context of physical simulations for anomaly detection—a key area of interest in the manufacturing industry.

We'll use popular open training datasets used to train predictive maintenance models as both a ground truth reference and a source for augmentation. This will include both normal and abnormal time series for vibration, temperature, pressure and current/voltage measurements. We'll use the ground truth reference time series datasets to evaluate both normal and abnormal time series generated and augmented using the best LLM strategy identified.Our performance metrics will be based on two statistical profiles tailored to two types of time series abnormalities: (a) a global regime/distribution shift type like those found in asset price trading bands, and (b) more localized feature anomalies that often predict impending failures.

We review an incremental progression of GPT4 utilization, illuminating its potential while addressing the inherent limitations. We'll begin with basic GPT models, explore various prompt engineering strategies, then delve into Python Code Interpreter extension and OpenAI tool use via Langchain. We'll explicitly address issues around hallucination, stale training data and innumeracy of LLM. The culmination will dive into OpenAI's 0613 model updates, which introduce a new API function object specification that dramatically enhances the reliability of programmatically interfacing with GPT3.5 and GPT4 models.

The presentation will wrap with a comparative analysis of LLM data synthesis and augmentation techniques against traditional approaches, including both open source and commercial offerings like Gretel.ai. Listeners will be equipped with a practical and up to date understanding of the latest state-of-the-art GPT4 LLM and how to better utilize such generative LLM AI models for their generating or augmenting data for simulation or fine-tuning other AI models. We will close with an update on the recent and anticipated AI advancements, enabling you to better align future LLM applications to particular physical simulations.

Kenyon Abstract

Multilayered Large Language Models Strategies for Generating Time Series Simulation Data

Jon Chun, Kenyon College – Co-Founder, AI for the Humanities Curriculum and AI Digital Collaboratory

Abstract:

NAFEMS Membership

ISO accredited