Developing a realistic generative model for order flow in financial markets is a complex problem with various potential applications for market participants. In this study, we propose an innovative autoregressive generative model that can produce tokenized limit order book (LOB) messages. These messages are then interpreted by a Jax-LOB simulator to update the LOB state. To efficiently handle long sequences, our model utilizes simplified structured state-space layers to process both order book states and tokenized messages. We create a customized tokenizer for the message data using LOBSTER data from NASDAQ equity LOBs, where groups of successive digits are converted into tokens, similar to how language models tokenize text. Our out-of-sample results demonstrate promising performance in approximating the data distribution, as indicated by low model perplexity. Additionally, the mid-price returns calculated from the generated order flow exhibit a significant correlation with the actual data, demonstrating impressive conditional forecast performance. Given the granularity and accuracy of the generated data, our model opens up new possibilities for future research beyond forecasting, such as serving as a world model in high-frequency financial reinforcement learning applications. We encourage the use and expansion of our model in the development of autoregressive large financial models for generating high-frequency financial data, and we are committed to open-sourcing our code to facilitate further research.
A Token-Level Autoregressive Generative Model of Message Flow Using Deep State Space Network: Exploring Generative AI for End-to-End Limit Order Book Modelling (arXiv:2309.00638v1 [q-fin.TR])
by instadatahelp | Sep 6, 2023 | AI Blogs