Deep Learning for High-Frequency Data

Order Model pipeline: tick data → tokenizer → causal Transformer → order decoder (MarS, ICLR 2025)

High-frequency financial data — order books, trade events, tick-by-tick prices — poses unique modeling challenges: extreme noise, irregular sampling, heavy-tailed return distributions, and volatility clustering at multiple timescales. Deep learning, and in particular large Transformer-based architectures, has emerged as a powerful approach to capturing the complex temporal dynamics of financial markets at this granularity.

A particularly exciting recent direction is the development of generative foundation models for market microstructure. Models such as MarS (ICLR 2025) treat market dynamics as a sequence generation problem: given a stream of orders and limit order book (LOB) snapshots, a causal Transformer learns to autoregressively generate the next order, capturing the joint distribution of order type, price, volume, and timing. Trained on billions of trade events across thousands of equities, these models reproduce key stylized facts — heavy-tailed returns, volatility clustering, and absence of linear autocorrelation — without hard-coding them explicitly.

Applications include realistic synthetic market data generation for backtesting, stress-testing strategies under rare market conditions, training reinforcement learning agents in simulated environments, and studying market impact and microstructure dynamics.