Comment by tshaddox

2 months ago

> It would almost be more interesting to specifically train the model on half the available market data, then test it on another half.

Yes, ideally you’d have a model trained only on data up to some date, say January 1, 2010, and then start running the agents in a simulation where you give them each day’s new data (news, stock prices, etc.) one day at a time.

21 comments

tshaddox

hxtk 2 months ago

I suspect trading firms have already done this to the maximum extent that it's profitable to do so. I think if you were to integrate LLMs into a trading algorithm, you would need to incorporate more than just signals from the market itself. For example, I hazard a guess you could outperform a model that operates purely on market data with a model that also includes a vector embedding of a selection of key social and news media accounts or other information sources that have historically been difficult to encode until LLMs.

solotronics 2 months ago
The part people are missing here is that if the trading firms are all doing something, that in itself influences the market.
If they are all giving the LLMs money to invest and the AIs generally buy the same group of stocks, those stocks will go up. As more people attempt the strategy it infuses fresh capital and more importantly signaling to the trading firms there are inflows to these stocks. I think its probably a reflexive loop at this point.
- brendoelfrendo 2 months ago
  
  They could have the AI perform paper trading: give it a simulated account but real data. This would make sense to me if it was just a research project. That said, I imagine the more high-tech trading firms started running this research a long time ago and wouldn't be surprised if there were already LLM-based trading bots that could be influencing the market.
giantg2 2 months ago

"includes a vector embedding of a selection of key social and news media accounts or other information sources that have historically been difficult to encode until LLMs."
Not really. Sentiment analysis in social networks has been around for years. It's probably cheaper to by that analysis and feed it to LLMs than to have LLMs do it.

IgorPartola 2 months ago

I mean ultimately this is an exercise in frustration because if you do that you will have trained your model on market patterns that might not be in place anymore. For example after the 2008 recession regulations changed. So do market dynamics actually work the same in 2025 as in 2005? I honestly don’t know but intuitively I would say that it is possible that they do not.

I think a potentially better way would be to segment the market up to today but take half or 10% of all the stocks and make only those available to the LLM. Then run the test on the rest. This accounts for rules and external forces changing how markets operate over time. And you can do this over and over picking a different 10% market slice for training data each time.

But then your problem is that if you exclude let’s say Intel from your training data and AMD from your testing data then there ups and downs don’t really make sense since they are direct competitors. If you separate by market segment then does training the model on software tech companies might not actually tell you accurately how it would do for commodities or currency training. Or maybe I am wrong and trading is trading no matter what you are trading.

godelski 2 months ago
> I think a potentially better way would be to segment the market up to today but take half or 10% of all the stocks and make only those available to the LLM.
Autocorrelation is going to bite you in the ass.
Those stocks are going to be coupled. Let's take an easy example. Suppose you include Nvidia in the training data and hold out AMD for test. Is there information leakage? Yes. The problem is that each company isn't independent. You have information leakage in both the setting where companies grow together as well as zero sum games (since x + y = 0, if you know x then you know y). But in this example AMD tends with Nvidia. Maybe not as much, but they go in the same direction. They're coupled
Not to mention that in the specific setting the LLMs were given news and other information.
chris_st 2 months ago
> you will have trained your model on market patterns that might not be in place anymore
My working definition of technical analysis [0]
[0]: https://en.wikipedia.org/wiki/Technical_analysis
- IgorPartola 2 months ago
  
  It is always fun (in a broad sense of that word) when I make a comment on an industry I know nothing about and somehow stumble onto a thing that not only has a name but also research. I am sure there is a German word for that feel of discovering something that countless others have already discovered.
  
  7 replies →
- stouset 2 months ago
  
  I am frankly astonished at the number of otherwise-intelligent people who actually seem to believe in this stuff.
  One of the worst possible things to do in a competitive market is to trade by some publicly-available formulaic strategy. It’s like announcing your rock-paper-scissors move to your opponent in advance.
  
  3 replies →
noduerme 2 months ago

Just to name a different but related approach, as a hobby project I built a (non LLM) model that trained mainly on data from stocks that didn't move much over the past decade, seeking ways to beat the performance of those particular stocks. I put it into practice for a couple of years, and came out roughly even by constantly rebalancing a basket of stocks that, as a whole, dropped by about 20%. I considered that to be a success, although it would've been nicer to make money.
0manrho 2 months ago

> you will have trained your model on market patterns that might not be in place anymore
How is that relevant to what was proposed? If it's trading and training on 2010 data, what relevance does todays market dynamics and regulations have?
Which further begs the question, what's the point of this exercise?
Is it to develop a model than compete effectively in today's market? If so then yeah, the 2010 trading/training idea probably isn't the best idea for the reasons you've outlined.
Or is it to determine the capacity of an AI to learn and compete effectively within any given arbitrary market/era? If so, then today's dynamics/constraints are irrelevant unless you're explicitly trying to train/trade on todays markets (which isn't what the person you're replying to proposed, but is obviously a valid desire and test case to evaluate in it's own right)
Or is it evaluating its ability to identify what those constraints/limitations are and then build strategies based on it? In which case it doesn't matter when you're training/trading so much as your ability to feed it accurate and complete data for that time period be it today, or 15 years ago or whenever, which is no small ask.