I’ve got a dataset with 5 columns: time, indicator 1, indicator 2, indicator 3, and result. The result shows True or False, based on certain conditions between the indicators over time.
For instance, one condition that leads to a True result is: if indicator 1 at time t-2 is higher than indicator 1 at time t, and indicator 2 at time t-5 is more than double indicator 2 at time t, then the result is True. Other conditions give a False result.
I’m working on training a machine learning model on this data, but I’m unsure if I should include these conditions explicitly as features during the learning process, or if the model will naturally learn these relationships on its own.
What kind of model would work best for this problem, and should I manually add these conditions, or let the model figure them out?
>I’m working on training a machine learning model on this data, but I’m unsure if I should include these conditions explicitly as features during the learning process, or if the model will naturally learn these relationships on its own
Most models can pick up on these patterns themselves, but adding features explicitly can help improve the model’s performance. It allows the model to focus on predictions instead of figuring out these patterns. So, yeah, I’d recommend including those conditions as features if you have the time and resources. (Just be sure to dummy code them properly with 1s and 0s.)
>What kind of model would work best for this problem?
Decision trees or random forests should do really well if your features are solid. Personally, I tend to experiment with neural networks a lot, but that’s just my style.
@Vinnie
Thanks a lot! I have a follow-up: what if the result isn’t just True/False but has levels like 0, 1, 2, and 3? Would this change the approach? Would models like decision trees or random forests still be effective for this multiclass setup (since the levels increase sequentially), or should I look at it as a regression problem?
@Tully
>what if the result isn’t just True/False but has levels like 0, 1, 2, and 3
It depends on what those “levels” mean. If 0 is actually less than 1, which is less than 2, and so on (meaning there’s a real order to them), then you’d treat it like a regression problem. But if the levels are just labels (like you could swap them with “A,” “B,” “C,” etc.), then stick with classification. Decision trees and random forests can handle both setups without much trouble. If you share more specifics, I could suggest what to do, but no pressure if you’d rather not. Let me know if you have more questions!