Do XGBoost trees use all features… or do they select some randomly?

Ozzy · October 19, 2024, 1:58pm

I know that in the Random Forest algorithm, each tree is built using a random subset of features (a process known as feature bagging). Do XGBoosted trees work the same way? It seems like this would be really useful for the kind of data I’m working with.

Clare · October 19, 2024, 1:58pm

Yes, there are parameters like colsample_bytree, colsample_bylevel, and colsample_bynode that control the fraction of features used when building each tree.

Kelby · October 19, 2024, 1:58pm

Yeah, but the downside is that it’s not easy to apply weights to feature selection in XGBoost.

Ozzy · October 19, 2024, 1:58pm

Kelby said:
Yeah, but the downside is that it’s not easy to apply weights to feature selection in XGBoost.

Does this ever affect performance compared to a Random Forest? Like, could a Random Forest perform better because of this feature selection?

Kelby · October 19, 2024, 1:58pm

@Ozzy
I get what you’re asking. My point was that XGBoost doesn’t make it easy to prioritize certain features during selection. I don’t use Random Forests often, but I think they have a similar limitation.

From what I’ve seen, Random Forests don’t usually outperform XGBoost. Not saying it never happens, but in my own experience, XGBoost tends to do better.

Ozzy · October 19, 2024, 1:58pm

@Kelby
I’m a bit unclear on your answer. Does XGBoost randomly select features for the trees, or is it mainly focused on the features that give the highest immediate information gain?

Kelby · October 19, 2024, 1:58pm

Ozzy said:
@Kelby
I’m a bit unclear on your answer. Does XGBoost randomly select features for the trees, or is it mainly focused on the features that give the highest immediate information gain?

Most implementations do allow for random feature selection, depending on the settings.

Ozzy · October 19, 2024, 1:58pm

@Kelby
Alright, thanks for clarifying.