Do XGBoost trees use all features… or do they select some randomly?

I know that in the Random Forest algorithm, each tree is built using a random subset of features (a process known as feature bagging). Do XGBoosted trees work the same way? It seems like this would be really useful for the kind of data I’m working with.

Yes, there are parameters like colsample_bytree, colsample_bylevel, and colsample_bynode that control the fraction of features used when building each tree.

Yeah, but the downside is that it’s not easy to apply weights to feature selection in XGBoost.

Kelby said:
Yeah, but the downside is that it’s not easy to apply weights to feature selection in XGBoost.

Does this ever affect performance compared to a Random Forest? Like, could a Random Forest perform better because of this feature selection?

@Ozzy
I get what you’re asking. My point was that XGBoost doesn’t make it easy to prioritize certain features during selection. I don’t use Random Forests often, but I think they have a similar limitation.

From what I’ve seen, Random Forests don’t usually outperform XGBoost. Not saying it never happens, but in my own experience, XGBoost tends to do better.

@Kelby
I’m a bit unclear on your answer. Does XGBoost randomly select features for the trees, or is it mainly focused on the features that give the highest immediate information gain?

Ozzy said:
@Kelby
I’m a bit unclear on your answer. Does XGBoost randomly select features for the trees, or is it mainly focused on the features that give the highest immediate information gain?

Most implementations do allow for random feature selection, depending on the settings.

@Kelby
Alright, thanks for clarifying.