I’ve been researching in FL for a while now and have looked into a few subfields. If I want to compare new methods to old ones for a project, it takes a very long time to get the old methods to work on common datasets like cifar10 that weren’t used in the original studies. I’m using a ready-made testing tool called fl-bench right now, but I’m still having trouble getting fedavg to agree on datasets that are even slightly not i.i.d. on cifar10. I think this makes working in the field really difficult. Did you go through the same things I did, or did I miss something really important all this time?
One problem I’ve seen in a related area is that judges at top journals don’t really push people to depend less on hyperparameters. I would even say it is implicitly discouraged. As a rule, more hyperparameters are present in more complicated methods. Stronger methods, on the other hand, tend to be easier to use and aren’t as valued by most reviews, especially new ones.
In general, most deep learning experiments are badly planned and full of secret tricks. This is because no one cares about knowing how the experiments are set up; they only want to see how the methods rank and think that the specific standard doesn’t matter. In FL and other complicated settings, this is almost never true. In these settings, small changes to the trial setting can have big effects on performance and the best hyperparameters.
If you want to get started quickly, I think the best thing to do is to find a library that already has all the best practices and new tricks built in. It will take months of work if you have to do it yourself.
i hate that simpler method is not valued.
I don’t know much about collaborative learning, but hyperparameters play a role in all deep learning methods to some extent. Some are more fair than others, but most media don’t talk about this.
In machine learning, you can only see some effects if you set the hyperparameters just right. A statistical point of view says that you can’t train imagenet models to be very accurate without first doing a hyper parameter search.
I definitely don’t think that each way will work with each set of hyperparameters. But what I don’t understand is why there isn’t a single set of best practices once you get to FL, and why the results are so different depending on how you do it. Also, in my experience, the method doesn’t just work a little worse if you don’t have the exact right hyperparameters for FL. It doesn’t work at all, which makes hyperparameters search very hard. This may sound like a rant, and in a way it is, but I think what I really want to know is if this happens all the time or if I am missing a paper, blog post, or other thing that lists the best ways to do things.
The more moving parts your “global” algorithm has, the more likely it is that the hyperparameters will not be set correctly. This is similar to RL. In the same way, even small changes to how the same method is implemented can have a big effect on the results.
I feel like you’re not the only one. Most of the work I did was with numerical datasets, and most methods (GAN models, Trees ensembles, VAEs, etc.) are very sensitive to the hyperparameters you choose. It got so bad that I usually couldn’t even get the same results as the writers. Very annoying