FedADT: An Adaptive Method Based on Derivative Term for Federated Learning

Huimin Gao; Qingtao Wu; Xuhui Zhao; Junlong Zhu; Mingchuan Zhang

doi:10.3390/s23136034

FedADT: An Adaptive Method Based on Derivative Term for Federated Learning

Sensors (Basel). 2023 Jun 29;23(13):6034. doi: 10.3390/s23136034.

Authors

Huimin Gao¹, Qingtao Wu^{1

2}, Xuhui Zhao¹, Junlong Zhu¹, Mingchuan Zhang¹

Affiliations

¹ School of Information Engineering, Henan University of Science and Technology, Luoyang 471023, China.
² Intelligent System Science and Technology Innovation Center, Longmen Laboratory, Luoyang 471023, China.

Abstract

Federated learning is served as a novel distributed training framework that enables multiple clients of the internet of things to collaboratively train a global model while the data remains local. However, the implement of federated learning faces many problems in practice, such as the large number of training for convergence due to the size of model and the lack of adaptivity by the stochastic gradient-based update at the client side. Meanwhile, it is sensitive to noise during the optimization process that can affect the performance of the final model. For these reasons, we propose Federated Adaptive learning based on Derivative Term, called FedADT in this paper, which incorporates adaptive step size and difference of gradient in the update of local model. To further reduce the influence of noise on the derivative term that is estimated by difference of gradient, we use moving average decay on the derivative term. Moreover, we analyze the convergence performance of the proposed algorithm for non-convex objective function, i.e., the convergence rate of 1/nT can be achieved by choosing appropriate hyper-parameters, where n is the number of clients and T is the number of iterations, respectively. Finally, various experiments for the image classification task are conducted by training widely used convolutional neural network on MNIST and Fashion MNIST datasets to verify the effectiveness of FedADT. In addition, the receiver operating characteristic curve is used to display the result of the proposed algorithm by predicting the categories of clothing on the Fashion MNIST dataset.

Keywords: derivative; distributed training; federated learning; stochastic gradient.

MeSH terms

Algorithms*
Humans
Internet
Learning*
Neural Networks, Computer
ROC Curve

Grants and funding

61971458/National Natural Science Foundation of China