报告题目:Optimal Subsampling for Data Streams with Categorical Responses
报告时间:2024.05.09 15:00-16:30
腾讯会议ID:398 674 421
会议链接:https://meeting.tencent.com/dm/9A0FGQRemlKd
Abstract:Timely analyzing categorical data which arrive quickly in large-scale chunks are in high demand, especially for the case that storage or access to the historical data is not always possible or desirable. This work introduces an efficient subsampling procedure for online data streams with multinomial logistic model to sequentially update the parameter estimator. The proposed online subsampling and estimating algorithm is computationally efficient, minimally storage-intensive, and allows for the scenario that the labels of data are expensive to measure and are not all provided initially. Some theoretical properties to quantify the asymptotic behavior of the proposed estimator are established. Optimal subsampling probabilities are given according to the A-optimality criterion. An adaptive subsampling algorithm is suggested for ease of practical implementation. The advantages of the proposed method are illustrated through numerical studies on both simulated and real data sets.