报告题目:RANK: large-scale inference with graphical nonlinear knockoffs
主讲人: 李高荣教授 北京师范大学
时间:2019年6月14日(星期五)晚上8:30—9:30
地点:学院会议室(90510)
主讲人简介:李高荣,北京师范大学教授,博士生导师。目前是全国工业统计学教学研究会常务理事、中国现场统计研究会高维数据统计分会理事、生存分析分会理事和副秘书长、北京应用统计学会常务理事和美国数学评论(MathematicalReviews)评论员。研究领域涉及统计学习、深度学习、因果推断、非参数统计、复杂高维数据分析、模型和变量选择、测量误差模型、纵向数据和面板数据分析、以及经验似然推断等。近年多次访问香港浸会大学、新加坡南洋理工大学和香港城市大学,目前在国内外学术刊物“The Annals of Statistics”、“Statistics and Computing”、“Statistica Sinica”、“Journal of Multivariate Analysis”、“Computational Statistics and Data Analysis”、 “中国科学:数学”等上发表学术论文80多篇,在科学出版社出版专著《纵向数据半参数模型》和《现代测量误差模型》。主持和参加国家自然科学基金、北京市自然科学基金、高等学校博士学科点专项科研基金等多项国家和省部级科研项目。2010年入选北京市属高等学校人才强教深化计划“中青年骨干人才培养计划”和北京市优秀人才培养资助计划,2012年破格入选北京工业大学“京华人才”支持计划。目前主持国家自然科学基金和北京市自然科学基金等科研项目。
报告摘要:Power and reproducibility are key to enabling refined scientific discoveries in contemporary big data applications with general high-dimensional nonlinear models. In this paper, we provide theoretical foundations on the power and robustness for the model- X knockoffs procedure introduced recently in Cand`es, Fan, Janson and Lv (2018) in high-dimensional setting when the covariate distribution is characterized by Gaussian graphical model. We establish that under mild regularity conditions, the power of the oracle knockoffs procedure with known covariate distribution in high-dimensional linear models is asymptotically one as sample size goes to infinity. When moving away from the ideal case, we suggest the modified model-X knockoffs method called graphical nonlinear knockoffs (RANK) to accommodate the unknown covariate distribution. We provide theoretical justifications on the robustness of our modified procedure by showing that the false discovery rate (FDR) is asymptotically controlled at the target level and the power is asymptotically one with the estimated covariate distribution. To the best of our knowledge, this is the first formal theoretical result on the power for the knockoffs procedure. Simulation results demonstrate that compared to existing approaches, our method performs competitively in both FDR control and power. A real data set is analyzed to further assess the performance of the suggested knockoffs procedure.