数学科学研究所
Insitute of Mathematical Science

Seminar:Off-policy Evaluation with Deeply-abstracted States

Seminar| Institute of Mathematical Sciences

Time: Friday, December 06th, 2024,14:30-15:30

Location: IMS, RS408

Speaker:Meiling Hao, University of International Business and Economics

 

Abstract:Off-policy evaluation (OPE) is crucial for assessing a target policy’s impact offline before its deployment. However, achieving accurate OPE in large state spaces remains challenging. This paper studies state abstractions – originally designed for policy learning – in the context of OPE. Our contributions are four-fold: (i) We define a set of irrelevance conditions central to learning state abstractions for OPE. (ii) We derive a backward-model-irrelevance condition for achieving irrelevance in (marginalized) importance sampling ratios by constructing a time-reversed Markov decision process (MDP) based on the standard MDP. (iii) We propose a novel iterative procedure that sequentially projects the original state space into a smaller space, resulting in a deeply-abstracted state, which substantially simplify the sample complexity of OPE arising from high cardinality. (iv) We prove the Fisher consistencies of various OPE estimators when applied to our proposed abstract state spaces.


地址:上海市浦东新区华夏中路393号
邮编:201210
上海市徐汇区岳阳路319号8号楼
200031(岳阳路校区)