Seminar:Off-policy Evaluation with Deeply-abstracted States

发布部门：行政办公室(A) 浏览次数：10

Seminar| Institute of Mathematical Sciences

Time: Friday, December 06th, 2024,14:30-15:30

Location: IMS, RS408

Speaker:Meiling Hao, University of International Business and Economics

Abstract:Off-policy evaluation (OPE) is crucial for assessing a target policy’s impact offline before its deployment. However, achieving accurate OPE in large state spaces remains challenging. This paper studies state abstractions – originally designed for policy learning – in the context of OPE. Our contributions are four-fold: (i) We define a set of irrelevance conditions central to learning state abstractions for OPE. (ii) We derive a backward-model-irrelevance condition for achieving irrelevance in (marginalized) importance sampling ratios by constructing a time-reversed Markov decision process (MDP) based on the standard MDP. (iii) We propose a novel iterative procedure that sequentially projects the original state space into a smaller space, resulting in a deeply-abstracted state, which substantially simplify the sample complexity of OPE arising from high cardinality. (iv) We prove the Fisher consistencies of various OPE estimators when applied to our proposed abstract state spaces.