Seminar| Institute of Mathematical Sciences
Time: Friday, December 06th, 2024,14:30-15:30
Location: IMS, RS408
Speaker:Meiling Hao, University of International Business and Economics
Abstract:Off-policy evaluation (OPE) is crucial for assessing a target policy’s impact offline before its deployment. However, achieving accurate OPE in large state spaces remains challenging. This paper studies state abstractions – originally designed for policy learning – in the context of OPE. Our contributions are four-fold: (i) We define a set of irrelevance conditions central to learning state abstractions for OPE. (ii) We derive a backward-model-irrelevance condition for achieving irrelevance in (marginalized) importance sampling ratios by constructing a time-reversed Markov decision process (MDP) based on the standard MDP. (iii) We propose a novel iterative procedure that sequentially projects the original state space into a smaller space, resulting in a deeply-abstracted state, which substantially simplify the sample complexity of OPE arising from high cardinality. (iv) We prove the Fisher consistencies of various OPE estimators when applied to our proposed abstract state spaces.