This page is dedicated to interesting papers in mean field games, Monte Carlo methods, and some things on Markov Decision Processes (MDP). Bibliography Tool

- G. Bianconi. Statistical physics of exchangeable sparse simple networks, multiplex networks and simplicial complexes. Phys. Rev. E 105, 034310 , 2022. Abstract.
Exchangeability is a desired statistical property of network ensembles requiring their invariance upon relabelling of the nodes. However combining sparsity of network ensembles with exchangeability is challenging. Here we propose a statistical physics framework and a Metropolis-Hastings algorithm defining exchangeable sparse network ensembles. The model generates networks with heterogeneous degree distributions by enforcing only global constraints while existing (non exchangeable) exponential random graphs enforce an extensive number of local constraints. This very general theoretical framework to describe exchangeable networks is here first formulated for uncorrelated simple networks and then it is extended to treat simple networks with degree correlations, directed networks, bipartite networks and generalized network structures including multiplex networks and simplicial complexes. In particular here we formulate and treat both uncorrelated and correlated exchangeable ensembles of simplicial complexes using statistical mechanics approaches.
- R. Carmona, M. Laurière, Z. Tan. Model-Free Mean-Field Reinforcement Learning: Mean-Field MDP and Mean-Field Q-Learning. Ann. Appl. Probab. 33 (6B) 5334 - 5381, 2023. Abstract.
We study infinite horizon discounted mean field control (MFC) problems with common noise through the lens of mean field Markov decision processes (MFMDP). We allow the agents to use actions that are randomized not only at the individual level but also at the level of the population. This common randomization is introduced for the purpose of exploration from a reinforcement learning (RL) paradigm. It also allows us to establish connections between both closed-loop and open-loop policies for MFC and Markov policies for the MFMDP. In particular, we show that there exists an optimal closed-loop policy for the original MFC and we prove dynamic programming principles for the state and state-action value functions. Building on this framework and the notion of state-action value function, we then propose RL methods for such problems, by adapting existing tabular and deep RL methods to the mean-field setting. The main difficulty is the treatment of the population state, which is an input of the policy and the value function. We provide convergence guarantees for the tabular Q-learning algorithm based on discretizations of the simplex. We also show that neural network based deep RL algorithms are more suitable for continuous spaces as they allow us to avoid discretizing the mean field state space. Numerical examples are provided.
- M. Kearns, Y. Mansour, A. Y. Ng. A Sparse Sampling Algorithm for Near-Optimal Planning in Large Markov Decision Processes. Machine Learning, 49, 193–208, 2002. Abstract.
A critical issue for the application of Markov decision processes (MDPs) to realistic problems is how the complexity of planning scales with the size of the MDP. In stochastic environments with very large or infinite state spaces, traditional planning and reinforcement learning algorithms may be inapplicable, since their running time typically grows linearly with the state space size in the worst case. In this paper we present a new algorithm that, given only a generative model (a natural and common type of simulator) for an arbitrary MDP, performs on-line, near-optimal planning with a per-state running time that has no dependence on the number of states. The running time is exponential in the horizon time (which depends only on the discount factor γ and the desired degree of approximation to the optimal policy). Our algorithm thus provides a different complexity trade-off than classical algorithms such as value iteration—rather than scaling linearly in both horizon time and state space size, our running time trades an exponential dependence on the former in exchange for no dependence on the latter. Our algorithm is based on the idea of sparse sampling. We prove that a randomly sampled look-ahead tree that covers only a vanishing fraction of the full look-ahead tree nevertheless suffices to compute near-optimal actions from any state of an MDP. Practical implementations of the algorithm are discussed, and we draw ties to our related recent results on finding a near-best strategy from a given class of strategies in very large partially observable MDPs (Kearns, Mansour, & Ng. Neural information processing systems 13, to appear).
- L. Kocsis, C. Szepesvári . Bandit Based Monte-Carlo Planning. In: Machine Learning: ECML 2006. Lecture Notes in Computer Science(), vol 4212. Springer, Berlin, Heidelberg., 2006. Abstract.
For large state-space Markovian Decision Problems Monte-Carlo planning is one of the few viable approaches to find near-optimal solutions. In this paper we introduce a new algorithm, UCT, that applies bandit ideas to guide Monte-Carlo planning. In finite-horizon or discounted MDPs the algorithm is shown to be consistent and finite sample bounds are derived on the estimation error due to sampling. Experimental results show that in several domains, UCT is significantly more efficient than its alternatives.