Daniil Tiapkin

Research Scientist, Google DeepMind, Paris.

prof_pic.jpg

Hello! I am a Research Scientist at Google DeepMind in Paris, working on reinforcement learning and foundation-model post-training in general.

I defended my PhD in September 2025 at CMAP, École Polytechnique (Institut Polytechnique de Paris) and LMO, Université Paris-Saclay, under the supervision of Éric Moulines and Gilles Stoltz. My thesis, Sample-Efficient Reinforcement Learning: Exploration, Imitation, and Online Learning, is available online.

Before joining DeepMind full-time, I was a student-researcher under supervision of Mathieu Blondel, where I worked on distillation of language models — see On Teacher Hacking in Language Model Distillation.

Earlier, I did research at the HDI Lab at HSE University, where I also completed a Master’s degree in Applied Mathematics and Computer Science (program “Math of Machine Learning”).

Research interests

  • Reinforcement learning: exploration, sample efficiency, imitation, RLHF.
  • Connections between amortized sampling and RL.
  • Post-training of foundation models.
  • Online learning and bandits.

news

Apr 01, 2026 Beyond Softmax and Entropy: Convergence Rates of Policy Gradients with f-SoftArgmax Parameterization & Coupled Regularization accepted at ICLR 2026, and On Global Convergence Rates for Federated Softmax Policy Gradient under Heterogeneous Environments accepted at AISTATS 2026.
Feb 23, 2026 I joined Google DeepMind Paris as a Research Scientist!
Nov 20, 2025 Together with collaborators, we released gfnx, a fast and scalable JAX library for Generative Flow Networks (paper: arXiv:2511.16592).
Sep 16, 2025 I defended my PhD at Institut Polytechnique de Paris! Thesis: Sample-Efficient Reinforcement Learning: Exploration, Imitation, and Online Learning. Many thanks to my reviewers Shie Mannor and Emilie Kaufmann, to the jury president Erwan Le Pennec, and to the examiners Claire Vernade and Aurélien Garivier — and, of course, to my advisors Éric Moulines and Gilles Stoltz.
May 01, 2025 Three papers accepted at ICML 2025: On Teacher Hacking in Language Model Distillation, Revisiting Non-Acyclic GFlowNets in Discrete Environments, and Finite-Sample Convergence Bounds for Trust Region Policy Optimization in Mean-Field Games.

selected publications

  1. ICLR
    Beyond Softmax and Entropy: Convergence Rates of Policy Gradients with f-SoftArgmax Parameterization & Coupled Regularization
    S. Labbi, D. Tiapkin, P. Mangold, and 1 more author
    In International Conference on Learning Representations (ICLR), 2026
  2. Software
    gfnx: Fast and Scalable Library for Generative Flow Networks in JAX
    D. Tiapkin, A. Agarkov, N. Morozov, and 4 more authors
    2025
  3. ICML
    On Teacher Hacking in Language Model Distillation
    D. Tiapkin, D. Calandriello, J. Ferret, and 4 more authors
    In International Conference on Machine Learning (ICML), 2025
  4. AISTATS
    Generative Flow Networks as Entropy-Regularized RL
    D. Tiapkin, N. Morozov, A. Naumov, and 1 more author
    In International Conference on Artificial Intelligence and Statistics (AISTATS), 2024
    Oral presentation
  5. ICLR
    Demonstration-Regularized RL
    D. Tiapkin, D. Belomestny, D. Calandriello, and 5 more authors
    In International Conference on Learning Representations (ICLR), 2024