Daniil Tiapkin
Research Scientist, Google DeepMind, Paris.
Hello! I am a Research Scientist at Google DeepMind in Paris, working on reinforcement learning and foundation-model post-training in general.
I defended my PhD in September 2025 at CMAP, École Polytechnique (Institut Polytechnique de Paris) and LMO, Université Paris-Saclay, under the supervision of Éric Moulines and Gilles Stoltz. My thesis, Sample-Efficient Reinforcement Learning: Exploration, Imitation, and Online Learning, is available online.
Before joining DeepMind full-time, I was a student-researcher under supervision of Mathieu Blondel, where I worked on distillation of language models — see On Teacher Hacking in Language Model Distillation.
Earlier, I did research at the HDI Lab at HSE University, where I also completed a Master’s degree in Applied Mathematics and Computer Science (program “Math of Machine Learning”).
Research interests
- Reinforcement learning: exploration, sample efficiency, imitation, RLHF.
- Connections between amortized sampling and RL.
- Post-training of foundation models.
- Online learning and bandits.
news
| Apr 01, 2026 | Beyond Softmax and Entropy: Convergence Rates of Policy Gradients with f-SoftArgmax Parameterization & Coupled Regularization accepted at ICLR 2026, and On Global Convergence Rates for Federated Softmax Policy Gradient under Heterogeneous Environments accepted at AISTATS 2026. |
|---|---|
| Feb 23, 2026 | I joined Google DeepMind Paris as a Research Scientist! |
| Nov 20, 2025 | Together with collaborators, we released gfnx, a fast and scalable JAX library for Generative Flow Networks (paper: arXiv:2511.16592). |
| Sep 16, 2025 | I defended my PhD at Institut Polytechnique de Paris! Thesis: Sample-Efficient Reinforcement Learning: Exploration, Imitation, and Online Learning. Many thanks to my reviewers Shie Mannor and Emilie Kaufmann, to the jury president Erwan Le Pennec, and to the examiners Claire Vernade and Aurélien Garivier — and, of course, to my advisors Éric Moulines and Gilles Stoltz. |
| May 01, 2025 | Three papers accepted at ICML 2025: On Teacher Hacking in Language Model Distillation, Revisiting Non-Acyclic GFlowNets in Discrete Environments, and Finite-Sample Convergence Bounds for Trust Region Policy Optimization in Mean-Field Games. |