Daniil Tiapkin

Hello! I am a Research Scientist at Google DeepMind in Paris, working on reinforcement learning and foundation-model post-training in general.

I defended my PhD in September 2025 at CMAP, École Polytechnique (Institut Polytechnique de Paris) and LMO, Université Paris-Saclay, under the supervision of Éric Moulines and Gilles Stoltz. My thesis, Sample-Efficient Reinforcement Learning: Exploration, Imitation, and Online Learning, is available online.

Before joining DeepMind full-time, I was a student-researcher under supervision of Mathieu Blondel, where I worked on distillation of language models — see On Teacher Hacking in Language Model Distillation.

Earlier, I did research at the HDI Lab at HSE University, where I also completed a Master’s degree in Applied Mathematics and Computer Science (program “Math of Machine Learning”).

Research interests

Reinforcement learning: exploration, sample efficiency, imitation, RLHF.
Connections between amortized sampling and RL.
Post-training of foundation models.
Online learning and bandits.

news

Apr 01, 2026	Beyond Softmax and Entropy: Convergence Rates of Policy Gradients with f-SoftArgmax Parameterization & Coupled Regularization accepted at ICLR 2026, and On Global Convergence Rates for Federated Softmax Policy Gradient under Heterogeneous Environments accepted at AISTATS 2026.
Feb 23, 2026	I joined Google DeepMind Paris as a Research Scientist!
Nov 20, 2025	Together with collaborators, we released `gfnx`, a fast and scalable JAX library for Generative Flow Networks (paper: arXiv:2511.16592).
Sep 16, 2025	I defended my PhD at Institut Polytechnique de Paris! Thesis: Sample-Efficient Reinforcement Learning: Exploration, Imitation, and Online Learning. Many thanks to my reviewers Shie Mannor and Emilie Kaufmann, to the jury president Erwan Le Pennec, and to the examiners Claire Vernade and Aurélien Garivier — and, of course, to my advisors Éric Moulines and Gilles Stoltz.
May 01, 2025	Three papers accepted at ICML 2025: On Teacher Hacking in Language Model Distillation, Revisiting Non-Acyclic GFlowNets in Discrete Environments, and Finite-Sample Convergence Bounds for Trust Region Policy Optimization in Mean-Field Games.

selected publications

ICLR

Beyond Softmax and Entropy: Convergence Rates of Policy Gradients with f-SoftArgmax Parameterization & Coupled Regularization

S. Labbi, D. Tiapkin, P. Mangold, and 1 more author

In International Conference on Learning Representations (ICLR), 2026

arXiv Bib HTML PDF

@inproceedings{labbi2026beyond,
  title = {Beyond Softmax and Entropy: Convergence Rates of Policy Gradients with f-SoftArgmax Parameterization \& Coupled Regularization},
  author = {Labbi, S. and Tiapkin, D. and Mangold, P. and Moulines, {\'E}.},
  booktitle = {International Conference on Learning Representations (ICLR)},
  year = {2026},
}

Software

gfnx: Fast and Scalable Library for Generative Flow Networks in JAX

D. Tiapkin, A. Agarkov, N. Morozov, and 4 more authors

2025

arXiv Bib HTML PDF Code Website

@misc{gfnx2025,
  title = {gfnx: Fast and Scalable Library for Generative Flow Networks in JAX},
  author = {Tiapkin, D. and Agarkov, A. and Morozov, N. and Maksimov, I. and Tsyganov, A. and Gritsaev, T. and Samsonov, S.},
  year = {2025},
}

ICML

On Teacher Hacking in Language Model Distillation

D. Tiapkin, D. Calandriello, J. Ferret, and 4 more authors

In International Conference on Machine Learning (ICML), 2025

arXiv Bib HTML PDF

@inproceedings{tiapkin2025teacher,
  title = {On Teacher Hacking in Language Model Distillation},
  author = {Tiapkin, D. and Calandriello, D. and Ferret, J. and Perrin, S. and Vieillard, N. and Ram{\'e}, A. and Blondel, M.},
  booktitle = {International Conference on Machine Learning (ICML)},
  year = {2025},
}

AISTATS

Generative Flow Networks as Entropy-Regularized RL

D. Tiapkin, N. Morozov, A. Naumov, and 1 more author

In International Conference on Artificial Intelligence and Statistics (AISTATS), 2024

Oral presentation

Oral Bib HTML PDF

Selected for Oral presentation.

@inproceedings{tiapkin2024gflownets,
  title = {Generative Flow Networks as Entropy-Regularized RL},
  author = {Tiapkin, D. and Morozov, N. and Naumov, A. and Vetrov, D.},
  booktitle = {International Conference on Artificial Intelligence and Statistics (AISTATS)},
  year = {2024},
  note = {Oral presentation},
}

ICLR

Demonstration-Regularized RL

D. Tiapkin, D. Belomestny, D. Calandriello, and 5 more authors

In International Conference on Learning Representations (ICLR), 2024

Bib HTML PDF

@inproceedings{tiapkin2024demonstration,
  title = {Demonstration-Regularized RL},
  author = {Tiapkin, D. and Belomestny, D. and Calandriello, D. and Moulines, {\'E}. and Naumov, A. and Perrault, P. and Valko, M. and M{\'e}nard, P.},
  booktitle = {International Conference on Learning Representations (ICLR)},
  year = {2024},
}