Arianne le 6 octobre 2025 à 15:27

2025-10-06T15:27:06Z

← Version précédente		Version du 6 octobre 2025 à 11:27
Ligne 2 :		Ligne 2 :

	== Définition ==		== Définition ==
	~~XXXXXXXXX~~		'''[[Algorithme]]''' d''''[[apprentissage par renforcement]]''' qui améliore l'efficacité de l''''[[entraînement]]''' ainsi que les performances des '''[[Grand modèle de langues (GML)\|grands modèles de langues]]''' en utilisant des ratios d'importance au niveau des séquences (?) et des opérations.

	== Français ==		== Français ==
Ligne 12 :		Ligne 12 :
	'''GSPO'''		'''GSPO'''

	~~A new reinforcement~~ learning algorithm ~~for training large language models~~ that ~~addresses critical stability issues in existing methods. Current state-of-the-art algorithms like GRPO exhibit severe stability issues when~~ training ~~gigantic language model that can lead to catastrophic model collapse. GSPO resolves these issues by performing optimization at the sequence level rather than the token level, leading to more stable~~ and ~~efficient training.~~		'' Reinforcement learning algorithm that improves training efficiency and performance of large language models by using sequence-level importance ratios and operations.''

	~~GSPO, a solution to improve the stability~~ of ~~current RL training methods for~~ large language models~~. By aligning the optimization approach with the~~ sequence~~-level nature of rewards and avoiding problematic token~~-level importance ~~weighting, GSPO provides a more stable~~ and ~~efficient foundation for scaling RL training~~.

	== Source ==		== Source ==

Pitpitt : Page créée avec « ==en construction== == Définition == XXXXXXXXX == Français == ''' XXXXXXXXX ''' == Anglais == '''Group Sequence Policy Optimization''' '''GSPO''' A new reinforcement learning algorithm for training large language models that addresses critical stability issues in existing methods. Current state-of-the-art algorithms like GRPO exhibit severe stability issues when training gigantic language model that can lead to catastrophic model collapse. GSPO resolves t... »

2025-09-04T13:47:52Z

Page créée avec « ==en construction== == Définition == XXXXXXXXX == Français == ''' XXXXXXXXX ''' == Anglais == '''Group Sequence Policy Optimization''' '''GSPO''' A new reinforcement learning algorithm for training large language models that addresses critical stability issues in existing methods. Current state-of-the-art algorithms like GRPO exhibit severe stability issues when training gigantic language model that can lead to catastrophic model collapse. GSPO resolves t... »

Nouvelle page

==en construction==

== Définition ==
XXXXXXXXX

== Français ==
''' XXXXXXXXX '''

== Anglais ==
'''Group Sequence Policy Optimization'''

'''GSPO'''

A new reinforcement learning algorithm for training large language models that addresses critical stability issues in existing methods. Current state-of-the-art algorithms like GRPO exhibit severe stability issues when training gigantic language model that can lead to catastrophic model collapse. GSPO resolves these issues by performing optimization at the sequence level rather than the token level, leading to more stable and efficient training.

GSPO, a solution to improve the stability of current RL training methods for large language models. By aligning the optimization approach with the sequence-level nature of rewards and avoiding problematic token-level importance weighting, GSPO provides a more stable and efficient foundation for scaling RL training.

== Source ==

[https://huggingface.co/papers/2507.18071 Source : huggingface]

[[Catégorie:vocabulary]]

Group Sequence Policy Optimization - Historique des versions

Arianne le 6 octobre 2025 à 15:27