The magnitude of categories of texts enriched by language models

Tai-Danae Bradley; Juan Pablo Vigneaux

The magnitude of categories of texts enriched by language models

Tai-Danae Bradley and Juan Pablo Vigneaux

The purpose of this article is twofold. Firstly, we use the next-token probabilities given by a language model to explicitly define a category of texts in natural language enriched over the unit interval, in the sense of Bradley, Terilla, and Vlassopoulos. We consider explicitly the terminating conditions for text generation and determine when the enrichment itself can be interpreted as a probability over texts. Secondly, we compute the Möbius function and the magnitude of an associated generalized metric space of texts. The magnitude function of that space is a sum over texts (prompts) of the t-logarithmic (Tsallis) entropies of the next-token probability distributions associated with each prompt, plus the cardinality of the model's possible outputs. A suitable evaluation of the magnitude function's derivative recovers a sum of Shannon entropies, which justifies seeing magnitude as a partition function. Following Leinster and Shulman, we also express the magnitude function of the generalized metric space as an Euler characteristic of magnitude homology and provide an explicit description of the zeroeth and first magnitude homology groups.

Keywords: categorical magnitude, language model, generalized metric space, entropy

2020 MSC: 18D20; 68T50; 94A17

Theory and Applications of Categories, Vol. 44, 2025, No. 37, pp 1256-1281.

Published 2025-11-10.

http://www.tac.mta.ca/tac/volumes/44/37/44-37.pdf

TAC Home