Institute for Georgian Literature – Tbilisi State University – Georgia
13 Mar 2025
Query: ['poésie_nom', 10]
Result: poétique_adj 0.841
poème_nom 0.790
prose_nom 0.733
littérature_nom 0.715
poète_nom 0.704
poétique_nom 0.701
poésie_nam 0.700
anthologie_nom 0.695
littéraire_adj 0.655
sonnet_nom 0.651
(authentic data, Wikipedia model)
Query: ['prose_nom', 'littérature_nom']
Result: 0.511518681366
Query: ['poésie_nom', 'littérature_nom']
Result: 0.714615326722
(authentic data, Wikipedia model)
Axis: [["bonheur", "joie"], # positive
["malheur", "tristesse"]] # negative
Query: ange
Result: 0.0875
Query: monstre
Result -0.1407
(authentic data)
Topic 10: detective, inspector, police Distinctive of crime fiction (content & statistics) (p < α=0.01)
##Topic and subgenre Topic 49: death, crime, to kill Distinctive of crime fiction (content & statistics) (p < α=0.01)
##Topic and subgenre Topic 47: door, room, to open
##Topic and subgenre Topic 26: beach, sand, sun Distinctive of non-crime fiction (p < α=0.001)
Topic 2: judge, prison, lawyer/attorney Statistically significant (crime fiction): (1,4), (4,5) etc.
Topic 33: black, hair, eyes, wear, eye, face Statistically significant: crime fiction all but (2,3); non-crime fiction (1,3), (2,5)
(top 50 topics, cosine/weighted)
(top 50 topics, cosine/weighted)
(Each word has a score in each topic; here ordered by topic/rank)
(Each topic has a score in each document; ordered by document)
“A topic model is a generative model for documents: it specifies a simple probabilistic procedure by which documents can be generated. To make a new document, one chooses a distribution over topics. Then, for each word in that document, one chooses a topic at random according to this distribution, and draws a word from that topic. Standard statistical techniques can be used to invert this process, inferring the set of topics that were responsible for generating a collection of documents.” (Steyvers and Griffiths 2006)
“The computational problem of inferring the hidden topic structure from the documents is the problem of computing the posterior distribution, the conditional distribution of the hidden variables, given the documents.”
(David Blei, “Probabilistic Topic Models”, 2012)
p(Z, φ, θ | w, α, β)
Describe the topic mixture distributions of the model. Here several possible distributions with three topics.
Introductory articles * Blei, David M. (2012). “Probabilistic topic models”. In: Communications of the ACM, 55(4): 77–84. http://www.cs.princeton.edu/~blei/papers/Blei2012.pdf * Steyvers, M. and Griffiths, T. (2006). “Probabilistic Topic Models”. In: Landauer, T. et al. (eds), Latent Semantic Analysis: A Road to Meaning. Laurence Erlbaum.
Video lectures * Jordan Boyd-Graber, “Topic Models”, YouTube.com, 2015. https://www.youtube.com/watch?v=yK7nN3FcgUs * David Blei, “Topic Models”, Videolectures.net, 2012, http://videolectures.net/mlss09uk_blei_tm/