

However, the pandemic-related terminology is well covered. There is a high overlap in the frequently used terms across the first 13 months, which may indicate the narrow focus of reporting in certain periods. Our study shows that online media has a prompt response to the pandemic with a large number of COVID-19 related articles. Finally, the findings indicate that the most influential entities have lower overlaps for the identified persons and higher overlaps for locations and institutions.

Furthermore, there are high overlaps in the terminology used in all articles published during the pandemic with a slight shift in the pandemic-related terms between the first and the second wave. The results show there is no significant correlation between the number of articles and the number of new daily COVID-19 cases. Finally, we apply named entity recognition to extract the most frequent entities and track the dynamics of changes during the observed period. Next, we compare the occurrence of the pandemic-related terms during the two waves of the pandemic. Secondly, we analyze the content by extracting the most frequent terms and apply the Jaccard similarity.

Firstly, we test the correlations between the number of articles and the number of new daily COVID-19 cases. We collected a dataset of news articles published by Croatian online media during the first 13 months of the pandemic. The goal of this study is to perform a longitudinal analysis of the COVID-19 related content based on natural language processing methods. Infoveillance of online media during the COVID-19 pandemic is an important step toward a better understanding of crisis communication. Online media plays an important role in public health emergencies and serves as a communication platform. The linguistic analysis of the extracted collocations shows, among other things, that a contrastive comparison can be used to idenitfy the main characteristics and trends regarding lexical innovations, as well as to highlight their problematic aspects-e.g., when lexical innovations-particularly when under the influence of foreign language elements-also introduce changes in spelling and syntactic roles. The extracted data were compared with existing dictionaries of Slovene in terms of new vocabulary, typical collocations, and set phrases, as well as semantic shifts. The results of the analysis are shown to be of interest in the monitoring of lexical innovations in Slovene vocabulary and for updating dictionaries. EN // This article presents a lexical analysis of data extracted for a specific collocation window from the Janes and Kres corpora of Slovene. Jezikoslovna analiza izluščenih kolokacij je med drugim pokazala, da je mogoče s primerjalno analizo prepoznati glavne značilnosti in trende leksikalnih novosti ter zaznati problematične točke, kjer leksikalne novosti zlasti pod vplivom tujejezičnih elementov v slovenščino vnašajo tudi spremembe v zapisu in skladenjski vlogi. Izluščene podatke smo analizirali primerjalno glede na aktualne slovarje za slovenščino z vidika še neregistriranega besedišča, z vidika vstopanja v tipične kolokacije in stalne zveze ter z vidika pomenskih sprememb. V prispevku opišemo leksikalno analizo izluščenih podatkov za določen kolokacijski okvir iz korpusov Janes in Kres ter predstavimo rezultate, ki so zanimivi za spremljanje leksikalnih novosti v slovenski leksiki in za njeno posodobitev v slovarjih. e results of the analysis show the persistence of heteronormativity so much so that it becomes naturalized, whereas any counterforce is seen as disruptive. Our analysis is mainly qualitative and based on content analysis, followed by a critical discourse analysis examining how much control one social group imposes over another and tries to limit the freedom of other people's actions, using the concepts of "normal" and "natural" as key aspects in forming gender and sexual identities.

e dataset for our study was extracted from the Janes corpus of Slovenian user-generated content, which contains almost 215 million tokens from Slovenian blog posts, forum messages, news comments, tweets, etc., and is richly annotated with socio-demographic and linguistic metadata. Since Twitter, as one of the main social networking platforms, plays an important part in forming gender and sexual identities, the aim of this study was to perform a corpus analysis of Twitter discourse pertaining to the LGBTQ+ community in Slovenia.
