Séminaire MFO/LERMA ‘Building and Mining Corpora for Social Media Discourse Analysis’


Wednesday 01 June 2022    
15h00 - 16h30

‘Channels of Digital Scholarship’ Seminar I: New tools and old questions in the analysis of textual corpora

Séminaire organisé par la Maison Française d’Oxford (UAR 3129, UMIFRE 11) avec le soutien du LERMA (UR 853)

Table ronde en ligne avec Zoom : ‘Building and Mining Corpora for Social Media Discourse Analysis’

Mercredi 1er juin 2022 15h-16h30

Organisateur et discutant : Grégoire Lacaze (AMU, LERMA / MFO)

Conférenciers invités :
Bernie Hogan (Senior Research Fellow, Oxford Internet Institute, University of Oxford): “Theorising and integrating platform signals into digital text corpora”

Gudrun Ledegen (Université Rennes 2, Laboratoire PREFICS) : “Suicide prevention chat, quantitative and qualitative description of a discourse genre for better listening”

Résumés des communications : https://mfo.web.ox.ac.uk/event/channels-digital-scholarship-seminar-i-new-tools-and-old-questions-2

Inscription Zoom : https://us06web.zoom.us/meeting/register/tZ0oduuqrj8jGtaykEVmZnxXQHT96nCArhR0

Vidéo de la séance de séminaire:


Texte de cadrage :

Social media discourse analysis raises the topical question of the process of building a corpus of digital posts. The determination of the limits of the corpora is at stake in this process. In this round table, we will discuss the amount and types of data that need to be selected in the building of a corpus.

Digital platforms of social media have the main property to be regarded as open environments in which new posts and comments can be added without a limitation in time, which has a strong impact on the singularity of corpora that can be elaborated at a given time.

The question of reproducibility applied to this data according to the FAIR principles (Findable, Accessible, Interoperable, Reusable) will also be tackled. Once the corpora are constituted, they have to be stored on safe and permanent repositories, which directly leads us to highlight the importance of open data for long-term analyses.

When the corpora are built, they can be analysed thanks to data-mining techniques. Different approaches and methodologies will be presented, some of them being based on deep learning techniques including neural networks. Digital corpora obviously need digital tools to be analysed. Algorithms and software such as open source Iramuteq will be shown.

A recurrent question as far as corpus building is concerned is the dichotomy between qualitative analysis and quantitative analysis.