Categories


Authors

DIGSUM Workshop on Webscraping and Text Visualisation

On Friday April 11, 10:00 to 12:00 and 13:00 to 15:00, DIGSUM hosts a methods workshop. It will be held by our invited guests Dalia Ortiz Pablo and Maria Skeppstedt, Research Engineers at the Centre for Digital Humanities and Social Sciences at Uppsala University. The workshop will focus on methods for webscraping and text visualisation.

Seats are limited to 15.
This is an on-site event at Umeå University only.
Sign up [here]!

Time and location: 10:00 - 15:00, ULED.A.310 - Triple Helix [mazemap]

Webscraping using VKontakte scraper
Web scraping is a powerful tool for automating data collection, but building a functional scraper comes with its own challenges. In this hands-on workshop, we’ll take a deep dive into a custom-built web scraper, exploring its purpose, inner workings, and practical applications. Specifically, we will look at the development of the VK-scraper — an open-source data collection solution for the non-Western social media VKontakte.

We’ll start with a live demo to see the scraper in action, followed by a detailed breakdown of its architecture, including how it retrieves, processes, and stores data. A step-by-step code walkthrough will highlight key design choices, common pitfalls, and lessons learned during development.

Attendees will then get a chance to experiment with the scraper, modifying and testing it to extract new data. By the end of the session, participants will have a solid understanding of how this scraper works, how to customize it, and best practices for utilising the scraper.

Text visualisation using the Word Rain technique
The second part of the workshop will explore methods developed at CDHU for providing the user with an overview of large text collections, as well as with the possibility to zoom in to access content on a more detailed level. We will focus on the Word Rain method – an improvement of the classic word cloud–which can be used for exploring and analysing texts.

We will describe the theory behind the Word Rain method, as well as show practical examples of how you can create word rains for your own texts. We will end by also showing some examples of our Topic Timelines visualization, where we move from the word-level of the Word Rain to the topic level.

Dalia Ortiz Pablo has a background in mathematics and data science. In 2022, she joined the CDHU as a research engineer and has since contributed to multidisciplinary projects in the humanities and social sciences. Her expertise lies in machine learning, computer vision, and web scraping.Maria

Maria Skeppstedt received her PhD in Computer and Systems Sciences in 2015. After a few years of postdoctoral research in applied natural language processing, she has spent the past six years working within research infrastructures, developing tools for searching, processing, annotating, and visualizing terms and text. As a research engineer at CDHU, she specializes in creating and applying new visualization techniques for exploring large text collections.

Empowering Learning with AI by Navigating Challenges and Opportunities