HANDLE: Get a Grip on MALLET

poster / demo / art installation
Authorship
  1. 1. David Lawrence Shepard

    University of California, Los Angeles (UCLA)

Work text
This plain text was ingested for the purpose of full-text search, not to preserve original formatting or readability. For the most complete copy, refer to the original conference program.

Large collections of digital text from various sources are now available to researchers and the public, and topic modeling is one of the most common approaches to mining that text. However, existing topic modeling tools are often inaccessible to less technically-adept users. The most popular tools require either using the command line or writing code1. While a GUI2 exists, researchers have expressed frustrations with its limited feature set. HANDLE (Heuristic Analytical Digital Language Environment) is a better GUI for MALLET3 that offers easier text importing, topic model building, visualization, and data exporting, and was developed at UCLA’s Scholarly Innovation Lab. Increasing the accessibility of text mining facilitates research at all levels, from undergraduate to faculty, by making it easier to perform research and communicate it to a wider audience.One-step ImportingHANDLE removes one of the stumbling blocks for first-time users by simplifying importing text. It features a drag-and-drop interface for importing files so that students do not need to learn the command line tools for MALLET. HANDLE can import plain text and Word documents. Second, a window shows the text as MALLET will understand it: punctuation is removed, and stopwords are replaced with struck-through text. Third, the defaults preserve accent marks and non-Latin characters; more advanced users can adjust these settings. Fourth, stopword settings can be adjusted: a user can drag-and-drop stopword list files into the program, and new stopwords can be added through a contextual menu.Convenient ExperimentationMost users have to try a few different topic models with different settings before they find one that works, but MALLET only accepts input from the command line, leaving users with the tedious task of retyping commands to change one setting. HANDLE addresses this issue by providing a graphical interface for creating topic models, reducing the number of opportunities for errors. HANDLE also shows a list of all the topic models a user has created and remembers the settings used for each, making comparisons and documentation easier. HANDLE also allows a user to save a project (a collection of documents and topic models) and come back to it later.Easy VisualizationHANDLE has integrated visualizations. For each topic model, an LDAViz4-inspired interface can be used to explore topic similarity and the top words in each topic. Another interface displays the topics that make up a specific document, and their proportions. These images can be exported for articles or papers.Export Data into Familiar ToolsBecause no tool can anticipate all use cases, HANDLE can export its data in formats that are easy to manipulate. Summaries of all topics within all documents, called “document topic reports” in MALLET, can be exported as spreadsheets. Topic model files can be exported for use with command-line MALLET, or for other computers running HANDLE. HANDLE allows users to focus on the data rather than getting the tool to run, and to work with their data in environments more familiar to them.HANDLE will be released as an open-source project on GitHub by summer 2020.

If this content appears in violation of your intellectual property rights, or you see errors or omissions, please reach out to Scott B. Weingart to discuss removing or amending the materials.

Conference Info

In review

ADHO - 2020
"carrefours / intersections"

Hosted at Carleton University, Université d'Ottawa (University of Ottawa)

Ottawa, Ontario, Canada

July 20, 2020 - July 25, 2020

475 works by 1078 authors indexed

Conference cancelled due to coronavirus. Online conference held at https://hcommons.org/groups/dh2020/. Data for this conference were initially prepared and cleaned by May Ning.

Conference website: https://dh2020.adho.org/

References: https://dh2020.adho.org/abstracts/

Series: ADHO (15)

Organizers: ADHO