Introduction to the TXM content analysis platform

workshop / tutorial
Authorship
  1. 1. Serge Heiden

    Project Manager of the TXM Platform Development

Work text
This plain text was ingested for the purpose of full-text search, not to preserve original formatting or readability. For the most complete copy, refer to the original conference program.

The objective of the “introduction to TXM” tutorial is to introduce the participants to the methodology of textometric content analysis (http://textometrie.ens-lyon.fr/?lang=en) through working with the TXM software directly on their own laptop computers. At the end of the tutorial, the participants will be able to input their own textual corpora (Unicode encoded raw texts or XML tagged texts) into TXM and to analyze them with the panel of content analysis tools available : word patterns frequency lists, kwic concordances and text browsing, rich full text search engine syntax (allowing to express various sequences of word forms, part of speech and lemma combinations constrained by XML structures), statistically specific sub-corpus vocabulary analysis, statistical collocation analysis, etc.).

During the tutorial, each participant will install TXM (from http://sourceforge.net/projects/txm) and the TreeTagger lemmatizer (http://www.ims.uni-stuttgart.de/projekte/corplex/TreeTagger) on her Windows, Mac or Linux laptop and will leave the tutorial with a ready to use environment.

The tutorial will also introduce the participants to the TXM community ecosystem (users mailing list and wiki, bug reports, etc.) and to the TXM portal version server software (See for example http://portal.textometrie.org/demo) for on line corpus distribution and analysis. Time permitting, TEI encoding aspects of corpora related to TXM could also be introduced, as well as speech transcriptions or parallel corpora encoding and analysis.

The objective of the “introduction to TXM” tutorial is to introduce the participants to the methodology of textometric content analysis (http://textometrie.ens-lyon.fr/?lang=en) through working with the TXM software directly on their own laptop computers. At the end of the tutorial, the participants will be able to input their own textual corpora (Unicode encoded raw texts or XML tagged texts) into TXM and to analyze them with the panel of content analysis tools available : word patterns frequency lists, kwic concordances and text browsing, rich full text search engine syntax (allowing to express various sequences of word forms, part of speech and lemma combinations constrained by XML structures), statistically specific sub-corpus vocabulary analysis, statistical collocation analysis, etc.)

The tutorial will be taught in English for the first time in DH2013 (the TXM User Graphical Interface is already available in English), and will complement two accepted communications introducing the TXM platform given during the conference:

—“TXM Platform for analysis of TEI encoded textual sources” #391 long paper;
— “TXM Portal: Providing Online Access to Textometric Corpus Analysis” #399 poster with live demo.
Tutorial Instructor
Serge Heiden

Project manager of the TXM platform development (http://textometrie.ens-lyon.fr/spip.php?article9). S. Heiden develops the textometry content analysis methodology through the development of tools able to process richly encoded corpora. Working on the relation between analysis tools and XML-TEI encoded corpora, he is involved in the TEI consortium activities as the TEI Tools SIG convener (http://www.tei-c.org/Activities/SIG/Tools).

Target audience and expected number of participants
The ideal number of participants is about 12-15 people, the maximum number of participants is about 20.

Each participant should come with her own laptop computer. The tutorial needs to run at least for a full day(*): typically half day for TXM tools fundamentals and half day for main corpus formats fundamentals (TXT and XML) and input procedures into the platform.

(*) The regular TXM tutorials run for two days (one day TXM introduction, one day corpus formating and import into TXM).

Brief Outline
9am – 12pm
1pm – 5pm
— Install & introduction: 45'
— TXM user interface & windows, corpus Description command
— Main tools: 2h15
—Lexicon analysis & spreadsheet export
—Index building for distributional semantics & Corpus Query Language syntax
— Concordance & Reading, Progression graphics
—Partitions, Subcorpus & Specificity/Factorial analysis
—Coccurrence analysis
—TXM portal demo (optional)
—TXM community: mailing lists, web sites and documentation
— TXM import strategy and main corpus formats: TXT-Unicode+CSV, XML+CSV, XML-TEI: 1/2h
— TXT-Unicode sample corpus and TXT+CSV import into TXM, sample analysis: 1h15
— introduction to XML and to TXT2XML conversion tools: 1/2h
— XML sample corpus and XML/w+CSV import into TXM, sample analysis: 1h45

If this content appears in violation of your intellectual property rights, or you see errors or omissions, please reach out to Scott B. Weingart to discuss removing or amending the materials.

Conference Info

Complete

ADHO - 2013
"Freedom to Explore"

Hosted at University of Nebraska–Lincoln

Lincoln, Nebraska, United States

July 16, 2013 - July 19, 2013

243 works by 575 authors indexed

XML available from https://github.com/elliewix/DHAnalysis (still needs to be added)

Conference website: http://dh2013.unl.edu/

Series: ADHO (8)

Organizers: ADHO

Tags
  • Keywords: None
  • Language: English
  • Topics: None