Cleaning and Exploring Your Data With Open Refine
Intersect Australia, Australia
Paul Arthur, University of Western Sidney
Locked Bag 1797
Penrith NSW 2751
Converted from a Word document
Pre-Conference Workshop and Tutorial (Round 2)
databases & dbms
This three-hour workshop introduces participants to Open Refine, which is a powerful tool for cleaning, normalization, and exploration of datasets. In this tutorial we’ll work through the various features of Refine, including importing data, faceting, clustering, and calling into remote APIs, by working on a fictional but plausible humanities research project.
Ingrid Mason (on behalf of Intersect Australia, tutorial instructors TBC)
eResearch, digital humanities
Area of Expertise
The eResearch analysts and software developers at Intersect Australia deliver training and tutorials to support uptake of ubiquitous and domain-specific technologies employed in data-intensive research. This training is regularly delivered to academics and higher degree students in its member universities in Australia. A common area of expertise shared across the team is data, information and technology management, informatics and data modelling, through to technical evaluation and software engineering to enable the development, application, and uptake of technologies in support of research.
Attendees accustomed to using ubiquitous tools on their desktop computer, e.g., Excel, and interested in building up further skills to process their data. Mastering tasks such as
• Cleaning up messy data: If you have a text file with semi-structured data, you can edit it using facets, transformations, and clustering to make the data cleanly structured.
• Transformation of data: Converting data to different formats, normalising and the de-normalising of data.
• Parsing of data from websites: Open Refine contains a URL fetch feature.
• Adding data to your dataset by fetching it from web services with json: A good example of this is used for geocoding addresses to geographic coordinates.
Attendees bring a laptop they have administration access to (able to download software) and robust wireless must be available.
Open Refine is a powerful free tool for exploring, normalizing, and cleaning up datasets. In this 2.5-hour tutorial we’ll work through the various features of Refine, including importing data, faceting, clustering, and calling into remote APIs, by working on a fictional but plausible humanities research project. We will start with a research question in mind and use the features of Refine to gain insights and to find answers. The research question relates to NSW police stations—finding out what we can about where they are located, their heritage status, and the kinds of archival records that State Records NSW holds on them. During this course you will be shown how to get map coordinates by creating direct batch queries to Google Maps.
Participants will learn
• How to install Open Refine.
• How to create a project.
• How to organise your data.
• Ways to explore your data.
• How to work with APIs.
• How to export a dataset.
For more information, see http://intersect.org.au/course-resources.
If this content appears in violation of your intellectual property rights, or you see errors or omissions, please reach out to Scott B. Weingart to discuss removing or amending the materials.
Hosted at Western Sydney University
June 29, 2015 - July 3, 2015
280 works by 609 authors indexed
Conference website: https://web.archive.org/web/20190121165412/http://dh2015.org/
Series: ADHO (10)