Cleaning Spreadsheet Data with Open Refine

Got messy spreadsheets? Open Refine is a powerful, free, open-source software tool for cleaning and transforming data in a way that is easy to reproduce. This class is targeted at people who need to clean messy data, including spreadsheets of survey responses, patient encounters, financial records, or workshop attendance. We will cover the basics of cleaning data in OpenRefine and also go over some more advanced techniques including pulling in additional data from an API. If you want something more powerful than Excel but don't want to spend the time to learn a programming language like R or Python, OpenRefine could be the perfect tool for you!

Learning Objectives: 

By the end of the class learners should be able to:

  • Explain how OpenRefine works on their computer
  • Use OpenRefine to:
    • Facet data to find typos and errors
    • Cluster data to easily correct typos at scale
    • Split data into multiple columns
    • Pull in additional data from an API
    • Transform wide data into tidy data format 
  • Export their cleaned data in a variety of formats
  • Save their cleaning scripts so they can be re-used

Prerequisites / Preparation

Please complete the following tasks before coming to class:

Instructor

Ariel Deardorff, Data Services Librarian, UCSF Library

Dial-In Information

This will be an online class via Zoom conferencing. The zoom link will be sent out a week in advance.

Tuesday, June 16 at 9:30am to 11:00am

Virtual Event
Event Type

Class/Info Session

Audience

Students, Postdocs, Faculty, Staff

Location

Online

Tags

Data Science Initiative, data management, data sharing, open refine

Website

https://calendars.library.ucsf.edu/ev...

Cost

Free

Department/Group
UCSF Library
Contact Info

ariel.deardorff@ucsf.edu

Subscribe

You're not going yet!

This event requires registration.