-
Cleaning and Preparing Data with Pandas (2 hours)
-
Get Started with Anaconda Notebooks
This video is still being processed. Please check back later and refresh the page.
Uh oh! Something went wrong, please try again.
Cleaning and Preparing Data with Pandas
Manipulating data in the NCAA games data set.
Data cleaning is a critical step for any data science, machine learning, statistical, or analytics project. In this two-hour live online course, we'll cover the basics of pruning, cleaning, and formatting data through tasks like dataframe selection, filtering, outlier removal, coalescing blanks, and formatting data types. Afterwards, you will be prepared to handle more advanced areas in Pandas like data transformation, feature selection, and machine learning.
What you'll learn—and how you can apply it
By the end of this hands-on course, you’ll understand:
- What constitutes data cleaning and why it is necessary.
- Techniques on dealing with missing values and outliers.
- When data should be modified versus removed.
And you’ll be able to:
- Take raw inputs and sanitize them for more sophisticated tasks.
- Strategize how to handle outliers, missing values, and bad data.
- Cast grungy values into proper data types, including freeform text, dates, and times.
This training is for you because...
- You’re a spreadsheet user looking for a better way to clean data.
- You work with data science professionals seeking more usable data.
- You want to become a data professional who can transform raw data into usable formats.
Prerequisites
- Basic Python proficiency (variables, loops, collections, operators, etc.)
- Basic Pandas proficiency is recommended, but not required
Jupyter Notebooks / Setup
- Get set up with Anaconda Notebooks. Watch this video to learn how.
- Or, if you prefer to use a Jupyter Notebook on your machine, you can install Anaconda Distribution (free) and open JupyterLab. Instructions can be found within the Get Started with Anaconda course or in our online documentation.
- Alternatively, you can use any preferred Jupyter Notebook service.
Recommended preparation
- Introduction to Python Programming on-demand course
- Introduction to Pandas for Data Analysis on-demand course
Recommended follow-up
- Introduction to SQL on-demand course
- Introduction to Data Visualization on-demand course
- Introduction to Machine Learning on-demand course
About the Instructor
Thomas Nield is the founder of Nield Consulting Group and Yawman Flight, as well as an instructor at University of Southern California. He enjoys making technical content relatable and relevant to those unfamiliar or intimidated by it. Thomas regularly teaches classes on data analysis, machine learning, mathematical optimization, and practical artificial intelligence. At USC he teaches AI System Safety, developing systematic approaches for identifying AI-related hazards in aviation and ground vehicles. He's authored three books, including Essential Math for Data Science (O’Reilly) and Getting Started with SQL (O'Reilly)
He is also the founder and inventor of Yawman Flight, a company developing universal handheld flight controls for flight simulation and unmanned aerial vehicles. You can find him at:
Cost: $49. Anaconda Learning subscription is not required.
You may cancel your registration at any time before the course airs. Once the course is live, there are no refunds or credits. All registered users will receive a Zoom recording of the live course one day after the course airs.
Important info:
The tutorial will be conducted using Zoom Meetings. It is important that the name you used to register for the event is the same as the name you use when you login to Zoom. If this will not be the case, please email learning@anaconda.com to let us know.
All participants will have their microphones muted and cameras off upon entry to help minimize distractions during the live event. Support and Q&A will be conducted via the Chat function within Zoom.
Questions? Issues? Contact learning@anaconda.com.