Data analytics with Python intermediate: importing and cleaning data
Time: 18:00 - 21:00
This course will be delivered online. See the ‘What is the course about?’ section in course details for more information.
Course Code: CADS12
Duration: 6 sessions (over 3 weeks)
What is the course about?
Data analysis requires clean, orderly data to work on. Sadly most useful data comes from the real world, which is neither clean nor orderly. A great deal of an analyst’s time is spent bridging the gap between the two.
The process begins with getting the data into a form Python can work with. We look at various file formats (CSV, Excel, ARFF, Pickle, fixed-width, partially-structured, XML, HTML, JSON) as well as providing a crash course in databases and web APIs for the non-specialist. This exposes us to a variety of Python libraries. We also deal here with the problem of extracting data from a source that was never intended to be used that way, such as a web page or log file.
But this is only the first step; it’s unlikely, at this point, that your data will be of much use to you. The second half of the course offers a menagerie of issues you may encounter and techniques for resolving them. We discuss detecting and fixing errors in your data, often using statistical methods; encoding data in a convenient way; and finally “sculpting” your data into a form that reflects how you will use it. These often raise subtle methodological questions and there is not always a simple right way to proceed.
This is a live online course. You will need:
- Internet connection. The classes work best with Chrome.
- A computer with microphone and camera.
We will contact you with joining instructions before your course starts.
What will we cover?
• Extracting data from a wide variety of file types
• Querying a relational database
• Advanced Pandas and Numpy techniques
• Detecting and handling feature-level problems such as data corruption, human error and misunderstandings, outliers and inliers, and missing or redundant data.
• Detecting and handling observation-level problems such as inconsistency, duplication and sample bias.
• Using regression, interpolation and filters to transform messy or incomplete data
• Working with different levels of numerical precision and identifying problems related to numerical datatypes
• Use of Pandas to change dimensionality and granularity of your data
• Create and manage hierarchical indices in Pandas.
What will I achieve?
By the end of this course you should be able to...
• Import data into Python from a wide variety of common file formats
• Extract data from a database
• Use some advanced features from Pandas
• Describe and diagnose a wide variety of error types that real-life data can contain, and fix them
• Where multiple approaches to fixing a problem exist, describe the pros and cons of each and the implications for future analysis.
What level is the course and do I need any particular skills?
This is an advanced course. You should already be confident with basic Python programming at the level of our Introduction to Python course, and you should have used the basic features of Numpy and Pandas – Introduction to Data Analytics with Python is ideal preparation for this.
You do not need any prior knowledge of statistics or analytical methods.
How will I be taught, and will there be any work outside the class?
There will be some theoretical underpinning to the course, but it is nearly all practical, through demonstrations and practical programming and problem solving activities.
Are there any other costs? Is there anything I need to bring?
There are no additional costs. A pen and paper to take notes.
When I've finished, what course can I do next?
You might want to explore the Excel courses in data analysis such as: Data analysis with Power BI, Excel analysing data (stage 1 & 2) Introduction to DAX: data analysis expression for Power BI or you might find beneficial to attend one of our maths courses in Probability and statistics for Data Analysis.
Rich is a programmer, writer and educator with a particular interest in creative practice. In his previous career he worked as a software developer in the CIty, first at a dot-com startup and later at a top-tier investment bank where he worked mostly on trading floor systems and got to play with a wide range of languages and technologies. He now teaches coding and maths-related courses full time. Besides his work at City Lit he also teaches at Central Saint Martins, the Architecture Association and the Photographer's Gallery and is the author of two books about mathematics. His technical collaborations with artists have been shown at, among others, the Hayward gallery, the V&A, the ICA and Camden Arts Centre. He has a BSc in Mathematics from the Open University. He also has a BA in English Literature and a PhD in philosophy (both from Cardiff). He continues to teach a little philosophy and literature, especially as they intersect with his other interests, and as a partner in Minimum Labyrinth he has brought these ideas to wider audiences in collaboration with the Museum of London, the Barbican and various private sponsors.
Please note: We reserve the right to change our tutors from those advertised. This happens rarely, but if it does, we are unable to refund fees due to this. Our tutors may have different teaching styles; however we guarantee a consistent quality of teaching in all our courses.