Course Hive
Search

Welcome

Sign in or create your account

Continue with Google
or
Python Pandas: Importing flat files
Play lesson

Python Tutorial: Learn Python For Data Science - Python Pandas: Importing flat files

4.0 (1)
15 learners

What you'll learn

This course includes

  • 1.5 hours of video
  • Certificate of completion
  • Access on mobile and TV

Summary

Keywords

Full Transcript

Learn how to import flat files using pandas: https://www.datacamp.com/courses/importing-data-in-python-part-1 Congrats! You're now able to import a bunch of different types of flat files into Python as NumPy arrays. Although arrays are incredibly powerful and serve a number of essential purposes, they cannot fulfil one of the most basic needs of a Data Scientist: to have ‘[two]-dimensional labeled data structure[s] with columns of potentially different types’ that you can easily perform a plethora of Data Sciencey type things on: manipulate, slice, reshaped, groupby, join, merge, perform statistics in a missing-value-friendly manner, deal with times series. The need for such a data structure, among other issues, prompted Wes McKinney to develop the pandas library for Python. Nothing speaks to the project of pandas more than the documentation itself: ‘Python has long been great for data munging and preparation, but less so for data analysis and modeling. pandas helps fill this gap, enabling you to carry out your entire data analysis workflow in Python without having to switch to a more domain specific language like R.’ The data structure most relevant to the data manipulation and analysis workflow that pandas offers is the dataframe and it is the Pythonic analogue of R’s dataframe. As Hadley Wickham tweeted, "A matrix has rows and columns. A data frame has observations and variables." Manipulating dataframes in pandas can be useful in all steps of the data scientific method, from exploratory data analysis to data wrangling, preprocessing, building models and visualization. Here we will see its great utility in importing flat files, even merely in the way that it deals with missing data, comments along with the many other issues that plague working data scientists. For all of these reasons, it is now standard and best practice in Data Science to use pandas to import flat files as dataframes. Later in this course, we’ll see how many other types of data, whether they’re stored in relational databases, hdf5, MATLAB or excel files, can easily be imported as dataframes. To use pandas, you first need to import it. Then, if we wish to import a CSV in the most basic case all we need to do is to call the function read_csv() and supply it with a single argument, the name of the file. Having assigned the dataframe to the variable data, we can check the first 5 rows of the dataframe, including the header, with the command data.head(). We can also easily convert to the dataframe to a numpy array by calling the dataframe attribute values. Now it's your turn to play around importing flat files using Python. You'll get experience importing a flat file that is straighforward and you'll also get experience importing a flat file that has a few issues, such as containing comments & strings that should be interpreted as missing values: have fun importing!

Course Hive

Continue this lesson in the app

Install CourseHive on Android or iOS to keep learning while you move.

Related Courses

FAQs

Course Hive
Download CourseHive
Keep learning anywhere