Course Hive
Search

Welcome

Sign in or create your account

Continue with Google
or
Importing flat files with Python
Play lesson

Python Tutorial: Learn Python For Data Science - Importing flat files with Python

4.0 (1)
15 learners

What you'll learn

This course includes

  • 1.5 hours of video
  • Certificate of completion
  • Access on mobile and TV

Summary

Keywords

Full Transcript

Learn how to import flat files with Python: https://www.datacamp.com/courses/importing-data-in-python-part-1 Now you know how to import plain text files, we're going to look at flat files, such as titanic.csv, in which each row is a unique passenger onboard and each column is a feature of attribute, such as gender, cabin and ‘suvived or not’. It is essential for any budding data scientist to know precisely what the term flat file means: flat files are basic text files containing records, that is, table data, without structured relationships. This is in contrast to a relational database, for example, in which columns of distinct tables can be related. We'll get to these later. To be even more precise, flat files consist of records, where by a record we mean a row of fields or attributes, each of which contains at most one item of information. In the flat file titanic.csv, each row or record is a unique passenger onboard and each column is a feature or attribute, such as name, gender and cabin. It is also essential to note that a flat file can have a header, such as in titanic.csv , which is a row that occurs as the first row and describes the contents of the data columns or states what the corresponding attribues or features in each column are. It will be important to know whether or not your file has a header as it may alter your data import. The reason that flat files are so important in data science is that we data scientists really honestly like to think in records or rows of attributes. Now you may have noticed that the file extension was .csv. You may be wondering what this is? Well, CSV is an acronym for comma separated value and it means exactly what it says: the values in each row are separated by commas. Another common extension for a flat file is .txt, which means a text file. Values in flat files can be separated by characters or sequences of characters other than commas, such as a tab, and the character or characters in question is called a delimiter. See here an example of a tab-delimited file: the data consists of the famous MNIST digit recognition images, where each row contains the pixel values of a given image. Note that all fields in the MNIST data are numeric, while the titanic.csv also contained strings. How do we import such files? If they consist entirely of numbers and we want to store them as a numpy array, we could use numpy. If, instead, we want to store the data in a dataframe, we could use pandas. Most of the time, you will use one of these options. In the rest of this Chapter, you'll learn how to Import flat files that contain only numerical data, such as the MNIST data; Import flat files that contain both numerical data and strings, such as titanic.csv. But first, lets get you to do a couple of quick multiple choice questions to test your knowledege of flat files.

Course Hive

Continue this lesson in the app

Install CourseHive on Android or iOS to keep learning while you move.

Related Courses

FAQs

Course Hive
Download CourseHive
Keep learning anywhere