Summary
Keywords
Full Transcript
Learn how to import data in Python: https://www.datacamp.com/courses/importing-data-in-python-part-1 Welcome to Importing Data in Python! My name is Hugo Bowne-Anderson and I am a Data Scientist at DataCamp. As a Data Scientist, on a daily basis you will need to clean data, wrangle and munge it, visualize it, build predictive models and interpret these models. But before doing any of this, you will need to know how to get data into Python. There are many common sources of data that you'll need to import into Python: (i) flat files such as .txts and .csvs; (ii) files native to other software such as Excel spreadsheets, Stata, SAS and MATLAB files; (iii) relational databases such as SQLite & PostgreSQL; (iv) data from the world wide web and (v) a special and essential case of this: pulling data from Application Programming Interfaces, also known as APIs, such as the Twitter streaming API, which allows us to stream real-time tweets. We’ll cover all of these topics in this course. First off, however, we're going to learn how to import basic text files, which we can broadly classify into 2 types of files: Those containing plain text, such as the opening of Mark Twain's novel The Adventures of Huckleberry Finn, which you can see here; Those containing records, that is, table data, such as titanic.csv, in which each row is a unique passenger onboard and each column is a characteristic or feature, such as gender, cabin and ‘survived or not’. The latter is known as a flat file and we'll come back to these in a minute. In this section, we'll figure out how to read lines from a plain text file: So let's do it! To check out any plain text file, you can use Python’s basic open() function to open a connection to the file. To do so, you Assign the filename to a variable as a string; Pass the filename to the function open() and also pass it the argument mode = 'r', which makes sure that we can only read it (we wouldn't want to accidentally write to it!); Assign the text from the file to a variable text by applying the method read() to the connection to the file; After you do this, make sure that you close the connection to the file using the command file.close(). It’s always best practice to clean while cooking! You can then print the file to console and check it out using the command print(text). A brief side note: if you wanted to open a file in order to write to it, you would pass it the argument mode = 'w'. We won't use that in this course as this is course on Importing Data but it is good to know. You can avoid having to close the connection to the file by using a with statement: This allows you to create a context in which you can execute commands with the file open. Once out of this clause/context, the file is no longer open and, for this reason, with is called a Context Manager. What you're doing here is called 'binding' a variable in the context manager construct; while still within this construct, the variable file will be bound to open(filename, 'r'). It is best practice to use the with statement as you never have to concern yourself with closing the files again. In the following interactive coding sessions, you’ll figure out how to print files to console; you’ll also learn to print specific lines, which can be very useful for large files. Then we’ll be back to discuss flat files and then I'll show you how to use the Python package NumPy to make our job of importing flat files & numerical data a far easier beast to tame: enjoy!
