What is CSV file?
A CSV is a comma-separated values file, which allows data to be saved in a tabular format. CSVs look like a garden-variety spreadsheet but with a . csv extension. CSV files can be used with most any spreadsheet program, such as Microsoft Excel or Google Spreadsheets.
Where Do CSV Files Come From?
CSV files are normally created by programs that handle large amounts of data. They are a convenient way to export data from spreadsheets and databases as well as import or use it in other programs. For example, you might export the results of a data mining program to a CSV file and then import that into a spreadsheet to analyze the data, generate graphs for a presentation, or prepare a report for publication.
How do you create a CSV file from Excel?
Convert an Excel spreadsheet into a comma separated value file to reduce the chance of import errors when uploading contacts
- In your Excel spreadsheet, click File.
- Click Save As.
- Click Browse to choose where you want to save your file.
- Select “CSV” from the “Save as type” drop-down menu.
- Click Save.
Parsing CSV Files With Python’s Built-in CSV Library
The csv library provides functionality to both read from and write to CSV files. Designed to work out of the box with Excel-generated CSV files, it is easily adapted to work with a variety of CSV formats. The csv library contains objects and other code to read, write, and process data from and to CSV files.
Read/Write CSV Files With csv
Reading from a CSV file is done using the reader object. The CSV file is opened as a text file with Python’s built-in open() function, which returns a file object. This is then passed to the reader, which does the heavy lifting.
CSV Regular Reader/Writer
import csv
with open('names.csv', 'r') as csv_file:
# create a reader to read CSV file content
csv_reader = csv.reader(csv_file)
#print(csv_reader)
#next(csv_reader)
# Open a new file to write new CSV file
with open('new_names.csv', 'w') as new_file:
# Create a CSV writer
csv_writer = csv.writer(new_file, delimiter='\t')
for line in csv_reader:
#print(line[2])
csv_writer.writerow(line)
Each row returned by the reader is a list of String elements containing the data found by removing the delimiters. The first row returned contains the column names, which is handled in a special way.
Read/Write CSV Files Into a Dictionary With csv
Rather than deal with a list of individual String elements, you can read CSV data directly into a dictionary (technically, an Ordered Dictionary) as well. In this way we can manipluate the field names more easily with DictReader/DictWriter.
import csv
with open('names.csv', 'r') as csv_file:
csv_reader = csv.DictReader(csv_file)
with open('new_names.csv', 'w') as new_file:
#fieldnames = ['first_name', 'last_name', 'email']
fieldnames = ['first_name', 'last_name']
# Create a CSV DictWriter
csv_writer = csv.DictWriter(new_file,
fieldnames=fieldnames, delimiter='\t')
# add header for the csv file
csv_writer.writeheader()
for line in csv_reader:
del line['email'] # remove the email field
csv_writer.writerow(line)
Where did the dictionary keys come from? The first line of the CSV file is assumed to contain the keys to use to build the dictionary. If you don’t have these in your CSV file, you should specify your own keys by setting the fieldnames optional parameter to a list containing them.
Optional Python CSV reader Parameters
The reader object can handle different styles of CSV files by specifying additional parameters, some of which are shown below:
- delimiter specifies the character used to separate each field. The default is the comma (‘,’).
- quotechar specifies the character used to surround fields that contain the delimiter character. The default is a double quote (‘ ” ‘).
- escapechar specifies the character used to escape the delimiter character, in case quotes aren’t used. The default is no escape character.
In case a CSV file contains address field that contains comma in the data. At the same time we’re also using comma as the field delimiter. We have a problem then.
There are three different ways to handle this situation:
- Use a different delimiter That way, the comma can safely be used in the data itself. You use the delimiter optional parameter to specify the new delimiter.
- Wrap the data in quotes The special nature of your chosen delimiter is ignored in quoted strings. Therefore, you can specify the character used for quoting with the quotechar optional parameter. As long as that character also doesn’t appear in the data, you’re fine.
- Escape the delimiter characters in the data Escape characters work just as they do in format strings, nullifying the interpretation of the character being escaped (in this case, the delimiter). If an escape character is used, it must be specified using the escapechar optional parameter.
Parsing CSV Files With the pandas Library
Of course, the Python CSV library isn’t the only game in town. Reading CSV files is possible in pandas as well. It is highly recommended if you have a lot of data to analyze.
pandas is an open-source Python library that provides high performance data analysis tools and easy to use data structures. pandas is available for all Python installations, but it is a key part of the Anaconda distribution and works extremely well in Jupyter notebooks to share data, code, analysis results, visualizations, and narrative text.
Reading CSV Files with read_csv()
You can use the read_csv() function to read a CSV file. We will try to read the same “names.csv” file as above. By default, the read_csv() method treats the values in the first row of a CSV file as column headers. However, you can pass your custom header names while reading a file via the read_csv() method.
import pandas as pd
col_names = ['First Name',
'Last Name',
'Email']
name_data = pd.read_csv(r'names.csv', names=col_names,
header=None)
print(name_data)
First Name Last Name Email 0 first_name last_name email 1 John Doe john-doe@bogusemail.com 2 Mary Smith-Robinson maryjacobs@bogusemail.com 3 Dave Smith davesmith@bogusemail.com 4 Jane Stuart janestuart@bogusemail.com 5 Tom Wright tomwright@bogusemail.com 6 Steve Robinson steverobinson@bogusemail.com 7 Nicole Jacobs nicolejacobs@bogusemail.com 8 Jane Wright janewright@bogusemail.com 9 Jane Doe janedoe@bogusemail.com 10 Kurt Wright kurtwright@bogusemail.com 11 Kurt Robinson kurtrobinson@bogusemail.com 12 Jane Jenkins janejenkins@bogusemail.com 13 Neil Robinson neilrobinson@bogusemail.com 14 Tom Patterson tompatterson@bogusemail.com 15 Sam Jenkins samjenkins@bogusemail.com 16 Steve Stuart stevestuart@bogusemail.com 17 Maggie Patterson maggiepatterson@bogusemail.com 18 Maggie Stuart maggiestuart@bogusemail.com 19 Jane Doe janedoe@bogusemail.com 20 Steve Patterson stevepatterson@bogusemail.com 21 Dave Smith davesmith@bogusemail.com 22 Sam Wilks samwilks@bogusemail.com 23 Kurt Jefferson kurtjefferson@bogusemail.com 24 Sam Stuart samstuart@bogusemail.com 25 Jane Stuart janestuart@bogusemail.com 26 Dave Davis davedavis@bogusemail.com 27 Sam Patterson sampatterson@bogusemail.com 28 Tom Jefferson tomjefferson@bogusemail.com 29 Jane Stuart janestuart@bogusemail.com 30 Maggie Jefferson maggiejefferson@bogusemail.com 31 Mary Wilks marywilks@bogusemail.com 32 Neil Patterson neilpatterson@bogusemail.com 33 Corey Davis coreydavis@bogusemail.com 34 Steve Jacobs stevejacobs@bogusemail.com 35 Jane Jenkins janejenkins@bogusemail.com 36 John Jacobs johnjacobs@bogusemail.com 37 Neil Smith neilsmith@bogusemail.com 38 Corey Wilks coreywilks@bogusemail.com 39 Corey Smith coreysmith@bogusemail.com 40 Mary Patterson marypatterson@bogusemail.com 41 Jane Stuart janestuart@bogusemail.com 42 Travis Arnold travisarnold@bogusemail.com 43 John Robinson johnrobinson@bogusemail.com 44 Travis Arnold travisarnold@bogusemail.com
Writing CSV Files with to_csv()
The process of creating or writing a CSV file through Pandas can be a little more complicated than reading CSV, but it’s still relatively simple. We use the to_csv() function to perform this task. However, you have to create a Pandas DataFrame first, followed by writing that DataFrame to the CSV file.
Column names can also be specified via the keyword argument columns, as well as a different delimiter via the sep argument. Again, the default delimiter is a comma, ‘,’.
Here is a simple example showing how to export a DataFrame to a CSV file via to_csv():
import pandas as pd
city = pd.DataFrame([['Folsom', 'CA'],
['Seaside', 'OR'],
['Long Beach', 'WA']],
columns=['City', 'State'])
city.to_csv('city.csv')
print(city)
City State 0 Folsom CA 1 Seaside OR 2 Long Beach WA
In the above example, we have created a DataFrame named city. Subsequently, we have written that DataFrame to a file named “city.csv” using the to_csv() function. The resulting CSV file should have the above contents.