CSV Files – Comma-Separated Values

What is CSV file?

A CSV is a comma-separated values file, which allows data to be saved in a tabular format. CSVs look like a garden-variety spreadsheet but with a . csv extension. CSV files can be used with most any spreadsheet program, such as Microsoft Excel or Google Spreadsheets.

Where Do CSV Files Come From?

CSV files are normally created by programs that handle large amounts of data. They are a convenient way to export data from spreadsheets and databases as well as import or use it in other programs. For example, you might export the results of a data mining program to a CSV file and then import that into a spreadsheet to analyze the data, generate graphs for a presentation, or prepare a report for publication.

How do you create a CSV file from Excel?

Convert an Excel spreadsheet into a comma separated value file to reduce the chance of import errors when uploading contacts

  1. In your Excel spreadsheet, click File.
  2. Click Save As.
  3. Click Browse to choose where you want to save your file.
  4. Select “CSV” from the “Save as type” drop-down menu.
  5. Click Save.

Parsing CSV Files With Python’s Built-in CSV Library

The csv library provides functionality to both read from and write to CSV files. Designed to work out of the box with Excel-generated CSV files, it is easily adapted to work with a variety of CSV formats. The csv library contains objects and other code to read, write, and process data from and to CSV files.

Read/Write CSV Files With csv

Reading from a CSV file is done using the reader object. The CSV file is opened as a text file with Python’s built-in open() function, which returns a file object. This is then passed to the reader, which does the heavy lifting.

CSV Regular Reader/Writer

import csv

with open('names.csv', 'r') as csv_file:
# create a reader to read CSV file content
csv_reader = csv.reader(csv_file)
#print(csv_reader)
#next(csv_reader)
# Open a new file to write new CSV file
with open('new_names.csv', 'w') as new_file:
# Create a CSV writer
csv_writer = csv.writer(new_file, delimiter='\t')
for line in csv_reader:
#print(line[2])
csv_writer.writerow(line)

Each row returned by the reader is a list of String elements containing the data found by removing the delimiters. The first row returned contains the column names, which is handled in a special way.

Read/Write CSV Files Into a Dictionary With csv

Rather than deal with a list of individual String elements, you can read CSV data directly into a dictionary (technically, an Ordered Dictionary) as well. In this way we can manipluate the field names more easily with DictReader/DictWriter.

import csv
with open('names.csv', 'r') as csv_file:
csv_reader = csv.DictReader(csv_file)
with open('new_names.csv', 'w') as new_file:
#fieldnames = ['first_name', 'last_name', 'email']
fieldnames = ['first_name', 'last_name']
# Create a CSV DictWriter
csv_writer = csv.DictWriter(new_file,
fieldnames=fieldnames, delimiter='\t')
# add header for the csv file
csv_writer.writeheader()
for line in csv_reader:
del line['email'] # remove the email field
csv_writer.writerow(line)

Where did the dictionary keys come from? The first line of the CSV file is assumed to contain the keys to use to build the dictionary. If you don’t have these in your CSV file, you should specify your own keys by setting the fieldnames optional parameter to a list containing them.

Optional Python CSV reader Parameters

The reader object can handle different styles of CSV files by specifying additional parameters, some of which are shown below:

  • delimiter specifies the character used to separate each field. The default is the comma (‘,’).
  • quotechar specifies the character used to surround fields that contain the delimiter character. The default is a double quote (‘ ” ‘).
  • escapechar specifies the character used to escape the delimiter character, in case quotes aren’t used. The default is no escape character.

In case a CSV file contains address field that contains comma in the data. At the same time we’re also using comma as the field delimiter. We have a problem then.

There are three different ways to handle this situation:

  • Use a different delimiter That way, the comma can safely be used in the data itself. You use the delimiter optional parameter to specify the new delimiter.
  • Wrap the data in quotes The special nature of your chosen delimiter is ignored in quoted strings. Therefore, you can specify the character used for quoting with the quotechar optional parameter. As long as that character also doesn’t appear in the data, you’re fine.
  • Escape the delimiter characters in the data Escape characters work just as they do in format strings, nullifying the interpretation of the character being escaped (in this case, the delimiter). If an escape character is used, it must be specified using the escapechar optional parameter.

Parsing CSV Files With the pandas Library

Of course, the Python CSV library isn’t the only game in town. Reading CSV files is possible in pandas as well. It is highly recommended if you have a lot of data to analyze.

pandas is an open-source Python library that provides high performance data analysis tools and easy to use data structures. pandas is available for all Python installations, but it is a key part of the Anaconda distribution and works extremely well in Jupyter notebooks to share data, code, analysis results, visualizations, and narrative text.

Reading CSV Files with read_csv()

You can use the read_csv() function to read a CSV file. We will try to read the same “names.csv” file as above. By default, the read_csv() method treats the values in the first row of a CSV file as column headers. However, you can pass your custom header names while reading a file via the read_csv() method.

import pandas as pd
col_names = ['First Name',
'Last Name',
'Email']
name_data = pd.read_csv(r'names.csv', names=col_names,
header=None)
print(name_data)

    First Name       Last Name                           Email
0   first_name       last_name                           email
1         John             Doe         john-doe@bogusemail.com
2         Mary  Smith-Robinson       maryjacobs@bogusemail.com
3         Dave           Smith        davesmith@bogusemail.com
4         Jane          Stuart       janestuart@bogusemail.com
5          Tom          Wright        tomwright@bogusemail.com
6        Steve        Robinson    steverobinson@bogusemail.com
7       Nicole          Jacobs     nicolejacobs@bogusemail.com
8         Jane          Wright       janewright@bogusemail.com
9         Jane             Doe          janedoe@bogusemail.com
10        Kurt          Wright       kurtwright@bogusemail.com
11        Kurt        Robinson     kurtrobinson@bogusemail.com
12        Jane         Jenkins      janejenkins@bogusemail.com
13        Neil        Robinson     neilrobinson@bogusemail.com
14         Tom       Patterson     tompatterson@bogusemail.com
15         Sam         Jenkins       samjenkins@bogusemail.com
16       Steve          Stuart      stevestuart@bogusemail.com
17      Maggie       Patterson  maggiepatterson@bogusemail.com
18      Maggie          Stuart     maggiestuart@bogusemail.com
19        Jane             Doe          janedoe@bogusemail.com
20       Steve       Patterson   stevepatterson@bogusemail.com
21        Dave           Smith        davesmith@bogusemail.com
22         Sam           Wilks         samwilks@bogusemail.com
23        Kurt       Jefferson    kurtjefferson@bogusemail.com
24         Sam          Stuart        samstuart@bogusemail.com
25        Jane          Stuart       janestuart@bogusemail.com
26        Dave           Davis        davedavis@bogusemail.com
27         Sam       Patterson     sampatterson@bogusemail.com
28         Tom       Jefferson     tomjefferson@bogusemail.com
29        Jane          Stuart       janestuart@bogusemail.com
30      Maggie       Jefferson  maggiejefferson@bogusemail.com
31        Mary           Wilks        marywilks@bogusemail.com
32        Neil       Patterson    neilpatterson@bogusemail.com
33       Corey           Davis       coreydavis@bogusemail.com
34       Steve          Jacobs      stevejacobs@bogusemail.com
35        Jane         Jenkins      janejenkins@bogusemail.com
36        John          Jacobs       johnjacobs@bogusemail.com
37        Neil           Smith        neilsmith@bogusemail.com
38       Corey           Wilks       coreywilks@bogusemail.com
39       Corey           Smith       coreysmith@bogusemail.com
40        Mary       Patterson    marypatterson@bogusemail.com
41        Jane          Stuart       janestuart@bogusemail.com
42      Travis          Arnold     travisarnold@bogusemail.com
43        John        Robinson     johnrobinson@bogusemail.com
44      Travis          Arnold     travisarnold@bogusemail.com

Writing CSV Files with to_csv()

The process of creating or writing a CSV file through Pandas can be a little more complicated than reading CSV, but it’s still relatively simple. We use the to_csv() function to perform this task. However, you have to create a Pandas DataFrame first, followed by writing that DataFrame to the CSV file.

Column names can also be specified via the keyword argument columns, as well as a different delimiter via the sep argument. Again, the default delimiter is a comma, ‘,’.

Here is a simple example showing how to export a DataFrame to a CSV file via to_csv():

import pandas as pd
city = pd.DataFrame([['Folsom', 'CA'],
['Seaside', 'OR'],
['Long Beach', 'WA']],
columns=['City', 'State'])
city.to_csv('city.csv')
print(city)

         City State
0      Folsom    CA
1     Seaside    OR
2  Long Beach    WA

In the above example, we have created a DataFrame named city. Subsequently, we have written that DataFrame to a file named “city.csv” using the to_csv() function. The resulting CSV file should have the above contents.