Pandas is a very useful Python library for machine learning. It is also suggested in Google’s ML course.

A quick introduction to Pandas can be found here.

A complete tutorial of Pandas from Data School can be found here.

The basic elements in Pandas is DataFrame and Series.

Basic Operations

import pandas as pd
# Read in .csv file
california_housing_dataframe = pd.read_csv(
    "", sep=",")
# Read in .xlsx file
rssi_dataframe = pd.read_excel(file_path)
# Only show the head of the table
# A quick description of the data
# Histogram of certain feature
# Value counts
# Remove rows
df[ != 'Tina']
# Reset index
df = df.reset_index(drop=True)
# Create a new dataframe
df = pd.DataFrame(data = some_list, columns = ['col_name'])
df = pd.DataFrame(data = some_dict)


# Return element by index (absolute position)
some_value = california_housing_dataframe.iloc[i]['housing_median_age']
# Returen element by index (origin index)
some_value = california_housing_dataframe.loc[i]['housing_median_age']
# Check diff of consecutive rows (df.diff is not very convenient)
for index, row in df.iterrows():
  if index==index_list[-2]:
    ## reach end and skip
  this_time = row['time']
  next_time = df.loc[[index+1]]['time']
  time_diff = (next_time-this_time).astype('timedelta64[s]').iloc[0]


# Create datetime object
place_time = pd.to_datetime('2013-12-12 10:00:00')
# Time diff
t1 = pd.to_datetime('1/1/2015 01:00')
t2 = pd.to_datetime('1/1/2015 03:30')
print pd.Timedelta(t2 - t1).seconds / 3600.0
# Get day of datetime
# Add new column based on two columns (use of lambda)
df['delta'] = df.apply(lambda row: pd.Timedelta(row['stop'] -row['start']) /
                       np.timedelta64(1, 's') , axis=1)
## Get seconds of the day from datetime
df['seconds_of_day'] = (df['datetime'] -