I’m learning the Pandas library for Python. Here are some note.
Series
- General form:
pandas.Series(data=None, index=None)
- Example:
pandas.Series(['big', 'small', 'red', 42])
- Example:
pandas.Series(data=['big', 'small', 'red', 42], index=['desired_wealth', 'actual_wealth', 'status', 'answer'])
- Getting data out of a Series:
x = series['status']
- Getting multiple data out:
y = series[['status','answer']]
- Get a sub-series, where the value is transformed (in this case: multiplied by 3). Returns a series:
z = seriesName * 3
- Get sub-series from a series where value meets a condition (in this case: != 3). Returns a series:
zz = seriesName[seriesName != 3]
DataFrame
- General form:
pandas.DataFrame(data=None)
- Example:
pandas.DataFrame({ 'year': [2010, 2011, 2012, 2011, 2012, 2010, 2011, 2012], 'team': ['Bears', 'Bears', 'Bears', 'Packers', 'Packers', 'Lions', 'Lions', 'Lions'], 'wins': [11, 8, 10, 15, 11, 6, 10, 4], 'losses': [5, 8, 6, 1, 5, 10, 6, 12]})
- Functions/selectors you can call on a DataFrame named d:
- d.dtypes - what is the data type of each column
- d.describe() - show some aggregated metrics about the values in the data frame
- d.head() - show the first few rows
- d.tail() - show the last few rows
- d.head(2) - show the firs 2 rows
- Load a CSV file:
- d =
pandas.read_csv('./file1.csv')
- d =
- Create a data frame from a literal:
my_cars = ['Prius', 'Corolla', 'Matchbox'] horsepower = [120, 130, 0] colors = ['gold', 'silver', 'rainbow'] t = DataFrame({ 'my_cars':Series(my_cars), 'horsepower':Series(horsepower), 'color':Series(colors) })
- Create a data frame from a literal, with an index:
d = DataFrame({ 'one':Series([1, 2, 3], index=['a','b','c']), 'two':Series([1, 2, 3], index=['a','b','c']) })
- Get a column from a data frame:
- Treat it like a dictionary:
x = d['columnName']
- Or treat it like a field:
x = d.columnName
- Treat it like a dictionary:
- Get multiple columns from a data frame:
x = d[['colName1', 'colName2']]
- Get a row from a data frame:
d.loc[index]
d.loc[3]
d.loc['namedIndex']
- Select rows from a data frame that meet a criteria:
d[d['horsepower'] >= 120]
- Get rows 3-5 (Note: Last number is strictly less than)
d[3:6]
- Select and project (one column)
- d[‘color’][d[‘horsepower’] >= 120]
- Select and project (multiple columns)
- d[[‘color’,’horsepower’]][d[‘horsepower’] >= 120]
- The boolean test return all the indices meeting that criteria. Then those indices are used to select the rows from the color+horsepower intermediate data frame.
- d[[‘color’,’horsepower’]][d[‘horsepower’] >= 120]
- Apply a function to every column:
d.apply(numpy.median)
- Apply a function to each value one column:
d['colName'].map(lambda x: x+1)
- Apply a function to every value in every row and column
d['colName'].applymap(lambda x: x+1)
- Apply an aggregate function to a column:
numpy.mean(d['horsepower'])
Matrix Multiply Example From Udacity Course
import numpy from pandas import DataFrame, Series def numpy_dot(): ''' 4 points for gold, 2 points per silver, one point per bronze. Using the numpy.dot function, return a new dataframe with a) a column called 'country_name' with the country name b) a column called 'points' with the total number of points the country earned. ''' countries = ['Russian Fed.', 'Norway', 'Canada', 'United States', 'Netherlands', 'Germany', 'Switzerland', 'Belarus', 'Austria', 'France', 'Poland', 'China', 'Korea', 'Sweden', 'Czech Republic', 'Slovenia', 'Japan', 'Finland', 'Great Britain', 'Ukraine', 'Slovakia', 'Italy', 'Latvia', 'Australia', 'Croatia', 'Kazakhstan'] gold = [13, 11, 10, 9, 8, 8, 6, 5, 4, 4, 4, 3, 3, 2, 2, 2, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0] silver = [11, 5, 10, 7, 7, 6, 3, 0, 8, 4, 1, 4, 3, 7, 4, 2, 4, 3, 1, 0, 0, 2, 2, 2, 1, 0] bronze = [9, 10, 5, 12, 9, 5, 2, 1, 5, 7, 1, 2, 2, 6, 2, 4, 3, 1, 2, 1, 0, 6, 2, 1, 0, 1] d = DataFrame({ 'country_name':Series(countries), 'gold':Series(gold), 'silver':Series(silver), 'bronze':Series(bronze) })[['gold','silver','bronze']] v = numpy.dot(d,[4,2,1]) return DataFrame({ 'country_name':Series(countries), 'points':Series(v) })
</code>