Data Science - Python DataFrame


Create a DataFrame with Pandas

A data frame is a structured representation of data.

Let's define a data frame with 3 columns and 5 rows with fictional numbers:

Example

import pandas as pd

d = {'col1': [1, 2, 3, 4, 7], 'col2': [4, 5, 6, 9, 5], 'col3': [7, 8, 12, 1, 11]}

df = pd.DataFrame(data=d)

print(df)
Try it Yourself »

Example Explained

  • Import the Pandas library as pd
  • Define data with column and rows in a variable named d
  • Create a data frame using the function pd.DataFrame()
  • The data frame contains 3 columns and 5 rows
  • Print the data frame output with the print() function

We write pd. in front of DataFrame() to let Python know that we want to activate the DataFrame() function from the Pandas library.

Be aware of the capital D and F in DataFrame!


Interpreting the Output

This is the output:

Dataframe Output

We see that "col1", "col2" and "col3" are the names of the columns.

Do not be confused about the vertical numbers ranging from 0-4. They tell us the information about the position of the rows.

In Python, the numbering of rows starts with zero.

Now, we can use Python to count the columns and rows.

We can use df.shape[1] to find the number of columns:

Example

Count the number of columns:

count_column = df.shape[1]
print(count_column)
Try it Yourself »

We can use df.shape[0] to find the number of rows:

Example

Count the number of rows:

count_row = df.shape[0]
print(count_row)
Try it Yourself »

Why Can We Not Just Count the Rows and Columns Ourselves?

If we work with larger data sets with many columns and rows, it will be confusing to count it by yourself. You risk to count it wrongly. If we use the built-in functions in Python correctly, we assure that the count is correct.