Jupyter Notebook is a helpful open-source web app used for all types of data transformation, modeling, and visualization.
Below is a quick refresher on installing Jupyter Notebook and using Python commands to handle data sets within the app.
Let's jump right in!
Create and activate a virtual environment
macOS Terminal
User-Macbook:~ user$ cd desktop
User-Macbook:desktop user$ python3 -m venv jup
User-Macbook:desktop user$ cd jup
User-Macbook:jup user$ source bin/activate
Windows Command Prompt
C:\Users\Owner> cd desktop
C:\Users\Owner\desktop> py -m venv jup
C:\Users\Owner\desktop> cd jup
C:\Users\Owner\desktop\jup> Scripts\activate
First, type python3 -m venv env
for Mac or py -m venv env
for Windows to create a virtual environment named jup
.
Then cd
into the new directory and activate the virtual environment so you can properly install the JupyterLab desktop app in the next step.
Install JupyterLab desktop app
How to install JupyterLab on macOS Terminal
(jup)User-Macbook:jup user$ pip install jupyterlab
How to install JupyterLab on Windows Command Prompt
(jup)C:\Users\Owner\desktop\jup> pip install jupyterlab
Install JupyterLab, the interactive development environment for notebooks, with the command pip install juypterlab
. This may take a couple of minutes.
Open JupyterLab in the browser
Open JupyterLab on macOS Terminal
(jup)User-Macbook:jup user$ jupyter notebook
Open JupyterLab on Windows Command Prompt
(jup)C:\Users\Owner\desktop\jup> jupyter notebook
Once the installation is complete, open the JupyterLab desktop app in your browser with the simple command juypter notebook
.
You do not cd into Jupyter Notebook.

Your virtual environment folders will then appear in the browser window.
How to create a new Jupyter notebook
Click on the "New" button on the right side of the screen then select, "Python3".
An untitled Jupyter notebook will then be created in a new tab.

Install numpy pandas nltk in the Jupyter notebook
import sys
!{sys.executable} -m pip install numpy pandas nltk
Type in the command pip install numpy pandas nltk
in the first cell. Click Shift + Enter to run the cell's code. An asterisk will then appear in the brackets indicating it is running the code.
When finished, a new cell will appear below. You are now ready to use Python commands to load-in and clean your data.
How to import pandas and load in your file
Import pandas into Jupyter notebook then load in a JSON file
import pandas as pd
df = pd.read_json(r'C:\Users\Owner\Desktop\data.json')
Import pandas into Jupyter notebook then load in a CSV file
import pandas as pd
df = pd.read_csv(r'C:\Users\Owner\Desktop\data.csv')
Import pandas into Jupyter notebook then load in an excel file
import pandas as pd
df = pd.read_excel(r'C:\Users\Owner\Desktop\data.xls')
In a new cell import pandas as pd
. Then set a variable, in this case df
for dataframe, as pd.read_doctype(r'full_path_to_file')
.
Run the cell.
How to display the dataframe in Jupyter notebook
display the dataframe in Jupyter notebook
df
Run df
to display the dataframe in Jupyter notebook.
Working with columns
display all columns in Jupyter notebook
df.columns
Lists the names of all of the columns in the data frame.
display specific columns in Jupyter notebook
df['column_name']
df['column_name1', 'column_name2', 'column_name3']
Display one specific column or multiple columns by calling on their names.
sort columns alphabetically in Jupyter notebook
df.sort_values('column_name')
df.sort_values('column_name', ascending=False)
Display the data frame alphabetically by the column specified. Add ascending=False
to sort by descending alphabetical order.
drop/delete columns in Jupyter notebook
df = df.drop(columns = ['column_name']))
Drops the column specified from the data set.
create a new column from existing columns in Jupyter notebook
df['new_column'] = df['column_name1'] + df['column_name2']
Creates a new column equal to the sum of column 1 plus column 2 data.
Working with rows
display first rows in Jupyter notebook
df.head()
df.head(10)
Displays the first 5 rows of the data frame. Add a number in the parentheses and that number of rows will display.
display last rows in Jupyter notebook
df.tail()
df.tail(10)
Displays the last 5 rows of the data frame. Add a number in the parentheses and that number of rows will display.
display a range of rows in Jupyter notebook
df.iloc[0:5]
Displays row 0-4 of the data frame. Note, the data frame starts counting rows/columns at 0 instead of 1.
Working with the columns and rows
locate a specific value by row and column in Jupyter notebook
df.iloc[2,1]
Displays a specific value in a designated location. The example above locates the value in row 2, column 1.
locate specific rows by column in Jupyter notebook
df.loc[df['column_name'] == 'column_value']
Displays all of the rows that meet the specified column value.
locate specific rows that adhere to all of the column values in Jupyter notebook
df.loc[(df['column_1'] == 'column_value') & df.loc(df['column_2'] == 'column_value')]
Displays all rows with the specific column value of column 1 and the specific column value of column 2.
locate specific rows that adhere to one of the column values in Jupyter notebook
df.loc[(df['column_1'] == 'column_value') | df.loc(df['column_2'] == 'column_value')]
Displays all rows with the specific column value of column 1 or the specific column value of column 2.
locate specific rows in a column that contain a certain word in Jupyter notebook
df.loc[df['column_1'].str.contains('word')]
Lists all of the rows in column 1 that contain the word specified.
drop specific rows in a column that contain a certain word in Jupyter notebook
df.loc[~df['column_1'].str.contains('word')]
Drop all of the rows in column 1 that contain the word specified.
Changing data types
view data types in Jupyter notebook
df.dtypes
Outputs the data type (i.e. object, int32, datetime64...) of each column.
change the data type to integer in Jupyter notebook
df.column_name.astype(int)
Changes the data type of the column specified from a string to an integer.
change data to datetime field
pd.to_datatime(df.column_name)
Changes the data type of the column specified from a string to a date/time integer.
Working with the data set as a whole
reset the index in Jupyter notebook
df.reset_index(drop=True, inplace=True)
Reset the index of the data frame if you delete or change the ordering of rows.
list duplicates in Jupyter notebook
df.drop_duplicates()
Identifies duplicates in the data set.
drop duplicates in Jupyter notebook
df.drop_duplicates()
Drops any duplicate rows from the data set.
combine dataframes in Jupyter notebook
data = pd.concat([df1, df2, df3])
Combines all of the data from each data frame into a new data set.
Saving updated data frames as files
create and save JSON file in Jupyter notebook
df.to_json('name_of_new_file.json')
Save the existing data frame as a new CSV file.
create and save CSV file in Jupyter notebook
df.to_csv('name_of_new_file.csv')
Save the existing data frame as a new CSV file.
create and save excel file w/o the index in Jupyter notebook
df.to_excel('name_of_new_file.xlsx', index=False)
Save the existing data frame as a new excel file without the index numbers. Note, index=False
can be added to the other.