{
"cells": [
{
"cell_type": "markdown",
"id": "47fbeb51-864b-4e14-a0b2-fed06b014c2e",
"metadata": {},
"source": [
"# Pandas Exercises"
]
},
{
"cell_type": "markdown",
"id": "1d26ef1e-3c49-4a46-a4c5-b68825d5a953",
"metadata": {},
"source": [
"## Creating DataFrames and Using Sample Data Sets"
]
},
{
"cell_type": "markdown",
"id": "5252953e-79d1-43f3-9bf9-e4ef72e0987c",
"metadata": {},
"source": [
"This is the Jupyter Notebook **solution set* for the article, [Pandas Practice Questions – Fifty-Two Examples to Make You an Expert](https://codesolid.com/pandas-practice-questions-twenty-one-examples-to-make-you-an-expert/)."
]
},
{
"cell_type": "code",
"execution_count": 1,
"id": "df91e92f-5f2b-4e98-81f7-84e7478ed432",
"metadata": {},
"outputs": [],
"source": [
"import pandas as pd\n",
"import numpy as np\n",
"import seaborn as sb"
]
},
{
"cell_type": "markdown",
"id": "6950cc47-3cc0-4565-8df1-d01741b7ea5f",
"metadata": {},
"source": [
"**1.** Using NumPy, create a Pandas DataFrame with five rows and three columms:"
]
},
{
"cell_type": "code",
"execution_count": 2,
"id": "dd1a5014-d17e-4fd0-b7a2-db261a53a02a",
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"
\n",
"\n",
"
\n",
" \n",
"
\n",
"
\n",
"
0
\n",
"
1
\n",
"
2
\n",
"
\n",
" \n",
" \n",
"
\n",
"
0
\n",
"
0
\n",
"
1
\n",
"
2
\n",
"
\n",
"
\n",
"
1
\n",
"
3
\n",
"
4
\n",
"
5
\n",
"
\n",
"
\n",
"
2
\n",
"
6
\n",
"
7
\n",
"
8
\n",
"
\n",
"
\n",
"
3
\n",
"
9
\n",
"
10
\n",
"
11
\n",
"
\n",
"
\n",
"
4
\n",
"
12
\n",
"
13
\n",
"
14
\n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" 0 1 2\n",
"0 0 1 2\n",
"1 3 4 5\n",
"2 6 7 8\n",
"3 9 10 11\n",
"4 12 13 14"
]
},
"execution_count": 2,
"metadata": {},
"output_type": "execute_result"
}
],
"source": []
},
{
"cell_type": "markdown",
"id": "9c74f94d-4adf-403d-a2c4-9efd200ba5b2",
"metadata": {},
"source": [
"**2.** For a Pandas DataFrame created from a NumPy array, what is the default behavior for the labels for the columns? For the rows?"
]
},
{
"cell_type": "markdown",
"id": "e7508b50-d724-47a7-8272-9ba040cd9c65",
"metadata": {},
"source": [
"Both the \"columns\" value and the \"index\" value (for the rows) are set to zero based numeric arrays."
]
},
{
"cell_type": "markdown",
"id": "1a20442a-ea2c-4d32-877b-bfe5ae349318",
"metadata": {},
"source": [
"**3.** Create a second DataFrame as above with five rows and three columns, setting the row labels to the names of any five major US cities and the column labels to the first three months of the year."
]
},
{
"cell_type": "code",
"execution_count": 3,
"id": "02686eae-cba2-4180-94ff-4dc85eca0731",
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"
\n",
"\n",
"
\n",
" \n",
"
\n",
"
\n",
"
January
\n",
"
February
\n",
"
March
\n",
"
\n",
" \n",
" \n",
"
\n",
"
NewYork
\n",
"
0
\n",
"
1
\n",
"
2
\n",
"
\n",
"
\n",
"
LosAngeles
\n",
"
3
\n",
"
4
\n",
"
5
\n",
"
\n",
"
\n",
"
Atlanta
\n",
"
6
\n",
"
7
\n",
"
8
\n",
"
\n",
"
\n",
"
Boston
\n",
"
9
\n",
"
10
\n",
"
11
\n",
"
\n",
"
\n",
"
SanFrancisco
\n",
"
12
\n",
"
13
\n",
"
14
\n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" January February March\n",
"NewYork 0 1 2\n",
"LosAngeles 3 4 5\n",
"Atlanta 6 7 8\n",
"Boston 9 10 11\n",
"SanFrancisco 12 13 14"
]
},
"execution_count": 3,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df = DataFrame(np.arange(15).reshape(5,3))\n",
"df.index = [\"NewYork\", \"LosAngeles\", \"Atlanta\", \"Boston\", \"SanFrancisco\"]\n",
"df.columns = [\"January\", \"February\", \"March\"]\n",
"df"
]
},
{
"cell_type": "markdown",
"id": "f6c2f45b-0fea-49bf-98a0-572d594a0680",
"metadata": {},
"source": [
"**4.** You recall that the Seaborn package has some data sets built in, but can't remember how to list and load them. Assuming the functions to do so have \"data\" in the name, how might you locate them? You can assume a Jupyter Notebook / IPython environment and explain the process, or write the code to do it in Python."
]
},
{
"cell_type": "markdown",
"id": "8afae59d-eb5a-4692-9924-b54d49942769",
"metadata": {},
"source": [
"Method 1: In an empty code cell, type sb + tab to bring up a list of names. Type \"data\" to filter the names."
]
},
{
"cell_type": "code",
"execution_count": 4,
"id": "ce709b0a-cb22-4f33-9492-62f8346d0ef4",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"['get_data_home', 'get_dataset_names', 'load_dataset']"
]
},
"execution_count": 4,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# Method 2:\n",
"[x for x in dir(sb) if \"data\" in x]"
]
},
{
"cell_type": "code",
"execution_count": 5,
"id": "81dabb30-8220-49c3-981c-106d190a5de1",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"['anagrams',\n",
" 'anscombe',\n",
" 'attention',\n",
" 'brain_networks',\n",
" 'car_crashes',\n",
" 'diamonds',\n",
" 'dots',\n",
" 'exercise',\n",
" 'flights',\n",
" 'fmri',\n",
" 'gammas',\n",
" 'geyser',\n",
" 'iris',\n",
" 'mpg',\n",
" 'penguins',\n",
" 'planets',\n",
" 'taxis',\n",
" 'tips',\n",
" 'titanic']"
]
},
"execution_count": 5,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"sb.get_dataset_names()"
]
},
{
"cell_type": "markdown",
"id": "70efcde8-cd73-4d61-8534-e6a3c469dbff",
"metadata": {},
"source": [
"## Loading data from CSV"
]
},
{
"cell_type": "markdown",
"id": "00214582-1340-4700-99b5-9f36afce3e16",
"metadata": {},
"source": [
"**5**. Zillow home data is available at this URL: https://files.zillowstatic.com/research/public_csvs/zhvi/Metro_zhvi_uc_sfrcondo_tier_0.33_0.67_sm_sa_month.csv\n",
"\n",
"Open this file as a DataFrame in Pandas."
]
},
{
"cell_type": "code",
"execution_count": 6,
"id": "2838d3d2-5f1c-4e97-aab6-c7236ced82ad",
"metadata": {},
"outputs": [],
"source": [
"df_homes = pd.read_csv(\"https://files.zillowstatic.com/research/public_csvs/zhvi/Metro_zhvi_uc_sfrcondo_tier_0.33_0.67_sm_sa_month.csv\")\n"
]
},
{
"cell_type": "markdown",
"id": "4476553a-1f93-4259-87df-69670ef486c0",
"metadata": {},
"source": [
"**6.** Save the DataFrame, df_homes, to a local CSV file, \"zillow_home_data.csv\". "
]
},
{
"cell_type": "code",
"execution_count": 7,
"id": "1d237615-d23c-4f62-a0b8-c709deb95647",
"metadata": {},
"outputs": [],
"source": [
"df_homes.to_csv(\"../data/zillow_home_data.csv\")"
]
},
{
"cell_type": "markdown",
"id": "182e0039-63dc-49df-9413-7eede1453fe0",
"metadata": {},
"source": [
"**7.** Load zillow_home_data.csv back into a new Dataframe, df_homes_2"
]
},
{
"cell_type": "code",
"execution_count": 8,
"id": "53665adb-adb2-462f-a244-01665182e870",
"metadata": {},
"outputs": [],
"source": [
"df_homes_2 = pd.read_csv(\"../data/zillow_home_data.csv\")"
]
},
{
"cell_type": "markdown",
"id": "eaaa6d3b-21e6-4e4a-916a-d522f1fd60c6",
"metadata": {},
"source": [
"**8.** Compare the dimensions of the two DataFrames, df_homes and df_homes_2. Are they equal? If not, how can you fix it?"
]
},
{
"cell_type": "code",
"execution_count": 9,
"id": "dff1485e-fb64-4a19-a6ee-15dc38c72b2e",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"(908, 271)\n",
"(908, 272)\n",
"False\n"
]
}
],
"source": [
"print(df_homes.shape)\n",
"print(df_homes_2.shape)\n",
"print(df_homes.shape == df_homes_2.shape)"
]
},
{
"cell_type": "markdown",
"id": "7e6d423c-75b5-4813-85cd-700f598491ee",
"metadata": {},
"source": [
"To fix the fact that they're not equal, save file again this time using index=False to avoid saving the index as a CSV column."
]
},
{
"cell_type": "code",
"execution_count": 10,
"id": "553a674c-345a-46ff-8466-6a8f003630db",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"True\n"
]
}
],
"source": [
"df_homes.to_csv(\"../data/zillow_home_data.csv\", index=False)\n",
"df_homes_2 = pd.read_csv(\"../data/zillow_home_data.csv\")\n",
"print(df_homes.shape == df_homes_2.shape)"
]
},
{
"cell_type": "markdown",
"id": "0f34b104-fc3a-492c-964b-45daa12850b0",
"metadata": {},
"source": [
"**9.** A remote spreadsheet showing how a snapshot of how traffic increased for a hypothetical website is available here: https://github.com/CodeSolid/CodeSolid.github.io/raw/main/booksource/data/AnalyticsSnapshot.xlsx. Load the worksheet page of the spreasheet data labelled \"February 2022\" as a DataFrame named \"feb\". Note: the leftmost column in the spreadsheet is the index column."
]
},
{
"cell_type": "code",
"execution_count": 11,
"id": "7b8772dd-5d19-4510-a725-80c7273ce61b",
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"
\n",
"\n",
"
\n",
" \n",
"
\n",
"
\n",
"
This Month
\n",
"
Last Month
\n",
"
Month to Month Increase
\n",
"
\n",
" \n",
" \n",
"
\n",
"
Users
\n",
"
1800.0
\n",
"
280.0
\n",
"
5.428571
\n",
"
\n",
"
\n",
"
New Users
\n",
"
1700.0
\n",
"
298.0
\n",
"
4.704698
\n",
"
\n",
"
\n",
"
Page Views
\n",
"
2534.0
\n",
"
436.0
\n",
"
4.811927
\n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" This Month Last Month Month to Month Increase\n",
"Users 1800.0 280.0 5.428571\n",
"New Users 1700.0 298.0 4.704698\n",
"Page Views 2534.0 436.0 4.811927"
]
},
"execution_count": 11,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"url = \"https://github.com/CodeSolid/CodeSolid.github.io/raw/main/booksource/data/AnalyticsSnapshot.xlsx\"\n",
"feb = pd.read_excel(url, sheet_name=\"February 2022\", index_col=0)\n",
"feb"
]
},
{
"cell_type": "markdown",
"id": "ad366534-b168-490c-a55d-654f3ef44288",
"metadata": {},
"source": [
"**10.** The \"Month to Month Increase\" column is a bit hard to understand, so ignore it for now. Given the values for \"This Month\" and \"Last Month\", create a new column, \"Percentage Increase\"."
]
},
{
"cell_type": "code",
"execution_count": 12,
"id": "d053f773-c9a9-4592-aaaa-660ae8e189e5",
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"
\n",
"\n",
"
\n",
" \n",
"
\n",
"
\n",
"
This Month
\n",
"
Last Month
\n",
"
Month to Month Increase
\n",
"
Percentage Increase
\n",
"
\n",
" \n",
" \n",
"
\n",
"
Users
\n",
"
1800.0
\n",
"
280.0
\n",
"
5.428571
\n",
"
542.857143
\n",
"
\n",
"
\n",
"
New Users
\n",
"
1700.0
\n",
"
298.0
\n",
"
4.704698
\n",
"
470.469799
\n",
"
\n",
"
\n",
"
Page Views
\n",
"
2534.0
\n",
"
436.0
\n",
"
4.811927
\n",
"
481.192661
\n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" This Month Last Month Month to Month Increase \\\n",
"Users 1800.0 280.0 5.428571 \n",
"New Users 1700.0 298.0 4.704698 \n",
"Page Views 2534.0 436.0 4.811927 \n",
"\n",
" Percentage Increase \n",
"Users 542.857143 \n",
"New Users 470.469799 \n",
"Page Views 481.192661 "
]
},
"execution_count": 12,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"feb[\"Percentage Increase\"] = (feb[\"This Month\"] - feb[\"Last Month\"]) / feb[\"Last Month\"] * 100\n",
"feb"
]
},
{
"cell_type": "markdown",
"id": "8a71222f-2ab4-47bb-806d-2610e25b3a91",
"metadata": {},
"source": [
"## Basic Operations on Data"
]
},
{
"cell_type": "markdown",
"id": "0b9cf32b-3132-40cc-a5ca-26d6e6a36e4b",
"metadata": {},
"source": [
"**11.** Using Seaborn, get a dataset about penguins into a dataframe named \"df_penguins\". Note that because all of the following questions depend on this example, we'll provide the solution here so no one gets stuck:"
]
},
{
"cell_type": "code",
"execution_count": 13,
"id": "a8b68caf-a998-414a-9de7-899eae7213c7",
"metadata": {},
"outputs": [],
"source": [
"df_penguins = sb.load_dataset('penguins')"
]
},
{
"cell_type": "markdown",
"id": "f7170135-17bd-45e2-9239-dad647ed6eaf",
"metadata": {},
"source": [
"**12.** Write the code to show the the number of rows and columns in df_penguins"
]
},
{
"cell_type": "code",
"execution_count": 14,
"id": "6f565afb-d5f4-462d-b3e5-946bc91bd83e",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"(344, 7)"
]
},
"execution_count": 14,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df_penguins.shape"
]
},
{
"cell_type": "markdown",
"id": "9d3396f3-3fe6-4571-83b2-1e70f8a11513",
"metadata": {},
"source": [
"**13.** How might you show the first few rows of df_penguins?"
]
},
{
"cell_type": "code",
"execution_count": 15,
"id": "68d5946e-5011-4735-983e-7485f44a60f2",
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"
\n",
"\n",
"
\n",
" \n",
"
\n",
"
\n",
"
species
\n",
"
island
\n",
"
bill_length_mm
\n",
"
bill_depth_mm
\n",
"
flipper_length_mm
\n",
"
body_mass_g
\n",
"
sex
\n",
"
\n",
" \n",
" \n",
"
\n",
"
0
\n",
"
Adelie
\n",
"
Torgersen
\n",
"
39.1
\n",
"
18.7
\n",
"
181.0
\n",
"
3750.0
\n",
"
Male
\n",
"
\n",
"
\n",
"
1
\n",
"
Adelie
\n",
"
Torgersen
\n",
"
39.5
\n",
"
17.4
\n",
"
186.0
\n",
"
3800.0
\n",
"
Female
\n",
"
\n",
"
\n",
"
2
\n",
"
Adelie
\n",
"
Torgersen
\n",
"
40.3
\n",
"
18.0
\n",
"
195.0
\n",
"
3250.0
\n",
"
Female
\n",
"
\n",
"
\n",
"
3
\n",
"
Adelie
\n",
"
Torgersen
\n",
"
NaN
\n",
"
NaN
\n",
"
NaN
\n",
"
NaN
\n",
"
NaN
\n",
"
\n",
"
\n",
"
4
\n",
"
Adelie
\n",
"
Torgersen
\n",
"
36.7
\n",
"
19.3
\n",
"
193.0
\n",
"
3450.0
\n",
"
Female
\n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" species island bill_length_mm bill_depth_mm flipper_length_mm \\\n",
"0 Adelie Torgersen 39.1 18.7 181.0 \n",
"1 Adelie Torgersen 39.5 17.4 186.0 \n",
"2 Adelie Torgersen 40.3 18.0 195.0 \n",
"3 Adelie Torgersen NaN NaN NaN \n",
"4 Adelie Torgersen 36.7 19.3 193.0 \n",
"\n",
" body_mass_g sex \n",
"0 3750.0 Male \n",
"1 3800.0 Female \n",
"2 3250.0 Female \n",
"3 NaN NaN \n",
"4 3450.0 Female "
]
},
"execution_count": 15,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df_penguins.head()"
]
},
{
"cell_type": "markdown",
"id": "c841930d-e235-468a-809d-8427f8739076",
"metadata": {},
"source": [
"**14.** How can you return the unique species of penguins from df_penguins? How many unique species are there?"
]
},
{
"cell_type": "code",
"execution_count": 16,
"id": "130ed9bd-397b-4504-bf26-6b0dc716ba8b",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"0 Adelie\n",
"152 Chinstrap\n",
"220 Gentoo\n",
"Name: species, dtype: object\n",
"There are 3 unique species, ['Adelie', 'Chinstrap', 'Gentoo'].\n"
]
}
],
"source": [
"species = df_penguins[\"species\"].copy()\n",
"unique = species.fillna(0)\n",
"unique = unique.drop_duplicates()\n",
"nrows = unique.shape[0]\n",
"print(unique)\n",
"print(f\"There are {nrows} unique species, {list(unique.values)}.\")"
]
},
{
"cell_type": "markdown",
"id": "3912a3df-d931-4d74-a6fa-34ad11f83f94",
"metadata": {},
"source": [
"**15.** What function can we use to drop the rows that have missing data?"
]
},
{
"cell_type": "code",
"execution_count": 17,
"id": "436cf3d3-db4a-42e3-81d3-9f2df28265f1",
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"
\n",
"\n",
"
\n",
" \n",
"
\n",
"
\n",
"
species
\n",
"
island
\n",
"
bill_length_mm
\n",
"
bill_depth_mm
\n",
"
flipper_length_mm
\n",
"
body_mass_g
\n",
"
sex
\n",
"
\n",
" \n",
" \n",
"
\n",
"
0
\n",
"
Adelie
\n",
"
Torgersen
\n",
"
39.1
\n",
"
18.7
\n",
"
181.0
\n",
"
3750.0
\n",
"
Male
\n",
"
\n",
"
\n",
"
1
\n",
"
Adelie
\n",
"
Torgersen
\n",
"
39.5
\n",
"
17.4
\n",
"
186.0
\n",
"
3800.0
\n",
"
Female
\n",
"
\n",
"
\n",
"
2
\n",
"
Adelie
\n",
"
Torgersen
\n",
"
40.3
\n",
"
18.0
\n",
"
195.0
\n",
"
3250.0
\n",
"
Female
\n",
"
\n",
"
\n",
"
4
\n",
"
Adelie
\n",
"
Torgersen
\n",
"
36.7
\n",
"
19.3
\n",
"
193.0
\n",
"
3450.0
\n",
"
Female
\n",
"
\n",
"
\n",
"
5
\n",
"
Adelie
\n",
"
Torgersen
\n",
"
39.3
\n",
"
20.6
\n",
"
190.0
\n",
"
3650.0
\n",
"
Male
\n",
"
\n",
"
\n",
"
...
\n",
"
...
\n",
"
...
\n",
"
...
\n",
"
...
\n",
"
...
\n",
"
...
\n",
"
...
\n",
"
\n",
"
\n",
"
338
\n",
"
Gentoo
\n",
"
Biscoe
\n",
"
47.2
\n",
"
13.7
\n",
"
214.0
\n",
"
4925.0
\n",
"
Female
\n",
"
\n",
"
\n",
"
340
\n",
"
Gentoo
\n",
"
Biscoe
\n",
"
46.8
\n",
"
14.3
\n",
"
215.0
\n",
"
4850.0
\n",
"
Female
\n",
"
\n",
"
\n",
"
341
\n",
"
Gentoo
\n",
"
Biscoe
\n",
"
50.4
\n",
"
15.7
\n",
"
222.0
\n",
"
5750.0
\n",
"
Male
\n",
"
\n",
"
\n",
"
342
\n",
"
Gentoo
\n",
"
Biscoe
\n",
"
45.2
\n",
"
14.8
\n",
"
212.0
\n",
"
5200.0
\n",
"
Female
\n",
"
\n",
"
\n",
"
343
\n",
"
Gentoo
\n",
"
Biscoe
\n",
"
49.9
\n",
"
16.1
\n",
"
213.0
\n",
"
5400.0
\n",
"
Male
\n",
"
\n",
" \n",
"
\n",
"
333 rows × 7 columns
\n",
"
"
],
"text/plain": [
" species island bill_length_mm bill_depth_mm flipper_length_mm \\\n",
"0 Adelie Torgersen 39.1 18.7 181.0 \n",
"1 Adelie Torgersen 39.5 17.4 186.0 \n",
"2 Adelie Torgersen 40.3 18.0 195.0 \n",
"4 Adelie Torgersen 36.7 19.3 193.0 \n",
"5 Adelie Torgersen 39.3 20.6 190.0 \n",
".. ... ... ... ... ... \n",
"338 Gentoo Biscoe 47.2 13.7 214.0 \n",
"340 Gentoo Biscoe 46.8 14.3 215.0 \n",
"341 Gentoo Biscoe 50.4 15.7 222.0 \n",
"342 Gentoo Biscoe 45.2 14.8 212.0 \n",
"343 Gentoo Biscoe 49.9 16.1 213.0 \n",
"\n",
" body_mass_g sex \n",
"0 3750.0 Male \n",
"1 3800.0 Female \n",
"2 3250.0 Female \n",
"4 3450.0 Female \n",
"5 3650.0 Male \n",
".. ... ... \n",
"338 4925.0 Female \n",
"340 4850.0 Female \n",
"341 5750.0 Male \n",
"342 5200.0 Female \n",
"343 5400.0 Male \n",
"\n",
"[333 rows x 7 columns]"
]
},
"execution_count": 17,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df_penguins.dropna()"
]
},
{
"cell_type": "markdown",
"id": "f1feab44-7136-4736-a11c-ee005fdb0698",
"metadata": {},
"source": [
"**16.** By default, will this modify df_penguins or will it return a copy?"
]
},
{
"cell_type": "markdown",
"id": "8b0981bc-d8b1-42a5-9df6-e2723f61013c",
"metadata": {},
"source": [
"It will return a copy."
]
},
{
"cell_type": "markdown",
"id": "7f35c024-ac5c-43e0-8ea7-3e03953f644b",
"metadata": {},
"source": [
"**17.** How can we override the default?"
]
},
{
"cell_type": "markdown",
"id": "3f62e100-1000-40df-b21b-6af919fbb945",
"metadata": {},
"source": [
"We can use ```df_penguins.dropna(inplace=True)```"
]
},
{
"cell_type": "markdown",
"id": "a17f9c69-c8e2-4331-b9c7-c5be6a13a968",
"metadata": {},
"source": [
"**18.** Create a new DataFrame, df_penguins_full, with the missing data deleted."
]
},
{
"cell_type": "code",
"execution_count": 18,
"id": "f160094c-ba0c-4fe1-96f0-d6b23e8162c9",
"metadata": {},
"outputs": [],
"source": [
"df_penguins_full = df_penguins.dropna()"
]
},
{
"cell_type": "code",
"execution_count": 19,
"id": "54156eb4-45a6-4fcf-b40c-b406ee493180",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"Index(['species', 'island', 'bill_length_mm', 'bill_depth_mm',\n",
" 'flipper_length_mm', 'body_mass_g', 'sex'],\n",
" dtype='object')"
]
},
"execution_count": 19,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# Expoloratory only\n",
"df_penguins_full.columns"
]
},
{
"cell_type": "markdown",
"id": "c34afa01-c0c4-40fa-b3a6-3b7c6abc58e1",
"metadata": {},
"source": [
"**19.** What is the average bill length of a penguin, in millimeters, in this (df_full) data set?"
]
},
{
"cell_type": "code",
"execution_count": 20,
"id": "6012e669-3c97-4949-be33-3e44daf9534a",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"43.99279279279279"
]
},
"execution_count": 20,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df_penguins_full['bill_length_mm'].mean()"
]
},
{
"cell_type": "markdown",
"id": "38076367-ae05-4567-8963-12b5e0d77214",
"metadata": {},
"source": [
"**20.** Which of the following is most strongly correlated with bill length? a) Body mass? b) Flipper length? c) Bill depth? Show how you arrived at the answer."
]
},
{
"cell_type": "markdown",
"id": "2edbbf08-8f1d-40ea-b137-cdcdbfe02787",
"metadata": {},
"source": [
"The answer is b) Flipper length. See below:"
]
},
{
"cell_type": "code",
"execution_count": 21,
"id": "93840bc0-eda0-48fd-b14d-4405a484c547",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"0.5894511101769488\n",
"0.6530956386670861\n",
"-0.2286256359130292\n"
]
}
],
"source": [
"print(df_penguins_full['bill_length_mm'].corr(df_penguins_full['body_mass_g']))\n",
"print(df_penguins_full['bill_length_mm'].corr(df_penguins_full['flipper_length_mm']))\n",
"print(df_penguins_full['bill_length_mm'].corr(df_penguins_full['bill_depth_mm']))\n"
]
},
{
"cell_type": "markdown",
"id": "538ed22f-f2c0-46df-bf41-6a27246e356d",
"metadata": {},
"source": [
"**21.** How could you show the median flipper length, grouped by species?"
]
},
{
"cell_type": "code",
"execution_count": 22,
"id": "a39321c8-1822-4078-91f9-6f9a36a3bab7",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"species\n",
"Adelie 190.102740\n",
"Chinstrap 195.823529\n",
"Gentoo 217.235294\n",
"Name: flipper_length_mm, dtype: float64"
]
},
"execution_count": 22,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df_penguins_full.groupby('species').mean()['flipper_length_mm']"
]
},
{
"cell_type": "markdown",
"id": "0d6ff0a0-f72b-47a0-a8c5-50cf2e9c8ef4",
"metadata": {},
"source": [
"**22.** Which species has the longest flippers?"
]
},
{
"cell_type": "markdown",
"id": "20f5ec8f-1580-438d-8ddb-ad2eeabed1ba",
"metadata": {},
"source": [
"Gentoo"
]
},
{
"cell_type": "markdown",
"id": "f50dacfd-a59d-4763-b8bf-004a02d8f05f",
"metadata": {},
"source": [
"**23.** Which two species have the most similar mean weight? Show how you arrived at the answer."
]
},
{
"cell_type": "markdown",
"id": "be5582aa-447d-4645-85cd-072894fcbb82",
"metadata": {},
"source": [
"Adelie and Chinstrap"
]
},
{
"cell_type": "code",
"execution_count": 23,
"id": "c61f152b-d401-4c28-aea1-014d01100136",
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"