{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Basic Pandas Operations\n",
    "\n",
    "## Why do we use Pandas?\n",
    "\n",
    "Pandas is a powerful Python library that supports creation, storage, and manipulation of data frames. Aa data-frame as a 2-dimensional data-structure -- basically a spreadsheet -- with rows and columns. It's probably the most common way that we organize our data, whether experimental, survey, whatever... It's basically the bread and butter of data-analysis.\n",
    "\n",
    "Pandas has features such as handling missing data, cleaning up the data, and it also supports multiple file formats including CSV, Excel, SQL, etc.\n",
    "\n",
    "In this activity, we will learn some of the basic oprations to get you started with Pandas.\n",
    "\n",
    "## Storing data frames in a CSV file\n",
    "\n",
    "First we will create a dictionary containing the data. Then we can convert the dictionary to a pandas data frame and store it as a csv file. In real life you will never be hard-coding data like this. This is just a demonstration."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "metadata": {},
   "outputs": [],
   "source": [
    "import numpy as np\n",
    "import pandas as pd\n",
    "\n",
    "name_dict = {\n",
    "    'Name':['Adam','Becky','Charlie','Daniel','Emily','Frank','Greta','Helen','Ian','Jack','Klaus','Lucy'],\n",
    "    'Class ID':list(range(1,13)),\n",
    "    'Age':[int(i) for i in (np.round(np.random.uniform(18,30,12),0))],\n",
    "    'Score':[int(i) for i in (np.round(np.random.uniform(60,100,12),0))]\n",
    "}\n",
    "\n",
    "df = pd.DataFrame(name_dict)\n",
    "\n",
    "df.to_csv('ClassList.csv',index=False)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Loading in data from a CSV file\n",
    "\n",
    "After loading in the data, we can do some quick checks. First, let's check the column names:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 28,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "pandas.core.indexes.base.Index"
      ]
     },
     "execution_count": 28,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df = pd.read_csv('ClassList.csv')\n",
    "df.columns\n",
    "\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Pandas implements its own data types, but you can convert them to familiar things like lists (Basically, if it looks like it's a list, you can convert it to a list. You'll lose some functionality when you do it, but often it's functionality you don't need.)."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 30,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "<class 'pandas.core.indexes.base.Index'>\n",
      "['Name', 'Class ID', 'Age', 'Score']\n"
     ]
    }
   ],
   "source": [
    "print(type(df.columns))\n",
    "col_names = list(df.columns)\n",
    "print(col_names)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Next, let's look at the `head` (first few rows), and `tail` (the last few rows). This is always a good idea to make sure that everything is being read in as we expect."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 26,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>Name</th>\n",
       "      <th>Class ID</th>\n",
       "      <th>Age</th>\n",
       "      <th>Score</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>Adam</td>\n",
       "      <td>1</td>\n",
       "      <td>29</td>\n",
       "      <td>72</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>Becky</td>\n",
       "      <td>2</td>\n",
       "      <td>29</td>\n",
       "      <td>90</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>Charlie</td>\n",
       "      <td>3</td>\n",
       "      <td>20</td>\n",
       "      <td>81</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>Daniel</td>\n",
       "      <td>4</td>\n",
       "      <td>23</td>\n",
       "      <td>65</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>Emily</td>\n",
       "      <td>5</td>\n",
       "      <td>26</td>\n",
       "      <td>75</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>5</th>\n",
       "      <td>Frank</td>\n",
       "      <td>6</td>\n",
       "      <td>29</td>\n",
       "      <td>62</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>6</th>\n",
       "      <td>Greta</td>\n",
       "      <td>7</td>\n",
       "      <td>30</td>\n",
       "      <td>94</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>7</th>\n",
       "      <td>Helen</td>\n",
       "      <td>8</td>\n",
       "      <td>28</td>\n",
       "      <td>61</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>8</th>\n",
       "      <td>Ian</td>\n",
       "      <td>9</td>\n",
       "      <td>20</td>\n",
       "      <td>77</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>9</th>\n",
       "      <td>Jack</td>\n",
       "      <td>10</td>\n",
       "      <td>27</td>\n",
       "      <td>93</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "      Name  Class ID  Age  Score\n",
       "0     Adam         1   29     72\n",
       "1    Becky         2   29     90\n",
       "2  Charlie         3   20     81\n",
       "3   Daniel         4   23     65\n",
       "4    Emily         5   26     75\n",
       "5    Frank         6   29     62\n",
       "6    Greta         7   30     94\n",
       "7    Helen         8   28     61\n",
       "8      Ian         9   20     77\n",
       "9     Jack        10   27     93"
      ]
     },
     "execution_count": 26,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df.head(10) #the first 5 rows by default. If you want more, insert the number as an argument to head."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 24,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>Name</th>\n",
       "      <th>Class ID</th>\n",
       "      <th>Age</th>\n",
       "      <th>Score</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>7</th>\n",
       "      <td>Helen</td>\n",
       "      <td>8</td>\n",
       "      <td>28</td>\n",
       "      <td>61</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>8</th>\n",
       "      <td>Ian</td>\n",
       "      <td>9</td>\n",
       "      <td>20</td>\n",
       "      <td>77</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>9</th>\n",
       "      <td>Jack</td>\n",
       "      <td>10</td>\n",
       "      <td>27</td>\n",
       "      <td>93</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>10</th>\n",
       "      <td>Klaus</td>\n",
       "      <td>11</td>\n",
       "      <td>23</td>\n",
       "      <td>75</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>11</th>\n",
       "      <td>Lucy</td>\n",
       "      <td>12</td>\n",
       "      <td>20</td>\n",
       "      <td>73</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "     Name  Class ID  Age  Score\n",
       "7   Helen         8   28     61\n",
       "8     Ian         9   20     77\n",
       "9    Jack        10   27     93\n",
       "10  Klaus        11   23     75\n",
       "11   Lucy        12   20     73"
      ]
     },
     "execution_count": 24,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df.tail() # This will give you the last 5 rows of data"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Accessing a particular row, column, or cell\n",
    "\n",
    "In some cases, we want to access a particular row of the data."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "Name        Adam\n",
       "Class ID       1\n",
       "Age           28\n",
       "Score         63\n",
       "Name: 0, dtype: object"
      ]
     },
     "execution_count": 5,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df.loc[0,:] # Accessing the first row of data, the first value points to the row(s) and the second value points to the column(s)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "We can select a particular column by its name"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "0     63\n",
       "1     72\n",
       "2     82\n",
       "3     82\n",
       "4     71\n",
       "5     82\n",
       "6     84\n",
       "7     97\n",
       "8     63\n",
       "9     98\n",
       "10    94\n",
       "11    99\n",
       "Name: Score, dtype: int64"
      ]
     },
     "execution_count": 6,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df.Score # Accessing the Score column"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Or you can select a column by its index"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 8,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "0     28\n",
       "1     27\n",
       "2     28\n",
       "3     22\n",
       "4     24\n",
       "5     21\n",
       "6     26\n",
       "7     27\n",
       "8     23\n",
       "9     26\n",
       "10    29\n",
       "11    27\n",
       "Name: Age, dtype: int64"
      ]
     },
     "execution_count": 8,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df.iloc[:,2] # This will also access the Score column"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "It's also possible to select a particular cell"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 9,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "63"
      ]
     },
     "execution_count": 9,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df.iloc[0]['Score'] # This will access the first row ([0]) of the Score column"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Iterate through rows and access values in that row\n",
    "\n",
    "In some cases, we need to iterate through all the rows, for example when we are using a pandas data-frame as a trial list and creating experimental trials based on that data."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 10,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "(0, Name        Adam\n",
      "Class ID       1\n",
      "Age           28\n",
      "Score         63\n",
      "Name: 0, dtype: object)\n",
      "(1, Name        Becky\n",
      "Class ID        2\n",
      "Age            27\n",
      "Score          72\n",
      "Name: 1, dtype: object)\n",
      "(2, Name        Charlie\n",
      "Class ID          3\n",
      "Age              28\n",
      "Score            82\n",
      "Name: 2, dtype: object)\n",
      "(3, Name        Daniel\n",
      "Class ID         4\n",
      "Age             22\n",
      "Score           82\n",
      "Name: 3, dtype: object)\n",
      "(4, Name        Emily\n",
      "Class ID        5\n",
      "Age            24\n",
      "Score          71\n",
      "Name: 4, dtype: object)\n",
      "(5, Name        Frank\n",
      "Class ID        6\n",
      "Age            21\n",
      "Score          82\n",
      "Name: 5, dtype: object)\n",
      "(6, Name        Greta\n",
      "Class ID        7\n",
      "Age            26\n",
      "Score          84\n",
      "Name: 6, dtype: object)\n",
      "(7, Name        Helen\n",
      "Class ID        8\n",
      "Age            27\n",
      "Score          97\n",
      "Name: 7, dtype: object)\n",
      "(8, Name        Ian\n",
      "Class ID      9\n",
      "Age          23\n",
      "Score        63\n",
      "Name: 8, dtype: object)\n",
      "(9, Name        Jack\n",
      "Class ID      10\n",
      "Age           26\n",
      "Score         98\n",
      "Name: 9, dtype: object)\n",
      "(10, Name        Klaus\n",
      "Class ID       11\n",
      "Age            29\n",
      "Score          94\n",
      "Name: 10, dtype: object)\n",
      "(11, Name        Lucy\n",
      "Class ID      12\n",
      "Age           27\n",
      "Score         99\n",
      "Name: 11, dtype: object)\n"
     ]
    }
   ],
   "source": [
    "# Printing everything of each row\n",
    "for row in df.iterrows():\n",
    "    print(row)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Adam 72\n",
      "Becky 90\n",
      "Charlie 81\n",
      "Daniel 65\n",
      "Emily 75\n",
      "Frank 62\n",
      "Greta 94\n",
      "Helen 61\n",
      "Ian 77\n",
      "Jack 93\n",
      "Klaus 75\n",
      "Lucy 73\n"
     ]
    }
   ],
   "source": [
    "# Iterate through all rows and select only the name and Score columns\n",
    "for index,row in df.iterrows():\n",
    "    print(row['Name'], row['Score'])"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Changing data in a data-frame\n",
    "\n",
    "If what you want to do is **change** data in a data-frame, the preferred way is to **not** iterate through the data-frame, but change it procedurally. For example, let's say we want to add a column to our data-frame that converts scores to letter-grades. Rather than iterating through it and executing various conditionals, we do this:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 21,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "72 C\n",
      "90 A\n",
      "81 B\n",
      "65 D\n",
      "75 C\n",
      "62 D\n",
      "94 A\n",
      "61 D\n",
      "77 C\n",
      "93 A\n",
      "75 C\n",
      "73 C\n"
     ]
    }
   ],
   "source": [
    "def convert_score_to_grade(score):\n",
    "    min_score_to_letter = {90:'A', 80:'B', 70:'C', 60:'D', 0:'F'}\n",
    "    for min_score,letter in min_score_to_letter.items():\n",
    "        if score>=min_score:\n",
    "            return letter\n",
    "\n",
    "df['Letter'] = list(map(convert_score_to_grade, df['Score'])) #create a new column called Letter\n",
    "\n",
    "#now let's check it\n",
    "for index,row in df.iterrows():\n",
    "    print(row['Score'], row['Letter'])"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3.8.13 ('psych750')",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.8.13"
  },
  "orig_nbformat": 4,
  "vscode": {
   "interpreter": {
    "hash": "57beecaf6908bae4f97de5e2dc8e8d0311fae5bc989593c172c307d13e31f6e4"
   }
  }
 },
 "nbformat": 4,
 "nbformat_minor": 2
}