Python for Business Analytics
The University of Sydney Business School
Welcome to the Python resource page for Business Analytics students at the University of Sydney Business School. The purpose of this page is to provide key information and supplementary material for students using Python in Business Analytics units.
- Setting up Python
- Getting started with Python and Jupyter Notebook
- Jupyter Qt Console
- Resources for learning Python
- Python essentials
- Working with data in Python (exercises, solutions)
- Data visualisation
- Writing your own functions (solutions)
- Working with time stamped data
Setting up Python
You can easily and quickly install Python on your personal computer. Python is free, open source, and you do not need a license to work with it even if you use it for commercial purposes. The freedom and flexibility is one of main reasons why it is so popular as a general purpose language, and why Python has become one of the most popular languages for Data Science (together with R).
Our units use the core Python installation plus a collection of libraries to support tasks such as scientific computing, data management, data visualisation, statistical analysis, and machine learning. You can get almost everything that we need in one go by downloading and installing the Anaconda distribution provided by Continuum Analytics. Follow the instructions on their website and install the latest version, which is currently Python 3.6.
Next, you need a way to interact with Python. We use Jupyter Notebook, a browser based interface that is simple to use and has many useful features. You can install it by following the instructions on the link.
Anaconda also works as a convenient package manager, and you additionally need to install Seaborn statistical data visualisation package, which does not automatically come in the Anaconda distribution. All you need to do is to open a terminal or command prompt (click on the links if you need Windows or Mac help) and enter:
Getting started with Python and Jupyter Notebook
To start using Python, enter the following on the terminal. This will launch Jupyter notebook on your default browser.
It may not be convenient to launch Jupyter from the terminal every time. To avoid this, you can create a shortcut. One way to do this is to open up a file explorer and search for "Jupyter" to find the program in the Anaconda directory. You can then place a shortcut anywhere you like. Edit the shortcut properties so that Jupyter starts in your preferred directory. You can also replace the "target" field with "jupyter notebook".
Some students find it easier to launch Jupyter (as well as managing packages) from the Anaconda Navigator.
Once Jupyter opens up, you will see a screen that looks like the one below. In the main body you will see all files in the directory from which you started Jupyter. On the top right, you can click on new and then Python 3 to open a new notebook.
You are now ready to start coding. The basic elements of Jupyter Notebook are the cells, as in the next figure. Each cell holds is interpreted as code by default. You can type (or copy and paste) as much code as you like in a single cell. Press Shift + Enter when you are ready to run it. As a first step, try using a cell as a calculator to see it working. You can always run a cell again.
You should familiarise yourself with the menu on the top to get an overview of the basic functionality of the notebook and useful keyboard shortcuts. For example, in the drop down list where you initially see "code" you have the option of changing a cell to markdown, so that you can write notes (alternatively Esc + M). Markdown cells accept HTML code and mathematics typed in LaTeX, which will be rendered when you run the cell. For practical information you can consult this cheatsheet.
As a practical step, try to run the code below. It loads the pandas package for data management. From previous experience, an error may arise in some international computers due to font encoding issues. This is a minor fixable problem. One solution for Mac computers is in this Stack Overflow thread. There may be simpler ones but you would need to bring your computer to me. Please let me know if the problem persists so that I can help you. It is important that you do this before the first face-to-face session.
import pandas as pd
To get an idea of we can do with this environment, run the following code snippet, which is an illustration from the Seaborn documentation.
%matplotlib inline import numpy as np import matplotlib as mpl import matplotlib.pyplot as plt def sinplot(flip=1): x = np.linspace(0, 14, 100) for i in range(1, 7): plt.plot(x, np.sin(x + i * .5) * (7 - i) * flip) import seaborn as sns sns.set_style('whitegrid') sinplot() plt.show()
If it all goes well, you will a graph like the one below. Do not worry about understanding the details for now, you will have mastered all this by the end the unit. It is however useful to note that the first line makes the figure appear on the notebook itself rather than on a separate window. Once you run it once it will keep doing so unless you revert to the default backend by running "%matplotlib qt" or restarting the notebook. Try experimenting with this.
Jupyter Qt Console
This section is optional and more helpful after you get some experience with Python and Jupyter Notebook.
Even though Jupyter Notebook is an excellent interface for learning Python and a great tool for communication that is widely adopted by professionals, it may not fulfill all your needs as you become a more advanced user.
An alternative workflow is to combine a text editor for writing code with a console to run it and work interactively with Python. For example, you may write a set of Python instructions in a text file know as a script (.py extension in our case), and use the console to run it and see the output. You will notice that this is what happens when you do the exercises on DataCamp.
The Jupyter Qt console is a good option that already comes with your installed environment. To open it you can type the following on a terminal:
I like to tweak this slightly to work with a dark background (unless I plan to inline figures) and default to a larger font. As before, you can create a desktop shortcut for convenience.
You can try to copy and paste and run the code from before on the Qt console. A key difference is that you just have to press Enter to run single line commands (but still Shift + Enter for multiple lines).
At a very basic level even Notepad works as a text editor for coding, but you would want a more sophisticated option with features such as syntax highlighting and auto indentation. I use Sublime Text, but it requires a license (even though it has an unlimited trial period). Atom is a good free choice. A simple alternative if you are just getting started is to use Jupyter Notebook as a text editor. You can do this by creating a text file instead of a Python notebook in the main screen. You should then specify Python as the language on the menu.
There often too many choices and sources of information when it comes to different aspects of Python coding, as the discussion of text editors illustrates. This is a good thing as it reflects the enthusiasm of developers in creating tools and packages, but can be distraction and even a barrier for beginners (compared to R for example, for which R Studio is a clear cut IDE choice). I suggest that you follow the recommended setup and do not worry too much about the details. If you are coding and running programs, then you are making progress towards your goals.
Resources for learning Python
Dataquest. A hands-on and comprehensive online course in Python for data science .
Think Python: How to Think Like a Computer Scientist (Second Edition) by Allen Downey (free online text). For students who are new to coding and are interested in developing a more fundamental understanding of programming.
Quantitative Economics online lectures by Thomas J. Sargent and John Stachurski. A fast way to get started with Python for those who already have experience with other languages such as MATLAB or R (some knowledge of econometrics is helpful).
It is important to note that no activities in Business Analytics units are intended as a replacement for a programming unit. Our focus in on using Python as a practical tool for data analysis, while programming is a process for problem solving. Students who plan to code professionally are encouraged to study programming at the School of Information Technologies.