The ability to process and analyze data has never been more important. Spreadsheets - in particular Microsoft's Excel - are a primary data-analysis tool for finance professionals. However, as the size and complexity of financial data analysis increase, we quickly find ourselves working on problems beyond Excel's capabilities. Still many people persist in using the tool with which they are familiar. This leads to a syndrome known as Excel Hell.
You’re probably in Excel Hell:
- If spreadsheets take minutes to open.
- If formulas span multiple lines.
- If you have a 20-step analysis job that you run each week. Most weeks all goes well. But, occasionally you miss a step and have to start over, wasting valuable time.
Data-Analysis Programming Languages Can Help
Today, several widely used software programs address many of Excel's weaknesses. The key difference between spreadsheets and these new tools is that the latter are data-analysis programming languages. Two popular open-source – free -- programming languages are Python and R. A couple of noteworthy closed-source -- not free -- languages are Matlab and SAS.
With spreadsheets, you load data into a workbook, navigate around a grid of cells, and write formulas in those cells. The experience is visual, tactile, and intuitive.
With a programming language, you load data into memory, and then write computer code that acts on the data. The experience is more abstract, which can be challenging to learn, but this abstraction ultimately leads to greater flexibility.
How do tools like R and Python make for a better data-analysis environment?
- By loading data into memory, these tools handle larger data sets. Excel starts struggling at 10 megabytes of data. Programming languages can handle one to two gigabytes of data with ease.
- If you write complex Excel formulas, you're already a de facto computer programmer, so you might as well use a tool intended for that purpose.
- By cementing analysis steps in code, tasks can be repeated many times.
Programming is Hard
Having extolled the virtues of these programming languages let me offer a dose of realism: programming is hard. The increased capabilities of a programming language come at the cost of a steep learning curve.
Coding is not for everyone.
If analyzing data is a significant part of your job or if you would like increased responsibilities with these tasks, I recommend mastering at least one data-analysis programming language. But it's a long process. Don't expect to conquer these tools over a weekend.
And if you’ve taken the plunge and feel overwhelmed, take solace in knowing that you're not alone.
Which Language Should You Choose?
Since 2018, I have taught graduate students and finance professionals how to use both R and Python. Two frequent questions are: "Which tool is the best? Which one should I focus on?"
I appreciate the motivation for asking the questions. People are busy and want to make efficient use of their time. Therefore, my answer may be unsatisfying: it doesn’t matter. Don't dither, just make a choice, and develop fluency in that language.
All four languages - R, Python, Matlab, and SAS - are similar in their data-analysis capabilities. If you want to migrate a task in Excel to a programming language, all four will work well.
Rather than choosing a tool for its intrinsic properties, a better criterion would be institutional inertia. Use the tool that your collaborators use. If your company has chosen a language, go with that one. All four are more than adequate, but none is perfect.
In the last five years, Matlab and SAS have lost significant market share to Python and R. This trend looks to continue. This is largely because the open-source tools are now full featured and robust, so it’s hard for the other two to justify their cost.
Companies serious about data science typically support both R and Python. The former is better for interactive analysis, while the latter is better for production code.
When I decided to make data-analysis programming one of my core competencies, I chose to learn R first. A few years later, I added Python to my toolbox.
Two non-programming data-analysis tools that have become popular over the last few years are Tableau and PowerBI. Both tools focus on gaining insights from data visualization.
If you want the flexibility of a programming language, I do not recommend either of these tools. However, there may be value for those who don’t want to commit to learning programming.
What We're Doing at Aware
The Aware quant team uses Python predominantly. In order to follow institutional inertia, despite my proficiency in R, I made Python my primary data-analysis tool.
Most of our work occurs in a Jupyter Notebook, which is an interactive programming environment that runs in your web browser. On our desktops, we make use of Notebooks within Bloomberg’s BQuant framework. We also work with Notebooks in our own custom analytics platform in Amazon Web Services.
Good luck in your journey.
Head of Quantitative Research
Aware Asset Management