Self-paced

Explore our extensive collection of courses designed to help you master various subjects and skills. Whether you're a beginner or an advanced learner, there's something here for everyone.

Bootcamp

Learn live

Join us for our free workshops, webinars, and other events to learn more about our programs and get started on your journey to becoming a developer.

Upcoming live events

Learning library

For all the self-taught geeks out there, here is our content library with most of the learning materials we have produced throughout the years.

It makes sense to start learning by reading and watching videos about fundamentals and how things work.

Search from all Lessons


LoginGet Started
← Back to Lessons
Edit on Github
Open in Colab

Algorithms and Data Structures Optimization

Optimization of Algorithms

The origin of computing and programming required the use of languages very close to the machine. These languages were called assembler languages or machine languages. By directly programming sentences at a low level, the efficiency of these codes was very high. However, the complexity of developing them was very high.

Today, we can use high-level languages that delegate issues such as memory access, registers, etc. and allow us not to worry about them. The only drawback of this abstraction is the problem of developing highly inefficient algorithms or computer programs.

What is an algorithm?

An algorithm is a set of instructions that are followed to achieve a goal or produce a result. This term does not only apply to the computer world. For example, the execution of everyday tasks as simple as brushing your teeth, washing your hands or following the instruction manual for assembling a piece of furniture can be seen as an algorithm. In programming, an algorithm is a set of computer instructions that constitute a function.

Let's look at an example of a very simple algorithm that allows us to define a program to calculate the area of a triangle:

Process CalculateTriangleArea(base, height)
    Multiply the base by the height;
    Divide the result by 2;
    Write "The area is", area;
EndProcess

Once the algorithm has been defined, we can implement it in a programming language such as Python:

In [1]:
def calculate_triangle_area(base, height):
    product = base * height
    area = product / 2
    return f"The area is {area}"

calculate_triangle_area(20, 15)
Out[1]:
'The area is 150.0'

Time complexity of algorithms

A simple problem can be solved using many different algorithms. Some solutions simply take less time and space than others. But how do we know which solutions are more efficient?

The time complexity is the number of operations an algorithm performs to complete its task (considering that each operation takes the same amount of time). The algorithm that performs the task with the fewest number of operations is considered the most efficient in terms of time complexity. Typically, the programming languages most commonly used in data analysis, such as Python, R or Julia try to optimize the computational complexity as much as possible, and it is, in fact, one of the reasons why there are developers who prefer one or the other.

There are several definitions of time complexity:

  • O(1)O(1) - Constant time: The statement only needs one unit of time to finish. It is the best measure of all.
  • O(n)O(n) - Linear time: The statement needs nn time units to perform the task. If, for example, n=3n = 3, the statement will need 33 units of time to finish.
  • O(n2)O(n^2) - Quadratic time: The statement needs nnn - n time units to perform the task. If for example n=3n = 3, the statement will need 33=93 - 3 = 9 time units to finish.
  • O(Cn)O(C^n) - Exponential time: The statement needs a large amount of time to finish. It is the worst measure of all.

There are many more measures to catalog the efficiency of the sentences and, therefore, of the algorithms. The following graph shows the comparison between the most common measures:

Big-O Complexity Chart

It can be clearly seen that the ideal scenario is to have algorithms composed of O(1)O(1) statements. Normally, Python already has many of these optimizations and streamlines all processes and functions, as well as in the various libraries and packages, so that whenever we use any function, it is of a much reduced complexity. However, something that we have to be fully responsible for and that directly impacts the code and can affect its efficiency is good programming practices.

Best Programming Practices for Reducing Time Complexity

1. Stay current

As hardware advances, so does software. If processors or graphics cards improve in capabilities and speed, so must programming languages. The first principle of any good developer looking for efficient code is to constantly update the version of the libraries and programming language. So, for example, Python 3 is much faster than Python 2, so part of that needed efficiency could be achieved simply by upgrading the language.

2. Don't program everything you want to do in detail, rely on libraries

Surely at some point you will want to solve something in Python, and you will get down to work, such as calculating the average by first adding the numbers and then dividing by the total sample size. Did you know that this can be done in one line using a multitude of libraries? In addition, the code behind the functions of these libraries will usually be highly optimized by taking advantage of every available resource, so it will always be more efficient to use them rather than to program your own. In addition, the code will be more understandable, cleaner and surely more scalable.

✅ Do this❌ Don't do this
np.array([1, 2, 3]).mean()
def mean(elements):
    sum = elements.sum()
    n = len(elements)
    return sum/n

mean(np.array([1, 2, 3]))


names['Gender'].replace('female', 'FEMALE', inplace=True)

names["Gender'].loc[names.Gender=='female'] = 'FEMALE'

This also applies to projects or applications that you want to make. Maybe they have already been done before and you can start from that project to make your own. Do you want to make a calculator? Do some research, see if someone has already made one in Python and use it as a starting point.

Here are some of the most used and necessary tricks in day-to-day work with data: https://www.turing.com/kb/22-hottest-python-tricksfor-efficient-coding

3. Uses efficient data structures

Python provides many mechanisms for performing computationally and temporally efficient tasks, as shown in the following examples:

✅ Do this❌ Don't do this
def good_list(elements):
    my_list = [value for value in range(elements)]

def bad_list(elements):
    my_list = []
    for value in range(elements):
        my_list.append(value)

def good_string_joiner(elements):
    "".join(elements)

def bad_string_joiner(elements):
    final_string = ""
    for value in elements:
        final_string += value

There are many ways to optimize your code, like the ones shown above using, first of all, list comprehension, string accumulation using join, collections, itertools...

More information on how you can make your code as efficient as possible by taking advantage of Python's native tools and packages here: https://khuyentran1401.github.io/Efficient_Python_tricks_and_tools_for_data_scientists/README.html

4. Remove what you don't need

In any programming language, variables and objects take up memory, so a good way to keep your code clean is to remove variables that you no longer need. In Python you can see how much memory your variable occupies with the sys.getsizeof(variable) function of the sys package. If you notice that a variable has a considerable weight, you might consider deleting it so as not to unnecessarily load the memory of the execution environment, since the more collapsed it is and the more its memory is used, the worse it will perform.

5. Learn from others

Many times, we need inspiration from others. Maybe until you have read this content, you did not know that mechanisms such as list comprehension or the NumPy function to calculate the average existed. Therefore, the best way to make your code more efficient is to learn from the code of other developers. Experience is the best way to efficiency.

Best practices in programming to improve the code

In addition to having learned about best practices to make code more efficient in terms of time and resources, there are also best practices to make code more understandable and standardized, so that it facilitates the exchange of knowledge between developers and follows a common standard. There are many proposals and guides to developing Python code, but the best known is the PEP 8 - Style Guide for Python Code, which you can read here: https://peps.python.org/pep-0008/

This document provides coding conventions for Python code that comprises the standard library in the main Python distribution. In addition, this guide is under constant revision and evolves with time and language releases.