Introduction

Have you ever encountered situations like these: writing a web crawler program with frustratingly slow crawling speed, or developing a data processing program where CPU usage stays at 25% in single-thread mode while other cores remain idle? These are signs of not fully utilizing concurrent programming. Today, I'd like to discuss Python concurrent programming with you.

Basic Concepts

When discussing concurrent programming, we first need to clarify some basic concepts. Many people often confuse Concurrency with Parallelism, which is a common misconception.

Concurrency refers to the logical structure of a program, indicating its ability to handle multiple tasks simultaneously. Parallelism refers to the running state of a program, indicating its ability to execute simultaneously on multiple processors. To use an analogy, concurrency is like one person making multiple phone calls, appearing to happen simultaneously but actually quickly switching between tasks; parallelism is like multiple people each making their own phone calls, truly happening simultaneously.

In Python, there are three main ways to implement concurrent programming: Threading, Multiprocessing, and AsyncIO. Each method has its suitable scenarios, and choosing the appropriate concurrency method is crucial for improving program performance.

Threading

When discussing Python threading, we must mention the GIL (Global Interpreter Lock). Many people might have heard that "Python's multithreading is fake," which is actually a misunderstanding of GIL.

GIL is a feature of Python's CPython interpreter that ensures only one thread can execute Python bytecode at a time. This sounds terrible, but the actual situation is more complex.

Let's look at a specific example:

import threading
import time

def cpu_bound_task(n):
    while n > 0:
        n -= 1

def io_bound_task():
    time.sleep(1)

def run_tasks_sequentially():
    start = time.time()
    cpu_bound_task(10**7)
    io_bound_task()
    print(f"Sequential execution took: {time.time() - start:.2f} seconds")

def run_tasks_threaded():
    start = time.time()
    t1 = threading.Thread(target=cpu_bound_task, args=(10**7,))
    t2 = threading.Thread(target=io_bound_task)
    t1.start()
    t2.start()
    t1.join()
    t2.join()
    print(f"Threaded execution took: {time.time() - start:.2f} seconds")

run_tasks_sequentially()
run_tasks_threaded()

In this example, we define two tasks: a CPU-intensive computation task and an IO-intensive sleep task. When we run these two tasks using multiple threads, you'll find that the execution time doesn't significantly decrease. This is the effect of GIL.

However, this doesn't mean Python's multithreading is completely useless. For IO-intensive tasks, such as network requests and file operations, multithreading can still significantly improve performance. This is because Python releases the GIL during IO operations, allowing other threads to execute.

I remember once when developing a web crawler program, using multithreading improved performance by nearly 10 times. This was because network requests are mainly IO operations, and threads release the GIL while waiting for network responses, allowing other threads to continue working.

Multiprocessing

For CPU-intensive tasks, Python's multiprocessing is a better choice. Since each process has its own Python interpreter and memory space, it's not limited by the GIL.

Let's look at an example of using multiprocessing to handle CPU-intensive tasks:

from multiprocessing import Process, Pool
import time
import os

def cpu_intensive_task(n):
    result = 0
    for i in range(n):
        result += i * i
    return result

def run_with_multiprocessing():
    start = time.time()
    numbers = [10**7, 10**7, 10**7, 10**7]

    with Pool() as pool:
        results = pool.map(cpu_intensive_task, numbers)

    print(f"Multiprocessing took: {time.time() - start:.2f} seconds")
    return results

def run_sequentially():
    start = time.time()
    numbers = [10**7, 10**7, 10**7, 10**7]
    results = [cpu_intensive_task(n) for n in numbers]
    print(f"Sequential execution took: {time.time() - start:.2f} seconds")
    return results

if __name__ == '__main__':
    sequential_results = run_sequentially()
    multiprocessing_results = run_with_multiprocessing()

Running this code on a 4-core CPU machine, you'll find that the multiprocessing version runs nearly 4 times faster than sequential execution. This is because each process can run in parallel on different CPU cores.

However, using multiprocessing comes with its costs. Inter-process communication overhead is much higher than between threads, and each process requires its own memory space. I encountered memory shortage issues when processing large-scale data analysis and had to switch to other solutions.

Async IO

The async/await syntax introduced in Python 3.5 made asynchronous programming more elegant. Async IO is a single-threaded concurrent method particularly suitable for IO-intensive tasks.

Let's look at an example of using asyncio to handle concurrent network requests:

import asyncio
import aiohttp
import time

async def fetch_url(session, url):
    async with session.get(url) as response:
        return await response.text()

async def main():
    urls = [
        'http://python.org',
        'http://pypi.org',
        'http://docs.python.org',
        'http://python.org/about'
    ]

    async with aiohttp.ClientSession() as session:
        tasks = [fetch_url(session, url) for url in urls]
        start = time.time()
        responses = await asyncio.gather(*tasks)
        print(f"Async execution took: {time.time() - start:.2f} seconds")
        return responses

if __name__ == '__main__':
    asyncio.run(main())

The advantage of Async IO is that it can handle many concurrent connections with minimal system resources. In the above example, we initiate multiple HTTP requests simultaneously, but the entire program uses only one thread. This approach is particularly suitable for applications that need to handle many IO operations, such as web servers.

When developing a real-time data processing system, I used Async IO to handle WebSocket connections. The system needed to handle thousands of connections simultaneously. If using multithreading or multiprocessing, the server's resource consumption would be very high. After using Async IO, the system's resource usage greatly decreased while performance significantly improved.

Performance Comparison

Let's do a concrete performance comparison. Here's a comprehensive test example:

import threading
import multiprocessing
import asyncio
import time
import aiohttp
import requests
from concurrent.futures import ThreadPoolExecutor, ProcessPoolExecutor

def io_bound_sync(urls):
    for url in urls:
        requests.get(url)

def io_bound_threading(urls):
    with ThreadPoolExecutor(max_workers=4) as executor:
        executor.map(requests.get, urls)

async def io_bound_async(urls):
    async with aiohttp.ClientSession() as session:
        tasks = [
            session.get(url)
            for url in urls
        ]
        await asyncio.gather(*tasks)

def cpu_bound_task(n):
    result = 0
    for i in range(n):
        result += i * i
    return result

def main():
    # IO-intensive task test
    urls = ['http://httpbin.org/delay/1'] * 4

    # Synchronous execution
    start = time.time()
    io_bound_sync(urls)
    print(f"Sync IO: {time.time() - start:.2f}s")

    # Multithreaded execution
    start = time.time()
    io_bound_threading(urls)
    print(f"Threading IO: {time.time() - start:.2f}s")

    # Asynchronous execution
    start = time.time()
    asyncio.run(io_bound_async(urls))
    print(f"Async IO: {time.time() - start:.2f}s")

    # CPU-intensive task test
    numbers = [10**7] * 4

    # Synchronous execution
    start = time.time()
    for n in numbers:
        cpu_bound_task(n)
    print(f"Sync CPU: {time.time() - start:.2f}s")

    # Multiprocessing execution
    start = time.time()
    with ProcessPoolExecutor() as executor:
        executor.map(cpu_bound_task, numbers)
    print(f"Multiprocessing CPU: {time.time() - start:.2f}s")

if __name__ == '__main__':
    main()

From the test results, we can see:

For IO-intensive tasks:
Synchronous execution: about 4 seconds
Multithreading: about 1 second
Async IO: about 1 second
For CPU-intensive tasks:
Synchronous execution: about 8 seconds
Multiprocessing: about 2 seconds (on a 4-core CPU)

These results well support our previous discussion: IO-intensive tasks are suitable for multithreading or Async IO, while CPU-intensive tasks are suitable for multiprocessing.

Practical Recommendations

Based on my experience, when choosing a concurrency solution, I recommend following these principles:

If the task is CPU-intensive (like heavy computation, image processing, etc.), choose multiprocessing
If the task is IO-intensive (like network requests, file operations, etc.), you can choose:
For synchronous code, use multithreading
If async libraries are available, choose Async IO
If the system has both CPU-intensive and IO-intensive tasks, consider mixing multiprocessing with Async IO

Did you know? Python's Async IO ecosystem is rapidly developing. For example, the FastAPI framework is built entirely on Async IO and often outperforms other Python web frameworks in performance tests. This shows that Async IO is becoming an important direction in Python concurrent programming.

Conclusion

Through this article, we've deeply explored Python's three main concurrent programming methods: multithreading, multiprocessing, and Async IO. Each method has its suitable scenarios; the key is choosing the right solution based on specific needs.

By the way, have you used concurrent programming in actual projects? What problems did you encounter? Feel free to share your experiences in the comments.

Finally, here's a thought question: In a web application that needs to handle many user requests simultaneously, where each request requires some CPU-intensive computation while also accessing a database, how would you design the concurrent architecture of this system?