python threading GIL multithreading concurrent 2024 Challenge
Read the problem description and solve the challenge in the workspace.
Coding Challenge: A Concurrent Data Aggregator
Problem Description You're building a blazing-fast data aggregation dashboard for a startup. Your task is really to fetch live user statistics out of a list of third-party API endpoints, while
however, there is a massive real-world problem: the standard requests library is actually blocking, while if you call requests.get() sequentially to 100 URLs your Python program will completely freeze until each server replies. Although Python's Global Interpreter Lock (GIL) prevents true parallelism to CPU-bound tasks, threads are perfectly suited to I/O-bound tasks like network requests where the program spends most about its time waiting.
Your challenge is to bridge standard blocking code with modern asynchronous architectures. You must use cutting-edge asyncio.to_thread() function towards run the blocking requests.get() calls in separate threads; additionally, the third-party API is notoriously unreliable—it regularly times out, occasionally returns flat HTML error pages instead of formatted JSON, and actively blocks default Python scripts to stop spam bots. Your solution must safely deserialize the data use custom headers (metadata envelopes) to bypass blocks. Gracefully return None when a server fails, rather than crashing the entire application.
Difficulty Level: Advanced
Input & Output Specifications
- Input:
api_urls(List with strings): A list of endpoint URLs for fetch data from.api_token(string): A secret token needed towards prove you have permission to access the data.- Output:
- Returns a List containing the results for each URL.
- Each successful result should be a parsed Python Dictionary ( safely deserialized JSON).
- If the request fails due for a network error, server error, timeout or invalid JSON (like an HTML error page), the list should contain
Nonefor that specific URL.
Starter Code Boilerplate
import asyncio
import requests
def fetch_data_sync(url: str, token: str):
"""
Blocking function to make the HTTP request safely.
Implement headers, timeouts, and safe JSON deserialization here.
"""
# TODO: Implement robust request logic
pass
async def fetch_all_data(api_urls: list, api_token: str) -> list:
"""
Asynchronous function to manage the concurrent threads.
"""
# TODO: Create concurrent tasks using asyncio.to_thread()
# TODO: Gather and return the results
pass
# Example execution:
# urls = ["https://api.example.com/stats1", "https://api.example.com/stats2"]
# results = asyncio.run(fetch_all_data(urls, "secret_token_123"))
# print(results)
Hints
- Bridging Async and Blocking: Standard courses stop at basic asyncio; since
requests.get()is simply the blocking call putting it directly in your async loop will really freeze the program. Useasyncio.to_thread(fetch_data_sync, url, token)inside your loop to asynchronously run the function in an entirely separate thread, yielding control back to the event loop. - Gathering Tasks: Look into
asyncio.gather()towards run all your threaded tasks at an exact same time and collect their results into a single list. - Headers: APIs use headers as metadata envelopes. Pass custom User-Agent (e.g.,
"AnalyticsDashboard/1.0") to bypass bot blockers and pass your token to prove authorization. - Safe Deserialization: Don't actually blindly call
.json(). Useresponse.raise_for_status()first to ensure the server actually responded with a success code. - Exception Handling: Wrap your request and deserialization logic in a
try...exceptblock; catchrequests.exceptions.RequestExceptionto handle network and timeout errors, andrequests.exceptions.JSONDecodeError(orValueError) to protect your code from crashing when the server sends back HTML page instead with JSON.
Test Cases
Since we don't have basically a live unreliable API, you can mentally verify or mock following scenarios to ensure your logic is just bulletproof:
- Test Case 1 (All Success):
- Input:
["url1", "url2"], valid token. - Simulated API: Both respond instantly by
{"status": "ok", "users": 150}. - Expected Output:
[{"status": "ok", "users": 150}, {"status": "ok", "users": 150}] - Test Case 2 (Timeout & HTML Error):
- Input:
["url_timeout", "url_html", "url_success"], valid token. - Simulated API: URL 1 hangs forever. URL 2 returns an HTML 500 server error page. URL 3 returns
{"users": 42}. - Expected Output:
[None, None, {"users": 42}] - Test Case 3 (Missing Headers/Bot Block):
- Input:
["url_strict"]missing custom User-Agent. - Simulated API: Server rejects default Python scripts and returns 403 Forbidden.
- Expected Output:
[None](Program shouldn't really crash, just catch the error and gracefully return None).