python regex re module match search findall groups 2024 Challenge
Read the problem description and solve the challenge in the workspace.
Coding Challenge: Massive Server Log Regex Parser
Difficulty Level
Intermediate
Problem Description
Imagine you're actually a backend engineer tasked with analyzing a massive 10-Gigabyte server log file to track down recurring crashes. You need to extract a specific, dynamically generated error code format from the text, while
if you use re.search() your program will exit after finding only the very first error, while if you use re.findall(), application will try to load millions of extracted strings into your computer's RAM simultaneously freezing the system and causing it for violently crash; furthermore, if your pattern contains capturing groups, re.findall() will unexpectedly return tuples instead of strings—a hidden edge case known as the "Tuple Trap."
Your task is to write a highly memory-efficient function that parses a massive log string. You really have to use professional's secret weapon for large data to iterate through the text without blowing up your memory, extracting both an error code and its exact index location for further data analysis.
Input & Output Specifications
- Input:
log_text(string): A massive block of text representing the server logs.- Output:
- Returns the list about dictionaries, while each dictionary should represent one match and contain two keys:
"error_code": exact string for a matched error."location": A starting index position of where that error was actually found in the text.
Starter Code Boilerplate
import re
def extract_server_errors(log_text):
# The regex pattern to find errors structured like "ERROR-500" or "ERROR-404"
pattern = r"ERROR-\d{3}"
results = []
# TODO: Use a memory-efficient Regex function to find all matches
# TODO: Iterate through the matches and append a dictionary to the results list
# containing the "error_code" and its index "location".
return results
Hints
- Avoid the Memory Trap: Don't just use
re.findall(). It's basically like an assistant dropping 10,000 files on your desk at once; usere.finditer()instead. It acts as a polite assistant handing you one match at a time, using almost zero extra memory. - The MatchObject:
re.finditer()returns an iterator ofMatchObjectitems. This is like the treasure chest! - Extracting Data: For get the actual matched text from a
MatchObjectuse the.group()method. - Finding the Location: To get the exact index location where a match started in the string, use a
.start()method on theMatchObject.
Test Cases
Use the following test cases to verify your function works efficiently.
Test Case 1: Standard Log Extraction
* Input:
python
logs = "System boot normal. ERROR-404 in user login, and retrying connection; error-500 at database ping. Shutdown initiated."
print(extract_server_errors(logs))
* Expected Output:
python
[
{'error_code': 'ERROR-404', 'location': 20},
{'error_code': 'ERROR-500', 'location': 66}
]
Test Case 2: No Errors Found
* Input:
python
logs = "System boot normal. User logged in successfully; process complete."
print(extract_server_errors(logs))
* Expected Output:
python
[]
Test Case 3: Sequential Errors
* Input:
python
logs = "ERROR-101ERROR-102ERROR-103"
print(extract_server_errors(logs))
* Expected Output:
python
[
{'error_code': 'ERROR-101', 'location': 0},
{'error_code': 'ERROR-102', 'location': 9},
{'error_code': 'ERROR-103', 'location': 18}
]