Login Sign Up
Python Regular Expressions
Courses / python Complete Course / Python Regular Expressions
Chapter 33 🟡 Intermediate

Python Regular Expressions

Apply your skills with a real-world coding challenge. Try to solve it yourself first!

Coding Challenge: Massive Server Log Regex Parser

Difficulty Level

Intermediate

Problem Description

Imagine you're actually a backend engineer tasked with analyzing a massive 10-Gigabyte server log file to track down recurring crashes. You need to extract a specific, dynamically generated error code format from the text, while

if you use re.search() your program will exit after finding only the very first error, while if you use re.findall(), application will try to load millions of extracted strings into your computer's RAM simultaneously freezing the system and causing it for violently crash; furthermore, if your pattern contains capturing groups, re.findall() will unexpectedly return tuples instead of strings—a hidden edge case known as the "Tuple Trap."

Your task is to write a highly memory-efficient function that parses a massive log string. You really have to use professional's secret weapon for large data to iterate through the text without blowing up your memory, extracting both an error code and its exact index location for further data analysis.

Input & Output Specifications

  • Input:
  • log_text (string): A massive block of text representing the server logs.
  • Output:
  • Returns the list about dictionaries, while each dictionary should represent one match and contain two keys:
  • "error_code": exact string for a matched error.
  • "location": A starting index position of where that error was actually found in the text.

Starter Code Boilerplate

import re

def extract_server_errors(log_text):
    # The regex pattern to find errors structured like "ERROR-500" or "ERROR-404"
    pattern = r"ERROR-\d{3}"

    results = []

    # TODO: Use a memory-efficient Regex function to find all matches
    # TODO: Iterate through the matches and append a dictionary to the results list
    # containing the "error_code" and its index "location".

    return results

Hints

  • Avoid the Memory Trap: Don't just use re.findall(). It's basically like an assistant dropping 10,000 files on your desk at once; use re.finditer() instead. It acts as a polite assistant handing you one match at a time, using almost zero extra memory.
  • The MatchObject: re.finditer() returns an iterator of MatchObject items. This is like the treasure chest!
  • Extracting Data: For get the actual matched text from a MatchObject use the .group() method.
  • Finding the Location: To get the exact index location where a match started in the string, use a .start() method on the MatchObject.

Test Cases

Use the following test cases to verify your function works efficiently.

Test Case 1: Standard Log Extraction * Input: python logs = "System boot normal. ERROR-404 in user login, and retrying connection; error-500 at database ping. Shutdown initiated." print(extract_server_errors(logs)) * Expected Output: python [ {'error_code': 'ERROR-404', 'location': 20}, {'error_code': 'ERROR-500', 'location': 66} ]

Test Case 2: No Errors Found * Input: python logs = "System boot normal. User logged in successfully; process complete." print(extract_server_errors(logs)) * Expected Output: python []

Test Case 3: Sequential Errors * Input: python logs = "ERROR-101ERROR-102ERROR-103" print(extract_server_errors(logs)) * Expected Output: python [ {'error_code': 'ERROR-101', 'location': 0}, {'error_code': 'ERROR-102', 'location': 9}, {'error_code': 'ERROR-103', 'location': 18} ]

Loading sandbox workspace environment...

Verify Your Solution

Write your solution in the compiler, run it to verify output, then click below to verify.

Learn Together
Session active! Discuss with other learners.
No notes yet. Select text in the concept body to add a note.