Login Sign Up
Challenges / python regex re module match search findall groups 2024

python regex re module match search findall groups 2024 Challenge

Read the problem description and solve the challenge in the workspace.

Open Full Sandbox Studio
Problem Description

Coding Challenge: Massive Server Log Regex Parser

Difficulty Level

Intermediate

Problem Description

Imagine you're actually a backend engineer tasked with analyzing a massive 10-Gigabyte server log file to track down recurring crashes. You need to extract a specific, dynamically generated error code format from the text, while

if you use re.search() your program will exit after finding only the very first error, while if you use re.findall(), application will try to load millions of extracted strings into your computer's RAM simultaneously freezing the system and causing it for violently crash; furthermore, if your pattern contains capturing groups, re.findall() will unexpectedly return tuples instead of strings—a hidden edge case known as the "Tuple Trap."

Your task is to write a highly memory-efficient function that parses a massive log string. You really have to use professional's secret weapon for large data to iterate through the text without blowing up your memory, extracting both an error code and its exact index location for further data analysis.

Input & Output Specifications

  • Input:
  • log_text (string): A massive block of text representing the server logs.
  • Output:
  • Returns the list about dictionaries, while each dictionary should represent one match and contain two keys:
  • "error_code": exact string for a matched error.
  • "location": A starting index position of where that error was actually found in the text.

Starter Code Boilerplate

import re

def extract_server_errors(log_text):
    # The regex pattern to find errors structured like "ERROR-500" or "ERROR-404"
    pattern = r"ERROR-\d{3}"

    results = []

    # TODO: Use a memory-efficient Regex function to find all matches
    # TODO: Iterate through the matches and append a dictionary to the results list
    # containing the "error_code" and its index "location".

    return results

Hints

  • Avoid the Memory Trap: Don't just use re.findall(). It's basically like an assistant dropping 10,000 files on your desk at once; use re.finditer() instead. It acts as a polite assistant handing you one match at a time, using almost zero extra memory.
  • The MatchObject: re.finditer() returns an iterator of MatchObject items. This is like the treasure chest!
  • Extracting Data: For get the actual matched text from a MatchObject use the .group() method.
  • Finding the Location: To get the exact index location where a match started in the string, use a .start() method on the MatchObject.

Test Cases

Use the following test cases to verify your function works efficiently.

Test Case 1: Standard Log Extraction * Input: python logs = "System boot normal. ERROR-404 in user login, and retrying connection; error-500 at database ping. Shutdown initiated." print(extract_server_errors(logs)) * Expected Output: python [ {'error_code': 'ERROR-404', 'location': 20}, {'error_code': 'ERROR-500', 'location': 66} ]

Test Case 2: No Errors Found * Input: python logs = "System boot normal. User logged in successfully; process complete." print(extract_server_errors(logs)) * Expected Output: python []

Test Case 3: Sequential Errors * Input: python logs = "ERROR-101ERROR-102ERROR-103" print(extract_server_errors(logs)) * Expected Output: python [ {'error_code': 'ERROR-101', 'location': 0}, {'error_code': 'ERROR-102', 'location': 9}, {'error_code': 'ERROR-103', 'location': 18} ]

Loading sandbox workspace environment...

Verify Your Solution

Run assertions against your code in the sandbox environment.

Sandbox Instructions

1. Click Copy Starter Boilerplate at the top to copy function definition.
2. Use the interactive compiler to implement and run your code securely.
3. Click Verify & Submit Solution to validate your code.