Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
160 commits
Select commit Hold shift + click to select a range
e9d92f5
feat: solve Striver SDE Sheet problem 16 - Merge Intervals
TechnoBlogger14o3 Dec 8, 2025
34a89ee
feat: solve Striver SDE Sheet problem 17 - Next Permutation
TechnoBlogger14o3 Dec 8, 2025
691484b
feat: solve Striver SDE Sheet problem 1 - Reverse String
TechnoBlogger14o3 Dec 8, 2025
72e097e
feat: solve Striver SDE Sheet problem 9 - Count and Say
TechnoBlogger14o3 Dec 8, 2025
081438d
feat: solve Striver SDE Sheet problem 10 - Flood Fill
TechnoBlogger14o3 Dec 8, 2025
e2f980e
feat: solve Striver SDE Sheet problem 11 - Reorganize String
TechnoBlogger14o3 Dec 8, 2025
bfb2329
feat: solve Striver SDE Sheet problem 13 - Word Ladder
TechnoBlogger14o3 Dec 8, 2025
363de37
feat: solve Striver SDE Sheet problem 15 - Merge Intervals
TechnoBlogger14o3 Dec 8, 2025
b772f87
feat: solve Striver SDE Sheet problem 19 - Best Time to Buy and Sell …
TechnoBlogger14o3 Dec 8, 2025
51d6eaf
feat: solve Striver SDE Sheet problem 27 - Majority Element II
TechnoBlogger14o3 Dec 8, 2025
a6a8e92
feat: solve Striver SDE Sheet problem 29 - Longest Common Prefix
TechnoBlogger14o3 Dec 8, 2025
9c2b349
feat: solve Striver SDE Sheet problem 39 - Reverse String
TechnoBlogger14o3 Dec 8, 2025
4852ec7
feat: solve Striver SDE Sheet problem 43 - Remove Invalid Parentheses
TechnoBlogger14o3 Dec 8, 2025
c14a578
feat: solve Striver SDE Sheet problem 46 - Count and Say
TechnoBlogger14o3 Dec 8, 2025
5f92781
feat: solve Striver SDE Sheet problem 47 - Flood Fill
TechnoBlogger14o3 Dec 8, 2025
c249a8c
feat: solve Striver SDE Sheet problem 48 - Reorganize String
TechnoBlogger14o3 Dec 8, 2025
ca844ec
feat: solve Striver SDE Sheet problem 50 - Word Ladder
TechnoBlogger14o3 Dec 8, 2025
544d07a
feat: solve Striver SDE Sheet problem 53 - Merge Intervals
TechnoBlogger14o3 Dec 8, 2025
e3ef156
feat: solve Striver SDE Sheet problem 63 - Reverse String
TechnoBlogger14o3 Dec 8, 2025
4c8083f
feat: solve Striver SDE Sheet problem 70 - Count and Say
TechnoBlogger14o3 Dec 8, 2025
d080479
feat: solve Striver SDE Sheet problem 71 - Flood Fill
TechnoBlogger14o3 Dec 8, 2025
5ee330d
feat: solve Striver SDE Sheet problem 72 - Reorganize String
TechnoBlogger14o3 Dec 8, 2025
e18e9ca
feat: solve Striver SDE Sheet problem 74 - Word Ladder
TechnoBlogger14o3 Dec 8, 2025
69c4361
feat: solve Striver SDE Sheet problem 77 - Merge Intervals
TechnoBlogger14o3 Dec 8, 2025
a9f6f18
feat: solve Striver SDE Sheet problem 88 - Majority Element II
TechnoBlogger14o3 Dec 8, 2025
8a60340
feat: solve Striver SDE Sheet problem 90 - Longest Common Prefix
TechnoBlogger14o3 Dec 8, 2025
636e261
feat: solve Striver SDE Sheet problem 99 - Reverse String
TechnoBlogger14o3 Dec 8, 2025
d8a9d9c
feat: solve Striver SDE Sheet problem 106 - Count and Say
TechnoBlogger14o3 Dec 8, 2025
c9e4b32
feat: solve Striver SDE Sheet problem 107 - Flood Fill
TechnoBlogger14o3 Dec 8, 2025
8468da1
feat: solve Striver SDE Sheet problem 108 - Reorganize String
TechnoBlogger14o3 Dec 8, 2025
4f3b410
feat: solve Striver SDE Sheet problem 110 - Reverse String
TechnoBlogger14o3 Dec 8, 2025
beab043
feat: solve Striver SDE Sheet problem 112 - Delete Node in a BST
TechnoBlogger14o3 Dec 8, 2025
496370b
feat: solve Striver SDE Sheet problem 117 - Count and Say
TechnoBlogger14o3 Dec 8, 2025
a4c579b
feat: solve Striver SDE Sheet problem 118 - Flood Fill
TechnoBlogger14o3 Dec 8, 2025
eb74837
feat: solve Striver SDE Sheet problem 119 - Reorganize String
TechnoBlogger14o3 Dec 8, 2025
cc4a1ce
feat: solve Striver SDE Sheet problem 121 - Word Ladder
TechnoBlogger14o3 Dec 8, 2025
967af30
feat: solve Striver SDE Sheet problem 124 - Merge Intervals
TechnoBlogger14o3 Dec 8, 2025
0f766ed
feat: solve Striver SDE Sheet problem 133 - Reverse String
TechnoBlogger14o3 Dec 8, 2025
c8c16f1
feat: solve Striver SDE Sheet problem 140 - Count and Say
TechnoBlogger14o3 Dec 8, 2025
a876434
feat: solve Striver SDE Sheet problem 141 - Flood Fill
TechnoBlogger14o3 Dec 8, 2025
5364dbc
feat: solve Striver SDE Sheet problem 142 - Reorganize String
TechnoBlogger14o3 Dec 8, 2025
86b2b57
feat: solve Striver SDE Sheet problem 144 - Word Ladder
TechnoBlogger14o3 Dec 8, 2025
511ee34
feat: solve Striver SDE Sheet problem 147 - Merge Intervals
TechnoBlogger14o3 Dec 8, 2025
530ea77
feat: solve Striver SDE Sheet problem 158 - Majority Element II
TechnoBlogger14o3 Dec 8, 2025
f14f44b
feat: solve Striver SDE Sheet problem 160 - Longest Common Prefix
TechnoBlogger14o3 Dec 8, 2025
be0b3d1
feat: solve Striver SDE Sheet problem 176 - Longest Palindromic Subst…
TechnoBlogger14o3 Dec 8, 2025
cd76756
feat: solve Striver SDE Sheet problem 194 - Reverse String
TechnoBlogger14o3 Dec 8, 2025
54a0736
feat: solve Striver SDE Sheet problem 201 - Flood Fill
TechnoBlogger14o3 Dec 8, 2025
dadb875
feat: solve Striver SDE Sheet problem 202 - Flood Fill
TechnoBlogger14o3 Dec 8, 2025
beceeeb
feat: solve Striver SDE Sheet problem 203 - Clone Graph
TechnoBlogger14o3 Dec 8, 2025
01dc146
feat: solve Striver SDE Sheet problem 204 - Number of Operations to M…
TechnoBlogger14o3 Dec 8, 2025
a3f0720
feat: solve Striver SDE Sheet problem 205 - Word Ladder
TechnoBlogger14o3 Dec 8, 2025
ddddf3b
feat: solve Striver SDE Sheet problem 208 - Merge Intervals
TechnoBlogger14o3 Dec 8, 2025
930cdd2
feat: solve Striver SDE Sheet problem 219 - Snakes and Ladders
TechnoBlogger14o3 Dec 8, 2025
cb949bf
feat: solve Striver SDE Sheet problem 221 - Longest Common Prefix
TechnoBlogger14o3 Dec 8, 2025
054e780
feat: solve Striver SDE Sheet problem 226 - Cheapest Flights Within K…
TechnoBlogger14o3 Dec 8, 2025
025ba22
feat: solve Striver SDE Sheet problem 240 - Reverse String
TechnoBlogger14o3 Dec 8, 2025
599ddcf
feat: solve Striver SDE Sheet problem 247 - Count and Say
TechnoBlogger14o3 Dec 8, 2025
aab402d
feat: solve Striver SDE Sheet problem 248 - Flood Fill
TechnoBlogger14o3 Dec 8, 2025
a627a18
feat: solve Striver SDE Sheet problem 249 - Reorganize String
TechnoBlogger14o3 Dec 8, 2025
4c83740
feat: solve Striver SDE Sheet problem 251 - Word Ladder
TechnoBlogger14o3 Dec 8, 2025
8ac884e
feat: solve Striver SDE Sheet problem 254 - Merge Intervals
TechnoBlogger14o3 Dec 8, 2025
51d614b
feat: solve Striver SDE Sheet problem 265 - Majority Element II
TechnoBlogger14o3 Dec 8, 2025
fe94c60
feat: solve Striver SDE Sheet problem 267 - Longest Common Prefix
TechnoBlogger14o3 Dec 8, 2025
d0af626
feat: solve Striver SDE Sheet problem 276 - Reverse String
TechnoBlogger14o3 Dec 8, 2025
7b277c7
feat: solve Striver SDE Sheet problem 283 - Count and Say
TechnoBlogger14o3 Dec 8, 2025
c26eff5
feat: solve Striver SDE Sheet problem 284 - Reorganize String
TechnoBlogger14o3 Dec 8, 2025
4ced8c0
feat: solve Striver SDE Sheet problem 285 - Reorganize String
TechnoBlogger14o3 Dec 8, 2025
75274b1
feat: solve Striver SDE Sheet problem 287 - Word Ladder
TechnoBlogger14o3 Dec 8, 2025
1095aac
feat: solve Striver SDE Sheet problem 290 - Merge Intervals
TechnoBlogger14o3 Dec 8, 2025
973a88b
feat: solve Striver SDE Sheet problem 295 - Reverse String
TechnoBlogger14o3 Dec 8, 2025
8ad432c
feat: solve Striver SDE Sheet problem 302 - Count and Say
TechnoBlogger14o3 Dec 8, 2025
fcefe84
feat: solve Striver SDE Sheet problem 303 - Flood Fill
TechnoBlogger14o3 Dec 8, 2025
e31ffaf
feat: solve Striver SDE Sheet problem 304 - Reorganize String
TechnoBlogger14o3 Dec 8, 2025
23818be
feat: solve Striver SDE Sheet problem 306 - Word Ladder
TechnoBlogger14o3 Dec 8, 2025
707f120
feat: solve Striver SDE Sheet problem 309 - Merge Intervals
TechnoBlogger14o3 Dec 8, 2025
6d88b96
feat: solve Striver SDE Sheet problem 310 - Middle of the Linked List
TechnoBlogger14o3 Dec 8, 2025
4034fbb
feat: solve Striver SDE Sheet problem 320 - Majority Element II
TechnoBlogger14o3 Dec 8, 2025
555cf8f
feat: solve Striver SDE Sheet problem 322 - Longest Common Prefix
TechnoBlogger14o3 Dec 8, 2025
17f1345
feat: solve Striver SDE Sheet problem 332 - Reverse String
TechnoBlogger14o3 Dec 8, 2025
149fb11
feat: solve Striver SDE Sheet problem 334 - Search a 2D Matrix
TechnoBlogger14o3 Dec 8, 2025
05db9c2
feat: solve Striver SDE Sheet problem 339 - Count and Say
TechnoBlogger14o3 Dec 8, 2025
f779c98
feat: solve Striver SDE Sheet problem 340 - Flood Fill
TechnoBlogger14o3 Dec 8, 2025
5c523d1
feat: solve Striver SDE Sheet problem 342 - Reorganize String
TechnoBlogger14o3 Dec 8, 2025
6107a84
feat: solve Striver SDE Sheet problem 344 - Reverse String
TechnoBlogger14o3 Dec 8, 2025
4899227
feat: solve Striver SDE Sheet problem 347 - Search in Rotated Sorted …
TechnoBlogger14o3 Dec 8, 2025
605c2e2
feat: solve Striver SDE Sheet problem 351 - Count and Say
TechnoBlogger14o3 Dec 8, 2025
7ec8356
feat: solve Striver SDE Sheet problem 352 - Flood Fill
TechnoBlogger14o3 Dec 8, 2025
d312fa1
feat: solve Striver SDE Sheet problem 353 - Reorganize String
TechnoBlogger14o3 Dec 8, 2025
1909d03
feat: solve Striver SDE Sheet problem 355 - Word Ladder
TechnoBlogger14o3 Dec 8, 2025
50a3343
feat: solve Striver SDE Sheet problem 358 - Merge Intervals
TechnoBlogger14o3 Dec 8, 2025
8f19f6b
feat: solve Striver SDE Sheet problem 369 - Majority Element II
TechnoBlogger14o3 Dec 8, 2025
c8d8c3f
feat: solve Striver SDE Sheet problem 371 - Longest Common Prefix
TechnoBlogger14o3 Dec 8, 2025
e4c3515
feat: solve Striver SDE Sheet problem 381 - Reverse String
TechnoBlogger14o3 Dec 8, 2025
ead1662
feat: solve Striver SDE Sheet problem 389 - Count and Say
TechnoBlogger14o3 Dec 8, 2025
ad94d07
feat: solve Striver SDE Sheet problem 390 - Flood Fill
TechnoBlogger14o3 Dec 8, 2025
622714f
feat: solve Striver SDE Sheet problem 391 - Reorganize String
TechnoBlogger14o3 Dec 8, 2025
f758134
feat: solve Striver SDE Sheet problem 393 - Word Ladder
TechnoBlogger14o3 Dec 8, 2025
7ca15e4
feat: solve Striver SDE Sheet problem 396 - Merge Intervals
TechnoBlogger14o3 Dec 8, 2025
125001a
feat: solve Striver SDE Sheet problem 407 - Majority Element II
TechnoBlogger14o3 Dec 8, 2025
b2d7743
feat: solve Striver SDE Sheet problem 410 - Longest Common Prefix
TechnoBlogger14o3 Dec 8, 2025
15743ff
feat: solve Striver SDE Sheet problem 423 - Reverse String
TechnoBlogger14o3 Dec 8, 2025
dda61dc
feat: solve Striver SDE Sheet problem 430 - Count and Say
TechnoBlogger14o3 Dec 8, 2025
0fde53a
feat: solve Striver SDE Sheet problem 431 - Flood Fill
TechnoBlogger14o3 Dec 8, 2025
6ef4e35
feat: solve Striver SDE Sheet problem 432 - Reorganize String
TechnoBlogger14o3 Dec 8, 2025
7a7ecfb
feat: solve Striver SDE Sheet problem 437 - Merge Intervals
TechnoBlogger14o3 Dec 8, 2025
8236198
feat: solve Striver SDE Sheet problem 448 - Majority Element II
TechnoBlogger14o3 Dec 8, 2025
392f9eb
feat: solve Striver SDE Sheet problem 450 - Longest Common Prefix
TechnoBlogger14o3 Dec 8, 2025
8d4ba3e
feat: solve Striver SDE Sheet problem 424 - Reverse String
TechnoBlogger14o3 Dec 8, 2025
ac1989c
feat: solve Striver SDE Sheet problem 434 - Word Ladder
TechnoBlogger14o3 Dec 8, 2025
83483e6
feat: solve Striver SDE Sheet problem 467 - Reverse String
TechnoBlogger14o3 Dec 8, 2025
dc63992
feat: solve Striver SDE Sheet problem 26 - Longest Consecutive Sequence
TechnoBlogger14o3 Dec 8, 2025
b3fa913
feat: solve Striver SDE Sheet problem 174 - Longest Palindromic Subst…
TechnoBlogger14o3 Dec 8, 2025
e052f24
feat: solve Striver SDE Sheet problem 301 - Remove Duplicates from So…
TechnoBlogger14o3 Dec 8, 2025
e5f2939
feat: solve Striver SDE Sheet problem 305 - Add Two Numbers
TechnoBlogger14o3 Dec 8, 2025
53f19d9
feat: implement solution for problem 26 - Longest Consecutive Sequence
TechnoBlogger14o3 Dec 8, 2025
dfe6685
feat: solve Striver SDE Sheet problem 174 - Longest Palindromic Subst…
TechnoBlogger14o3 Dec 8, 2025
d012bb9
feat: solve Striver SDE Sheet problem 301 - Remove Duplicates from So…
TechnoBlogger14o3 Dec 8, 2025
df0efec
feat: solve Striver SDE Sheet problem 305 - Add Two Numbers
TechnoBlogger14o3 Dec 8, 2025
f778c07
feat: solve Striver SDE Sheet problem 25 - Maximum Product Subarray
TechnoBlogger14o3 Dec 8, 2025
2acc163
feat: solve Striver SDE Sheet problem 31 - Trapping Rain Water
TechnoBlogger14o3 Dec 8, 2025
7cde71f
feat: solve Striver SDE Sheet problem 42 - Word Break
TechnoBlogger14o3 Dec 8, 2025
7902926
feat: solve Striver SDE Sheet problem 44 - Sudoku Solver
TechnoBlogger14o3 Dec 8, 2025
c85de52
feat: solve Striver SDE Sheet problem 57 - Subsets
TechnoBlogger14o3 Dec 8, 2025
e0b6fda
feat: solve Striver SDE Sheet problem 123 - Kth Smallest Element in a…
TechnoBlogger14o3 Dec 8, 2025
c1aa697
feat: solve Striver SDE Sheet problem 134 - Coin Change
TechnoBlogger14o3 Dec 8, 2025
e13e1e9
feat: solve Striver SDE Sheet problem 149 - Longest Increasing Subseq…
TechnoBlogger14o3 Dec 8, 2025
933b497
feat: solve Striver SDE Sheet problem 167 - Balanced Binary Tree
TechnoBlogger14o3 Dec 8, 2025
7ad899c
feat: solve Striver SDE Sheet problem 171 - Word Break
TechnoBlogger14o3 Dec 8, 2025
f490ed2
feat: solve Striver SDE Sheet problem 255 - Maximum Product Subarray
TechnoBlogger14o3 Dec 8, 2025
0c8a734
feat: solve Striver SDE Sheet problem 281 - Kth Smallest Element in a…
TechnoBlogger14o3 Dec 8, 2025
ea9ceb6
feat: solve Striver SDE Sheet problem 361 - Sort an Array
TechnoBlogger14o3 Dec 8, 2025
fc179d9
feat: solve Striver SDE Sheet problem 365 - Kth Smallest Element in a…
TechnoBlogger14o3 Dec 8, 2025
407d856
feat: solve Striver SDE Sheet problem 404 - Permutations
TechnoBlogger14o3 Dec 8, 2025
f04fd46
feat: solve Striver SDE Sheet problem 409 - LRU Cache
TechnoBlogger14o3 Dec 8, 2025
433ed4c
feat: solve Striver SDE Sheet problem 426 - Find the Duplicate Number
TechnoBlogger14o3 Dec 8, 2025
0068d17
feat: solve Striver SDE Sheet problem 440 - Word Break
TechnoBlogger14o3 Dec 8, 2025
ca15c46
feat: solve Striver SDE Sheet problem 454 - Longest Common Subsequence
TechnoBlogger14o3 Dec 8, 2025
3159b13
feat: solve Striver SDE Sheet problem 470 - Word Break
TechnoBlogger14o3 Dec 8, 2025
62b252e
feat: solve Striver SDE Sheet problem 28 - Best Time to Buy and Sell …
TechnoBlogger14o3 Dec 8, 2025
8f54d93
feat: solve Striver SDE Sheet problem 92 - Maximum Subarray
TechnoBlogger14o3 Dec 8, 2025
6f66c22
feat: solve Striver SDE Sheet problem 104 - Pow(x, n)
TechnoBlogger14o3 Dec 8, 2025
ef6b4c8
feat: solve Striver SDE Sheet problem 109 - Pow(x, n)
TechnoBlogger14o3 Dec 8, 2025
3ab7efb
feat: solve Striver SDE Sheet problem 152 - Maximum Subarray
TechnoBlogger14o3 Dec 8, 2025
9d068dd
feat: solve Striver SDE Sheet problem 159 - Maximum Subarray
TechnoBlogger14o3 Dec 8, 2025
4057fc2
feat: solve Striver SDE Sheet problem 181 - Best Time to Buy and Sell…
TechnoBlogger14o3 Dec 8, 2025
d8c5234
feat: solve Striver SDE Sheet problem 191 - Best Time to Buy and Sell…
TechnoBlogger14o3 Dec 8, 2025
4d69177
feat: solve Striver SDE Sheet problem 258 - Maximum Subarray
TechnoBlogger14o3 Dec 8, 2025
f14946e
feat: solve Striver SDE Sheet problem 275 - Maximum Subarray
TechnoBlogger14o3 Dec 8, 2025
54b307b
feat: solve Striver SDE Sheet problem 294 - Sum of Two Integers
TechnoBlogger14o3 Dec 8, 2025
72e0fdb
feat: solve Striver SDE Sheet problem 356 - Maximum Subarray
TechnoBlogger14o3 Dec 8, 2025
9b6b355
feat: solve Striver SDE Sheet problem 372 - Missing Number
TechnoBlogger14o3 Dec 8, 2025
e9cdafe
feat: solve Striver SDE Sheet problem 18 - Reverse Pairs
TechnoBlogger14o3 Dec 8, 2025
f4624ae
feat: solve Striver SDE Sheet problem 21 - Intersection of Two Arrays II
TechnoBlogger14o3 Dec 8, 2025
6b12f7c
feat: solve Striver SDE Sheet problem 32 - Assign Cookies
TechnoBlogger14o3 Dec 8, 2025
b229997
feat: solve Striver SDE Sheet problem 33 - Minimum Size Subarray Sum
TechnoBlogger14o3 Dec 8, 2025
8af191c
feat: solve Striver SDE Sheet problem 34 - Sort Colors
TechnoBlogger14o3 Dec 8, 2025
a75211e
feat: solve Striver SDE Sheet problem 51 - Combination Sum
TechnoBlogger14o3 Dec 8, 2025
e3eed2e
feat: solve Striver SDE Sheet problem 264 - Assign Cookies
TechnoBlogger14o3 Dec 8, 2025
4c65818
feat: solve Striver SDE Sheet problem 343 - Intersection of Two Array…
TechnoBlogger14o3 Dec 8, 2025
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
The table of contents is too big for display.
Diff view
Diff view
  •  
  •  
  •  
5 changes: 4 additions & 1 deletion .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -54,4 +54,7 @@ __pycache__/

# Backup files
*.bak
*.backup
*.backup

# Progress tracking
.striver_progress.json
Binary file added DSA Sheets.pdf
Binary file not shown.
156 changes: 156 additions & 0 deletions scripts/extract_dsa_sheet_problems.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,156 @@
#!/usr/bin/env python3
"""
Extract problems from DSA Sheets.pdf
Hybrid approach: extract all problems, improve slug extraction
"""
import pdfplumber
import re
from pathlib import Path
import json

ROOT = Path(__file__).resolve().parents[1]
PDF_PATH = ROOT / "DSA Sheets.pdf"
OUTPUT_JSON = ROOT / "striver_sde_sheet_problems.json"

def extract_problems_from_pdf():
"""Extract all problems from the PDF"""
problems = []

print(f"Reading PDF: {PDF_PATH}")
with pdfplumber.open(PDF_PATH) as pdf:
full_text = ""
for page_num, page in enumerate(pdf.pages, 1):
text = page.extract_text()
if text:
full_text += text + "\n"
if page_num % 10 == 0:
print(f"Processed {page_num} pages...")

print(f"Total pages: {len(pdf.pages)}")
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bug: Variable accessed after context manager closes

The pdf variable is accessed at line 29 after the with pdfplumber.open(PDF_PATH) as pdf: block closes at line 27. At this point, the pdf object is no longer valid because the context manager has already closed it. Accessing pdf.pages outside the with block will fail or produce undefined behavior.

Fix in Cursor Fix in Web

print(f"Extracted text length: {len(full_text)} characters")

# First pass: Extract all LeetCode URLs with complete slugs
# Normalize text to handle line breaks in URLs
normalized_text = full_text.replace('\n', ' ').replace(' ', ' ')

# Find all complete LeetCode URLs
leetcode_pattern = r'(?:https?://)?(?:www\.)?leetcode\.com/problems/([a-z0-9-]+)/?'
all_slug_matches = {}
for match in re.finditer(leetcode_pattern, normalized_text, re.IGNORECASE):
slug = match.group(1).lower()
# Only keep complete slugs (more than 5 chars, doesn't end with hyphen)
if len(slug) > 5 and not slug.endswith('-'):
# Get context to find problem number
start = max(0, match.start() - 100)
end = min(len(normalized_text), match.end() + 50)
context = normalized_text[start:end]
# Try to find problem number nearby
num_match = re.search(r'\b(\d{1,3})\.?\s', context)
if num_match:
problem_num = int(num_match.group(1))
if slug not in all_slug_matches:
all_slug_matches[slug] = problem_num

print(f"Found {len(all_slug_matches)} complete LeetCode slugs with problem numbers")

# Second pass: Extract problems line by line (original method)
lines = full_text.split('\n')
current_category = None
problem_num = 0

for i, line in enumerate(lines):
line = line.strip()
if not line:
continue

# Detect category headers
if len(line) > 3 and line.isupper() and len(line.split()) < 10:
current_category = line
print(f"Found category: {current_category}")
continue

# Look for problem numbers
problem_match = re.match(r'^(\d+)\.?\s*(.+)', line)
if problem_match:
num = problem_match.group(1)
rest = problem_match.group(2).strip()

# Build search text from current line and next 2 lines (for split URLs)
search_text = line
for j in range(i+1, min(i+3, len(lines))):
if lines[j].strip():
search_text += " " + lines[j].strip()

# Normalize for URL matching (remove spaces that might break URLs)
search_normalized = re.sub(r'([a-z0-9-])\s+([a-z0-9-])', r'\1\2', search_text, flags=re.IGNORECASE)

# Try to extract LeetCode link
link_match = re.search(leetcode_pattern, search_normalized, re.IGNORECASE)
title_slug = None
if link_match:
potential_slug = link_match.group(1).lower()
# Only use if it's a complete slug
if len(potential_slug) > 5 and not potential_slug.endswith('-'):
title_slug = potential_slug

# Also check if this problem number has a slug in our map
if not title_slug and int(num) in all_slug_matches.values():
# Find slug for this problem number
for slug, pnum in all_slug_matches.items():
if pnum == int(num):
title_slug = slug
break

# Extract problem title
title = re.sub(r'https?://[^\s]+', '', rest).strip()
title = re.sub(r'\([^)]*\)', '', title).strip()
title = re.sub(r'\[.*?\]', '', title).strip()
if not title and title_slug:
title = title_slug.replace('-', ' ').title()

if title or title_slug:
problem_num += 1
problem = {
"number": problem_num,
"category": current_category or "Unknown",
"title": title,
"title_slug": title_slug,
"raw_line": line
}
problems.append(problem)
if problem_num % 50 == 0:
print(f"Extracted {problem_num} problems...")

# Add any slugs we found that weren't in the numbered list
existing_slugs = {p.get("title_slug") for p in problems if p.get("title_slug")}
for slug, pnum in all_slug_matches.items():
if slug not in existing_slugs:
# Check if we already have this problem number
existing_for_num = [p for p in problems if p["number"] == pnum]
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bug: Comparison uses incompatible numbering systems

The comparison p["number"] == pnum compares two incompatible numbering systems. The pnum variable holds problem numbers extracted from the PDF text (found near URLs), while p["number"] holds a sequential counter that increments as problems are extracted. These numbering schemes are unrelated, so the check incorrectly skips adding slugs when a sequential counter happens to coincide with a PDF number, or fails to detect duplicates when it should.

Fix in Cursor Fix in Web

if not existing_for_num:
problem_num += 1
problems.append({
"number": problem_num,
"category": "Unknown",
"title": slug.replace('-', ' ').title(),
"title_slug": slug,
"raw_line": f"leetcode.com/problems/{slug}"
})

# Count problems with slugs
with_slugs = [p for p in problems if p.get("title_slug") and len(p["title_slug"]) > 5]
print(f"\nProblems with valid slugs: {len(with_slugs)}")
print(f"Total problems extracted: {len(problems)}")

return problems

def save_problems(problems):
"""Save problems to JSON file"""
with open(OUTPUT_JSON, 'w', encoding='utf-8') as f:
json.dump(problems, f, indent=2, ensure_ascii=False)
print(f"Saved {len(problems)} problems to {OUTPUT_JSON}")

if __name__ == "__main__":
problems = extract_problems_from_pdf()
save_problems(problems)
print(f"\n✅ Extraction complete! Found {len(problems)} problems")
Loading