- Speed & memory hacks
- Better control over recursion, errors, and I/O
- Advanced iteration & code-structuring tricks
- Standard-library tools for variations of common DS
- 3rd-party “serious mode” helpers (if allowed)
We briefly mentioned __slots__, but for really big trees/graphs or “state” objects, using it correctly can be a huge memory & speed win.
class Node:
__slots__ = ('val', 'children', 'dist')
def __init__(self, val):
self.val = val
self.children = []
self.dist = 0- Cuts per-instance memory (no
__dict__). - Faster attribute access.
- Huge difference when you have 10^5–10^6 nodes.
Use it for Node, Edge, State classes in heavy graph/DP problems.
If you ever do DP/string processing directly over bytes:
b = bytearray(sys.stdin.buffer.read())
view = memoryview(b)
sub = view[1000:2000] # no copymemoryviewlets you slice a big buffer without copying.- Very helpful in problems with big binary strings / large grids in a single blob.
Not a function, but a huge practical trick:
- On some judges, PyPy (JIT) is much faster for heavy Python loops, especially integer-heavy DSA.
- On others, CPython wins with heavy C-accelerated modules (
heapq,math, etc.).
If allowed, check both; sometimes the difference decides whether you TLE or not.
You already know array('i', ...), but a pattern:
- Use
array('b')orarray('I')as compact bit/flag arrays for BFS/DP (visited, color, etc.) when memory is tight. - They’re denser than lists of Python ints/booleans.
Example:
from array import array
visited = array('b', [0]) * n # 0/1 flagsFor deep DFS:
import sys
sys.setrecursionlimit(10**7)But advanced trick: rewrite obvious recursion as a manual stack:
stack = [root]
while stack:
u = stack.pop()
...
for v in adj[u]:
stack.append(v)- Use this template to avoid recursion depth issues entirely.
- Especially for trees with height up to 10^5.
Sometimes best DSA pattern is:
try:
i = a.index(x) # O(n) but okay for small n
except ValueError:
i = -1Or more interesting:
try:
# Use fast path / assumption
...
except SomeError:
# Fallback path
...For example, try greedy assumption, if impossible, fall back to DP, etc.
For really big input/output:
import sys
data = sys.stdin.buffer.read().split()
it = iter(data)
n = int(next(it))
out_lines = []
for _ in range(n):
# parse using next(it)
...
sys.stdout.write("\n".join(out_lines))stdin.buffer.read()is one of the fastest ways to ingest huge inputs.stdout.writeavoids overhead of manyprint()calls.
Instead of building giant lists:
def neighbors(u):
for v in adj[u]:
if not blocked[v]:
yield v
for v in neighbors(u):
...Or multi-stage pipelines:
def tokens():
for line in sys.stdin:
for tok in line.split():
yield tok
def ints():
for t in tokens():
yield int(t)
it = ints()
n = next(it)
...Keeps memory usage low, allows streaming-style solutions.
For recursive enumerations:
def dfs(u):
yield u
for v in adj[u]:
yield from dfs(v)Automatic “flattening”; handy for enumerating paths, ordering nodes, etc.
Example: timing decorator, or memoization around a complex solver:
import functools, time
def timed(fn):
@functools.wraps(fn)
def wrapper(*a, **k):
t0 = time.perf_counter()
res = fn(*a, **k)
print("time:", time.perf_counter() - t0)
return res
return wrapper
@timed
def solve():
...Helps benchmark different implementations without touching their logic.
Sometimes you’ll see patterns like:
def solve():
import sys
input = sys.stdin.readline
...or
def solve():
from math import sqrt
...Because local variables are faster than globals. Having input/sqrt locally can shave some overhead inside tight loops.
Here’s some more “lesser-known, but nice” stuff.
Even though normal dicts keep insertion order now, OrderedDict has move_to_end:
from collections import OrderedDict
class LRUCache:
def __init__(self, cap):
self.cap = cap
self.od = OrderedDict()
def get(self, key):
if key not in self.od:
return -1
self.od.move_to_end(key)
return self.od[key]
def put(self, key, value):
if key in self.od:
self.od.move_to_end(key)
self.od[key] = value
if len(self.od) > self.cap:
self.od.popitem(last=False)Classic interview favorite: LRU cache in ~10 lines using OrderedDict.
Great when you want to extend list/dict behavior (for debugging or custom invariants) without re-implementing every method:
from collections import UserList
class Stack(UserList):
def push(self, x):
self.append(x)
def pop(self): # can override if you want more checks / logging
return super().pop()This is more “clean design” than performance, but can speed debugging.
Sometimes you prototype with local test files:
from pathlib import Path
for file in Path("tests").glob("*.in"):
data = file.read_text().splitlines()
...Not directly performance, but makes it easier to run your solution on many offline testcases (which is super important for tricky DPs/graphs).
If you ever get binary-encoded integer sequences:
import struct
data = sys.stdin.buffer.read()
n = len(data) // 4
ints = struct.unpack('!' + 'I' * n, data) # network-order 4-byte unsigned ints- Rare in contest settings, but very powerful if you do any binary-protocol style tasks or custom compressed input.
These are for when you’re not constrained by contest rules and want serious power.
If you’re allowed external libs, this is practically cheating (in a good way):
from sortedcontainers import SortedList
sl = SortedList()
sl.add(x) # O(log n)
sl.remove(x) # O(log n)
sl.bisect_left(x) # O(log n)
sl[0], sl[-1] # min, maxPerfect for:
- Order statistics
- Sliding window with dynamic min/max
- “Next greater/smaller” kind of tasks when updates happen
Also SortedDict, SortedSet exist.
For DP / loops that are pure Python but numeric-heavy:
from numba import njit
@njit
def solve_dp(n, arr):
...- JIT-compiles to machine code.
- Can give C-like speed in tight loops, but you lose some dynamic features.
- Amazing for running “Python algorithms at C speed” for your own practice.
Actually part of the standard library (graphlib), but specialized:
from graphlib import TopologicalSorter
ts = TopologicalSorter()
ts.add('A', 'B', 'C') # A depends on B and C
...
order = list(ts.static_order()) # topological orderNice for DAG problems and verifying partial orders without writing your own topological sort each time.
You don’t use this in contests, but for learning / sanity-checking:
import networkx as nx
G = nx.Graph()
G.add_edge(u, v, weight=w)
dist = nx.shortest_path_length(G, source=0, target=5, weight='weight')You can:
- Compare your Dijkstra/BFS outputs to
networkxon random graphs. - Prototype algorithms quickly, then port to hand-written Python for contests.
To test algorithms with random cases more intelligently:
from hypothesis import given, strategies as st
@given(st.lists(st.integers(), max_size=20))
def test_sort(xs):
assert my_sort(xs) == sorted(xs)- Hypothesis will try to break your algorithm with weird edge cases, then shrink to minimal failing examples.
- Brilliant for verifying tricky DP/graph solutions.
For advanced DSA prep, I’d suggest this flow:
-
Get raw correctness with simple idioms.
-
Then apply performance & clarity tools:
- Use
heapq,bisect,deque,defaultdict,lru_cache, etc. - Add
__slots__on high-volume classes. - Try
sys.stdin.buffer,sys.stdout.writewhen IO is big.
- Use
-
For personal learning, occasionally:
- Re-implement a classic algorithm,
- Then validate its output with
networkx/hypothesis, - Then profile with
timeitand optimize using these tricks.
Here are some even more advanced, very DSA-flavored Python tricks and tools that we haven’t really dug into yet. I’ll stick to stuff that:
- Actually helps with speed / clarity in algorithms
- You probably won’t see in beginner guides
- Still stays “Pythonic” (no need to become a C dev overnight)
I’ll group them like this:
- Advanced integer / bit tools
- Advanced string / bytes tricks
- Extra standard modules & features with DSA uses
- Extra 3rd-party helpers (for practice / non-contest work)
These are super useful for bitmask DP, subsets problems, combinatorics, etc.
Counts how many bits are set to 1 in the binary representation.
x = 0b101101
x.bit_count() # 4Use cases in DSA:
- Subset DP: number of elements in subset represented by
mask. - Hamming distance:
(a ^ b).bit_count(). - Parity checks:
mask.bit_count() % 2.
Number of bits needed to represent the integer (excluding sign).
(15).bit_length() # 4 (1111)
(16).bit_length() # 5 (10000)Use cases:
-
Fast way to approximate
log2(n):k = n.bit_length() - 1 # floor(log2(n)) for positive n
-
deciding array sizes in segment trees / sparse tables.
Some classic patterns:
# Check if i-th bit set
bool(mask & (1 << i))
# Set i-th bit
mask |= (1 << i)
# Clear i-th bit
mask &= ~(1 << i)
# Toggle i-th bit
mask ^= (1 << i)
# Iterate over submasks of a mask
sub = mask
while sub:
...
sub = (sub - 1) & mask
# (often also want to process sub = 0 separately)This pattern is huge for “iterate over all subsets of a given set”.
We already mentioned pow(a, b, mod), but for DSA:
-
Fast modular exponentiation in 1 line:
pow(a, b, mod) # (a^b) % mod, done in O(log b)
-
Modular inverse mod prime
p(whenaandpare coprime):inv_a = pow(a, p - 2, p) # Fermat's little theorem
-
Efficient exponentiation in combinatorics, matrix exponent, etc.
For weird bitpacking problems:
x = 123456
b = x.to_bytes(4, byteorder='big')
y = int.from_bytes(b, byteorder='big') # == 123456Useful when:
- You need to serialize states / masks.
- You want to hash bitmasks more compactly (e.g., as bytes for use with crypto/hashing libs).
Mostly useful in string algorithms, parsing and sometimes hashing.
Instead of doing slow loops:
table = str.maketrans({'A': '1', 'B': '2'})
s2 = s.translate(table)Great for:
- Normalizing characters
- Simple cipher/text transforms
- Quick mapping in string problems (e.g., map
'a'..'z'to something else)
bytes– immutable sequence of bytes.bytearray– mutable sequence of bytes.memoryview– “window” intobytes/bytearray/arraywithout copying.
Pattern:
import sys
data = sys.stdin.buffer.read() # bytes
barr = bytearray(data) # mutable
view = memoryview(barr)[1000:2000] # no copyUse cases:
- Very big text / binary problems.
- Custom substring / pattern search where copying would be too slow or memory-heavy.
Not a built-in, but a pattern built on top of int and ord:
MOD = 10**9 + 7
BASE = 911382323
def poly_hash(s):
h = 0
for c in s:
h = (h * BASE + ord(c)) % MOD
return hUsed for:
- String hashing (Rabin–Karp)
- Duplicate substring detection
- Hashing paths / states
Combine with pow(BASE, k, MOD) and prefix hashes for O(1) substring hashes.
When your code is too slow and you genuinely don’t know where:
import cProfile, pstats
def main():
solve()
cProfile.run('main()', 'profile.out')
p = pstats.Stats('profile.out')
p.sort_stats('tottime').print_stats(20)This shows which functions are hot; great for optimizing big DSA solutions locally.
Standard library (Python 3.9+):
from graphlib import TopologicalSorter
ts = TopologicalSorter()
ts.add('A', 'B', 'C') # A depends on B and C
ts.add('B', 'C')
ts.add('C')
order = list(ts.static_order())Use cases:
- Dependency resolution problems.
- Topological sort where you’d otherwise implement Kahn’s algorithm manually.
We talked about heapq but a powerful pattern is:
- Store negated values to simulate a max-heap:
import heapq
h = []
heapq.heappush(h, -value)
max_val = -heapq.heappop(h)- Or store
(priority_sign * key, extra info)tuples.
For problems requiring:
- “Get max k elements repeatedly”
- “Greedy pick largest by some metric”
To simulate an ordered map without external libs:
import bisect
keys = []
values = []
def insert(k, v):
i = bisect.bisect_left(keys, k)
if i < len(keys) and keys[i] == k:
values[i] = v
else:
keys.insert(i, k)
values.insert(i, v)
def lower_bound(k):
i = bisect.bisect_left(keys, k)
if i < len(keys):
return keys[i], values[i]
return NoneThis pattern is a poor man’s TreeMap / SortedDict using plain lists.
Using collections.abc + typing helps structure your own DS:
from collections.abc import MutableMapping
class MyMap(MutableMapping):
...Now your MyMap will behave like a dict (supports in, iter, etc.). For DSA practice, this is useful when:
- You build your own structures (e.g., hash table, tree map)
- You want them to integrate smoothly with Python code (for reuse).
Similar to collections.namedtuple but friendlier with typing:
from typing import NamedTuple
class Edge(NamedTuple):
u: int
v: int
w: int
e = Edge(0, 1, 5)Good for:
- Self-documenting edges/nodes with static typing.
- Immutable record-like objects.
Sometimes giving semantic names to primitive types helps avoid mistakes:
from typing import NewType
Node = NewType('Node', int)
Weight = NewType('Weight', int)
# Now Node and Weight are both ints but they’re different types to type checkersNot runtime performance, but helps you structure complex graph/DP code without mixing identifiers.
These are more “serious mode” than common; very handy if you’re building projects or doing offline practice.
A 3rd-party module (often called just bitarray) that gives you compact, fast bitsets:
from bitarray import bitarray
b = bitarray(10)
b.setall(False)
b[3] = TrueUse cases:
- Dense visited arrays / adjacency bitsets
- Fast set operations (AND/OR/XOR) at bit level
For big n, can be much more memory-efficient and faster than Python set or list[bool].
Roaring bitmaps: hybrid between bitsets and sorted lists:
- Very efficient for large sparse sets of integers.
- Fast union/intersection.
Good for advanced problems:
- Big integer sets (e.g. indexes, doc IDs, etc.).
- If you’re playing with inverted indexes / offline queries.
Graph algorithms written as sparse linear algebra:
pygraphblasuses GraphBLAS underneath (C/optimized).- Many graph problems (BFS, shortest paths, transitive closure) can be expressed as matrix ops.
This is very advanced but cool:
- You can recast graph problems as matrix multiplications.
- Performance is often extremely good on large graphs.
If you truly want C-level speed:
- Cython: compile typed Python‐ish code to C.
- cffi / ctypes: call C functions directly.
- maturin: compile Rust code as Python module.
Typical pattern:
- Prototype algorithm in Python
- Profile with
cProfile - Move only the bottleneck loops into a Cython or Rust function
- Call that from Python
This lets you keep Python flexibility but remove worst hot spots.
We mentioned it before, but advanced pattern:
from numba import njit
@njit
def dijkstra(n, edges, source):
...- Very fast for numeric heavy loops (DP, BFS, Dijkstra on adjacency lists).
- Restrictions: no Python objects, limited features — you must stay in a “numba-friendly” subset.
At this point you don’t need more names so much as patterns. Things you can do:
-
Pick a topic (bitmask DP, geometry, strings, graphs…)
-
For each, design a small “toolkit” from these:
- Bitmask DP →
int.bit_count, bit patterns,powmodular,Fraction/mathwhere needed - Graphs →
heapqpatterns,deque,graphlib.TopologicalSorter, maybenetworkx(for verifying) - Strings →
str.translate, rolling hash pattern,re,bytes/bytearray
- Bitmask DP →
-
Implement 2–3 classical problems using those tools:
- And compare to a naive version to feel the difference.
-
Occasionally profile (
cProfile/timeit) and try “upgrading” with:__slots__, faster I/O, caching, better data structures.
If you tell me which advanced topic you’re diving into next (e.g. “bitmask DP”, “range queries (segment tree / Fenwick)”, “string pattern matching”), I can give you a small, concrete kit of 5–10 tricks specifically tailored to that topic, with sample code snippets.