Detailed answers, visual diagrams & interview tips.
Everything you need to work confidently in Python.
The language itself — how it's designed, how it handles memory, and the small rules that trip up beginners but define how fluent Python developers think.
Python is a high-level, interpreted, dynamically-typed, general-purpose programming language created by Guido van Rossum in 1991. Its design philosophy — codified in The Zen of Python — emphasizes readability, simplicity, and "one obvious way to do it."
The key features interviewers expect you to know:
Python's trade-offs: slower runtime than compiled languages (C, Rust, Go) because of interpretation and dynamic dispatch, and the GIL limits CPU-bound multithreading. For most web, data, and ML workloads, developer productivity wins over raw speed — and hot paths can be dropped into C extensions (NumPy, PyTorch) when needed.
Python 2 was sunset on January 1, 2020. Every new project uses Python 3, but interviewers still ask this to test historical awareness and understanding of why breaking changes happened.
| Feature | Python 2 | Python 3 |
|---|---|---|
| Statement: print "hi" | Function: print("hi") | |
| Integer division | 5/2 = 2 | 5/2 = 2.5 (use // for floor) |
| Strings | str = bytes, unicode separate | str = unicode, bytes separate |
| range() | Returns a list | Returns a lazy iterator |
| input() | raw_input() for strings | input() returns str |
| Exceptions | except E, e: | except E as e: |
| Type hints | Not supported | PEP 484 annotations supported |
The driving motivation for the split was Unicode. Python 2 mixing bytes and text caused endless encoding bugs — Python 3 forces them to be different types, so you can't concatenate them by accident.
Python's built-in types fall into a handful of families. Knowing which are mutable, ordered, and hashable is what separates confident from approximate answers.
| Category | Types | Mutable? | Hashable? |
|---|---|---|---|
| Numeric | int, float, complex, bool | No | Yes |
| Text | str | No | Yes |
| Binary | bytes, bytearray, memoryview | bytes: No / others: Yes | bytes only |
| Sequence | list, tuple, range | list only | tuple (if contents are) |
| Mapping | dict | Yes | No |
| Set | set, frozenset | set only | frozenset only |
| None | NoneType | N/A (singleton) | Yes |
Hashable means usable as a dict key or set member. Any object is hashable if its hash never changes over its lifetime — which is why mutable containers are not. A tuple of lists, for example, is not hashable because its contents can change.
bool is a subclass of int (True == 1, False == 0). None is a singleton — always compare with is None, never == None.
Mutable objects can change their internal state after creation. Immutable objects cannot — any "change" produces a new object at a new memory address.
Consider this subtle behavior:
x = 5; y = x; x += 1 → x=6, y=5 (int is immutable, += rebinds x to a new object)
a = [1]; b = a; a += [2] → a=[1,2], b=[1,2] (list is mutable, += modifies in place — a and b are the same list)
This aliasing behavior is the root cause of most "why did my other variable change?" bugs.
CPython manages memory automatically through three cooperating mechanisms: a private heap, reference counting, and a cyclic garbage collector.
Reference counting is the primary mechanism: every object has a counter; when it reaches zero, the memory is reclaimed immediately. This makes deallocation deterministic — unlike Java or Go, a __del__ tends to run predictably at end-of-scope.
Refcounting's blind spot is cycles: a.ref = b; b.ref = a — neither ever reaches zero even after all external names are gone. The gc module runs periodically to detect unreachable cycles and break them.
Small integers (-5 to 256) and short strings are interned (cached and reused), which is why a = 5; b = 5; a is b is True but a = 1000; b = 1000; a is b may be False.
PEP 8 is Python's official style guide, written in 2001 by Guido van Rossum and Barry Warsaw. It's not a language spec — it's a convention document that makes Python code look like Python across projects and teams.
The highlights worth memorizing:
| Rule | Convention | Example |
|---|---|---|
| Indentation | 4 spaces, no tabs | return x |
| Line length | 79 chars (72 for docstrings) | Modern teams use 88 or 100 |
| Variables / functions | snake_case | user_count, get_user() |
| Classes | PascalCase | UserAccount |
| Constants | UPPER_SNAKE | MAX_RETRIES = 3 |
| Private | Leading underscore | _internal_helper() |
| Imports | Stdlib → third-party → local; one per line | Grouped with blank lines |
| Blank lines | 2 around top-level defs, 1 around methods | — |
PEP 8 matters because Python doesn't have braces — readability is its whole point. Inconsistent style in Python looks broken in a way it wouldn't in Java. Modern teams enforce PEP 8 with ruff, black, or flake8 in CI, so style arguments never happen.
== compares values (calls __eq__). is compares identity — whether two names point to the same object in memory (same id()).
Common traps:
[1,2,3] == [1,2,3] → True (same value)
[1,2,3] is [1,2,3] → False (different list objects)
a = 5; b = 5; a is b → True (small int interning)
a = 1000; b = 1000; a is b → False in most contexts (no interning)
Always use is for None, True, and False. These are singletons — there's only ever one of each, so is None is both faster and safer than == None, which can be broken by a misbehaving __eq__.
Every non-trivial Python program is a choreography of lists, dicts, sets, and tuples. Picking the right one — and knowing its cost — is the single biggest leverage point for clean, fast code.
The four core containers each optimize for a different shape of problem. Choosing correctly is half the craft of Python.
| Type | Ordered? | Mutable? | Duplicates? | Use When |
|---|---|---|---|---|
| list [1,2,3] | Yes | Yes | Yes | Ordered, growable sequence |
| tuple (1,2,3) | Yes | No | Yes | Fixed record, dict key, return value |
| set {1,2,3} | No* | Yes | No | Membership tests, deduping, set math |
| dict {k:v} | Insertion order (3.7+) | Yes | Keys: No | Key→value lookup |
Lists are your default sequence. They allow mutation, duplicates, and are cheap to append (amortized O(1)). Tuples signal "this is a fixed record — don't reassign fields." They're hashable so they can be dict keys, and pack/unpack makes them the natural return type for multiple values.
Sets trade ordering for O(1) membership and natural set operations (|, &, -). If you ever write if x in some_list on a list of 10k+ elements, you almost certainly want a set. Dicts are hash tables — by far the most used container in Python, the backbone of objects, modules, and JSON.
A dict is an open-addressing hash table with perturbation probing. In Python 3.6+ it uses a compact + ordered layout that stores keys and values in a dense array and the hash table holds indices into that array.
Each slot stores (hash, key, value). Lookup computes the hash, indexes into the table, and compares. If the slot is occupied by a different key (collision), Python follows a deterministic probe sequence using a perturbation value derived from the hash, guaranteeing every slot can be visited.
The table resizes when it's 2/3 full — doubling (or shrinking) and rehashing all entries. This is why average dict insert/lookup is O(1), with O(n) worst-case on pathological collisions.
Since Python 3.7, dict insertion order is guaranteed by language spec (was CPython implementation detail in 3.6). This is made possible by the compact layout: iteration walks the dense entries array in order.
Comprehensions are compact, readable expressions that build lists, dicts, or sets from iterables — a single Pythonic idiom that replaces three lines of for + append.
| Type | Syntax | Example |
|---|---|---|
| List | [expr for x in it if cond] | [x*x for x in range(10) if x%2] |
| Dict | {k:v for x in it} | {u.id: u for u in users} |
| Set | {expr for x in it} | {w.lower() for w in words} |
| Generator | (expr for x in it) | sum(x*x for x in nums) |
Key benefits: they are faster than equivalent for loops (bytecode is specialized), they're one expression (no intermediate variable), and they evaluate in a nested scope so the loop variable doesn't leak into the surrounding namespace.
Rule of thumb: if the comprehension has more than two filters or two loops, switch to a normal for-loop. Clever nesting makes you feel smart and your teammates cry.
Generator expressions use parentheses and are lazy — no list is built; values are produced on demand. Use them inside sum(), max(), any(), etc. for O(1) memory over huge inputs.
Copying matters because assignment creates an alias, not a copy. b = a makes both names point to the same object. For real copies you need copy.copy() (shallow) or copy.deepcopy() (recursive).
The textbook demo:
a = [[1,2],[3,4]]
b = a.copy() → b[0] is a[0] → True (inner lists shared)
b[0].append(99) changes a too.
c = copy.deepcopy(a) → c[0] is a[0] → False
c[0].append(99) does NOT change a.
Deep copy has costs: it's slower (traverses the whole object graph) and it handles cycles with a memo dict. For configs and pure-data trees you usually want it; for large caches you usually don't.
The collections module is a toolbox of specialized containers that solve common problems more elegantly than built-ins. These show up constantly in real code — knowing them signals experience.
The one you'll reach for daily is defaultdict. Grouping records by key in plain dict takes 3 lines (setdefault or if); with defaultdict(list) it's one.
Slicing lets you extract a subsequence from any sequence type (list, tuple, str, bytes) using seq[start:stop:step]. All three parts are optional; the operation returns a new object of the same type.
Slices clip instead of raising: [1,2,3][:100] returns [1,2,3], not an error. Single-index access with an out-of-range index would raise IndexError — slicing is the forgiving cousin.
Slice assignment on a list can do clever things: a[1:3] = [10,20,30] replaces the slice with the new elements (list grows or shrinks). a[::2] = [0,0,0] replaces every other element (lengths must match for extended slices).
Under the hood, s[a:b:c] creates a slice(a,b,c) object and calls s.__getitem__(slice(a,b,c)). You can build slices explicitly and reuse them — last_three = slice(-3, None); lst[last_three].
Pick the right container and most hot-path bugs disappear. These are the numbers every Python developer should have memorized.
| Operation | list | deque | dict / set |
|---|---|---|---|
| x in c | O(n) | O(n) | O(1) avg |
| c[i] / c[k] | O(1) | O(n) | O(1) avg |
| append / add | O(1) amortized | O(1) | O(1) avg |
| appendleft / popleft | O(n) | O(1) | — |
| insert middle | O(n) | O(n) | — |
| sort | O(n log n) | — | — |
| min / max | O(n) | O(n) | O(n) |
The biggest practical win: replace x in big_list with x in big_set. If you're doing many membership checks, build the set once, reuse it — you go from quadratic to linear.
For FIFO queues, use collections.deque, not list.pop(0), which is O(n) per call.
Functions in Python are first-class objects — they can be passed around, returned, decorated, and closed over. Mastering these is what unlocks decorators, middleware, and clean functional patterns.
*args collects extra positional arguments into a tuple. **kwargs collects extra keyword arguments into a dict. Together they let a function accept anything.
The names args and kwargs are conventions — the * and ** prefixes are what matter. You can name them anything, but don't.
They work in both directions:
Definition: def f(a, *args, **kwargs) — function accepts any extras
Call site (unpacking): f(*[1,2,3], **{'k':'v'}) — spread a list/dict into arguments
The killer use case is wrapper functions: a decorator that accepts any signature just writes def wrapper(*args, **kwargs): return func(*args, **kwargs) and forwards everything transparently.
Ordering in a signature: positional-only → positional-or-keyword → *args → keyword-only → **kwargs.
A lambda is an anonymous, single-expression function. lambda x, y: x + y is the callable equivalent of def add(x, y): return x + y.
Lambdas exist to avoid naming a function you use once — typically as a key or callback:
| Use Case | Example |
|---|---|
| Sort key | sorted(users, key=lambda u: u.age) |
| Filter predicate | filter(lambda x: x > 0, nums) |
| Map transform | map(lambda s: s.strip().lower(), lines) |
| defaultdict factory | defaultdict(lambda: {"count": 0}) |
Limitations: lambdas are a single expression only — no statements, no assignments, no type annotations. If your lambda spans more than one line or does anything complex, a named def is almost always clearer.
PEP 8 even discourages assigning a lambda to a name: square = lambda x: x*x is worse than def square(x): return x*x — the def shows up better in tracebacks (<lambda> is unhelpful) and gives you a __name__.
A closure is a function that remembers variables from its enclosing scope even after that scope has finished executing. It "closes over" those variables.
Three conditions must hold for a closure to exist:
1. A nested function. 2. That function references a variable from the outer (enclosing) function. 3. The outer function returns the nested function.
To rebind (not just read) an enclosing variable, you need the nonlocal keyword. Without it, count += 1 would be treated as a local assignment and raise UnboundLocalError. Reading is fine without nonlocal.
Closures are the foundation of decorators, callbacks, and factory functions. They let you create specialized functions with pre-baked configuration without classes.
You can inspect a closure via func.__closure__ — a tuple of cell objects holding the captured values.
A decorator is a function that takes a function and returns a (usually wrapped) function. The @decorator syntax is just sugar: @log \n def f(): ... is identical to f = log(f).
A minimal timing decorator:
def timer(f):
@functools.wraps(f)
def wrapper(*a, **kw):
t = time.perf_counter()
r = f(*a, **kw)
print(time.perf_counter() - t)
return r
return wrapper
functools.wraps is not optional. Without it, the wrapped function loses __name__, __doc__, and signature — which breaks logging, docs, and introspection.
Common built-in decorators: @staticmethod, @classmethod, @property, @functools.cache, @dataclass. In frameworks: @app.route (Flask), @pytest.fixture, @retry.
LEGB is the order Python follows to resolve a name: Local → Enclosing → Global → Built-in. When you reference x, Python searches these scopes in that exact sequence and uses the first match.
Two twists everyone gets bitten by:
1. Assignment creates a local. If you write x = 5 anywhere in a function, x is local for the whole function — even on lines before the assignment, reading it raises UnboundLocalError.
2. Classes are not in LEGB. Methods don't automatically see their class's attributes — you must go through self or cls. A class body is a separate namespace that exists only during class creation.
To write to an outer scope: global x for module-level, nonlocal x for an enclosing function. Reading works without either.
Python gives you more argument-passing flexibility than most languages. Understanding the categories lets you design APIs that are both ergonomic and safe.
| Kind | At the call site | At definition |
|---|---|---|
| Positional | f(1, 2) | Order matters |
| Keyword | f(x=1, y=2) | Order doesn't — name does |
| Default | f(1) → y=10 | def f(x, y=10) |
| Positional-only | Must be positional | def f(x, /) (PEP 570) |
| Keyword-only | Must use name | def f(*, x) or after *args |
Full signature order: def f(pos_only, /, pos_or_kw, *args, kw_only, **kwargs). Everything after / can be keyword; everything after * or *args must be.
Keyword-only arguments are the cleanest way to prevent misuse. def connect(host, *, timeout=30, retries=3) forces callers to write connect("x.com", timeout=10), which is self-documenting and safe to extend — you can reorder or add new options later without breaking callers.
The infamous mutable-default trap: def append_item(x, lst=[]). That list is created once at function-definition time and shared across every call with no explicit lst. Use lst=None and allocate inside.
Functions in Python are first-class objects: they can be assigned to variables, stored in data structures, passed as arguments, and returned from other functions — exactly like any other value.
Any object with a __call__ method is callable — which means classes, lambdas, methods, generators, and instances with __call__ all participate in this system. The duck-typed "callable" concept is broader than just def.
First-class functions enable Python's most loved patterns: decorators (wrap a function), higher-order functions (map, filter, reduce, sorted(key=)), strategy pattern (pass behavior as an argument), and event handlers / callbacks.
Python's object model is duck-typed, permissive, and powerful. Everything is an object — including classes themselves. Understanding the mechanics reveals why Python can be both simple and meta-programmable.
A class is a blueprint describing state (attributes) and behavior (methods). An object is an instance created from that blueprint. In Python, even the classes themselves are objects — instances of a metaclass (by default, type).
Key pieces to name clearly in an interview:
__init__ — the initializer, run after the object is created to set up its state. (Not strictly the constructor — that's __new__.)
self — the instance, passed implicitly by the interpreter. It's a convention, not a keyword, but never rename it.
Class vs instance attributes — interest_rate lives on the class and is shared by all instances; self.balance lives on the instance. Reading an attribute looks on the instance first, then the class — which is why class attributes act as "defaults."
For pure data containers, modern Python prefers @dataclass: it auto-generates __init__, __repr__, and __eq__, cutting boilerplate dramatically.
Inheritance lets a class reuse and extend another class's attributes and methods. Python supports multiple inheritance — a class can have any number of parents.
super() is how you delegate to the next class in the MRO (Method Resolution Order). Inside __init__, always call super().__init__(...) so parent setup runs.
Every class ultimately inherits from object — that's where __repr__, __eq__, __hash__ and the baseline dunders come from.
Mixins are a disciplined use of multiple inheritance: small classes that add one capability (e.g. LoggingMixin, SerializableMixin) and are combined into concrete classes. They work because of Python's cooperative super().
All three live on a class; they differ in what implicit first argument they receive — and therefore what they can touch.
| Type | First arg | Can access | Typical use |
|---|---|---|---|
| Instance method | self | Instance & class state | Normal behavior |
| @classmethod | cls | Class state only | Alternative constructors, factories |
| @staticmethod | (none) | Neither | Utility helpers logically grouped with class |
A canonical factory pattern:
@classmethod
def from_json(cls, s):
data = json.loads(s)
return cls(**data)
The brilliance: cls is the actual class being called. If a subclass PaidUser inherits this, PaidUser.from_json(...) returns a PaidUser, not a User. That's why alternate constructors use @classmethod, not @staticmethod.
@staticmethod is really just a regular function living inside a class namespace — no special access, no self, no cls. Use it when the function is related to the class conceptually but doesn't need its state (e.g. a validator).
Dunder ("double underscore") methods are Python's protocol hooks. They let your objects integrate with language syntax — operators, iteration, context managers, printing, comparisons. You don't call them; Python calls them for you in response to syntax.
| Category | Dunders | Triggered by |
|---|---|---|
| Construction | __new__, __init__, __del__ | Object creation / destruction |
| Representation | __repr__, __str__, __format__ | repr(), print(), f-string |
| Arithmetic | __add__, __sub__, __mul__, __truediv__ | +, -, *, / |
| Comparison | __eq__, __lt__, __hash__ | ==, <, set/dict usage |
| Containers | __len__, __getitem__, __contains__ | len(), [], in |
| Iteration | __iter__, __next__ | for, iter(), next() |
| Callable | __call__ | obj() |
| Context | __enter__, __exit__ | with statement |
Implementing the right dunders makes your objects feel native:
Add __len__ → len(x) works. Add __iter__ → for i in x works. Add __eq__ and __hash__ → usable in sets/dicts.
Two repr rules to internalize: __repr__ is for developers (unambiguous, ideally copy-pasteable), __str__ is for users (friendly). If you only implement one, implement __repr__ — Python falls back to it.
MRO is the order in which Python searches base classes when looking up an attribute or method. For multiple inheritance, it's the algorithm that makes "which parent wins?" deterministic.
Python uses the C3 linearization algorithm. Its three rules:
1. A class always comes before its parents. 2. Order of base classes in the class declaration is preserved. 3. The result is monotonic — no class appears before another that was before it in a parent's MRO.
You can inspect MRO two ways: ClassName.__mro__ (tuple) or ClassName.mro() (method).
super() doesn't mean "call the parent" — it means "call the next class in the MRO." In single inheritance these are the same; in multiple inheritance they differ, and that's what makes cooperative mixins work. Every super().method() call follows the MRO chain.
If Python can't compute a consistent MRO (say, two classes conflict), it raises TypeError at class definition. You'll never hit this if your inheritance tree is sane.
Both are ways to declare "objects of this shape should behave like X." They differ in how strictly they enforce it.
ABC: inherit from ABC, decorate methods with @abstractmethod, and Python refuses to instantiate any subclass that doesn't implement them.
class Shape(ABC):
@abstractmethod
def area(self): ...
Protocol: defines a shape; any class with matching methods is considered a subtype by static type checkers — no inheritance needed. This is how Python's built-ins define "iterable," "sized," etc.
class Drawable(Protocol):
def draw(self) -> None: ...
The standard library defines ABCs in collections.abc (Iterable, Sized, Mapping) — useful for isinstance checks and as mixins that fill in methods once you implement the required ones.
__new__ creates the object. __init__ initializes the object after it exists. Both are called when you invoke MyClass(), but in that order.
| __new__ | __init__ | |
|---|---|---|
| First arg | cls (class) | self (instance) |
| Returns | A new instance | None |
| Role | Allocate | Configure |
| Called by | The type machinery | After __new__, if instance is of cls |
You rarely override __new__. The standard use cases are: (1) subclassing immutable types like int, str, tuple — where you can't modify the object in __init__ because it already exists; (2) singletons — return the same instance on every call; (3) metaclass magic.
For ordinary classes, just override __init__ and let Python handle __new__.
Python's iteration protocol is elegant and everywhere — from for loops to file handling to streaming pipelines. Generators turn it into a lazy, memory-efficient way to produce values on demand.
An iterable is anything you can loop over — it implements __iter__, which returns an iterator. An iterator is a single-pass cursor — it implements both __iter__ (returning itself) and __next__ (returning the next value, or raising StopIteration when done).
The canonical usage: iter(my_list) turns a list (iterable) into a list-iterator (iterator). next(it) advances it. When exhausted, next() raises StopIteration.
Critical distinction for interview traps:
lst = [1,2,3] — iterate twice, get the same elements. Iterables are replayable.
it = iter(lst) — consume it once, the second pass yields nothing. Iterators are burned after use.
This is why a generator function passed to sum() then list() gives weird results — the second call sees an empty generator.
A generator is a function that produces values lazily using yield instead of return. Calling it doesn't run the body — it returns a generator object (an iterator). Each next() call runs the function until the next yield, pausing its state.
Every generator is automatically an iterator — you get __iter__ and __next__ for free. You can loop, list(), sum(), unpack (a, b, c = gen), or pipeline them.
Generator expressions are the compact form: (x*x for x in range(10)). Same lazy semantics, no def needed.
yield from delegates to a sub-generator, piping values through transparently — useful for composing pipelines and writing recursive generators (e.g. tree traversal).
Generators are also coroutines: value = yield lets you send values back in with gen.send(x). This is the foundation asyncio was originally built on, before the dedicated async/await syntax.
Lists are eager: every element exists in memory at once. Generators are lazy: one value at a time, computed on demand. The difference is trivial for 100 elements, transformative for 10 million.
| List | Generator | |
|---|---|---|
| Memory | O(n) | O(1) |
| Evaluation | Eager (all at once) | Lazy (on demand) |
| Reusable? | Yes — indexable, re-iterable | No — single-pass |
| Indexing | lst[5] in O(1) | Must iterate forward |
| len() | Yes | No (unknown in advance) |
| Best for | Small, reusable, random access | Streaming, infinite, large data |
The canonical benchmark:
sum([x*x for x in range(10**7)]) — allocates 10M ints as a list, then sums. Slow, memory-heavy.
sum(x*x for x in range(10**7)) — streams values. Fast, ~200 bytes.
The paren-vs-bracket switch is a real perf win. But watch out: generators can't be rewound or measured. If you need len(), indexing, or multiple passes, you need a list.
itertools is a standard-library module of fast, memory-efficient iterator building blocks, inspired by functional languages. Every function returns an iterator, so chains stay lazy.
Common idioms:
Flatten a list of lists: list(chain.from_iterable(lol))
Top-N from an iterator: list(islice(gen, 10))
Cartesian product for grid search: product(lrs, batch_sizes)
Everything in itertools is C-implemented — it's almost always faster than a Python for-loop equivalent.
A context manager is an object that defines setup and teardown behavior around a block of code. The with statement guarantees teardown runs — even if the block raises an exception.
The canonical file example:
with open('f.txt') as f:
data = f.read()
# f is auto-closed, even on exception
Two ways to write your own:
Class-based: implement __enter__(self) (returns the as-value) and __exit__(self, exc_type, exc_val, tb) (returns True to suppress exception, else let it propagate).
Generator-based (cleaner): @contextlib.contextmanager on a generator with one yield. Code before yield is setup, code after is teardown.
@contextmanager
def timer():
t = time.perf_counter()
yield
print(time.perf_counter() - t)
Real uses: file / socket / DB connection management, lock acquisition (with lock:), transactions (commit on success, rollback on exception), temporarily changing state (cwd, logging level, env vars).
Python 3.10+ supports parenthesized multi-line with: with (open(a) as x, open(b) as y): ...
A for loop in Python is syntactic sugar for the iterator protocol. Understanding what it expands to demystifies iteration.
Steps Python takes:
1. Call iter(iterable) → this invokes __iter__ to get an iterator. 2. Repeatedly call next(it) → this invokes __next__. 3. When __next__ raises StopIteration, the loop terminates silently (the exception is caught by the loop machinery).
That's it. Any object that provides __iter__ returning something with __next__ can be used in a for loop — no inheritance, no base class required. This is pure duck typing.
Two extras worth knowing:
Python supports for/else: the else block runs if the loop completes without break. Rare but useful for search patterns.
For dict, default iteration gives keys. Use .items() for key-value pairs or .values() for values.
The GIL, three concurrency models, and the tricks that make Python fast when it needs to be. This section separates the Python developers who build toy scripts from those who ship production systems.
The GIL is a mutex in CPython that allows only one thread to execute Python bytecode at a time, regardless of how many CPU cores you have. It's the single most famous design choice (and complaint) in Python.
Why it exists: the GIL makes CPython's reference counting thread-safe without fine-grained locks. Removing it is hard — every INCREF/DECREF would need atomic operations, which slows down single-threaded code. Early experiments showed 2× regressions for single-threaded programs.
When it releases: on I/O operations (read, write, socket), during sleeps, around C extension calls (NumPy, PyTorch, image libraries intentionally release it), and every ~15 ms otherwise so threads can rotate.
The future: PEP 703 (accepted) introduces a "no-GIL" mode in Python 3.13 as an opt-in build, becoming default over several releases. This is the biggest change to CPython's concurrency story in 30 years.
Python gives you three concurrency models. Picking correctly depends almost entirely on whether your bottleneck is CPU or I/O, and whether tasks are many and fine-grained or few and heavy.
| Threading | Multiprocessing | asyncio | |
|---|---|---|---|
| Parallelism? | No (GIL) | Yes (separate interpreters) | No — single thread |
| Shares memory? | Yes — same process | No — IPC needed | Yes |
| Overhead | Low (OS threads) | High (fork / spawn) | Very low (coroutines) |
| Good for | I/O with blocking libs | CPU-bound workloads | I/O at scale — 1000s of conns |
| Module | threading / concurrent.futures | multiprocessing / concurrent.futures | asyncio |
Threading: lightweight. Ideal when you're using blocking libraries (requests, legacy DB drivers) and want parallel I/O. The GIL releases during I/O so threads actually progress.
Multiprocessing: each process has its own interpreter, memory, and GIL. True parallel CPU use, but sharing data means pickling — expensive for large objects.
asyncio: single thread, many coroutines cooperatively scheduled. Extreme efficiency for many-connection scenarios (web servers, WebSockets, crawlers). Requires async-aware libraries (aiohttp, asyncpg).
asyncio is single-threaded cooperative concurrency. An event loop manages a queue of tasks (coroutines); each task runs until it hits an await on something slow (I/O), at which point it yields control. The loop picks another ready task and runs it. When the awaited operation completes, the original task is scheduled to resume.
The key primitives:
async def — defines a coroutine function. Calling it returns a coroutine object; you must await it or schedule it as a task for it to actually run.
await x — pauses the current coroutine until x completes, releasing the loop for other work.
asyncio.gather(...) — runs coroutines concurrently and waits for all of them. The common idiom for "fetch these 100 URLs at once."
asyncio.run(main()) — the top-level entry point. Creates a loop, runs your coroutine, closes the loop.
Golden rule: one blocking call in an async function freezes the whole loop. Never time.sleep in async code — use asyncio.sleep. Never use requests — use aiohttp. If you must call blocking code, wrap it in asyncio.to_thread(fn, ...).
A coroutine is a function that can be paused and resumed. In Python, modern coroutines are defined with async def. Calling one doesn't execute the body — it returns a coroutine object, which you schedule with await or asyncio.run.
A minimal example:
async def fetch(url):
async with aiohttp.ClientSession() as s:
async with s.get(url) as r:
return await r.text()
Under the hood: coroutines evolved from generators. A generator with value = yield could already pause, receive a value, and resume. async/await is a cleaner syntax with dedicated type and rules (you can't accidentally next() a coroutine, for instance).
Key properties: coroutines are cheap (no OS thread), they cooperatively yield (only at await points), and they preserve state between suspensions (locals, call stack, exception handlers).
The optimizer's mantra: measure first, guess never. Python has excellent tools — use them before rewriting anything.
| Tool | Measures | Use when |
|---|---|---|
| timeit | Microbench small snippets | Comparing two expressions |
| cProfile | Function-level CPU time | Finding hot functions |
| line_profiler | Line-by-line time | Drilling into a hot function |
| py-spy | Sampling profiler, prod-safe | Live process, no code changes |
| tracemalloc | Memory allocations | Memory leaks / growth |
| memory_profiler | Line-by-line memory | Peaks in specific functions |
Standard optimization ladder, ordered by ROI:
Quick wins that rarely hurt readability:
Avoid repeated attribute lookup in hot loops (append = lst.append once). Use generator expressions to avoid list materialization. Replace s += item with "".join(parts). Cache pure functions. Vectorize numeric code with NumPy.
By default every Python instance carries a __dict__ — a hash table of its attributes. Declaring __slots__ tells Python to use a fixed-size array of named attributes instead, skipping the dict entirely.
class Point:
__slots__ = ('x', 'y')
p = Point()
p.z = 5 → AttributeError
Benefits: 40–60% memory reduction per instance, slightly faster attribute access (array index vs dict lookup), prevents typo-bugs like obj.naem = "x" silently creating a new attribute.
Caveats: can't add attributes not in the list; subclasses lose the optimization unless they also define __slots__; doesn't compose well with multiple inheritance (slot clashes); can't have class-level defaults for slotted attributes (the name would live in both the slot and the class dict).
When to use: data classes instantiated millions of times — records, nodes, events, points, particles. The memory savings scale linearly.
Modern bonus: @dataclass(slots=True) (Python 3.10+) handles this automatically.
Python has automatic memory management, but leaks absolutely happen — just not the C-style "forgot to free" kind. Python leaks are usually references that outlive their usefulness.
Diagnosis toolkit:
tracemalloc — track where allocations come from. Take snapshots at two points and diff them to find the growing site.
gc.get_objects() — get every tracked object. Combined with sys.getsizeof and Counter of type(o) you can spot "oh, 2 million Session objects."
weakref — break cycles by using weak references for back-pointers (parent→child strong, child→parent weak).
Error handling shapes how resilient your code is; testing shapes how confidently you can change it. Together they separate prototypes from production.
Python's exception handling has four clauses, each with a specific role. Understanding all four lets you write tight, correct error code without over-catching.
Canonical shape:
try: risky
except ValueError as e: handle
except (KeyError, IndexError): handle others
else: ran only if no exception
finally: always runs
Best practices:
Catch specific exceptions, not bare except: (which also catches SystemExit and KeyboardInterrupt — not what you want). Use except Exception as the broadest reasonable catch-all.
Keep try blocks small — only the line(s) that can raise. Put post-success logic in else so it doesn't silently get caught by the except.
Use raise (no args) to re-raise the current exception unchanged. Use raise NewError(...) from original to chain — the from preserves the original traceback.
EAFP vs LBYL: "Easier to Ask Forgiveness than Permission" — try it, catch the error — is idiomatic Python, often clearer and faster than "Look Before You Leap" existence checks.
Make a class that inherits from Exception (or a more specific built-in). That's it.
class PaymentError(Exception):
"""Raised when a payment fails to process."""
You can add structured data for richer errors:
class ValidationError(Exception):
def __init__(self, field, value, reason):
super().__init__(f"{field}={value!r}: {reason}")
self.field, self.value, self.reason = field, value, reason
Design guidelines:
Inherit never from BaseException directly — that class is reserved for system-level signals like KeyboardInterrupt and SystemExit, which you almost never want users to catch by mistake.
assert condition is a debugging aid that raises AssertionError if the condition is false. Exceptions are the general-purpose mechanism for reporting error conditions at runtime. They look similar but serve different purposes.
| assert | raise Exception | |
|---|---|---|
| Purpose | Catch programmer bugs | Handle user/environment errors |
| Stripped in prod? | Yes (with python -O) | No |
| Who reads it | The developer | The caller / user |
| Example | assert balance >= 0 | raise ValueError("Negative balance") |
The critical rule: never use assert for security or input validation. If someone runs your code with python -O, every assert disappears. Assertions that must always hold — like "user is authenticated" — silently vanish, leaving a gaping hole.
# WRONG
assert user.is_authenticated, "Must be logged in"
# RIGHT
if not user.is_authenticated:
raise PermissionError("Must be logged in")
Where assert shines: sanity checks that verify invariants — things you know must be true if your code is correct. "This queue shouldn't be empty here," "the index must be in bounds," "no two IDs overlap." These pin down assumptions in development and add almost zero runtime cost.
Both are Python testing frameworks. unittest is the stdlib module, modeled on JUnit. pytest is the de facto modern standard — richer, shorter, and widely adopted across open source and industry.
Same test, both styles:
unittest:
class TestMath(unittest.TestCase):
def test_add(self):
self.assertEqual(add(2, 3), 5)
pytest:
def test_add():
assert add(2, 3) == 5
Why pytest wins:
1. Plain assert — no memorizing assertEqual, assertIn, etc. Failure messages are still detailed via assertion rewriting.
2. Fixtures are composable and injected by name — much cleaner than setUp.
3. Parametrization: @pytest.mark.parametrize("a,b", [(1,2), (3,4)]) runs the same test with many inputs.
4. Plugins: pytest-cov (coverage), pytest-mock, pytest-xdist (parallel), pytest-asyncio.
pytest also runs unittest-style tests out of the box — you can migrate incrementally.
Mocking replaces real objects with fake ones whose behavior you control, so you can test code without its real dependencies. The standard tool is unittest.mock (stdlib).
Core primitives:
Mock — a generic fake. Any attribute or method access returns another Mock.
MagicMock — like Mock but also supports dunder methods (__iter__, __len__, etc.). Default for patch.
patch(...) — replaces an object where it's looked up for the duration of a test.
with patch('mymodule.requests.get') as mock_get:
mock_get.return_value.json.return_value = {'ok': True}
result = my_func()
mock_get.assert_called_once_with('https://api/x')
Critical subtlety: patch the name where it's used, not where it's defined. If mymodule.py does from requests import get, you must patch mymodule.get, not requests.get. The lookup is done via the local binding.
When NOT to mock: if you mock everything, your tests check your mocks, not your code. Integration tests with real (or test-container) databases catch integration bugs that mocks miss. Mock at the system boundary; test internal logic for real.
What separates a "Python-literate" engineer from a "Pythonic" one: idiom awareness, gotcha sense, and comfort with modern language features.
A handful of traps catch every Python developer at least once. Knowing the top few is a strong interview signal.
| Gotcha | What happens | Fix |
|---|---|---|
| Mutable default args | def f(x=[]) — list is created once, shared across calls | Use None and initialize in body |
| Late binding in closures | [lambda: i for i in range(3)] all return 2 | Default arg: lambda i=i: i |
| == vs is on small ints | 1000 is 1000 may be False (no interning) | Use == unless checking identity |
| Modifying while iterating | for x in lst: lst.remove(x) skips elements | Iterate over a copy: lst[:] |
| Integer / string caching | a is b True for small ints, False for big | Never rely on interning |
| Shallow copy surprise | [[0]*3]*3 — all rows are the same list | [[0]*3 for _ in range(3)] |
| Truthy trap | if items: hides None vs empty list | Use if items is not None: |
| Circular imports | Modules import each other at import time | Defer import, or restructure |
The single most famous gotcha deserves its own example:
def append(item, lst=[]): # BUG
lst.append(item)
return lst
append(1) # [1]
append(2) # [1, 2] — same list!
The default [] is evaluated once, at function definition time, and becomes a single mutable object shared across every call that doesn't override it. The fix:
def append(item, lst=None):
if lst is None: lst = []
lst.append(item); return lst
Pythonic means writing code in the style and idioms the language was designed for — leveraging built-ins, protocols, and expressive syntax rather than translating idioms from C, Java, or JavaScript. import this in a REPL shows the guiding principles ("The Zen of Python").
The Zen's tenets guide daily decisions: readability counts, there should be one obvious way to do it, explicit is better than implicit, flat is better than nested, errors should never pass silently.
Pythonic hallmarks in practice: EAFP over LBYL, comprehensions over manual loops, unpacking over indexing, with for resources, enumerate / zip over index juggling, generators over building big lists, named tuples / dataclasses over primitive dicts for records.
Duck typing: "If it walks like a duck and quacks like a duck, it's a duck." Python cares about what an object can do, not what class it is. You don't declare interfaces — you rely on objects having the right methods.
A function that calls obj.quack() accepts any object that has a quack method — a Duck, a MockDuck, a RobotDuck, or some unrelated class that just happens to quack. Python doesn't ask "is this a Duck?"; it asks "can it do what I need?"
This is why Python's built-ins are so composable:
for x in obj works on any object implementing __iter__. len(obj) works on anything with __len__. json.dump(obj, f) works on any file-like object with write. Custom classes slot into Python's built-in protocols just by implementing the right dunders.
Modern twist — Protocol (PEP 544): duck typing meets static analysis. Define a Protocol with expected methods, and type checkers validate at edit-time whether passed objects satisfy it. No inheritance required — still structural.
class Sized(Protocol):
def __len__(self) -> int: ...
Trade-off: flexibility vs. discoverability. You can plug anything in — and you won't know something's missing until it fails at runtime. Unit tests and type hints mitigate this.
Python has shipped substantial upgrades every release since 3.9. Mentioning modern features tells interviewers you've kept current.
| Version | Headline Feature | What it adds |
|---|---|---|
| 3.10 | Structural pattern matching | match/case — destructure and dispatch on shape |
| 3.10 | Better error messages | "did you forget a colon?" hints with line pointers |
| 3.10 | Parenthesized with | Multi-line context managers |
| 3.11 | Exception groups | except* for concurrent errors |
| 3.11 | Faster CPython | 10–60% speedup on many workloads |
| 3.11 | Self type | def clone(self) -> Self |
| 3.12 | F-string improvements | Multiline, quotes-in-quotes, backslashes allowed |
| 3.12 | Type parameter syntax (PEP 695) | class Stack[T], def first[T](x: list[T]) -> T |
| 3.13 | Optional no-GIL build | Experimental free-threaded mode (PEP 703) |
Pattern matching example:
match event:
case {"type": "click", "x": x, "y": y}:
handle_click(x, y)
case Point(x, y) if x > 0:
...
case _:
default()
This is not a switch-case — it destructures data and binds variables, like pattern matching in Rust / Scala. Lifesaver for nested JSON, AST walks, and state machines.
Type hints have also grown up. list[int] instead of List[int] (3.9+); X | Y instead of Union[X, Y] (3.10+); TypedDict, Protocol, Literal, Self, and now the generic syntax class Stack[T]: from PEP 695. Mypy, pyright, and ty (Astral) give you compile-time safety without giving up Python's flexibility.
Dataclasses + type hints + match together form a modern, almost ML-like Python style — the language is simultaneously dynamic and statically analyzable.
From fundamentals and data structures to concurrency, testing, and modern language features — a complete reference for Python interviews and day-to-day engineering.