Python Strings and Formatting
- Description:
stris immutable Unicode, slicing, methods, f-strings,.format(), raw and triple-quoted strings, escape sequences, andbytesvsstr - My Notion Note ID: K2A-D1-3
- Created: 2022-03-28
- Updated: 2026-05-11
- License: Reuse is very welcome. Please credit Yu Zhang and link back to the original on yuzhang.io
Table of Contents
- 1.
strIs Immutable Unicode - 2. Indexing and Slicing
- 3. Common Methods
- 4. Concatenation and Joining
- 5. f-Strings
- 6.
.format()and%-Formatting - 7. Raw, Multiline, and Escapes
- 8.
bytesand Encoding
1. str Is Immutable Unicode
- Immutable sequence of Unicode code points
- No
chartype, indexing yields a 1-char string - Rebinding allocates a new object; original unchanged
s = "café"
s[0] # 'c'
type(s[0]) # <class 'str'>
len(s) # 4 (code points, not bytes)
s[0] = "C" # TypeError: 'str' object does not support item assignment
- Contrast: C++
std::stringis mutable, byte-oriented (typically UTF-8, not natively code-point aware)
2. Indexing and Slicing
s = "Hello, World!"
s[0] # 'H'
s[-1] # '!'
s[7:12] # 'World' half-open: [start, stop)
s[:5] # 'Hello'
s[7:] # 'World!'
s[-6:-1] # 'World'
s[::2] # 'Hlo ol!' step
s[::-1] # '!dlroW ,olleH' reverse
- Slicing returns a new string
- Out-of-range slices clip silently (no exception)
3. Common Methods
s = " Hello, World! "
s.strip() # "Hello, World!"
s.lower() # " hello, world! "
s.upper() # " HELLO, WORLD! "
s.replace("Hello", "Hi") # " Hi, World! "
"a,b,c".split(",") # ['a', 'b', 'c']
",".join(["a", "b", "c"]) # "a,b,c"
"hello".startswith("he") # True
"hello".endswith("lo") # True
"hello".find("ll") # 2 (-1 if not found)
"hello".index("ll") # 2 (ValueError if not found)
"hello".count("l") # 2
"3.14".isdigit() # False (dot)
"hello".isalpha() # True
intests substring:"ell" in "hello"→True
4. Concatenation and Joining
"foo" + "bar" # OK, but allocates
"".join(["foo", "bar"]) # preferred for many parts
+=in a loop is O(n²) in CPython, each step allocates- Use
"".join(parts)orio.StringIOfor accumulators
5. f-Strings
PEP 498 (3.6+), preferred formatter. Fastest, most readable, executes arbitrary expressions.
name, n = "Yu", 42
f"hello {name}" # "hello Yu"
f"{n} squared = {n * n}" # "42 squared = 1764"
f"pi = {3.14159:.2f}" # "pi = 3.14"
f"{n:>5}" # " 42" right-align width 5
f"{n:05d}" # "00042" zero-pad
f"{0.5:%}" # "50.000000%"
f"{n=}" # "n=42" self-documenting (3.8+)
- Format spec mini-language:
[fill][align][sign][#][0][width][,][.precision][type]
6. .format() and %-Formatting
"{} {}".format("a", "b") # "a b"
"{1} {0}".format("first", "second") # "second first"
"{name}".format(name="Yu") # "Yu"
"{:.2f}".format(3.14159) # "3.14"
"%s is %d" % ("answer", 42) # "answer is 42", legacy, C-style
- f-strings supersede both for new code
%-formatting survives inlogging(defers formatting until a record is emitted)
7. Raw, Multiline, and Escapes
r"C:\new\file" # raw: backslashes literal
"line1\nline2" # \n is newline
"""
Triple-quoted strings
span multiple lines and
preserve newlines.
"""
b"bytes literal" # bytes, not str
rb"raw bytes" # raw bytes
| Escape | Meaning |
|---|---|
\n |
newline |
\t |
tab |
\\ |
backslash |
\' \" |
quotes |
\xhh |
hex byte |
\uXXXX |
4-digit Unicode |
\N{NAME} |
named Unicode (e.g. \N{GREEK SMALL LETTER PI}) |
8. bytes and Encoding
bytes, immutable sequence of 8-bit valuesbytearray, mutable cousinstrandbytesdo NOT mix implicitly, Python 3 forces explicit encode/decode
"café".encode("utf-8") # b'caf\xc3\xa9'
b'caf\xc3\xa9'.decode("utf-8") # 'café'
"abc" + b"def" # TypeError
- UTF-8 is the default and the right choice for almost all I/O
- Text-mode files (
open(path)) yieldstr; binary mode (open(path, "rb")) yieldsbytes
Length differs by representation:
len("café") # 4 (code points)
len("café".encode("utf-8")) # 5 (bytes, é takes 2 bytes)