Python Strings and Formatting


  • Description: str is immutable Unicode, slicing, methods, f-strings, .format(), raw and triple-quoted strings, escape sequences, and bytes vs str
  • My Notion Note ID: K2A-D1-3
  • Created: 2022-03-28
  • Updated: 2026-05-11
  • License: Reuse is very welcome. Please credit Yu Zhang and link back to the original on yuzhang.io

Table of Contents


1. str Is Immutable Unicode

  • Immutable sequence of Unicode code points
  • No char type, indexing yields a 1-char string
  • Rebinding allocates a new object; original unchanged
s = "café"
s[0]        # 'c'
type(s[0])  # <class 'str'>
len(s)      # 4 (code points, not bytes)
s[0] = "C"  # TypeError: 'str' object does not support item assignment
  • Contrast: C++ std::string is mutable, byte-oriented (typically UTF-8, not natively code-point aware)

2. Indexing and Slicing

s = "Hello, World!"
s[0]        # 'H'
s[-1]       # '!'
s[7:12]     # 'World'      half-open: [start, stop)
s[:5]       # 'Hello'
s[7:]       # 'World!'
s[-6:-1]    # 'World'
s[::2]      # 'Hlo ol!'    step
s[::-1]     # '!dlroW ,olleH'  reverse
  • Slicing returns a new string
  • Out-of-range slices clip silently (no exception)

3. Common Methods

s = "  Hello, World!  "
s.strip()                  # "Hello, World!"
s.lower()                  # "  hello, world!  "
s.upper()                  # "  HELLO, WORLD!  "
s.replace("Hello", "Hi")   # "  Hi, World!  "

"a,b,c".split(",")         # ['a', 'b', 'c']
",".join(["a", "b", "c"])  # "a,b,c"

"hello".startswith("he")   # True
"hello".endswith("lo")     # True
"hello".find("ll")         # 2   (-1 if not found)
"hello".index("ll")        # 2   (ValueError if not found)
"hello".count("l")         # 2

"3.14".isdigit()           # False (dot)
"hello".isalpha()          # True
  • in tests substring: "ell" in "hello"True

4. Concatenation and Joining

"foo" + "bar"              # OK, but allocates
"".join(["foo", "bar"])    # preferred for many parts
  • += in a loop is O(n²) in CPython, each step allocates
  • Use "".join(parts) or io.StringIO for accumulators

5. f-Strings

PEP 498 (3.6+), preferred formatter. Fastest, most readable, executes arbitrary expressions.

name, n = "Yu", 42
f"hello {name}"            # "hello Yu"
f"{n} squared = {n * n}"   # "42 squared = 1764"
f"pi = {3.14159:.2f}"      # "pi = 3.14"
f"{n:>5}"                  # "   42"   right-align width 5
f"{n:05d}"                 # "00042"   zero-pad
f"{0.5:%}"                 # "50.000000%"
f"{n=}"                    # "n=42"   self-documenting (3.8+)
  • Format spec mini-language: [fill][align][sign][#][0][width][,][.precision][type]

6. .format() and %-Formatting

"{} {}".format("a", "b")              # "a b"
"{1} {0}".format("first", "second")   # "second first"
"{name}".format(name="Yu")            # "Yu"
"{:.2f}".format(3.14159)              # "3.14"

"%s is %d" % ("answer", 42)           # "answer is 42", legacy, C-style
  • f-strings supersede both for new code
  • %-formatting survives in logging (defers formatting until a record is emitted)

7. Raw, Multiline, and Escapes

r"C:\new\file"            # raw: backslashes literal
"line1\nline2"            # \n is newline
"""
Triple-quoted strings
span multiple lines and
preserve newlines.
"""
b"bytes literal"          # bytes, not str
rb"raw bytes"             # raw bytes
Escape Meaning
\n newline
\t tab
\\ backslash
\' \" quotes
\xhh hex byte
\uXXXX 4-digit Unicode
\N{NAME} named Unicode (e.g. \N{GREEK SMALL LETTER PI})

8. bytes and Encoding

  • bytes, immutable sequence of 8-bit values
  • bytearray, mutable cousin
  • str and bytes do NOT mix implicitly, Python 3 forces explicit encode/decode
"café".encode("utf-8")          # b'caf\xc3\xa9'
b'caf\xc3\xa9'.decode("utf-8")  # 'café'

"abc" + b"def"                  # TypeError
  • UTF-8 is the default and the right choice for almost all I/O
  • Text-mode files (open(path)) yield str; binary mode (open(path, "rb")) yields bytes

Length differs by representation:

len("café")                   # 4 (code points)
len("café".encode("utf-8"))   # 5 (bytes, é takes 2 bytes)