Java, C++, and Python common data structures


  • Description: A comparison of common data structures and language constructs in Java, C++, and Python
  • My Notion Note ID: K2A-A1
  • Created: 2020-10-01
  • Updated: 2026-05-06
  • License: Reuse is very welcome. Please credit Yu Zhang and link back to the original on yuzhang.io

Table of Contents


1. Basic Data Structures

1.1 Basic Data Types

Concept Java C++ Python
Integer types byte (8), short (16), int (32), long (64) — all signed, fixed sizes char, short, int, long, long long — sizes are minimums (e.g. int ≥ 16 bits, typically 32). Use <cstdint> (int32_t, int64_t, …) for fixed sizes int — single arbitrary-precision type, no fixed-width variants in the language. Use numpy.int32 / int64 etc. for fixed-width
Unsigned integers None All integer types (except bool) have an unsigned form None (use numpy.uint* if a fixed-width unsigned type is needed)
Floating point float (32), double (64) float, double, long double (sizes implementation-defined; usually IEEE 754) float is always 64-bit IEEE 754. numpy.float32 / float16 for narrower
Character char is 16-bit (UTF-16 code unit) char is 1 byte; wchar_t, char16_t, char32_t for wider encodings No separate char type — a single character is just a length-1 str. bytes for raw 8-bit data
Boolean boolean (true / false), not implicitly convertible to int bool; implicitly convertible to/from int. Casting bool to int: false is 0, true is 1. Casting int to bool: 0 is false, non-zero is true bool is a subclass of int: True == 1, False == 0. Arithmetic on bools works
Null literal null (reference only) nullptr (pointer only); legacy NULL is just 0 None (singleton; idiomatically tested with is None)

1.2 Classes

In lower-level C++ code, strings are often represented as C-style char* arrays terminated by the null character '\0' (the "null terminator"). This is distinct from nullptr, which represents a null pointer (a char* that points to no string at all). Higher-level C++ code typically uses std::string, which manages length and storage internally. Java String and Python str are immutable Unicode sequences and do not expose a null terminator at the language level.

Language Substring
Java str.substring(start, end);
C++ str.substr(start, length);
Python str[start:end] (slice); str[start:end:step] for stride

1.3 Enum

Concept Java C++ Python
Declaration enum Name { A, B } — members are full objects with methods and fields enum class Name { A, B }; (scoped, C++11+); legacy unscoped enum Name { A, B }; from enum import Enum
class Name(Enum): A = 1; B = 2
Inheritance Cannot extends another class (implicitly extends java.lang.Enum); can implements interfaces Cannot derive from another type; underlying integer type can be set with : int, : uint8_t, etc. An Enum cannot be subclassed once it has members. Mix-ins like IntEnum, StrEnum (3.11+), IntFlag integrate enums with built-in types

1.4 Inheritance

Concept Java C++ Python
Class inheritance Single class inheritance: class Derived extends Base Multiple inheritance: class Derived : public Base1, public Base2 { ... } Multiple inheritance: class Derived(Base1, Base2): .... Method resolution by C3 linearization (Derived.__mro__)
Interface class C implements I1, I2; classes can implement many interfaces No interface keyword — use abstract classes (all methods pure virtual) No interface keyword. Use abc.ABC with @abstractmethod for nominal interfaces, or typing.Protocol for structural ("duck") typing (PEP 544)
Virtual dispatch Methods are virtual by default (unless final, static, or private) Methods are non-virtual by default; opt in with virtual. Use override on the derived method All methods are dynamically dispatched (no virtual keyword)
Abstract method abstract void m(); in an abstract class Pure virtual: virtual void m() = 0; @abstractmethod decorator inside a class derived from abc.ABC
Calling base method super.m(); Base::m(); super().m() (Python 3) — follows the MRO
Constructor chaining First statement: super(args); Member-initializer list: Derived(...) : Base(args) { ... } super().__init__(args) inside __init__
Access modifier on inheritance Always public-style (no syntactic choice) public, protected, private inheritance changes how base members are exposed in the derived class N/A — no inheritance access modifiers. Conventions only: _name (protected by convention), __name (name-mangled to _ClassName__name)

1.5 Generic Types

Concept Java C++ Python
Generic class class C<T1, T2> { T1 a; T2 b; } template <typename T1, typename T2> class C { T1 a; T2 b; }; from typing import Generic, TypeVar
T = TypeVar('T'); class C(Generic[T]): ...
or PEP 695: class C[T]: ... (3.12+)
Generic function <T> void m(T t); called as obj.<String>m(x) template <typename T> void f(T t);; called as f<int>(42) def f[T](x: T) -> T: ... (3.12+) or T = TypeVar('T'); def f(x: T) -> T: .... No call-site type-argument syntax
Bounded type parameter <T extends Base> — upper-bounded C++20 concepts: template <std::derived_from<Base> T> or requires clause; pre-C++20 SFINAE / std::enable_if T = TypeVar('T', bound=Base)
Variance / wildcards <? extends Base> (covariant; consumer / read), <? super Base> (contravariant; producer / write) N/A — templates are not variant; each instantiation is a distinct, unrelated type TypeVar('T', covariant=True) / contravariant=True (PEP 484)
Runtime behavior Type erasure — type arguments erased at compile time; one bytecode method serves all instantiations Monomorphization — the compiler emits a separate copy per instantiation Erasure-like — type hints are not enforced at runtime by default. Tools like pydantic / beartype add enforcement opt-in
Call-site type-argument position <> before method name: obj.<String>m(...) <> after function name: m<int>(...) No call-site type arguments — types are inferred or annotated only

2. Containers

2.1 Sequential Containers

Conventions: T is the element type, O an object/reference type (Java), K/V for keys/values.

2.1.1 Dynamic array (random access)

Operation Java ArrayList<O> C++ std::vector<T> Python list
Declare / Initialize List<O> l = new ArrayList<>();
List<O> l = new ArrayList<>(List.of(a, b));
vector<T> v;
vector<T> v{a, b, c};
vector<T> v(n, init);
l = []
l = [a, b, c]
l = list(iterable)
Add l.add(obj); l.add(i, obj); v.push_back(t); v.emplace_back(...); v.insert(it, t); l.append(x); l.insert(i, x); l += [x, y]
Remove l.remove(obj); (first match)
l.remove(i);
v.pop_back(); v.erase(it); l.pop() / l.pop(i); l.remove(x) (first match); del l[i]
Get l.get(i); v[i]; v.at(i) (bounds-checked) l[i]; negative indexing supported: l[-1]
Set l.set(i, obj); v[i] = t; l[i] = x
Size l.size() v.size(); v.empty(); len(l)
Iterate for (O o : l)
Iterator<O> it = l.iterator();
for (const auto& x : v)
for (size_t i = 0; i < v.size(); ++i)
for x in l:
for i, x in enumerate(l):

2.1.2 Fixed-size array

Operation Java T[] C++ T[N] / std::array<T, N> Python (no native fixed-size)
Declare / Initialize T[] arr = new T[n];
T[] arr = new T[]{a, b, c};
T arr[N];
std::array<T, N> arr{a, b, c};
array.array('i', [1, 2, 3]) (typed but resizable)
numpy.zeros(n) for true fixed-size numeric arrays
Get / Set arr[i] / arr[i] = ...; arr[i] / arr[i] = ...; arr[i] / arr[i] = ...
Size arr.length std::size(arr); arr.size() for std::array len(arr)
Iterate for (T t : arr) for (auto& x : arr) for x in arr:

2.1.3 Queue (FIFO)

Operation Java Queue<O> (ArrayDeque<> / LinkedList<>) C++ std::queue<T> Python collections.deque
Declare Queue<O> q = new ArrayDeque<>(); queue<T> q; from collections import deque
q = deque()
Add (enqueue) q.offer(obj); (returns false on failure)
q.add(obj); (throws)
q.push(t); q.emplace(...); q.append(x)
Remove (dequeue) q.poll(); (null on empty)
q.remove(); (throws)
q.pop(); (no return value — query front() first) q.popleft() (raises IndexError on empty)
Peek q.peek(); / q.element(); q.front(); q.back(); q[0]; q[-1] for back
Size q.size() q.size(); q.empty(); len(q)
Iterate for (O o : q) (insertion order, no index) Adaptor — no iterator. Drain with while (!q.empty()) for x in q: (left to right)

For thread-safe FIFO in Python, use queue.Queue (blocking, with put / get).

2.1.4 Stack (LIFO)

Operation Java Deque<O> (or legacy Stack<O>) C++ std::stack<T> Python list (or collections.deque)
Declare Deque<O> s = new ArrayDeque<>(); stack<T> s; s = []
Push s.push(obj); s.push(t); s.emplace(...); s.append(x)
Pop s.pop(); s.pop(); (no return — use top() first) s.pop() (returns the value)
Peek s.peek(); s.top(); s[-1]
Size s.size(); s.isEmpty(); s.size(); s.empty(); len(s)

java.util.Stack is a legacy class extending Vector; new code should prefer Deque / ArrayDeque.

2.1.5 Deque (double-ended queue)

Operation Java Deque<O> (ArrayDeque<>) C++ std::deque<T> Python collections.deque
Declare Deque<O> dq = new ArrayDeque<>(); deque<T> dq;
deque<T> dq(n);
dq = deque()
dq = deque(maxlen=n) (auto-evicts oldest)
Add addFirst(x) / addLast(x)
offerFirst / offerLast (return false instead of throw)
push_front(t); push_back(t);
emplace_front(...); emplace_back(...);
dq.appendleft(x) / dq.append(x)
Remove removeFirst / removeLast
pollFirst / pollLast
pop_front(); pop_back(); dq.popleft() / dq.pop()
Peek / Index peekFirst / peekLast (null on empty)
getFirst / getLast (throw)
front(); back();
Indexed: dq[i]; dq.at(i)
dq[0] / dq[-1]. Indexed access is O(n) for deque
Size dq.size() dq.size(); dq.empty(); len(dq)
Iterate for (O o : dq) (first to last) for (auto& x : dq) for x in dq:

2.1.6 Priority queue (heap)

Operation Java PriorityQueue<O> C++ std::priority_queue<T> Python heapq on a list
Declare PriorityQueue<O> pq = new PriorityQueue<>();
new PriorityQueue<>(cap, comparator);
priority_queue<T> pq; (max-heap)
priority_queue<T, vector<T>, greater<T>> pq; (min-heap)
priority_queue<T, vector<T>, decltype(cmp)> pq(cmp);
import heapq
pq = [] (any list; heapq.heapify(pq) to convert in place)
For thread-safe: from queue import PriorityQueue; pq = PriorityQueue()
Default order Min-heap (natural ordering) Max-heap Min-heap
Push pq.offer(obj); pq.push(t); pq.emplace(...); heapq.heappush(pq, x)
Pop pq.poll(); pq.pop(); (read with top() first) heapq.heappop(pq)
Peek pq.peek(); pq.top(); pq[0]
Iterate ordered Iterator does not guarantee priority order. Drain with repeated poll(), or Arrays.sort(pq.toArray()) Drain with repeated top() then pop() Drain with repeated heappop. heapq.nsmallest(k, pq) / nlargest(k, pq) for top-k

2.2 Associative Containers

2.2.1 Hash map (unordered key, value)

Operation Java HashMap<K, V> C++ std::unordered_map<K, V> Python dict
Declare Map<K, V> m = new HashMap<>();
new HashMap<>(initCap, loadFactor);
unordered_map<K, V> m;
unordered_map<K, V> m{{k1, v1}, {k2, v2}};
m = {}
m = {k1: v1, k2: v2}
m = dict(zip(keys, values))
Add / Set m.put(k, v); m[k] = v;
m.insert({k, v});
m.emplace(k, v);
m[k] = v
m.update({k: v})
Remove m.remove(k); m.erase(k); del m[k]
m.pop(k) / m.pop(k, default)
Get / Contains m.get(k); (null if absent)
m.containsKey(k);
m.containsValue(v);
m.at(k); (throws if absent)
m.find(k); (returns end iterator)
m.count(k); m.contains(k) (C++20)
Note: m[k] inserts a default-constructed value if k is missing
m[k] (raises KeyError if absent)
m.get(k) (returns None)
m.get(k, default)
k in m
Iterate for (Map.Entry<K, V> e : m.entrySet())
for (K k : m.keySet())
for (V v : m.values())
for (auto& [k, v] : m) (C++17 structured binding) for k in m:
for k, v in m.items():
for v in m.values():
Size m.size() m.size(); m.empty(); len(m)

2.2.2 Insertion-ordered map

Aspect Java LinkedHashMap<K, V> C++ Python dict / collections.OrderedDict
Notes Same API as HashMap plus guaranteed insertion order on iteration N/A — no built-in. Pair an unordered map with a separate insertion-order list Built-in dict preserves insertion order since Python 3.7. OrderedDict adds move_to_end() and order-sensitive equality

2.2.3 Sorted (tree) map

Operation Java TreeMap<K, V> (red-black) C++ std::map<K, V> (red-black) Python (no built-in)
Declare TreeMap<K, V> tm = new TreeMap<>();
new TreeMap<>(comparator);
map<K, V> m;
map<K, V, decltype(cmp)> m(cmp);
Third-party: from sortedcontainers import SortedDict
sd = SortedDict()
Roll-your-own: bisect over a parallel sorted list of keys
Ordered queries higherKey(k), ceilingKey(k), lowerKey(k), floorKey(k) m.lower_bound(k); m.upper_bound(k); SortedDict: irange(min, max), bisect_left/right, peekitem(i) for indexed access
bisect.bisect_left(keys, k) for lower-bound index on a parallel list
Iterate Sorted by comparator / natural ordering Sorted by comparator SortedDict iterates in sorted key order

Other operations (add / remove / size) match the hash-map row.

2.2.4 Hash set (unordered set)

Operation Java HashSet<O> C++ std::unordered_set<T> Python set
Declare Set<O> s = new HashSet<>(); unordered_set<T> s;
unordered_set<T> s{a, b, c};
s = set()
s = {a, b, c} (note: {} is an empty dict, not set)
s = set(iterable)
Add s.add(obj); s.insert(t); s.emplace(...); s.add(x); s.update(iterable)
Remove s.remove(obj); s.erase(t); s.discard(x) (silent), s.remove(x) (raises KeyError if absent)
Contains s.contains(obj); s.find(t); s.count(t); s.contains(t) (C++20) x in s
Set algebra addAll, retainAll, removeAll Use <algorithm> with sorted ranges, or third-party a | b (union), a & b (intersection), a - b (difference), a ^ b (symmetric diff)
Size / Iterate s.size(); for (O o : s) (no order) s.size(); for (auto& x : s) (no order) len(s); for x in s: (no order)

2.2.5 Insertion-ordered set

Aspect Java LinkedHashSet<O> C++ Python
Notes Same API as HashSet plus insertion-order iteration N/A — no built-in No built-in equivalent. Common idiom: dict.fromkeys(seq) for de-duplication while preserving order

2.2.6 Sorted (tree) set

Operation Java TreeSet<O> (red-black) C++ std::set<T> (red-black) Python (no built-in)
Declare TreeSet<O> ts = new TreeSet<>();
new TreeSet<>(comparator);
set<T> s;
set<T, decltype(cmp)> s(cmp);
Third-party: from sortedcontainers import SortedSet
Ordered queries higher, ceiling, lower, floor s.lower_bound(t); s.upper_bound(t); SortedSet: irange, bisect_left/right, direct ss[i] indexing
Iterate Sorted Sorted Sorted (SortedSet)

2.3 Special Rules for Each Language

2.3.1 Java PriorityQueue

The iterator does not guarantee traversal in priority order. That is why Arrays.sort(pq.toArray()) may be used when ordered traversal is needed.

2.3.2 C++ unordered_map / map operator[]

m[k] on a missing key inserts a default-constructed value and returns a reference to it. Use m.at(k) (throws std::out_of_range if missing) or m.find(k) (returns end iterator if missing) for read-only access.

2.3.3 Python heapq

Only a min-heap is provided. For a max-heap, push negated values, or wrap each element in a comparison-inverting class. The "container" is just a regular listheapify(l) reorders an existing list in place in O(n).

2.3.4 Python dict insertion order

Insertion-order preservation has been part of the language spec since 3.7. Code targeting 3.6 or older should use collections.OrderedDict.

2.3.5 Python sorted map / sorted set

The standard library has no balanced BST container. The community standard is sortedcontainers (SortedDict, SortedSet, SortedList) — pure Python, but competitive in practice. Internally it uses a list of sorted lists (chunked arrays + bisect), not a balanced tree, which is cache-friendlier than linked structures.


3. Logic and Function Grammar

3.1 Hash

Aspect Java C++ Python
Hook(s) hashCode() and equals(Object) on the class Specialize std::hash<Key> in namespace std; provide operator== __hash__(self) and __eq__(self, other) on the class
Contract If a.equals(b) then a.hashCode() == b.hashCode(). Override both together If a == b then hash<Key>()(a) == hash<Key>()(b). Provide both If a == b then hash(a) == hash(b). Defining __eq__ sets __hash__ to None unless explicitly defined; instances then become unhashable
Convenience IDE auto-generation; Objects.hash(field1, field2, ...) boost::hash_combine for combining field hashes @dataclass(frozen=True) auto-generates __hash__ from the field tuple

3.1.1 Java

class HashClass {
    @Override
    public int hashCode() {
        return Objects.hash(field1, field2);
    }

    @Override
    public boolean equals(Object obj) {
        if (!(obj instanceof HashClass)) {
            return false;
        }
        HashClass rhs = (HashClass) obj;
        return Objects.equals(field1, rhs.field1)
            && Objects.equals(field2, rhs.field2);
    }
}

3.1.2 C++

namespace std {
    template <>
    struct hash<Key> {
        std::size_t operator()(const Key& k) const {
            // Combine hashes of Key's members (example shown for two fields).
            std::size_t h1 = std::hash<std::string>()(k.name);
            std::size_t h2 = std::hash<int>()(k.id);
            return h1 ^ (h2 << 1); // simple combine; for production prefer boost::hash_combine
        }
    };
}

3.1.3 Python

class Key:
    def __init__(self, name: str, id: int):
        self.name, self.id = name, id

    def __eq__(self, other: object) -> bool:
        return isinstance(other, Key) and (self.name, self.id) == (other.name, other.id)

    def __hash__(self) -> int:
        return hash((self.name, self.id))


# Equivalent with dataclass:
from dataclasses import dataclass

@dataclass(frozen=True)
class Key:
    name: str
    id: int

3.2 Compare

Aspect Java C++ Python
Self-comparable type implements Comparable<Self> with int compareTo(Self) (negative / 0 / positive) Define operator< (and ideally the rest), or operator<=> (C++20 spaceship) Define __lt__ (and friends). @functools.total_ordering derives the rest from __eq__ plus one ordering hook
External comparator Comparator<T> returning negative / 0 / positive. Used by Collections.sort, TreeMap, PriorityQueue Boolean function/lambda: bool cmp(const T& a, const T& b) returning true if and only if a precedes b. Used by std::sort, std::set, std::priority_queue key= callable returning a sort key (preferred); or functools.cmp_to_key(fn) to convert a 3-way comparator
Ordering rule 3-way: negative means a before b (ascending), 0 means equal, positive means a after b. Must be a total order Strict weak ordering: cmp(a, b) returns true if a strictly precedes b; must return false when a == b (else UB in std::sort) Sort is stable; key-extracted values compared via their natural __lt__. reverse=True flips the order

3.2.1 Java

// External comparator:
Comparator<Obj> myComparator = new Comparator<Obj>() {
    @Override
    public int compare(Obj a, Obj b) {
        return Integer.compare(a.x, b.x); // negative: a before b (ascending)
    }
};
list.sort(myComparator);

// Self-comparable:
public class CompareClass implements Comparable<CompareClass> {
    @Override
    public int compareTo(CompareClass other) {
        return Integer.compare(this.x, other.x);
    }
}

3.2.2 C++

bool compare(const T& a, const T& b) {
    return a < b; // return true means a goes before b (ascending)
}
std::sort(vec.begin(), vec.end(), compare);

// Lambda form:
auto cmp = [](const T& a, const T& b) { return a.x < b.x; };
std::sort(vec.begin(), vec.end(), cmp);

// For containers taking the comparator as a template parameter:
std::set<T, decltype(cmp)> s(cmp);
std::priority_queue<T, std::vector<T>, decltype(cmp)> pq(cmp);

The C++ comparator follows strict weak ordering: it returns true when a strictly precedes b, and must return false when a == b (otherwise std::sort has undefined behaviour).

3.2.3 Python

# By natural ordering:
nums.sort()

# By a key function (preferred — fastest, stable, simplest):
items.sort(key=lambda x: x.score)
items.sort(key=lambda x: (x.group, -x.score))  # multi-key, mixed direction
items.sort(key=lambda x: x.score, reverse=True)

# Convert a 3-way comparator (Java-style) to a key:
from functools import cmp_to_key
items.sort(key=cmp_to_key(lambda a, b: a.x - b.x))

# Self-comparable class:
from functools import total_ordering

@total_ordering
class C:
    def __init__(self, x): self.x = x
    def __eq__(self, other): return self.x == other.x
    def __lt__(self, other): return self.x < other.x

4. Internal Mechanism

Aspect Java C++ Python
Compilation target javac produces JVM bytecode (.class); the JVM JIT-compiles hot paths to native code at run time Compiled directly to native machine code by the toolchain (e.g. g++, clang++, MSVC) CPython compiles source to .pyc bytecode and runs it on a stack-based VM. PyPy adds a tracing JIT
Memory management Garbage-collected heap for all objects; primitives are stored by value (locals on the stack frame, fields embedded in their owning object) Stack and heap; manual new / delete, idiomatically wrapped in RAII (std::unique_ptr, std::shared_ptr) Reference counting plus a cyclic GC (CPython). Every value is a heap-allocated object; locals just hold references
Object location Objects are heap-allocated; variables hold references Objects can live on the stack, heap, or as members; variables can be values, references, or pointers All objects on the heap; variables are name bindings (references)
Generics Type erasure — type arguments erased at compile time, leaving raw types and casts in bytecode Templates are monomorphized — separate version emitted per instantiation Type hints are erased at runtime by default; dispatch is by actual object type (duck typing)
Polymorphism Methods virtual by default (vtable dispatch, except final / static / private) Non-virtual by default; virtual opts in to vtable dispatch All attribute lookup is dynamic (MRO walk + __dict__ lookup); no virtual keyword
Pointers No raw pointers or pointer arithmetic; references only Raw pointers, references (non-rebindable alias), and pointer arithmetic on arrays No raw pointers; everything is a reference. id() exposes object identity, is checks reference equality
Concurrency model True parallel threads; memory model defined by JLS True parallel threads (std::thread); memory model in the standard since C++11 GIL in CPython — only one thread runs Python bytecode at a time. True parallelism via multiprocessing, native extensions releasing the GIL, or experimental free-threaded builds (PEP 703, 3.13+)
Standard library JDK shipped with the JVM; consistent across platforms Standard library is part of the toolchain; ABI may differ between vendors Standard library shipped with the interpreter ("batteries included": collections, itertools, threading, asyncio, …)
Exception model Checked + unchecked exceptions; method signatures must declare checked ones Unchecked only; noexcept declares non-throwing functions All exceptions are unchecked (no static declaration). LBYL vs. EAFP idioms

5. References