C++ Strings and Text


  • Description: A note on C-style strings (char*), std::string, std::string_view (C++17), string conversions, and modern formatting (std::format, std::print)
  • My Notion Note ID: K2A-B1-4
  • Created: 2020-04-10
  • Updated: 2026-02-28
  • License: Reuse is very welcome. Please credit Yu Zhang and link back to the original on yuzhang.io

Table of Contents


1. C-Style Strings (char*)

A C-style string is a contiguous sequence of chars terminated by a null byte ('\0'). The null terminator is what distinguishes it from a generic char* — it tells every C string function where the string ends.

const char* s = "hello";  // 6 bytes: h e l l o \0
char buf[10] = "hi";      // h i \0 \0 \0 \0 \0 \0 \0 \0  (zero-padded)

#include <cstring>
strlen(s);        // 5  (does NOT include the null terminator)
strcmp(s, "hi");  // negative
strcpy(buf, "world");  // unsafe — no bounds check

A char* is just a pointer to one or more chars. Whether it represents "a string" depends entirely on whether the bytes are null-terminated:

  1. nullptr — no string at all, just a null pointer.
  2. char* pointing to a buffer of bytes with '\0' somewhere inside — a C string. Length is the position of the '\0'.
  3. char* pointing to bytes with no '\0'not a string; passing it to strlen is undefined behavior.

Issues with C-style strings

  1. No size info. Every operation has to scan to the null. strlen is O(n), every time.
  2. No bounds checking. strcpy, strcat, gets are infamous buffer-overflow sources.
  3. No ownership semantics. Who owns the buffer? Tracked manually.
  4. String literals are read-only. char* p = "hello"; p[0] = 'H'; is undefined behavior. Use const char* for literals.
  5. Heap allocation requires manual malloc/free for dynamic strings.

C-style strings are still essential when interoperating with C APIs. Use std::string::c_str() to get a null-terminated const char* from a std::string.


2. std::string

std::string is the C++ way to handle text. It owns its memory, knows its size, and grows dynamically.

#include <string>

std::string s = "hello";
s += " world";              // concatenation
s.size();                   // 11
s.length();                 // 11 (alias for size)
s.empty();                  // false

// Substrings, find, replace
s.substr(6, 5);             // "world"
s.find("world");            // 6 (or std::string::npos if missing)
s.replace(6, 5, "C++");     // "hello C++"

// Iteration (it is a range)
for (char c : s) std::cout << c;

// Conversion to/from char*
const char* cstr = s.c_str();   // null-terminated, valid until s changes
std::string s2 = cstr;          // construct from C string

// Comparison (works as expected)
if (s == "hello C++") { /* ... */ }

Small String Optimization (SSO)

Most implementations of std::string store short strings (typically up to 15–23 chars) inline within the string object itself, avoiding a heap allocation. Passing a small string by value is therefore cheap.

For very short strings, prefer std::string over char* — SSO makes it nearly as efficient and you get safety, ownership, and length tracking for free.


3. std::string_view (C++17)

std::string_view is a non-owning view of a string — a (const char*, size_t) pair. It is the modern way to write a "takes any kind of string" parameter without copying.

#include <string_view>

void print(std::string_view sv) {
    std::cout << sv;
}

print("literal");                  // const char* -> string_view (no copy)
print(std::string("dynamic"));     // string -> string_view (no copy)

char buf[] = "buffer";
print(buf);                        // char[] -> string_view

// Substring without copying:
std::string_view sv = "abcdef";
print(sv.substr(1, 3));            // "bcd" — no allocation, just a slice

Lifetime trap

A string_view does not own the underlying characters. Storing one beyond the lifetime of the source is a use-after-free bug:

std::string_view make_view() {
    std::string s = "temporary";
    return s;        // BUG: s is destroyed; returned view points to freed memory
}

Rules of thumb:

  1. Use string_view for parameters.
  2. Don't store string_view as a class member or return it from a function unless the lifetime is obvious and documented.
  3. Use std::string for owned text storage.

4. String Conversions

#include <string>
#include <charconv>      // C++17 from_chars / to_chars

// Number → string
std::to_string(42);          // "42"
std::to_string(3.14);        // "3.140000" (fixed precision)

// String → number (throws on failure, allocates)
int n = std::stoi("42");
double d = std::stod("3.14");
size_t pos;
int x = std::stoi("42abc", &pos);   // x = 42, pos = 2

// Best for performance: from_chars / to_chars (C++17)
// — locale-independent, no allocation, round-trip correct
const char* str = "42";
int value;
auto [ptr, ec] = std::from_chars(str, str + 2, value);
if (ec == std::errc{}) {
    // value == 42
}

char buf[16];
auto [end, ec2] = std::to_chars(buf, buf + sizeof buf, 42);
*end = '\0';

from_chars and to_chars are the fastest string ↔ number conversions in the standard library. Use them for hot paths and any time you'd otherwise reach for sprintf/atoi.


5. std::format and std::print

std::format (C++20, <format>) is a type-safe, Python-like string formatter. std::print (C++23, <print>) prints directly to a stream.

#include <format>
#include <print>     // C++23

std::string s = std::format("Hello, {}!", name);
std::string t = std::format("{:>10}", 42);            // right-align width 10
std::string u = std::format("{:.3f}", 3.14159);       // "3.142"
std::string v = std::format("{0} and {0}", "twice");  // "twice and twice"

std::print("Hello, {}!\n", name);                     // C++23: write to stdout
std::println("count = {}", n);                        // adds a newline

Format specifiers loosely follow Python:

Spec Meaning
{} Default formatting
{:>10} / {:<10} / {:^10} Right / left / center align in width 10
{:0>5} Zero-pad to width 5
{:.3f} 3 decimal places
{:#x} Hex with 0x prefix
{:b} Binary
{0}, {1} Positional arguments

For custom types, specialize std::formatter<T>:

struct Point { int x, y; };

template <>
struct std::formatter<Point> : std::formatter<std::string> {
    auto format(Point p, format_context& ctx) const {
        return std::formatter<std::string>::format(
            std::format("({}, {})", p.x, p.y), ctx);
    }
};

std::print("p = {}\n", Point{1, 2});   // "p = (1, 2)"

Compared to printf, std::format is type-safe (no format-string mismatches) and extensible (custom formatters). Compared to <iostream>, it's faster, less verbose, and supports positional arguments.


6. Wide and Unicode Strings

C++ has several string types for different encodings. Avoid them in modern code unless you're doing platform-specific work — use std::string (UTF-8) wherever possible.

Type Underlying char Typical use
std::string char (8-bit) UTF-8 (recommended), or platform default
std::wstring wchar_t (16-bit on Windows, 32-bit on Linux) Windows API interop
std::u8string (C++20) char8_t Explicit UTF-8
std::u16string char16_t UTF-16
std::u32string char32_t UTF-32 (each element is one Unicode code point)

The C++ standard library has historically been weak on Unicode-aware text handling (case folding, normalization, segmentation, collation). Use ICU or a dedicated library when you need real Unicode operations.