C++ String Streams and Regex


  • Description: A note on std::stringstream, std::ostringstream, std::istringstream for in-memory I/O, and the <regex> library for pattern matching
  • My Notion Note ID: K2A-B1-21
  • Created: 2018-12-30
  • Updated: 2026-02-28
  • License: Reuse is very welcome. Please credit Yu Zhang and link back to the original on yuzhang.io

Table of Contents


1. String Streams Overview

  • <sstream> — 3 stream types backed by std::string:
Type Direction Use for
std::ostringstream Output (write) Building a string from heterogeneous values
std::istringstream Input (read) Parsing a string into typed values
std::stringstream Both Bidirectional in-memory buffer
  • Same << / >> interface as cout / cin, plus .str() for underlying string.

2. std::ostringstream — Building Strings

#include <sstream>
#include <iomanip>
#include <string>

std::ostringstream oss;
oss << "x=" << 42 << ", y=" << 3.14;
std::string s = oss.str();           // "x=42, y=3.14"

// With manipulators
std::ostringstream oss2;
oss2 << std::hex << std::uppercase << 255 << " "
     << std::fixed << std::setprecision(2) << 3.14159;
// "FF 3.14"

// Reset for reuse
oss.str("");                         // clear contents
oss.clear();                         // clear error flags
oss << "fresh";
  • Modern C++ — std::format (C++20, see K2A-B1-4 § 5) usually cleaner:
auto s = std::format("x={}, y={}", 42, 3.14);   // shorter, type-safe, faster

ostringstream still useful when:

  1. Conditional composition (write pieces only if condition holds).
  2. Fine-grained stream-state control (locale, manipulators).
  3. Pre-C++20 codebases.

3. std::istringstream — Parsing Strings

#include <sstream>
#include <string>

std::istringstream iss{"42 3.14 hello"};

int    n;
double d;
std::string word;

iss >> n >> d >> word;       // n=42, d=3.14, word="hello"

// Read all words
std::istringstream lines{"alpha beta gamma"};
std::string token;
while (lines >> token) {
    std::cout << token << "\n";
}

// Line-by-line parsing
std::istringstream multi{"line one\nline two\n"};
std::string line;
while (std::getline(multi, line)) {
    // process line
}

// Detect parse failure
std::istringstream bad{"not a number"};
int v;
if (!(bad >> v)) {
    std::cerr << "parse failed\n";
}
  • For high-perf number parsing → std::from_chars (C++17): faster + locale-independent. See K2A-B1-4 § 4. istringstream more flexible but heavier.

4. std::stringstream — Bidirectional

#include <sstream>

std::stringstream ss;

ss << 42 << " " << 3.14;        // write

int    n;
double d;
ss >> n >> d;                    // read

ss.str();                        // current contents
  • Rarely the right choice. Bidirectional buffering is awkward; read/write positions interact subtly. Pick ostringstream or istringstream.

5. <regex> Basics

  • <regex> (C++11) — pattern matching with ECMAScript-like dialect by default.
#include <regex>
#include <string>
#include <iostream>

std::string s = "[email protected]";
std::regex pattern{R"((\w+)@(\w+\.\w+))"};   // raw string for backslashes

// Test if any match exists
if (std::regex_search(s, pattern)) {
    std::cout << "found\n";
}

// Extract submatches
std::smatch m;
if (std::regex_search(s, m, pattern)) {
    std::cout << "full: "   << m[0] << "\n";   // [email protected]
    std::cout << "user: "   << m[1] << "\n";   // user42
    std::cout << "domain: " << m[2] << "\n";   // example.com
}

// Whole-string match (not just contains)
std::regex_match(s, m, pattern);

// Replace
std::string masked = std::regex_replace(s, pattern, "[REDACTED]");
// "[REDACTED]"

// Iterate all matches
auto begin = std::sregex_iterator{s.begin(), s.end(), pattern};
auto end   = std::sregex_iterator{};
for (auto it = begin; it != end; ++it) {
    std::cout << it->str() << "\n";
}

Functions

Function Purpose
std::regex_search Find first match anywhere in the string
std::regex_match Match the entire string
std::regex_replace Substitute matches with a replacement
std::sregex_iterator Iterate all matches
std::sregex_token_iterator Tokenize (split by pattern or capture)

Match types

Type Holds
std::smatch Match results over a std::string
std::cmatch Match results over a C-string
std::wsmatch / std::wcmatch Wide-string variants

6. Regex Patterns

  • Default ECMAScript dialect — usual constructs:
Pattern Matches
. Any character (except newline by default)
\d \D Digit / non-digit
\w \W Word char ([A-Za-z0-9_]) / non-word
\s \S Whitespace / non-whitespace
[abc] Any of a, b, c
[^abc] Anything except a, b, c
[a-z] Range
* + ? 0+, 1+, 0-or-1 of preceding
{n} {n,} {n,m} Exactly n, n+, n to m
*? +? ?? Lazy (non-greedy) variants
^ $ Start / end of string (or line in multiline mode)
\b Word boundary
(...) Capture group
(?:...) Non-capturing group
| Alternation
\1 \2 Backreference to capture group N

Use raw string literals

  • Always wrap in R"(...)" → no doubled backslashes:
std::regex bad{"(\\d+)\\.(\\d+)"};    // hard to read
std::regex good{R"((\d+)\.(\d+))"};   // much better

Other dialects

std::regex p1{"a.b", std::regex::extended};       // POSIX extended
std::regex p2{"a.b", std::regex::basic};          // POSIX basic
std::regex p3{"a.b", std::regex::ECMAScript};     // default
std::regex p4{"a.b", std::regex::icase};          // case-insensitive

7. Regex Performance and When to Avoid

  • Standard <regex> is slow. Major impls compile to NFA-based matchers — correct but several × slower than re2 or PCRE2.

Don't use regex when:

  1. Pattern fixed + simple — raw find / starts_with / ends_with much faster.
  2. Scanning large files — use faster engine (re2, boost::regex, ctre).
  3. Performance critical — hand-written state machine or std::ranges filter often beats regex.

Do use regex when:

  1. Pattern genuinely complex (alternations, groups, anchors).
  2. Pattern configurable at runtime (config / user input).
  3. Quick prototype, throughput not a concern.

Common gotcha: pattern construction cost

  • Compiling std::regex is expensive. Cache; don't construct in a loop.
// BAD: recompiles regex on every call
bool is_email(const std::string& s) {
    return std::regex_match(s, std::regex{R"(\w+@\w+\.\w+)"});
}

// GOOD: compile once
bool is_email(const std::string& s) {
    static const std::regex pattern{R"(\w+@\w+\.\w+)"};
    return std::regex_match(s, pattern);
}