HTML Basics


  • Description: HTML document structure, common elements, attributes, forms, semantic markup, and accessibility basics.
  • My Notion Note ID: K2A-F1-1
  • Created: 2018-03-23
  • Updated: 2026-05-17
  • License: Reuse is very welcome. Please credit Yu Zhang and link back to the original on yuzhang.io

Table of Contents


1. Overview

HTML = HyperText Markup Language — the document format of the web. Defines content and structure; styling is CSS, behaviour is JavaScript.

  • Markup language → annotates content with tags, not a programming language.
  • Current spec: HTML Living Standard (WHATWG). HTML5 (2014) was the last numbered version; the spec is now continuously updated.
  • Files end in .html (preferred) or .htm.
  • Browser parses HTML → builds the DOM tree → CSS computes styles → JS can mutate the tree.

Sibling markup languages: XML, SGML, LaTeX, Markdown. Markdown is often compiled to HTML. XHTML is HTML expressed as strict XML — largely abandoned in favour of the living standard.

2. Document Structure

Minimum valid HTML5 document:

<!DOCTYPE html>
<html lang="en">
  <head>
    <meta charset="utf-8">
    <meta name="viewport" content="width=device-width, initial-scale=1">
    <title>Page title</title>
  </head>
  <body>
    <h1>Hello</h1>
    <p>First paragraph.</p>
  </body>
</html>
Part Role
<!DOCTYPE html> Triggers standards mode. Case-insensitive but conventionally lower-case. Without it → quirks mode (buggy layout).
<html lang="en"> Root element. lang improves screen-reader pronunciation, hyphenation, and search ranking.
<head> Metadata: title, charset, viewport, links to CSS, script tags, SEO/OG tags. Not rendered.
<body> Visible content.

Common <head> entries:

<meta charset="utf-8">                                        <!-- always UTF-8 -->
<meta name="viewport" content="width=device-width, initial-scale=1">  <!-- mobile -->
<meta name="description" content="...">                       <!-- search snippet -->
<link rel="stylesheet" href="/styles.css">                    <!-- external CSS -->
<link rel="icon" href="/favicon.ico">
<script src="/app.js" defer></script>                          <!-- JS, deferred -->

Pitfalls:

  • Omit <meta charset> → browser may guess wrong encoding, garbled non-ASCII text.
  • Omit viewport meta → mobile renders desktop-width, then scales down (tiny text).
  • Put <script> without defer/async in <head> → blocks parsing. Either move to end of <body> or use defer.

3. Elements, Tags, and Attributes

Tag = the markup token (<p>, </p>). Element = opening tag + content + closing tag. Attribute = key/value on the opening tag.

<a href="/about" class="nav-link" id="about-link">About</a>
└─┬─┘ └──┬───┘ └──┬───────┘ └───┬─────────┘ └──┬─┘ └──┬───┘
  tag   attr      attr            attr       content   closing tag

Rules:

  • Tags are case-insensitive — lower-case by convention.
  • Most elements need a closing tag. Void elements have no content and no end tag: <area>, <base>, <br>, <col>, <embed>, <hr>, <img>, <input>, <link>, <meta>, <source>, <track>, <wbr>. Trailing slash (<br/>) is allowed but ignored by the HTML parser (it's XHTML/XML syntax).
  • Nest, don't overlap: <b><i>...</i></b> ✔, <b><i>...</b></i> ✘.
  • Attribute values can be unquoted if alphanumeric, but always quote them — safer with URLs, spaces, special chars.

Global attributes — usable on any element:

Attribute Purpose
id Unique identifier within the document. JS getElementById, CSS #id, link target #id.
class Space-separated list of class names. CSS .class, JS classList.
style Inline CSS. Use sparingly — external stylesheets are easier to maintain.
title Tooltip text on hover. Avoid for critical info (no mobile, no keyboard).
lang Language of this element's content. Overrides <html lang>.
dir Text direction: ltr, rtl, auto.
hidden Removes element from rendering and accessibility tree.
data-* Custom data attributes (data-user-id="42") — readable via el.dataset.userId.
tabindex Focus order. 0 = natural order, -1 = focusable via JS only.
aria-* Accessibility attributes — see § 9.

Comments: <!-- text -->. Cannot nest. Anything inside is ignored by the parser.

4. Text Content

4.1 Headings

<h1>Page title</h1>
<h2>Section</h2>
<h3>Subsection</h3>
...
<h6>Lowest level</h6>
  • Six levels, <h1> to <h6>. Use in document order — don't skip levels.
  • Each page should have one <h1> describing the page's main topic. Search engines and assistive tech rely on this hierarchy.
  • Headings define structure, not size. Use CSS to change appearance.

4.2 Paragraphs and Line Breaks

  • <p>...</p> — paragraph. Browser collapses whitespace; newlines in source become a single space.
  • <br> — hard line break. Use only within text where the break is part of the content (poetry, addresses). Don't use it for paragraph spacing — use <p> and CSS margins.
  • <hr> — thematic break (horizontal rule). Semantic divider between topics, not just decoration.

4.3 Inline Text Formatting

Element Meaning
<strong> Strong importance. Rendered bold by default.
<em> Emphasis. Rendered italic by default.
<b> Bold without added importance — keywords, product names.
<i> Italic without emphasis — taxonomic names, foreign terms.
<u> Underline. Rare — easily confused with links.
<mark> Highlighted text (search match, callout).
<small> Side comments, fine print.
<del> / <ins> Deleted / inserted text (diff-style).
<sub> / <sup> Subscript / superscript (H₂O, x²).
<code> Inline code.
<kbd> Keyboard input (<kbd>Ctrl</kbd>+<kbd>C</kbd>).
<abbr title="HyperText Markup Language">HTML</abbr> Abbreviation with expansion tooltip.

Prefer <strong>/<em> over <b>/<i> when the meaning is importance/emphasis — screen readers may vocalize the difference.

4.4 Quotations and Pre-formatted Blocks

<blockquote cite="https://example.com/source">
  <p>Quoted paragraph.</p>
</blockquote>

<q>Inline quote</q> — renders with quotation marks automatically.

<pre>
  Preserves   whitespace
    and line breaks   exactly.
</pre>

<code>inline code</code>
<pre><code>multi-line
code block</code></pre>

4.5 Character Entities

Reserved characters must be escaped:

Character Entity
< &lt;
> &gt;
& &amp;
" &quot;
' &apos; (or &#39;)
non-breaking space &nbsp;
em dash &mdash;
copyright &copy;

Forgetting &amp; in URL query strings (?a=1&b=2) is the classic bug — works in browsers but invalid HTML.

5. Links and Media

5.1 Anchors

<a href="/about">Internal link</a>
<a href="https://example.com">External</a>
<a href="#section-3">Same-page link to id="section-3"</a>
<a href="mailto:[email protected]">Email</a>
<a href="tel:+1234567890">Phone</a>
<a href="/file.pdf" download>Force download</a>
<a href="https://example.com" target="_blank" rel="noopener noreferrer">New tab</a>
  • target="_blank" opens in new tab. Modern browsers (since 2021) implicitly apply noopener semantics for target="_blank", so window.opener is null on the linked page. Still worth adding rel="noopener noreferrer" explicitly — it covers older browsers and the noreferrer part strips the Referer header.
  • Relative URLs (/about, ../foo) resolve against the current page or <base href>.

5.2 Images

<img src="/photo.jpg" alt="Description of the image" width="800" height="600">
  • alt is required for accessibility and SEO. Empty alt="" for purely decorative images (screen readers will skip).
  • Specify width and height — browser reserves layout space, avoids cumulative layout shift (CLS).
  • Modern formats: WebP, AVIF (smaller than JPEG/PNG).

Responsive images:

<img src="small.jpg"
     srcset="small.jpg 400w, medium.jpg 800w, large.jpg 1600w"
     sizes="(max-width: 600px) 100vw, 50vw"
     alt="...">

<picture>
  <source media="(prefers-color-scheme: dark)" srcset="logo-dark.png">
  <source media="(min-width: 800px)" srcset="logo-wide.png">
  <img src="logo.png" alt="Logo">
</picture>

srcset = browser picks best resolution. <picture> = browser picks first matching <source>, falls back to <img>.

5.3 Audio and Video

<audio src="/song.mp3" controls></audio>

<video src="/clip.mp4" controls poster="/thumb.jpg" width="640" height="360">
  <source src="/clip.webm" type="video/webm">
  <source src="/clip.mp4" type="video/mp4">
  <track kind="subtitles" src="/clip.en.vtt" srclang="en" label="English">
  Your browser doesn't support video.
</video>
  • controls — show play/pause UI. Without it, JS must drive playback.
  • autoplay requires muted on most browsers (Chrome, Safari) — autoplay-with-sound is blocked.
  • Multiple <source> — browser picks the first supported format.

5.4 Embedded Content

  • <iframe src="..." sandbox> — embed another page. sandbox restricts capabilities (no scripts, no forms) by default; whitelist with sandbox="allow-scripts allow-same-origin".
  • <canvas> — bitmap drawing surface for 2D/3D graphics (drive via JS).
  • <svg> — inline vector graphics. Stylable with CSS.

6. Lists and Tables

6.1 Lists

<ul>                   <!-- unordered: bullets -->
  <li>Item</li>
  <li>Item</li>
</ul>

<ol start="3" reversed>  <!-- ordered: numbered, start from 3, descending -->
  <li>Step</li>
</ol>

<dl>                   <!-- description list: term/definition pairs -->
  <dt>Term</dt>
  <dd>Definition</dd>
</dl>

Nest by putting another <ul>/<ol> inside an <li>.

6.2 Tables

<table>
  <caption>Quarterly revenue (USD)</caption>
  <thead>
    <tr>
      <th scope="col">Quarter</th>
      <th scope="col">Revenue</th>
    </tr>
  </thead>
  <tbody>
    <tr>
      <th scope="row">Q1</th>
      <td>$10,000</td>
    </tr>
    <tr>
      <th scope="row">Q2</th>
      <td>$15,000</td>
    </tr>
  </tbody>
  <tfoot>
    <tr>
      <th scope="row">Total</th>
      <td>$25,000</td>
    </tr>
  </tfoot>
</table>
  • <caption> — table summary, read by screen readers.
  • <th scope="col|row"> — declares header role, links data cells to their header.
  • Use tables for tabular data only, never for page layout (CSS grid/flexbox handle layout).
  • colspan, rowspan — span cells across columns/rows.

7. Forms and Inputs

<form action="/api/contact" method="post" enctype="multipart/form-data">
  <label for="name">Name</label>
  <input type="text" id="name" name="name" required>

  <label for="email">Email</label>
  <input type="email" id="email" name="email" required>

  <label for="msg">Message</label>
  <textarea id="msg" name="msg" rows="4"></textarea>

  <button type="submit">Send</button>
</form>

<form> attributes:

Attribute Purpose
action URL to submit to. Empty → current URL.
method get (default — params in URL) or post (params in body).
enctype application/x-www-form-urlencoded (default), multipart/form-data (file uploads), text/plain.
novalidate Skip built-in validation.

7.1 Input Types

type Use
text Single-line text.
password Masked text.
email Validates email format. Mobile shows email keyboard.
url Validates URL format.
tel Mobile shows numeric keypad. No format validation — phone formats vary.
number Numeric only. min, max, step attributes.
range Slider.
date, time, datetime-local, month, week Date/time pickers.
color Colour picker.
file File picker. Add multiple for several, accept="image/*" to filter.
checkbox On/off. Group multiple with same name.
radio Single choice within a group (same name).
hidden Send a value without showing it.
submit Submit button (<button> is preferred — more flexible).

Other input attributes: required, disabled, readonly, placeholder, pattern="[regex]", minlength, maxlength, autocomplete (e.g. email, current-password, one-time-code).

7.2 Labels and Grouping

<label for="name">Name</label>
<input id="name" name="name">

<!-- or wrap, no need for `for` -->
<label>Name <input name="name"></label>

<fieldset>
  <legend>Shipping address</legend>
  <label>Street <input name="street"></label>
  <label>City <input name="city"></label>
</fieldset>
  • <label> linked to <input> via for/id — clicking the label focuses the input.
  • Crucial for screen readers; without a label, the input has no accessible name.
  • <fieldset> + <legend> — group related inputs (visually and semantically).

7.3 Select and Textarea

<select name="country">
  <option value="">--</option>
  <optgroup label="Europe">
    <option value="fr">France</option>
    <option value="de" selected>Germany</option>
  </optgroup>
  <optgroup label="Asia">
    <option value="jp">Japan</option>
  </optgroup>
</select>

<textarea name="bio" rows="5" cols="40" maxlength="200"></textarea>

Add multiple to <select> for multi-select.

7.4 Buttons

<button type="submit">Send</button>       <!-- submit (default in a form) -->
<button type="reset">Reset</button>        <!-- clear form -->
<button type="button" onclick="...">Click</button>  <!-- inert; needs JS -->

Always set type explicitly — the default inside <form> is submit, which surprises people when a non-submitting button submits the form.

7.5 Built-in Validation

<input type="email" required>           <!-- must be filled, must look like email -->
<input type="number" min="0" max="100"> <!-- range constraint -->
<input pattern="\d{5}" title="5 digits"> <!-- regex constraint -->

Browser blocks submit and shows a tooltip if invalid. Style invalid inputs with the :invalid CSS pseudo-class. For richer validation, use the Constraint Validation API (input.validity, input.setCustomValidity()).

8. Semantic HTML

Use elements that describe their content, not generic <div>/<span>. Benefits: screen-reader navigation, SEO, more readable markup.

<body>
  <header>
    <h1>Site name</h1>
    <nav>
      <ul>
        <li><a href="/">Home</a></li>
        <li><a href="/about">About</a></li>
      </ul>
    </nav>
  </header>

  <main>
    <article>
      <header>
        <h1>Article title</h1>
        <p>By Author, <time datetime="2026-05-17">May 17, 2026</time></p>
      </header>

      <section>
        <h2>Section heading</h2>
        <p>...</p>
      </section>

      <aside>
        <p>Sidebar content related to the article.</p>
      </aside>

      <footer>
        <p>Article footer (tags, related links).</p>
      </footer>
    </article>
  </main>

  <footer>
    <p>Site footer.</p>
  </footer>
</body>
Element Meaning
<header> Introductory content of its nearest section/article (or page).
<nav> Major navigation block. Not for every group of links.
<main> Primary content of the document. Exactly one per page.
<article> Self-contained, independently distributable content (blog post, comment, product card).
<section> Thematic grouping of content with its own heading.
<aside> Tangentially related content (sidebars, callouts).
<footer> Footer of section/article/page (copyright, author info, links).
<figure> + <figcaption> Self-contained media with caption.
<time datetime="..."> Machine-readable date/time. Use ISO-8601 in datetime.
<address> Contact info for the nearest <article> or <body> ancestor (page/article author or organisation).
<details> + <summary> Native expand/collapse widget — no JS needed.
<dialog> Modal/non-modal dialog. Use dialog.showModal() in JS.

<div> and <span> are last-resort fallbacks when no semantic element fits. <div> is block, <span> is inline.

Pitfalls:

  • More than one <main> per page → invalid.
  • <section> should normally have a heading. If no heading fits, use <div> instead — sectioning elements without headings are valid HTML but discouraged.
  • Nested <header>/<footer> are fine — they scope to their nearest sectioning element.

9. Accessibility Basics

The web is read by humans, screen readers, search bots, and scrapers. Accessible markup helps them all.

  • Semantic elements first<button>, <nav>, <main>, <label> carry meaning. Avoid <div onclick> button impostors.
  • Always provide alt on <img>. Decorative? alt="". Informative? Describe the content.
  • Label every form control<label for> or wrapping. Inputs without labels are unusable on screen readers.
  • Keyboard navigation — every interactive element must be focusable and operable via keyboard. Native elements do this automatically; custom widgets need tabindex + key handlers.
  • Focus indicators — never outline: none without providing a replacement. Sighted keyboard users rely on the focus ring.
  • Contrast — ratio ≥ 4.5:1 for body text (WCAG AA). Check with browser DevTools.
  • Heading hierarchy — don't skip levels (<h1><h3> is wrong). Assistive tech uses headings to navigate.
  • lang attribute — set on <html>, override on elements with different language.

ARIA — when native semantics aren't enough:

<button aria-label="Close">×</button>
<button aria-pressed="false">Mute</button>           <!-- aria-pressed only for toggle buttons -->
<div role="alert">Form submission failed</div>
<img src="info.svg" alt="" aria-hidden="true">
Attribute Purpose
role Override element's implicit role (role="button", role="alert"). Use sparingly — prefer native elements.
aria-label Accessible name when there's no visible text.
aria-labelledby Reference another element's id as the name.
aria-describedby Additional description.
aria-hidden Hide from assistive tech (without display: none).
aria-live Announce dynamic updates (polite, assertive).
aria-expanded, aria-pressed, aria-checked Toggle/pressed/checked state.

First rule of ARIA: don't use ARIA if a native element does the job. <button> is better than <div role="button" tabindex="0">.

10. References