HTML Basics
- Description: HTML document structure, common elements, attributes, forms, semantic markup, and accessibility basics.
- My Notion Note ID: K2A-F1-1
- Created: 2018-03-23
- Updated: 2026-05-17
- License: Reuse is very welcome. Please credit Yu Zhang and link back to the original on yuzhang.io
Table of Contents
- 1. Overview
- 2. Document Structure
- 3. Elements, Tags, and Attributes
- 4. Text Content
- 5. Links and Media
- 6. Lists and Tables
- 7. Forms and Inputs
- 8. Semantic HTML
- 9. Accessibility Basics
- 10. References
1. Overview
HTML = HyperText Markup Language — the document format of the web. Defines content and structure; styling is CSS, behaviour is JavaScript.
- Markup language → annotates content with tags, not a programming language.
- Current spec: HTML Living Standard (WHATWG). HTML5 (2014) was the last numbered version; the spec is now continuously updated.
- Files end in
.html(preferred) or.htm. - Browser parses HTML → builds the DOM tree → CSS computes styles → JS can mutate the tree.
Sibling markup languages: XML, SGML, LaTeX, Markdown. Markdown is often compiled to HTML. XHTML is HTML expressed as strict XML — largely abandoned in favour of the living standard.
2. Document Structure
Minimum valid HTML5 document:
<!DOCTYPE html>
<html lang="en">
<head>
<meta charset="utf-8">
<meta name="viewport" content="width=device-width, initial-scale=1">
<title>Page title</title>
</head>
<body>
<h1>Hello</h1>
<p>First paragraph.</p>
</body>
</html>
| Part | Role |
|---|---|
<!DOCTYPE html> |
Triggers standards mode. Case-insensitive but conventionally lower-case. Without it → quirks mode (buggy layout). |
<html lang="en"> |
Root element. lang improves screen-reader pronunciation, hyphenation, and search ranking. |
<head> |
Metadata: title, charset, viewport, links to CSS, script tags, SEO/OG tags. Not rendered. |
<body> |
Visible content. |
Common <head> entries:
<meta charset="utf-8"> <!-- always UTF-8 -->
<meta name="viewport" content="width=device-width, initial-scale=1"> <!-- mobile -->
<meta name="description" content="..."> <!-- search snippet -->
<link rel="stylesheet" href="/styles.css"> <!-- external CSS -->
<link rel="icon" href="/favicon.ico">
<script src="/app.js" defer></script> <!-- JS, deferred -->
Pitfalls:
- Omit
<meta charset>→ browser may guess wrong encoding, garbled non-ASCII text. - Omit viewport meta → mobile renders desktop-width, then scales down (tiny text).
- Put
<script>withoutdefer/asyncin<head>→ blocks parsing. Either move to end of<body>or usedefer.
3. Elements, Tags, and Attributes
Tag = the markup token (<p>, </p>). Element = opening tag + content + closing tag. Attribute = key/value on the opening tag.
<a href="/about" class="nav-link" id="about-link">About</a>
└─┬─┘ └──┬───┘ └──┬───────┘ └───┬─────────┘ └──┬─┘ └──┬───┘
tag attr attr attr content closing tag
Rules:
- Tags are case-insensitive — lower-case by convention.
- Most elements need a closing tag. Void elements have no content and no end tag:
<area>,<base>,<br>,<col>,<embed>,<hr>,<img>,<input>,<link>,<meta>,<source>,<track>,<wbr>. Trailing slash (<br/>) is allowed but ignored by the HTML parser (it's XHTML/XML syntax). - Nest, don't overlap:
<b><i>...</i></b>✔,<b><i>...</b></i>✘. - Attribute values can be unquoted if alphanumeric, but always quote them — safer with URLs, spaces, special chars.
Global attributes — usable on any element:
| Attribute | Purpose |
|---|---|
id |
Unique identifier within the document. JS getElementById, CSS #id, link target #id. |
class |
Space-separated list of class names. CSS .class, JS classList. |
style |
Inline CSS. Use sparingly — external stylesheets are easier to maintain. |
title |
Tooltip text on hover. Avoid for critical info (no mobile, no keyboard). |
lang |
Language of this element's content. Overrides <html lang>. |
dir |
Text direction: ltr, rtl, auto. |
hidden |
Removes element from rendering and accessibility tree. |
data-* |
Custom data attributes (data-user-id="42") — readable via el.dataset.userId. |
tabindex |
Focus order. 0 = natural order, -1 = focusable via JS only. |
aria-* |
Accessibility attributes — see § 9. |
Comments: <!-- text -->. Cannot nest. Anything inside is ignored by the parser.
4. Text Content
4.1 Headings
<h1>Page title</h1>
<h2>Section</h2>
<h3>Subsection</h3>
...
<h6>Lowest level</h6>
- Six levels,
<h1>to<h6>. Use in document order — don't skip levels. - Each page should have one
<h1>describing the page's main topic. Search engines and assistive tech rely on this hierarchy. - Headings define structure, not size. Use CSS to change appearance.
4.2 Paragraphs and Line Breaks
<p>...</p>— paragraph. Browser collapses whitespace; newlines in source become a single space.<br>— hard line break. Use only within text where the break is part of the content (poetry, addresses). Don't use it for paragraph spacing — use<p>and CSS margins.<hr>— thematic break (horizontal rule). Semantic divider between topics, not just decoration.
4.3 Inline Text Formatting
| Element | Meaning |
|---|---|
<strong> |
Strong importance. Rendered bold by default. |
<em> |
Emphasis. Rendered italic by default. |
<b> |
Bold without added importance — keywords, product names. |
<i> |
Italic without emphasis — taxonomic names, foreign terms. |
<u> |
Underline. Rare — easily confused with links. |
<mark> |
Highlighted text (search match, callout). |
<small> |
Side comments, fine print. |
<del> / <ins> |
Deleted / inserted text (diff-style). |
<sub> / <sup> |
Subscript / superscript (H₂O, x²). |
<code> |
Inline code. |
<kbd> |
Keyboard input (<kbd>Ctrl</kbd>+<kbd>C</kbd>). |
<abbr title="HyperText Markup Language">HTML</abbr> |
Abbreviation with expansion tooltip. |
Prefer <strong>/<em> over <b>/<i> when the meaning is importance/emphasis — screen readers may vocalize the difference.
4.4 Quotations and Pre-formatted Blocks
<blockquote cite="https://example.com/source">
<p>Quoted paragraph.</p>
</blockquote>
<q>Inline quote</q> — renders with quotation marks automatically.
<pre>
Preserves whitespace
and line breaks exactly.
</pre>
<code>inline code</code>
<pre><code>multi-line
code block</code></pre>
4.5 Character Entities
Reserved characters must be escaped:
| Character | Entity |
|---|---|
< |
< |
> |
> |
& |
& |
" |
" |
' |
' (or ') |
| non-breaking space | |
| em dash | — |
| copyright | © |
Forgetting & in URL query strings (?a=1&b=2) is the classic bug — works in browsers but invalid HTML.
5. Links and Media
5.1 Anchors
<a href="/about">Internal link</a>
<a href="https://example.com">External</a>
<a href="#section-3">Same-page link to id="section-3"</a>
<a href="mailto:[email protected]">Email</a>
<a href="tel:+1234567890">Phone</a>
<a href="/file.pdf" download>Force download</a>
<a href="https://example.com" target="_blank" rel="noopener noreferrer">New tab</a>
target="_blank"opens in new tab. Modern browsers (since 2021) implicitly applynoopenersemantics fortarget="_blank", sowindow.openerisnullon the linked page. Still worth addingrel="noopener noreferrer"explicitly — it covers older browsers and thenoreferrerpart strips theRefererheader.- Relative URLs (
/about,../foo) resolve against the current page or<base href>.
5.2 Images
<img src="/photo.jpg" alt="Description of the image" width="800" height="600">
altis required for accessibility and SEO. Emptyalt=""for purely decorative images (screen readers will skip).- Specify
widthandheight— browser reserves layout space, avoids cumulative layout shift (CLS). - Modern formats: WebP, AVIF (smaller than JPEG/PNG).
Responsive images:
<img src="small.jpg"
srcset="small.jpg 400w, medium.jpg 800w, large.jpg 1600w"
sizes="(max-width: 600px) 100vw, 50vw"
alt="...">
<picture>
<source media="(prefers-color-scheme: dark)" srcset="logo-dark.png">
<source media="(min-width: 800px)" srcset="logo-wide.png">
<img src="logo.png" alt="Logo">
</picture>
srcset = browser picks best resolution. <picture> = browser picks first matching <source>, falls back to <img>.
5.3 Audio and Video
<audio src="/song.mp3" controls></audio>
<video src="/clip.mp4" controls poster="/thumb.jpg" width="640" height="360">
<source src="/clip.webm" type="video/webm">
<source src="/clip.mp4" type="video/mp4">
<track kind="subtitles" src="/clip.en.vtt" srclang="en" label="English">
Your browser doesn't support video.
</video>
controls— show play/pause UI. Without it, JS must drive playback.autoplayrequiresmutedon most browsers (Chrome, Safari) — autoplay-with-sound is blocked.- Multiple
<source>— browser picks the first supported format.
5.4 Embedded Content
<iframe src="..." sandbox>— embed another page.sandboxrestricts capabilities (no scripts, no forms) by default; whitelist withsandbox="allow-scripts allow-same-origin".<canvas>— bitmap drawing surface for 2D/3D graphics (drive via JS).<svg>— inline vector graphics. Stylable with CSS.
6. Lists and Tables
6.1 Lists
<ul> <!-- unordered: bullets -->
<li>Item</li>
<li>Item</li>
</ul>
<ol start="3" reversed> <!-- ordered: numbered, start from 3, descending -->
<li>Step</li>
</ol>
<dl> <!-- description list: term/definition pairs -->
<dt>Term</dt>
<dd>Definition</dd>
</dl>
Nest by putting another <ul>/<ol> inside an <li>.
6.2 Tables
<table>
<caption>Quarterly revenue (USD)</caption>
<thead>
<tr>
<th scope="col">Quarter</th>
<th scope="col">Revenue</th>
</tr>
</thead>
<tbody>
<tr>
<th scope="row">Q1</th>
<td>$10,000</td>
</tr>
<tr>
<th scope="row">Q2</th>
<td>$15,000</td>
</tr>
</tbody>
<tfoot>
<tr>
<th scope="row">Total</th>
<td>$25,000</td>
</tr>
</tfoot>
</table>
<caption>— table summary, read by screen readers.<th scope="col|row">— declares header role, links data cells to their header.- Use tables for tabular data only, never for page layout (CSS grid/flexbox handle layout).
colspan,rowspan— span cells across columns/rows.
7. Forms and Inputs
<form action="/api/contact" method="post" enctype="multipart/form-data">
<label for="name">Name</label>
<input type="text" id="name" name="name" required>
<label for="email">Email</label>
<input type="email" id="email" name="email" required>
<label for="msg">Message</label>
<textarea id="msg" name="msg" rows="4"></textarea>
<button type="submit">Send</button>
</form>
<form> attributes:
| Attribute | Purpose |
|---|---|
action |
URL to submit to. Empty → current URL. |
method |
get (default — params in URL) or post (params in body). |
enctype |
application/x-www-form-urlencoded (default), multipart/form-data (file uploads), text/plain. |
novalidate |
Skip built-in validation. |
7.1 Input Types
type |
Use |
|---|---|
text |
Single-line text. |
password |
Masked text. |
email |
Validates email format. Mobile shows email keyboard. |
url |
Validates URL format. |
tel |
Mobile shows numeric keypad. No format validation — phone formats vary. |
number |
Numeric only. min, max, step attributes. |
range |
Slider. |
date, time, datetime-local, month, week |
Date/time pickers. |
color |
Colour picker. |
file |
File picker. Add multiple for several, accept="image/*" to filter. |
checkbox |
On/off. Group multiple with same name. |
radio |
Single choice within a group (same name). |
hidden |
Send a value without showing it. |
submit |
Submit button (<button> is preferred — more flexible). |
Other input attributes: required, disabled, readonly, placeholder, pattern="[regex]", minlength, maxlength, autocomplete (e.g. email, current-password, one-time-code).
7.2 Labels and Grouping
<label for="name">Name</label>
<input id="name" name="name">
<!-- or wrap, no need for `for` -->
<label>Name <input name="name"></label>
<fieldset>
<legend>Shipping address</legend>
<label>Street <input name="street"></label>
<label>City <input name="city"></label>
</fieldset>
<label>linked to<input>viafor/id— clicking the label focuses the input.- Crucial for screen readers; without a label, the input has no accessible name.
<fieldset>+<legend>— group related inputs (visually and semantically).
7.3 Select and Textarea
<select name="country">
<option value="">--</option>
<optgroup label="Europe">
<option value="fr">France</option>
<option value="de" selected>Germany</option>
</optgroup>
<optgroup label="Asia">
<option value="jp">Japan</option>
</optgroup>
</select>
<textarea name="bio" rows="5" cols="40" maxlength="200"></textarea>
Add multiple to <select> for multi-select.
7.4 Buttons
<button type="submit">Send</button> <!-- submit (default in a form) -->
<button type="reset">Reset</button> <!-- clear form -->
<button type="button" onclick="...">Click</button> <!-- inert; needs JS -->
Always set type explicitly — the default inside <form> is submit, which surprises people when a non-submitting button submits the form.
7.5 Built-in Validation
<input type="email" required> <!-- must be filled, must look like email -->
<input type="number" min="0" max="100"> <!-- range constraint -->
<input pattern="\d{5}" title="5 digits"> <!-- regex constraint -->
Browser blocks submit and shows a tooltip if invalid. Style invalid inputs with the :invalid CSS pseudo-class. For richer validation, use the Constraint Validation API (input.validity, input.setCustomValidity()).
8. Semantic HTML
Use elements that describe their content, not generic <div>/<span>. Benefits: screen-reader navigation, SEO, more readable markup.
<body>
<header>
<h1>Site name</h1>
<nav>
<ul>
<li><a href="/">Home</a></li>
<li><a href="/about">About</a></li>
</ul>
</nav>
</header>
<main>
<article>
<header>
<h1>Article title</h1>
<p>By Author, <time datetime="2026-05-17">May 17, 2026</time></p>
</header>
<section>
<h2>Section heading</h2>
<p>...</p>
</section>
<aside>
<p>Sidebar content related to the article.</p>
</aside>
<footer>
<p>Article footer (tags, related links).</p>
</footer>
</article>
</main>
<footer>
<p>Site footer.</p>
</footer>
</body>
| Element | Meaning |
|---|---|
<header> |
Introductory content of its nearest section/article (or page). |
<nav> |
Major navigation block. Not for every group of links. |
<main> |
Primary content of the document. Exactly one per page. |
<article> |
Self-contained, independently distributable content (blog post, comment, product card). |
<section> |
Thematic grouping of content with its own heading. |
<aside> |
Tangentially related content (sidebars, callouts). |
<footer> |
Footer of section/article/page (copyright, author info, links). |
<figure> + <figcaption> |
Self-contained media with caption. |
<time datetime="..."> |
Machine-readable date/time. Use ISO-8601 in datetime. |
<address> |
Contact info for the nearest <article> or <body> ancestor (page/article author or organisation). |
<details> + <summary> |
Native expand/collapse widget — no JS needed. |
<dialog> |
Modal/non-modal dialog. Use dialog.showModal() in JS. |
<div> and <span> are last-resort fallbacks when no semantic element fits. <div> is block, <span> is inline.
Pitfalls:
- More than one
<main>per page → invalid. <section>should normally have a heading. If no heading fits, use<div>instead — sectioning elements without headings are valid HTML but discouraged.- Nested
<header>/<footer>are fine — they scope to their nearest sectioning element.
9. Accessibility Basics
The web is read by humans, screen readers, search bots, and scrapers. Accessible markup helps them all.
- Semantic elements first —
<button>,<nav>,<main>,<label>carry meaning. Avoid<div onclick>button impostors. - Always provide
alton<img>. Decorative?alt="". Informative? Describe the content. - Label every form control —
<label for>or wrapping. Inputs without labels are unusable on screen readers. - Keyboard navigation — every interactive element must be focusable and operable via keyboard. Native elements do this automatically; custom widgets need
tabindex+ key handlers. - Focus indicators — never
outline: nonewithout providing a replacement. Sighted keyboard users rely on the focus ring. - Contrast — ratio ≥ 4.5:1 for body text (WCAG AA). Check with browser DevTools.
- Heading hierarchy — don't skip levels (
<h1>→<h3>is wrong). Assistive tech uses headings to navigate. langattribute — set on<html>, override on elements with different language.
ARIA — when native semantics aren't enough:
<button aria-label="Close">×</button>
<button aria-pressed="false">Mute</button> <!-- aria-pressed only for toggle buttons -->
<div role="alert">Form submission failed</div>
<img src="info.svg" alt="" aria-hidden="true">
| Attribute | Purpose |
|---|---|
role |
Override element's implicit role (role="button", role="alert"). Use sparingly — prefer native elements. |
aria-label |
Accessible name when there's no visible text. |
aria-labelledby |
Reference another element's id as the name. |
aria-describedby |
Additional description. |
aria-hidden |
Hide from assistive tech (without display: none). |
aria-live |
Announce dynamic updates (polite, assertive). |
aria-expanded, aria-pressed, aria-checked |
Toggle/pressed/checked state. |
First rule of ARIA: don't use ARIA if a native element does the job. <button> is better than <div role="button" tabindex="0">.
10. References
- HTML Living Standard — https://html.spec.whatwg.org/
- MDN HTML reference — https://developer.mozilla.org/en-US/docs/Web/HTML
- WCAG 2.2 — https://www.w3.org/TR/WCAG22/
- WAI-ARIA Authoring Practices — https://www.w3.org/WAI/ARIA/apg/
- Wikipedia, Markup language — https://en.wikipedia.org/wiki/Markup_language