SCANF

// user input · &a address-of · 12 edge cases · SIGSEGV · undeclared variables

v1 · scanfv1.c — the working program

The correct usage of scanf to read a single integer from the user. Every part of it explained — what scanf is, why &a is required, what %d does, and why you deliberately left out \n on the prompt.

scanf %d format specifier & address-of operator pointers stdin stdout buffering scanf return value
Working Program → scanfv1.c

scanf("%d", &a) — reading an integer from stdin into a local variable

scanfv1.c — exact source
// We are going to get user input! Interesting.
// scanf : function in the standard input output lib.
// Users can provide NON SPACE SEPARATED INPUT.

#include <stdio.h>

int main(void) {

  int a;

  printf("Enter a number : "); // not using \n — for input visual purposes
  scanf("%d", &a);           // &a : the ADDRESS of a. scanf writes to it.
                              // similar to borrowing in Rust, except raw pointer
  printf("You entered this number groot : %d\n", a);
  return 0;
}
normal run
 ./scanfv1
Enter a number : 234
You entered this number groot : 234

What scanf is — and why it is the opposite of printf

scanf is the input counterpart to printf: Both are variadic functions declared in stdio.h and implemented in libc. printf takes a format string and a list of values — it reads them and writes formatted text to stdout. scanf takes a format string and a list of pointers — it reads formatted text from stdin and writes parsed values into the memory locations the pointers refer to.

Why scanf takes pointers and printf takes values: In C, function arguments are passed by value — a copy is made. If scanf took int a directly, it would receive a copy. Writing to the copy would change nothing in your main. To modify a variable in the caller's scope, you must give the callee the variable's address. scanf receives &a, dereferences it, and writes the parsed integer directly into the memory location where a lives on your stack. When scanf returns, a in main has the new value.

Your Rust comparison was accurate: "similar to borrowing a variable in Rust, except here we call it using the actual name: reference." In Rust, &mut a creates a mutable reference — a compile-time-checked pointer with borrow rules enforced by the compiler. In C, &a creates a raw pointer — the address of a, with no safety guarantees. The concept is identical (give the callee a way to write into your variable), but Rust enforces the rules at compile time while C trusts you completely.

What %d tells scanf to do: The format specifier %d instructs scanf to: (1) skip any leading whitespace bytes (spaces, tabs, newlines — 0x20, 0x09, 0x0A), (2) read an optional minus sign, (3) read consecutive digit characters and accumulate them into an integer value, (4) stop at the first byte that is not a digit, (5) convert the accumulated characters to a 32-bit signed integer, (6) write it to the int * argument provided. The non-matching byte that terminated parsing is left in stdin for the next read.

printf without \n — the deliberate choice: printf("Enter a number : ") leaves the cursor at the end of the colon and space. The user types their input right there on the same line — which is the natural feel of an inline prompt. Adding \n would move the cursor to the next line, making the prompt and the input appear on separate lines. The subtle risk: stdout is line-buffered by default when connected to a terminal. Without \n, the prompt text might stay in the buffer and not appear before scanf blocks. In practice this works because scanf triggers a flush when it needs to read. For absolute safety, add fflush(stdout); after the prompt printf.

What scanf returns — the thing you should always check: scanf returns the number of items successfully matched and assigned as an int. For scanf("%d", &a): returns 1 on success, 0 if the input did not match %d at all (typing letters, just Enter, etc.), and EOF (-1) if the input stream ended (Ctrl+D on Linux) before any conversion. In your code the return value is ignored. Production code always checks: if (scanf("%d", &a) != 1) { /* handle parse failure */ }. Without this check, a failed parse leaves a uninitialized and the program continues silently with garbage data.

Edge Cases · 12 Runs from scanfv1.output

Every edge case you listed in the source comments, run and recorded in scanfv1.output. Each result explained at the level of what scanf's internal parsing actually did.

integer parsing stdin buffer whitespace skipping integer overflow strtol saturation format mismatch float truncation octal literals
Input Output Verdict What scanf did
234 234 ✓ correct Skipped no whitespace. Read digits 2, 3, 4. Hit EOF. Converted to 234. Wrote to &a. Returned 1.
-234 -234 ✓ correct Read leading - (valid for %d). Read digits 2, 3, 4. Converted to -234. Negative integers work fine.
234 256436 24562 234 ✓ expected %d stops at whitespace. Read 234, hit the space, stopped. The remaining " 256436 24562\n" stayed in the stdin buffer — unread, available for the next scanf call.
-234 -436 -0 -234 ✓ expected Same as above. First token only. The space after -234 terminated parsing. Remaining input unread.
(Enter key only) waits... then accepts 234 ⚠ blocks %d skips leading whitespace — newline (0x0A) is whitespace. Pressing Enter gives scanf a byte to discard, not one to parse. scanf went back to blocking on read(). Only a digit or non-whitespace breaks the loop. It never silently accepts empty input.
290375398759087210985710987 -1 ⚠ overflow scanf uses strtol() internally. The value exceeded LONG_MAX (9,223,372,036,854,775,807 on x86_64). strtol saturated at LONG_MAX (0x7FFFFFFFFFFFFFFF) and set errno=ERANGE. scanf assigned LONG_MAX to your int a — a 64-bit value truncated to 32 bits: 0xFFFFFFFF = -1 in two's complement. Undefined behavior, but -1 is what this implementation produced.
-2347896587369782134687693876 0 ⚠ overflow Huge negative — strtol saturated at LONG_MIN (0x8000000000000000). Truncated to 32-bit int: 0x00000000 = 0. Different saturation value than the positive case, different truncation result.
lskadfjalk 0 ⚠ parse fail First character 'l' is not a digit or minus sign. scanf failed to match %d. Returned 0 (zero successful conversions). Did not write to &a. a was uninitialized — its stack slot happened to contain zero bytes (OS zero-fills fresh pages). Undefined behavior printed as 0.
-kjafda 0 ⚠ parse fail Leading - is syntactically valid for a negative integer. scanf consumed it, then saw 'k' — not a digit. No digits were read after the minus. scanf produced a matching failure. Same result as pure string input.
234.234 234 ✓ truncated %d parsed digits 2, 3, 4. Hit the '.' — not a digit. Stopped. Returned 234. The ".234" remains in stdin. Float input is always truncated toward zero by %d.
0324.2342 324 ✓ decimal Leading zero is NOT treated as octal — octal interpretation only happens with integer literals in C source code (e.g. int x = 0324;), not with scanf input. scanf always reads decimal for %d. Read 0324 as decimal 324, stopped at '.'.
-23049.23452352 -23049 ✓ truncated Negative float. Read '-', then digits until '.'. Returned -23049. The fractional ".23452352" stays in stdin. Truncates toward zero, not toward negative infinity: -23049.7 would still give -23049, not -23050.

Deep Explanation — stdin buffering, what stays after scanf, and overflow internals

The stdin buffer and what "unread input" means: When you type at the terminal and press Enter, the entire line — including the newline — goes into a kernel-level line buffer, then into libc's stdio buffer for stdin. When scanf calls read(0, buf, n) internally, the kernel delivers all of that buffered data at once. If scanf only consumes part of it (stopping at a space or a period), the rest stays in libc's buffer. The next call to any stdin-reading function — scanf, getchar, fgets — immediately reads from the leftover buffer without blocking for new input. This is the source of a very common bug in C programs that alternate between reading integers and reading characters: the leftover newline from a previous read gets consumed by the next getchar() call, skipping the intended input entirely.

Why the Enter key blocks scanf instead of submitting empty: The newline character (0x0A) is whitespace, and %d skips all leading whitespace. There is no concept of "empty integer input" in scanf — it will keep skipping whitespace indefinitely until it finds a non-whitespace character to parse. If you need to handle empty input gracefully, use fgets() to read a whole line, then parse it with sscanf() or check if it is empty before parsing.

Integer overflow — the strtol internals: scanf's %d uses strtol() internally to do the actual parsing. strtol reads the character sequence and accumulates the value as a long (64-bit on x86_64). When the accumulated value would exceed LONG_MAX, strtol clips it to LONG_MAX (0x7FFFFFFFFFFFFFFF) and sets errno = ERANGE. When it would go below LONG_MIN, it clips to LONG_MIN (0x8000000000000000). scanf then assigns this saturated long value to your int * argument — implicitly truncating from 64 to 32 bits. LONG_MAX truncated: 0x7FFFFFFFFFFFFFFF → low 32 bits = 0xFFFFFFFF = -1. LONG_MIN truncated: 0x8000000000000000 → low 32 bits = 0x00000000 = 0. That is exactly what you observed.

The octal observation for 0324: In C source code, an integer literal starting with 0 is interpreted as octal. int x = 0324; sets x to 212 decimal (3×64 + 2×8 + 4). But this interpretation happens at compile time by the lexer — it only applies to literal values written in source code. scanf reads characters at runtime and converts them as decimal for %d regardless of leading zeros. Use %o if you want scanf to parse octal input.

The one thing missing from these runs: You never checked scanf's return value. For the failed parses (string input, leading minus with no digits), scanf returned 0 — but your code had no way to know. if (scanf("%d", &a) != 1) { fprintf(stderr, "bad input\n"); return 1; } is the minimal correct pattern going forward.
Error · scanfv2.c — &a on an undeclared variable

int a; was commented out. &a still used. Compile-time error. You also questioned the compiler's deduplication note — this tab explains exactly what it means.

undeclared identifier symbol table error deduplication cascade errors -Wunused-but-set-variable
Error Documented → errors/scanfv2.c.error

'a' undeclared — and the compiler's deduplication note

scanfv2.c — the only change from v1
int main(void) {

  //int a;   ← commented out. a no longer exists.

  printf("Enter a number : ");
  scanf("%d", &a);   // ← 'a' was never declared
  printf("You entered this number groot : %d\n", a);
  return 0;
}
gcc output — exact
scanfv2.c: In function 'main':
scanfv2.c:11:16: error: 'a' undeclared (first use in this function)
   11 |   scanf("%d", &a);
      |                ^
scanfv2.c:11:16: note: each undeclared identifier is reported only once
                       for each function it appears in

What "undeclared" means at the compiler level

The symbol table: The C compiler maintains a symbol table — a data structure mapping names to their type, storage class, and location. When you write int a;, the compiler adds an entry: name = "a", type = int, storage = stack frame, offset = (calculated by the compiler). Every subsequent reference to a is looked up in this table to determine its type (for type-checking) and location (to generate the correct machine code instruction).

What happens when the declaration is missing: When the compiler encounters &a on line 11, it looks up "a" in the symbol table. There is no entry — you commented out the declaration. The compiler cannot produce the address of something that does not exist in its symbol table. It cannot determine the type of a, so it cannot verify that &a is a valid int * argument for scanf's %d. It has nothing to work with. It errors.

Answering your question about the deduplication note: "each undeclared identifier is reported only once for each function it appears in — What? So if we are going to use multiple &a, will those not be reported?" Correct — that is exactly what the note says. If you used &a ten times in main, gcc would report the undeclared error only at the first occurrence, on line 11. The remaining nine uses would be silently skipped in the error output.

Why gcc deduplicates: Once gcc knows "a" is undeclared, every subsequent use of "a" in the same function is a direct consequence of the exact same root cause — the missing declaration. Reporting it ten times would produce ten near-identical error lines, bury other unrelated errors below them, and force you to scroll past the noise to find other problems. By reporting once and noting the deduplication policy, gcc is telling you: "fix the declaration, and all ten references are fixed simultaneously." This is the same cascade-suppression philosophy you saw in 1.2's fartocelv1, where one missing semicolon produced a waterfall of "undeclared" errors for every variable in the declaration.

Why this is a hard compile error and not a warning: Using an undeclared name is an unrecoverable situation. The compiler cannot guess what type a was supposed to be. In K&R C (pre-C99), undeclared names in some contexts were implicitly treated as int — the infamous "implicit int" rule. C99 removed this. Under -std=c11, every name must be declared before use. No declaration, no code generation.

The rule in practice: When you get a cascade of undeclared errors, look at the first error in the list — not the last, not the middle. Fix the first one and recompile. Ninety percent of the subsequent errors will disappear because they were all downstream of the same missing declaration.
Crash · scanfv3.c — scanf("%d") with no &a → SIGSEGV

&a was removed. scanf("%d") was called with no destination pointer. Compiled without -Wall — no warnings. Ran. Segfaulted immediately on integer input. Survived on string input. You asked: "is that a segmentation error? SEGV?" — this tab answers fully.

SIGSEGV segmentation fault garbage pointer MMU page fault variadic arguments x86_64 calling convention -Wformat missing write vs no-write code path
Critical Error → errors/scanfv3.c.error

Compiled silently — crashed at runtime — fish reported SIGSEGV

scanfv3.c — the only change from v1
int main(void) {

  int a;

  printf("Enter a number : ");
  scanf("%d");    // ← &a removed. no destination pointer provided.
  printf("You entered this number groot : %d\n", a);
  return 0;
}
compiled without flags — then three inputs tested
 gcc -o scanfv3 scanfv3.c
  (no output — compiled silently. no -Wall, no -Wformat check.)

 ./scanfv3
Enter a number : 234234
fish: Job 1, './scanfv3' terminated by signal SIGSEGV (Address boundary error)

 ./scanfv3.e.bin
Enter a number : sf
You entered this number groot : 0   ← string input: no crash!

 ./scanfv3.e.bin
Enter a number : 234.234
fish: Job 1, './scanfv3.e.bin' terminated by signal SIGSEGV (Address boundary error)

 ./scanfv3.e.bin
Enter a number : 82375902837523857023985709847982348234932478324789234789234
fish: Job 1, './scanfv3.e.bin' terminated by signal SIGSEGV (Address boundary error)

What SIGSEGV is — from the x86_64 ABI to the kernel signal

What happens without &a — the ABI perspective: On x86_64, variadic function arguments are passed in registers first (rdi, rsi, rdx, rcx, r8, r9 for integer/pointer arguments), then on the stack if there are more. When you call scanf("%d"), the compiler puts the address of the format string into rdi (first argument). That is it. There is no second argument. scanf receives the format string, parses it, finds %d, and internally reaches for the pointer it expects — the second argument. It reads the value that happens to be in rsi at that moment. rsi was not set by your call — it contains whatever value it held from the last function call before scanf. That is a random 64-bit value. scanf treats it as a memory address and attempts to write the parsed integer there.

Why writing to a random address causes SIGSEGV: Every process on Linux runs with virtual memory. The kernel maintains a page table mapping virtual addresses to physical memory frames. Only addresses that are mapped in the page table are valid to read or write. When scanf issues a store instruction to the garbage address in rsi, the CPU's Memory Management Unit (MMU) walks the page table, finds no valid mapping for that address (or finds a read-only mapping), and raises a page fault exception. The kernel's page fault handler examines the fault: can it be resolved? (No — there is no mapping, or the mapping is read-only, and this was not a legitimate access.) It sends signal 11 — SIGSEGV — to the process. The default handler for SIGSEGV terminates the process immediately. Fish reports what happened.

SIGSEGV name and history: "Segmentation violation." The name comes from an older memory model where a process's address space was divided into hardware segments — code segment, data segment, stack segment. Accessing memory outside the bounds of any valid segment triggered a "segmentation violation." On modern x86_64 Linux, hardware segmentation is mostly not used (segments are set to cover the full address space), and the protection is done entirely by the MMU's page tables. But the signal name and its conventional meaning — "you accessed memory you do not own" — persist unchanged.

Why string input ("sf") did not crash — the crucial difference: When you typed sf, scanf tried to match %d. The first character is s — not a digit, not a minus sign. scanf immediately detected a matching failure and returned 0. It never attempted to write anything. No store instruction was issued to the garbage pointer. No memory access, no page fault, no signal. The program continued to the next printf, printed a (uninitialized, stack happened to be zero), and exited normally.

Why float input (234.234) crashed when string input did not: 234.234 starts with 2 — a valid digit. scanf successfully parsed the integer part 234, then attempted to write it to the garbage pointer. That write triggered the page fault. This is the exact difference: string input hits a matching failure before any write occurs; float input passes the matching phase and only fails at the write. Same explanation for the huge overflow number — parsing succeeded (or saturated), write was attempted, crash.

Why it compiled silently without -Wall: In C, calling a variadic function with fewer arguments than the format string expects is not a syntax error — it is a runtime semantic error. The compiler cannot statically check variadic argument counts without help. That help comes from the __attribute__((__format__(__scanf__, 1, 2))) annotation on scanf's declaration in stdio.h, and from the -Wformat diagnostic pass, which is activated by -Wall. You compiled with gcc -o scanfv3 scanfv3.c — no flags. The format-checking pass never ran. With -Wall, gcc would have warned: warning: format '%d' expects a matching 'int *' argument [-Wformat=]. This would have caught the bug before you ran the program. This is the exact argument for always using -Wall -Wextra -Wpedantic.

Your note: "see Memory-1, where I will use objdump -x": That is the right next step. objdump -x ./scanfv3 will show you the ELF section headers and the memory map of the binary — where the code, data, rodata, and bss sections live. Running the broken binary under Valgrind with --track-origins=yes would show you which register contained the garbage pointer and trace it back to where that value originated. Combining objdump's static view with Valgrind's dynamic trace gives you a complete picture of what the memory looked like at the moment of the crash.


Conclusions — scanf
QuestionAnswer
Why does scanf need &a and not a? scanf writes a value into a memory location. It needs an address. a is a value; &a is the address of a on your stack. Without &, scanf reads garbage from the register and tries to write to that address — SIGSEGV.
What does %d do with float input like 234.234? Reads digits until the first non-digit. The . stops parsing. Returns 234. The .234 stays in stdin. Truncates toward zero — not toward negative infinity.
Why did string input not crash scanfv3, but integer input did? String input causes a format mismatch — scanf fails before writing. No write = no garbage-pointer dereference = no SIGSEGV. Integer input succeeds in parsing, then writes to the garbage pointer — crash.
What does scanf return, and why does it matter? Count of successfully assigned items: 1 on success, 0 on mismatch, EOF (-1) on end-of-stream. Always check it. Ignoring it means a failed parse leaves variables uninitialized and the program continues silently with garbage.
Why did the broken scanfv3 compile without error? Missing variadic arguments are not a syntax error — only detectable with -Wformat (part of -Wall). Compiled without any flags: no check ran. Always use -Wall -Wextra -Wpedantic -std=c11.
What is SIGSEGV? Signal 11. Sent by the kernel when a process accesses a virtual address with no valid page-table mapping, or a read-only mapping on a write. The MMU raises a page fault; the kernel cannot resolve it; it kills the process with SIGSEGV.
The permanent rule: Every scalar argument to scanf takes &. Always. scanf("%d %d", &a, &b) — two ints, two addresses. The only exception is char[] buffers — they already decay to a pointer and do not take &. That case comes later when strings are covered.
← Back to hello_world