Basics · Ch1 · C-Phase-1

1.1 — Hello World

The anatomy of a C program. What every line actually does at the system level.

preprocessor #include main(void) printf escape sequences return codes exit status gcc flags compilation stages

v1 — incomplete

helloworld.c

Uses main() — unspecified args. No return statement. Works due to C99 implicit return, but not correct form.

v2 — partial fix

helloworld.another.c

Adds explicit return 0; but still uses main() instead of main(void).

v3 — final correct

helloworld.modern.c

int main(void) — explicit no-args, modern, pedantic-clean. The correct form.

error documented

errors/helloworld.c.error

No return type at all — main() { — triggers -Wimplicit-int error.

Version 1 → helloworld.c

First attempt — int main() with no return statement

helloworld.c — exact source

// This is the comment in C

#include <stdio.h>
// This is the standard input output library, here exists a function named printf
// from which we will be able to output to the standard output

int main() {
    printf("Hello World!\n");
    printf("Ah, the escape characters...!\n");
}

// What, wait? what about the return code? The above code still executes?
// Why is that executing? Huh. See the helloworld.another.c

// Deep Explanation — what each line does at the system level

#include <stdio.h> — This is not C code. It is a preprocessor directive. Before gcc compiles anything, a separate program called the C preprocessor (cpp) runs first. It sees #include <stdio.h> and literally copies the entire contents of /usr/include/stdio.h into your source file in memory before compilation begins. The angle brackets mean "look in system include paths." Quotes #include "myfile.h" mean "look in current directory first." stdio.h contains the declaration of printf — it tells the compiler "printf exists, takes a const char* and returns int" — but the actual implementation lives in libc, linked separately at the end.

int main() — Empty parentheses in C mean "unspecified number of arguments of unspecified types" — NOT "no arguments." This is a historical artifact from K&R C (1978) where you could write a function without specifying its parameters. The compiler accepts it but cannot type-check calls to it. In C++, empty parentheses DO mean "no arguments." C and C++ differ here — a subtle but important distinction.

printf("Hello World!\n") — printf is a variadic function (takes variable number of arguments) declared in stdio.h. The string literal "Hello World!\n" is stored in the read-only data section (.rodata) of the compiled binary. \n is an escape sequence — the backslash tells the compiler "the next character is special." \n becomes the single byte 0x0A (ASCII line feed). The terminal interprets this byte as "move cursor to start of next line." Without \n, consecutive printf calls print on the same line with no separation.

No return statement — The function is declared int main(), promising to return an integer. No return statement exists. In C89, reaching the end of a non-void function without returning was undefined behavior. The C99 standard made a special exception for main only: if execution reaches the closing brace of main without a return statement, the runtime automatically returns 0. This exception applies ONLY to main. Every other function that promises a return value but doesn't provide one is undefined behavior.

Version 2 → helloworld.another.c

Explicit return 0 added — why does the OS care?

helloworld.another.c — exact source

// This is the comment in C

#include <stdio.h>

int main() {
    printf("Hello World!\n");
    printf("Ah, the escape characters...!\n");
    return 0;
}

// Deep Explanation — return codes, exit status, and the OS

return 0; — When main returns, the C runtime (startup code in crt0.o/crt1.o that is linked into every C program) takes that return value and passes it to the operating system via the exit() system call. The kernel stores this as the exit status of the process. By Unix convention: 0 = success, any non-zero = failure. This is what makes shell scripting work: gcc ... && ./prog — the && operator checks the exit code and only runs the second command if the first returned 0.

How the shell sees it — The shell stores the last exit code in $?. Try: ./helloworld; echo $? — you see 0. Change return 0 to return 42 and you see 42. This is how programs communicate success or failure to the programs that called them. It's a fundamental IPC (inter-process communication) mechanism.

The process lifecycle — When you run a program, the shell calls fork() to create a child process, then execve() to load your program into it. When your program calls return 0 from main, the C runtime calls exit(0), which calls the kernel's _exit() syscall. The kernel marks the process as a zombie (keeping the exit status), notifies the parent (the shell), and the shell calls wait() to collect the exit status and store it in $?.

Final Solution → helloworld.modern.c

int main(void) — explicit, modern, correct

helloworld.modern.c — exact source

// This is the comment in C

#include <stdio.h>

int main(void) { // without this void, C understands that any number of
                 // arguments can be passed. That is dangerous.
    printf("Hello World!\n");
    printf("Ah, the escape characters...!\n");
}

// With void passed into main() -> this implies that function main
// explicitly accepts zero arguments.

// Deep Explanation — void, prototypes, and type safety

int main(void) — The void in the parameter list is an explicit declaration: "this function takes no arguments." This is enforced by the compiler. If you try to call main with arguments, it is a compile-time error. This is type-safe and self-documenting.

Why int main() is dangerous — An unprototyped function declaration main() tells the compiler nothing about parameters. The compiler cannot check calls to it. In old C code, this led to functions being called with the wrong number or types of arguments, with no error — just undefined behavior at runtime. The -Wpedantic flag warns about unprototyped function declarations specifically for this reason.

The two valid signatures for main — The C standard defines exactly two forms: int main(void) for programs that don't use command-line arguments, and int main(int argc, char *argv[]) for programs that do. The second form is how the OS passes command-line arguments: argc is the argument count (including the program name itself), argv is an array of null-terminated strings. When you run ./prog arg1 arg2, argc=3 and argv=["./prog", "arg1", "arg2", NULL]. Everything else is implementation-defined behavior.

Error Documented → errors/helloworld.c.error

No return type — implicit int error from K&R era

broken source + compiler output — exact

// broken — no return type before main
main() {
    printf("Hello World!\n");
    printf("Ah, the escape characters...!\n");
}

helloworld.c:6:1: error: return type defaults to 'int' [-Wimplicit-int]
    6 | main() {
      | ^~~~

// your notes from the error file:
// "Each C has the int main(void).. return 0;"
// "If the program succeeds, there will be a return code with zero."
// "Any non zero return code implies failure."

// Deep Explanation — implicit int and why C99 removed it

Where implicit int came from — In the original K&R C (1978, before any ISO standard), there was an "implicit int" rule: if you omitted a type, the compiler assumed int. This was a convenience from an era when C was written on machines with very limited memory and terseness was prized. So main() { } implicitly meant int main() { }.

Why it became dangerous — On early 32-bit systems, pointers and int were both 32 bits, so implicit int for a function returning a pointer caused no immediate problem. On 64-bit x86_64, pointers are 64 bits but int is 32 bits. If a function was supposed to return a pointer but implicitly returned int, the 64-bit pointer would be silently truncated to 32 bits. The upper 32 bits are lost. Dereferencing this truncated pointer causes a segmentation fault or silent memory corruption. The -Wimplicit-int flag catches this. With -std=c11 it becomes an error, not just a warning.

Your note was exactly right — "Each C has the int main(void).. return 0;" — yes. Every well-formed modern C program starts with this exact signature and ends main with an explicit return.

From instructions.md — gcc flags, deep dive

gcc -Wall -Wextra -Wpedantic -std=c11 -o helloworldv2 helloworld.c

Flag	What it does	Deep detail
gcc	GNU C Compiler driver	Not the compiler itself — a frontend that orchestrates 4 programs in sequence: cpp (preprocessor) → cc1 (compiler) → as (assembler) → ld (linker)
-Wall	Enable common warnings	Misleadingly named — not literally ALL warnings. Enables the most impactful set: unused variables, missing returns, suspicious type comparisons, format string mismatches, uninitialized variables
-Wextra	Enable extra warnings	Catches subtler issues beyond -Wall: unused function parameters, sign/unsigned comparison, empty loop bodies, missing field initializers in structs
-Wpedantic	Strict ISO C conformance	Warns about anything gcc-specific or non-standard. Catches main() vs main(void), // comments in C89 mode, GNU extensions. Makes code portable across all C compilers.
-std=c11	Compile as C11 (ISO/IEC 9899:2011)	Enforces the 2011 C standard. C11 removed implicit int, added _Generic, _Atomic, thread support. Options: c99 (1999), c11 (2011), c17 (2017), gnu11 (c11 + GNU extensions)
-o helloworldv2	Name the output binary	Without -o, gcc always writes to a.out — a historical name from early Unix meaning "assembler output." Every compile would silently overwrite the same file.
helloworld.c	Input file — must be last	gcc infers language from extension: .c = C, .cpp/.cc = C++, .s = assembly, .o = object file. You can mix object files and source files in one command.

// Deep Explanation — the 4 compilation stages

Stage 1 — Preprocessing (cpp): All #include directives are expanded (file contents inserted inline), #define macros are substituted, #ifdef conditionals are evaluated. Output is a single giant .c file. See it: gcc -E helloworld.c — you'll see hundreds of lines from stdio.h inserted into your tiny program.

Stage 2 — Compilation (cc1): The actual C compiler translates preprocessed C into assembly language for your CPU (x86_64 on your machine). This is where syntax checking, type checking, warning generation, and optimization happen. Output is a .s file. See it: gcc -S helloworld.c — you'll see the printf call become a call printf instruction, the string constant become a label in .rodata.

Stage 3 — Assembly (as): The GNU assembler converts assembly text into machine code — raw binary CPU instructions. Output is a .o object file. The object file contains compiled code with unresolved references — it knows printf should be called but doesn't yet know its address. See it: gcc -c helloworld.c then objdump -d helloworld.o.

Stage 4 — Linking (ld): The linker takes your .o and the C standard library (libc.so.6 on Linux) and produces the final ELF executable. It resolves all unresolved references (fills in printf's address from libc), adds the startup code (crt1.o) that calls your main, and writes the ELF binary. ELF (Executable and Linkable Format) is what the Linux kernel's execve() syscall loads into memory when you run the program.

Concept — Escape Sequences

What \n actually is — bytes, ASCII, and terminal interpretation

// Deep Explanation — escape sequences at the byte level

Why escape sequences exist — String literals in C are sequences of bytes. Most characters map directly to their ASCII value (a=0x61, A=0x41, space=0x20). But some characters cannot be written literally in source code: a newline would break the string across lines, a tab character is invisible, a null byte would terminate the string early. Escape sequences solve this — two printable characters in source that compile to one special byte.

\n = 0x0A (ASCII Line Feed) — When printf writes 0x0A to stdout, the terminal emulator (Alacritty in your case) interprets it as "move cursor to the start of the next line." This is why your output appears on separate lines. Without \n, all output stays on one line and the next shell prompt appears immediately after your last printed character.

\t = 0x09 (ASCII Horizontal Tab) — Moves cursor to the next tab stop (typically every 8 columns). You'll see this in exercise 1.2: printf("%d\t%d\n", f, c) — the \t creates aligned columns between Fahrenheit and Celsius values.

\0 = 0x00 (Null terminator) — Every C string ends with this byte. C has no string length field — instead it scans memory byte by byte until it finds 0x00. This is why the type is char * (pointer to first char) not a "string object." It's also why buffer overflows are catastrophic in C — if you write past the end of a char array, you overwrite whatever comes after it in memory, potentially including the null terminator of another string or a return address on the stack.

\\ = 0x5C (Backslash) — Since backslash is the escape character, you must escape it with itself to produce a literal backslash in output.

1.2 — Fahrenheit → Celsius

K&R Chapter 1. Five versions, three bugs, two critical systems-level discoveries.

variable declaration int arithmetic while loop printf format strings integer division truncation undefined behavior stack memory x86_64 ABI ASLR

error

fartocelv1.c

Semicolon after #include. Missing ; after variable declaration. Cascade of undeclared errors.

undefined behavior

fartocelv2.c

printf("%d\t%d\n") with no arguments. Reads random stack memory. Garbage output.

wrong formula

fartocelv3.c

5*(32-f)/9 instead of 5*(f-32)/9. Inverted subtraction. Negative results.

final · correct

fartocelv4.c

5*(f-32)/9 — correct formula, correct integer approximation.

integer truncation

fartocelv5.c

(5/9)*(f-32) — 5/9=0 in integer math. 0*anything=0. Every celsius is zero.

Error 1 → errors/fartocelv1.c.error

Semicolon after #include + missing ; → cascade of undeclared errors

fartocelv1.c — exact broken source

// I realised that the comments can be done like this
// And also like that given below.
#include <stdio.h>;   // ← BUG 1: semicolon after preprocessor directive
/* This is a stupid comment in C */
int main(void) {
    // no keyword for declaration in C. Just <type> <varname> = <value>
    // type is a must. Initialise the variable before declaration.
    int f, c, l, h, s  // ← BUG 2: missing semicolon. LSP caught it, ignored intentionally.
    l = 0;
    u = 1000;   // u was never declared — it was meant to be in the int line above
    s = 10;
    f = l;
    while (f <= h) {
        c = 5 * (f - 32) / 9;
        printf("%d\t%d\n", f, c);
        f = f + s;
    }
}

compiler output — exact

fartocelv1.c:4:19: warning: extra tokens at end of #include directive
    4 | #include <stdio.h>;
      |                   ^
fartocelv1.c:17:5: error: expected ';' or '__attribute__' before 'l'
   17 |     l = 0;
      |     ^
fartocelv1.c:18:5: error: 'u' undeclared (first use in this function)
   18 |     u = 1000;
      |     ^
fartocelv1.c:18:5: note: each undeclared identifier is reported only once
fartocelv1.c:19:5: error: 's' undeclared (first use in this function)
   19 |     s = 10;
      |     ^

// Deep Explanation — the cascade effect and how parsers work

#include <stdio.h>; — Preprocessor directives begin with # and end at the newline. There is no semicolon. The preprocessor sees the ; as a stray token after the directive. It warns but continues — the semicolon is dropped. This is harmless here but reveals a misunderstanding: preprocessor directives are not C statements. C statements end with semicolons. Preprocessor directives end with newlines.

The cascade — why one missing semicolon causes many errors — The C parser is a state machine. When it reads int f, c, l, h, s, it enters "variable declaration" state. It expects: a comma (more variables), an equals sign (initializer), or a semicolon (end of declaration). It sees the letter l on the next line instead. The parser gives up on the declaration — it was never successfully parsed. From this point on, f, c, l, h, s do not exist as far as the compiler is concerned.

Why errors point to the wrong place — The error message says "error at line 17" (l = 0;), not "error at line 16" (the missing semicolon). This is because the compiler reports errors where it gives up, not where the mistake is. The parser was still hoping for a semicolon when it read l — that's where it gave up. This is the fundamental rule of reading C compiler errors: the first error in the list is usually closest to the real mistake. Fix it and recompile. Subsequent errors often disappear.

Your experiment — "I see my LSP showing an error, I found that 'missing ;', but let my compiler tell it and let it be an error." — This was good experimental discipline. clangd's real-time analysis (powered by clang's parser) caught the missing semicolon immediately and pointed to the exact line. The gcc batch compiler caught it later and produced confusing cascade errors. This demonstrates precisely why LSP is valuable: incremental parsing with precise error localization, before you ever invoke the compiler.

Error 2 → errors/fartocelv2.c.error — CRITICAL DISCOVERY

printf with no arguments — reading uninitialized stack memory via ABI

fartocelv2.c — exact broken source

#include <stdio.h>
int main(void) {
    int u, l, s, c, f;
    u = 1000;
    l = 0;
    s = 10;
    f = l;
    while (f <= u) {
        c = 5 * (32 - f) / 9;
        printf("%d\t%d\n");  // ← no arguments. format expects 2 ints.
        f = f + s;
    }
    return 0;
}

compiler warnings + actual output (first few lines)

fartocelv2.c:15:18: warning: format '%d' expects a matching 'int' argument [-Wformat=]
   15 |         printf("%d\t%d\n");
      |                 ~^
fartocelv2.c:15:22: warning: format '%d' expects a matching 'int' argument [-Wformat=]
fartocelv2.c:4:18: warning: variable 'c' set but not used [-Wunused-but-set-variable]

// actual output — garbage from registers/stack:
-1575818216    0
 359645200     0
 1024          0
 1024          0
 1024          -1
 1024          -1
... continues with 1024 / -1 for remaining iterations ...

// your note from the file:
// "C cheated. it dumped memory? it is the int range for that int type."

// Deep Explanation — x86_64 ABI, registers, and why printf prints garbage

How function arguments are passed on x86_64 — On Linux x86_64, function calls follow the System V AMD64 ABI (Application Binary Interface). The first 6 integer/pointer arguments are passed in CPU registers in this order: rdi, rsi, rdx, rcx, r8, r9. Additional arguments go on the stack. The return value goes in rax.

What printf("format", arg1, arg2) normally does: puts format string pointer in rdi, arg1 in rsi, arg2 in rdx. printf then reads rsi and rdx as integers.

What printf("format") does with NO arguments: puts format string pointer in rdi. rsi and rdx contain whatever was in them before printf was called. These registers are NOT zeroed between function calls — they contain leftover values from the C runtime startup sequence, previous library calls, or the loop's own code. printf reads them unconditionally because it trusts the format string. It has no way to know arguments are missing.

x86_64 register state when printf is called (no args passed):

rdi = 0x402004     ← pointer to "%d\t%d\n" in .rodata  (correct)
rsi = 0xA6F80B48  ← LEFTOVER from previous code — printf reads as 1st %d
rdx = 0x00000000  ← LEFTOVER — printf reads as 2nd %d

printf prints: lower 32 bits of rsi as signed int, lower 32 bits of rdx as signed int

Why the output stabilizes to 1024 / -1 — The first two lines show highly variable values from the C runtime startup. After that, the loop settles into a predictable state — the same variables and loop overhead occupy the same registers each iteration. The value 1024 (0x400) is a common I/O buffer size in libc. The -1 (0xFFFFFFFF as signed int) is likely a libc internal flag or return value. You're seeing a window into the internal state of your C runtime.

Why "should not it output nothing?" — No. C makes zero safety guarantees. There is no argument count validation, no bounds checking, no null checking at runtime. The format string %d\t%d\n is an unconditional instruction to printf: "read two integers and print them." printf does exactly that. It reads from wherever the ABI says arguments should be. This is C's fundamental design philosophy: trust the programmer completely, do exactly what is specified, no more, no less, no safety net.

The compiler warned you — -Wformat= (included in -Wall) analyzes printf format strings at compile time and verifies argument count and types. It correctly identified both missing arguments. The program compiled anyway because warnings are not errors by default. To make all warnings into errors (recommended in production): add -Werror to your compile flags.

POC → errors/checkthis.c + poc_checkthis.sh

Proof of Concept — 200+ runs proving non-determinism via ASLR

checkthis.c — exact source — single printf no argument

// what will this output?
#include <stdio.h>
int main(void) {
    printf("%d\n");  // one %d, zero arguments provided
    return 0;
}

poc_checkthis.sh — exact source — infinite loop bash script

#!/bin/bash
# run in infinite loop, Ctrl+C to stop
while true; do
    ./checkthis.c.ebin
done

poc_checkthis.sh.output — different value every single run

1440973704
393857512
-918572696
-1627624216
-2031786536
-1151917272
1627043704
-448372728
-58942968
612214376
... all different, all within signed 32-bit int range ...
// your note: "within range of int type, probably int 32,
// so -2 power 32 to 2 power 32 - 1"

// Deep Explanation — ASLR and why values change between runs

Address Space Layout Randomization (ASLR) — Every time a program runs on Linux, the kernel deliberately randomizes the base addresses of the stack, heap, and shared libraries. This is a security feature introduced after the discovery of return-oriented programming attacks: if you don't know where the stack is, you can't reliably overwrite a return address to redirect execution to your shellcode. ASLR is enabled by default on all modern Linux kernels.

How this causes different values each run — The value in rsi when printf is called comes from the C runtime startup sequence (crt1.o code that runs before main). Part of that startup involves loading libc.so at a randomized address (ASLR), processing environment variables, and setting up internal data structures. The specific values left in registers after startup depend on these randomized addresses. Since addresses change each run, the leftover register values change, and printf reads and prints different garbage each time.

Why values are bounded within signed 32-bit int range — printf reads exactly 4 bytes from wherever rsi points, interprets them as a signed 32-bit int (int on x86_64 is 32 bits), and prints the decimal representation. Signed 32-bit int range: -2,147,483,648 to +2,147,483,647. Your observation "within range of int type, probably int 32" — correct in spirit. The exact range is -(2^31) to (2^31 - 1).

What this POC scientifically proved — By running the binary 200+ times and capturing every output, you empirically demonstrated: (1) C has no runtime argument count checking, (2) reading uninitialized registers is non-deterministic due to ASLR, (3) the type of the format specifier determines how many bytes are read and how they are interpreted. This is genuine systems programming research methodology — form a hypothesis, design an experiment, collect data, draw conclusions.

Your original insight: "C cheated. it dumped memory? it is the int range for that int type." — C did not cheat. C did exactly what you told it. You told printf to print an int. It printed an int. The int just came from a register you didn't set. This is the definition of undefined behavior in the C standard: the behavior of a program that violates a language rule is completely undefined — the compiler can do anything. In practice, on x86_64 Linux, "anything" means "reads a CPU register."

Error 3 → fartocelv3.c

5*(32-f)/9 — inverted subtraction — negative results

fartocelv3.c — exact source

#include <stdio.h>
int main(void) {
    int u, l, s, c, f;
    u = 1000;
    l = 0;
    s = 10;
    f = l;
    while (f <= u) {
        c = 5 * (32 - f) / 9;  // ← WRONG: (32-f) should be (f-32)
        printf("%d\t%d\n", f, c);
        f = f + s;
    }
    return 0;
}
// Ah, formula error, that is why the output is in negative.

// Deep Explanation — signed integer arithmetic and formula correctness

The Celsius conversion formula — C = (F - 32) × 5/9. When F=0°F: (0-32)×5/9 = -17.7°C. When F=212°F (boiling): (212-32)×5/9 = 100°C. The formula requires subtracting 32 from Fahrenheit first, then scaling.

With the inverted formula 5*(32-f)/9 — When f=0: 5*(32-0)/9 = 17. When f=100: 5*(32-100)/9 = -37. The output is the negation of the correct answer for all values above 32°F, and the negation of the negation for values below 32°F. No compiler error, no runtime error — just wrong math. This is the hardest class of bug: semantically incorrect but syntactically valid code.

Signed integer arithmetic — C's int is a signed 32-bit integer on your system. It handles negative numbers using two's complement representation. When 32-f is negative (because f > 32), C correctly computes the negative value. There is no error or warning — negative results from subtraction are completely normal. The compiler only warns about potential issues like overflow, not about wrong formulas.

Overflow caution for future reference — With u=1000, the intermediate value 5*(f-32) at f=1000 is 5*968=4840, well within int range. But if you extended the loop to very high Fahrenheit values, the intermediate multiplication could overflow. At f=500,000,000: 5*(500,000,000-32) ≈ 2,500,000,000 which exceeds INT_MAX (2,147,483,647). Signed integer overflow in C is undefined behavior. Unsigned overflow wraps around predictably (defined by the standard), but signed overflow is explicitly undefined.

Final Solution → fartocelv4.c

5*(f-32)/9 — correct formula — multiply first, then divide

fartocelv4.c — exact source — the correct program

#include <stdio.h>
int main(void) {
    int u, l, s, c, f;
    // the compiler does not explicitly tell that I missed this semicolon;
    // but result? they simply became undeclared and broke the remaining stuff.
    // Declaration part here.
    u = 1000;
    l = 0;
    s = 10;
    f = l;
    while (f <= u) {
        c = 5 * (f - 32) / 9;  // multiply FIRST, then divide — fixed-point trick
        printf("%d\t%d\n", f, c);
        f = f + s;
    }
    return 0;
}
// this code is perfect. But wait, let me try some bodmas..

// Deep Explanation — why operator order matters and fixed-point arithmetic

C operator precedence — * and / have the same precedence and are left-associative. So 5 * (f-32) / 9 evaluates as (5 * (f-32)) / 9. The parentheses around f-32 force that subtraction first, then multiply by 5, then divide by 9.

Why multiply before divide is critical — If we wrote 5 / 9 * (f-32): 5/9=0 (integer division), then 0*(f-32)=0 always. By multiplying first, we create a larger numerator: 5*(f-32) at f=100 is 340, and 340/9=37 — a useful integer approximation of 37.7°C.

This is fixed-point arithmetic — A technique for doing fractional math without floating-point numbers. Scale up (multiply by 5), do the math, scale down (divide by 9). The "fixed point" is the implicit fractional position. For more precision, scale by a larger factor: 50*(f-32)/90 gives the same result but intermediately handles tenths. 500*(f-32)/900 handles hundredths. This technique is used in embedded systems, DSP, and anywhere floating-point hardware is unavailable or too slow.

The %d\t%d\n format string — %d prints a signed decimal integer. \t inserts a tab character, creating aligned columns. \n ends the line. The two arguments f and c correspond positionally to the two %d specifiers. First %d prints f, second prints c. This positional correspondence is enforced by the programmer, not the compiler — which is why the fartocelv2.c bug was not a compile error but a runtime disaster.

Discovery → fartocelv5.c — Integer Division Truncation

(5/9)*(f-32) — all zeros — proving integer truncation experimentally

fartocelv5.c — exact source — BODMAS experiment

#include <stdio.h>
int main(void) {
    int u, l, s, c, f;
    u = 1000;
    l = 0;
    s = 10;
    f = l;
    while (f <= u) {
        c = (5/9) * (f - 32);  // testing BODMAS — 5/9 evaluates first
        printf("%d\t%d\n", f, c);
        f = f + s;
    }
    return 0;
}
// this code is perfect. But wait, let me try some bodmas..

fartocelv5.c.output — exact output — every celsius is 0

10    0
20    0
30    0
40    0
50    0
... all 99 rows: celsius = 0 ...
1000  0

// your notes from the output file:
// "I did not expect this output, seems like the 5/9 truncates."
// "0.234...something to just 0"
// "mathematically anything multiplied by zero is zero. Oh, yeah."
// "By this we can tell that 5/9 is not correct. There should probably
//  exist some better library to do math if im not wrong"

// Deep Explanation — integer division at the CPU level, and floating point preview

Integer division at the machine level — The x86_64 instruction for signed integer division is idiv. It takes a 64-bit dividend in the rdx:rax register pair and a divisor operand, and produces a quotient in rax and a remainder in rdx. When the C compiler computes 5/9: it puts 5 in rax, sign-extends to rdx:rax, divides by 9. Quotient = 0 (goes into rax, used as the result). Remainder = 5 (goes into rdx, discarded). The remainder is always discarded in C integer division. No rounding, no warning, no error. 5/9 = 0. Period.

The rule — when does integer vs floating-point math apply? — In any arithmetic expression in C: if both operands are integer types (int, long, etc.), integer arithmetic is used. If either operand is a floating-point type (float, double), floating-point arithmetic is used. So:

5 / 9       = 0        both int → integer division → truncate
5.0 / 9     = 0.5555   one double → floating-point division
5 / 9.0     = 0.5555   one double → floating-point division
(double)5/9 = 0.5555   cast to double → floating-point division
5 * 68 / 9  = 37       multiply first → large numerator → better int approx

Your intuition: "There should probably exist some better library to do math" — Yes, and it's built into the language. float and double are C primitive types that use IEEE 754 floating-point representation. double is 64 bits, gives ~15 significant decimal digits of precision. To use them: declare float c, f; (or double), use 5.0/9.0 or 5.0/9 in your formula, and use %f or %.2f in printf instead of %d. You'll cover this in the next exercise.

Fixed-point recap — The trick in fartocelv4.c (5*(f-32)/9) avoids floating point by multiplying before dividing. This is called fixed-point arithmetic — manipulating integers in a way that preserves fractional precision by keeping the numbers large. It's used everywhere floating-point hardware is absent or too slow: microcontrollers, embedded systems, digital signal processing, game physics engines, financial software (where floating-point rounding errors on currency are unacceptable).

1.3 — Variable Declarations · Exploratory

Not a K&R exercise. A self-directed investigation into C variable declaration syntax — initiated by one question: can you initialise multiple variables the way Rust does? Seven versions, four compiler errors, one accidental discovery about function shadowing, and the first real Valgrind run. Every corner explained.

declaration syntax declaration lists initialisation uninitialised variables undefined behavior stack memory preprocessor implicit function declaration function shadowing linker symbol resolution valgrind memcheck clang diagnostics

error

declarationv1.c

int u,z = 300,400 — tried Python/Rust-style value list. Clang rejected it with two errors.

error

declarationv2.c

int u,z = 300,200;; — added a semicolon following LSP suggestion. Same errors. Confirmed the comma is not a value separator.

wrong assumption

declarationv3.c

int u,z = 300 — expected both to be 300. Output: u=0, z=300. u was uninitialised, not "defaulted".

working

declarationv4.c

Separate int x = val per line. Clean compile, correct output. Identified the verbosity trade-off.

working

declarationv5.c

Everything collapsed to one line. Proved whitespace is irrelevant to the compiler. First deep note on #include.

intentional error

declarationv6.c

#include <stdio.h> commented out. Confirmed printf becomes undeclared. Proved the preprocessor is not optional.

key discovery

declarationv7.c

Defined a stub int printf() {return 0;} instead. Compiled with warnings. Ran. Printed nothing. Valgrind: zero errors. Function shadowing discovered.

Error 1 → errors/declarationv1.c.error

int u,z = 300,400 — the value-list syntax does not exist in C

declarationv1.c — exact source

#include <stdio.h>
int main(void) {
    int v = 100;
    int u,z = 300,400;  // guess this syntax is not valid, the LSP spotted it.
                         // But let it be an error.
    printf("%d\t%d\t%d\n", v, u, z);
}

clang output — exact (you used clang here, not gcc)

clang declarationv1.c -o declarationv1
declarationv1.c:7:19: error: expected identifier or '('
    7 |     int u,z = 300,400;
      |                   ^
declarationv1.c:7:19: error: expected ';' at end of declaration
    7 |     int u,z = 300,400;
      |                   ^
      |                   ;  -- here? What is this telling?
2 errors generated.

// your note: "Ah, wait, it is asking me to insert a semicolon and the error is resolved?"

// Deep Explanation — why this failed and what clang was really telling you

What you were attempting: You wanted the equivalent of Rust's let (u, z) = (300, 400); or Python's u, z = 300, 400. In both those languages, the comma on the left and right sides form a tuple/destructuring assignment — a language feature that maps values positionally. C has no such feature. C's declaration grammar is fundamentally different.

How the C declaration grammar actually works: In C, a declaration has the form type declarator-list ;. The declarator-list is a comma-separated sequence of individual declarators, where each declarator is either a bare name (u) or a name with an initialiser (z = 300). The grammar is: int name1, name2 = val2, name3 = val3;. Each initialiser belongs exclusively to the name it immediately follows. There is no provision anywhere in the C grammar for a second comma-separated list of values on the right-hand side.

What the parser actually did when it hit your line: The C parser is a state machine. It read int and entered declaration mode. It read u — valid declarator, no initialiser. It read , — separator, expect another declarator. It read z — valid declarator. It read = — initialiser follows. It read 300 — valid integer literal, initialiser for z complete. Then it read , — at this point the parser expected either another declarator name (an identifier) or a semicolon to end the declaration. It got 400 — a number, which cannot be a declarator name. That is the exact source of "expected identifier or '('". The '(' in that message refers to function declarator syntax like int foo().

Why the second error says "expected ';' at end of declaration": Once the parser failed at column 19, it attempted error recovery. It looked at what it had so far (int u, z = 300) and suggested that if you placed a semicolon right there — ending the declaration at that point — the line up to that position would be syntactically valid. The ,400; that follows would then be a comma expression statement, which is valid C even if useless. So the suggestion was technically correct but not what you wanted at all.

Your inference from this: "We are allowed to declare the variable as well as initialise it at the same time." — Yes, and this is important. C has always supported declaration with initialisation as a single statement. This is different from Rust's let in that C does not require a keyword — just type name = value;. The syntax you were looking for that actually works is int u = 300, z = 400; — one int, two declarators, each with their own initialiser.

Error 2 → errors/declarationv2.c.error

int u,z = 300,200;; — the semicolon did not help, and it confirmed something important

declarationv2.c — exact source

#include <stdio.h>
int main(void) {
    int v = 100;
    int u,z = 300,200;;  // added semicolon after LSP instruction...
                           // let's check this, what happens?
    printf("%d\t%d\t%d\n", v, u, z);
}

clang output — exact

clang declarationv2.c -o declarationv2
declarationv2.c:7:19: error: expected identifier or '('
    7 |     int u,z = 300,200;;
      |                   ^
declarationv2.c:7:19: error: expected ';' at end of declaration
    7 |     int u,z = 300,200;;
      |                   ^
      |                   ;
2 errors generated.

// your inference: "This means we cannot assign like Python or Rust does.
// But also: u,z = 300 → we are allowed to assign multiple variables the same value."

// Deep Explanation — why the semicolon changed nothing and what your inference got right and wrong

Why the semicolon did not fix it: The error occurs at column 19 — the position of the , after 300. This is inside the declaration, before any semicolon you could add at the end. The parser fails at that comma regardless of what follows it. Adding a second ; at the end of the line only added a second empty statement after the first — it did not move the parse failure point.

Your inference "u,z = 300 → we can assign multiple variables the same value": This inference was actually wrong, and v3 proved it. int u,z = 300 does not assign 300 to both. The = 300 binds only to z. In C there is no multi-target initialiser. The way to assign the same value to multiple variables at declaration is: int u = 300, z = 300; — explicitly writing the value for each. The only context where you can assign the same value to multiple variables in a single statement in C is using assignment chaining: u = z = 300; — but this only works as an assignment expression, not inside a declaration.

What clang's diagnostic system was doing across v1 and v2: Clang's error messages are designed to be precise about location. The caret ^ points to the exact token that caused the failure. The suggested fix (the ; hint with the pipe character) is clang's FixIt system — it proposes an edit that would make the code syntactically valid from that point forward, even if the result is not semantically what you intended. FixIt suggestions are useful for typos and simple mistakes, but they can mislead when the underlying intent is architecturally incompatible with the grammar.

Error 3 → errors/declarationv3.c.error — Wrong Assumption + Real Danger

int u,z = 300 — u printed 0, but it was not "defaulted". It was undefined behavior.

declarationv3.c — exact source

#include <stdio.h>
int main(void) {
    int v = 100;
    int u, z = 300;  // added semicolon after LSP instruction...
    printf("%d\t%d\t%d\n", v, u, z);
}

actual output

100    0    300

// your note: "int u,z = 300 defaults to u = 0, z = 300. Why is that so? whats the matter?"

// Deep Explanation — uninitialised variables, the stack, and undefined behavior

Why only z got 300: int u, z = 300 is parsed as a declaration list with two declarators: u with no initialiser, and z with initialiser 300. In C grammar, an initialiser belongs to the declarator it immediately follows. There is no shared or broadcast initialiser. The compiler allocates stack space for both u and z, writes 300 into z's slot, and leaves u's slot untouched.

u did not "default to 0" — this is the critical correction: The C standard is explicit: reading an uninitialised automatic (stack) variable is undefined behavior. The standard literally says the program can do anything — produce any output, crash, corrupt data, or behave inconsistently across runs. On your machine, at that moment, the bytes at u's stack address happened to be zero — because the operating system zero-fills new pages before giving them to a process (a security measure to prevent one process reading another's memory). But this is an OS implementation detail, not a C language guarantee. On the second call of main, or after any function call that leaves non-zero bytes on the stack, those bytes would be different.

What the stack actually looks like at that moment:

stack frame for main() after "int u, z = 300" executes:

address    name   contents       notes
0x...ffa0  v      0x00000064     ← 100 decimal, set by "int v = 100"
0x...ffa4  u      0x????????     ← UNINITIALISED — whatever was here before
0x...ffa8  z      0x0000012c     ← 300 decimal, set by "= 300"

printf reads u's slot and prints its contents as %d.
You saw 0. A different run could show any 32-bit integer.

Why it happened to be 0 on your run: The Linux kernel zero-fills all memory pages before mapping them to a process (syscall mmap with MAP_ANONYMOUS sets pages to zero). Your program's stack lives on pages that were freshly allocated at process start and had not yet been written to by any function call above main in the call chain. The C runtime startup code (crt1.o) that calls main does use some stack, but on x86_64 it had not written to the particular 4-byte slot that became u at your stack pointer offset. Different programs, different compilers, different systems, different results.

The threats this creates in real code: Uninitialised variables are one of the most common sources of security vulnerabilities in C programs. If a security-sensitive value (a permission flag, a buffer length, a pointer) is read before being written, an attacker who can influence the stack layout — through crafted input that controls stack usage in prior function calls — can control what value gets read. This class of bug has been the root cause of real CVEs in production systems.

How to catch this: Compile with -Wall — gcc will warn warning: 'u' is used uninitialized in simple cases. For cases the compiler misses, compile with -fsanitize=undefined and the UBSan runtime will catch the read and print a detailed report. Valgrind's Memcheck tool with --track-origins=yes will also detect it and trace the uninitialised value back to its source. The simplest defence is a standing rule: every local variable is initialised at the point of declaration, always. int u = 0; costs exactly zero runtime overhead on any modern compiler — it becomes a single mov instruction.

Working → declarationv4.c

Separate int x = val per line — correct, clean, verbose

declarationv4.c — exact source

// let me try initialising C variables in rust way, will it work?
// the K and R does not mention this.

#include <stdio.h>

int main(void) {
    int v = 100;
    int u = 300; // After going through the tedious process of variable
    int z = 200; // declaration I realised a fact.
    printf("%d\t%d\t%d\n", v, u, z);
}

// is there a good way to declare variables?
// int v, u, z; v = 100; u = 300; z = 200; → valid? Going to check in v5.
// using int 3 times = 3 extra keywords. Huh.

output

100    300    200

// Deep Explanation — the three valid declaration forms and the trade-offs between them

All three forms that do what you wanted, and when to use each:

// Form 1 — one declaration per line (what v4 does)
int v = 100;
int u = 300;
int z = 200;
→ most readable. each variable's intent is on its own line.
  preferred by Linux kernel style, LLVM style, most modern guides.

// Form 2 — declaration list with individual initialisers (the C idiom)
int v = 100, u = 300, z = 200;
→ concise. all three are guaranteed initialised.
  used in K&R throughout. valid and clean.
  this is what you were trying to write in v1 and v2.

// Form 3 — declare first, assign later
int v, u, z;
v = 100;  u = 300;  z = 200;
→ only use when initial values come from computation.
  dangerous because there is a window between declaration
  and assignment where the variables are uninitialised.

Your observation about typing int three times: It is a real trade-off, and it is the reason C programmers often prefer Form 2 for related variables. The extra keystrokes in Form 1 buy you something important though: each line is a complete, self-contained statement. When you are reading code months later or debugging, being able to see a variable and its initial value on a single line without scanning a comma-separated list is valuable. Form 1 also makes it trivial to add a comment explaining why a specific variable has a specific initial value — something much harder to do cleanly in Form 2.

What the compiler generates for all three forms is identical: Three mov instructions writing the constants 100, 300, and 200 to their respective stack offsets. The compiler does not care which form you use — the object code is byte-for-byte the same. This is a pure readability and maintenance decision.

Working → declarationv5.c

Everything collapsed to one line — valid C, and the first deep note on #include

declarationv5.c — exact source

#include <stdio.h>
// What is the purpose of this include? This is a C preprocessor directive,
// which copies the contents of stdio.h from the default lib location into
// this file in static memory during compilation.
// <> = search system include paths first.
// "" = search current directory first, then system paths.
// What happens without this? Check in v6.

int main(void) { int v = 100; int u = 300; int z = 200; printf("%d\t%d\t%d\n", v, u, z); }

// Ah, the valid and ugly syntax. I mean this is not a syntax anymore.
// Just raw instructions separated inside a function by `;`
// just like precious bash scripts. C is essentially a bash script! (just kidding.)
// But tbh, even bash has a syntax, but here this seems something easy?

// Deep Explanation — why C ignores whitespace, what #include actually does, and the bash comparison

Why v4 and v5 produce identical object code: The C compiler's first pass is the lexer — it reads the raw character stream and converts it into a sequence of tokens: keywords (int, return), identifiers (main, printf), literals (100), operators (=), and punctuation ({, ;, }). Whitespace characters — spaces, tabs, newlines — are used only as token separators when needed to prevent two adjacent tokens from merging (int v must have a space so it is not lexed as the identifier intv). Beyond that, all whitespace is discarded. The parser never sees it. The compiler produces the same AST (abstract syntax tree) from both versions, and therefore the same machine code.

Your note on #include was accurate and precise: The preprocessor runs before the compiler. It opens /usr/include/stdio.h and performs a textual substitution — inserting the entire file content at the location of the #include directive. The result is a single expanded source file that is then handed to the compiler. The difference between <> and "" is exactly as you described: angle brackets tell the preprocessor to search only in the system include directories (/usr/include, /usr/local/include, and paths passed via -I). Quotes tell it to look in the directory containing the current file first, then fall back to the system paths. You would use quotes for your own header files: #include "mylib.h".

Your "C is a bash script" observation: There is genuine structural similarity worth thinking about. Both C and shell use ; as a statement terminator and {} as block delimiters. Both were designed in the same era (early 1970s Unix) and share stylistic DNA. The key difference is execution model: a shell script is interpreted line by line at runtime. A C program is compiled to machine code before it ever runs — the CPU executes the binary directly, with no interpreter in the middle. The one-liner formatting looks like shell, but what gets executed is completely different: machine instructions, not text processing.

Intentional Error → errors/declarationv6.c.error

#include commented out — printf is undeclared. The preprocessor is not optional.

declarationv6.c — exact source

// #include <stdio.h>   ← intentionally commented out
// What happens without this preprocessor include?
// Will printf not be available? Check in v6.
int main(void) { int v = 100; int u = 300; int z = 200; printf("%d\t%d\t%d\n", v, u, z); }

clang output — exact

declarationv6.c:7:57: error: call to undeclared library function 'printf'
      with type 'int (const char *, ...)'; ISO C99 and later do not
      support implicit function declarations [-Wimplicit-function-declaration]
    7 | int main(void) { ... printf("%d\t%d\t%d\n", v, u, z); }
      |                       ^
note: include the header <stdio.h> or explicitly provide a declaration for 'printf'
1 error generated.

// your note: "attempt to call undeclared library function.
// Or we should at least declare the function. Then let me declare that empty function."

// Deep Explanation — what #include copies, what the compiler actually needs, and the history of implicit declarations

What stdio.h actually contains: It is not magic. Open it yourself: cat /usr/include/stdio.h. You will find — among other things — a line roughly like: extern int printf(const char *__restrict __format, ...) __attribute__((__format__(__printf__, 1, 2)));. This is a function declaration — it tells the compiler: printf is a function that accepts a const char pointer and a variable number of additional arguments, and returns an int. The compiler needs this to type-check your call, to verify the number and types of arguments, and to know how to set up the call on the stack/in registers. The actual implementation of printf — the code that does the formatting and writes to stdout — lives in libc.so.6 on your system, linked in by the linker at the end. The header is just the declaration. The library is the implementation.

Why ISO C99 removed implicit function declarations: In original K&R C (1978) and C89, calling an undeclared function was legal. The compiler would implicitly assume the function returned int and accepted whatever arguments you passed. This caused two classes of silent bugs. First: if the function actually returned a pointer (64-bit on x86_64), but the compiler assumed int (32-bit), the 64-bit return value was silently truncated — the upper 32 bits lost. Dereferencing that truncated pointer would crash or corrupt memory. Second: with no prototype, the compiler could not check argument types or count, so passing the wrong number of arguments of the wrong types was silently accepted and produced undefined behavior at runtime. C99 made implicit function declarations a hard error. Clang inherits this. The flag -Wimplicit-function-declaration in the error message names the category of diagnostic.

The compiler's note: "include the header or explicitly provide a declaration": This is precisely the two-path fork you took. The first path — include the header — is what every normal program does. The second path — explicitly provide a declaration — is what you tried in v7, and it opened up something interesting.

Key Discovery → declarationv7.c — Function Shadowing + First Valgrind Run

Defining a stub printf() — compiled with warnings, ran, printed nothing, Valgrind reported zero errors

declarationv7.c — exact source

// #include <stdio.h>   ← intentionally commented out
int printf() { return 0; }
// What happens without the preprocessor include?
// Will printf not be available? Check in v6.
// Answer from v6: errors. So let me declare that empty function instead.

int main(void) { int v = 100; int u = 300; int z = 200; printf("%d\t%d\t%d\n", v, u, z); }

gcc -Wall -Wextra -Wpedantic -std=c11 warnings

declarationv7.c:4:5: warning: conflicting types for built-in function 'printf';
      expected 'int(const char *, ...)' [-Wbuiltin-declaration-mismatch]
    4 | int printf() {return 0;}
      |     ^~~~~~
declarationv7.c:4:5: note: 'printf' is declared in header '<stdio.h>'
declarationv7.c:4:5: warning: number of arguments doesn't match built-in prototype
    4 | int printf() {return 0;}
      |     ^~~~~~
// compiled despite warnings. binary produced. ran. printed nothing. exit 0.

declarationv7.valgrind.output — both runs, exact

// run 1: valgrind --leak-check=full ./declarationv7
==18167== Memcheck, a memory error detector
==18167== Using Valgrind-3.25.1 and LibVEX; rerun with -h for copyright info
==18167== Command: ./declarationv7
==18167==
==18167== HEAP SUMMARY:
==18167==     in use at exit: 0 bytes in 0 blocks
==18167==   total heap usage: 0 allocs, 0 frees, 0 bytes allocated
==18167==
==18167== All heap blocks were freed -- no leaks are possible
==18167==
==18167== For lists of detected and suppressed errors, rerun with: -s
==18167== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 0 from 0)

// run 2: valgrind --show-error-list=all --leak-check=full ./declarationv7
==18222== HEAP SUMMARY:
==18222==     in use at exit: 0 bytes in 0 blocks
==18222==   total heap usage: 0 allocs, 0 frees, 0 bytes allocated
==18222==
==18222== All heap blocks were freed -- no leaks are possible
==18222== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 0 from 0)

// Deep Explanation Part 1 — what the gcc warnings mean, line by line

-Wbuiltin-declaration-mismatch: gcc has built-in knowledge of certain standard library functions — particularly printf, malloc, memcpy, and others. This built-in knowledge is used for optimisation: gcc can replace a simple printf("hello\n") with a direct call to puts("hello") (which is faster), or inline a small memcpy as a few register moves. To do these transformations safely, gcc must verify that the function being called actually matches its expected signature. When gcc sees your int printf() { return 0; }, it compares your declaration against its internal prototype — int printf(const char *format, ...) — and finds a mismatch. Your function takes no arguments and the real one takes a variadic list. Hence the warning.

number of arguments doesn't match built-in prototype: This is a more specific elaboration of the same mismatch. Your int printf() uses the old C syntax of unspecified arguments (which means "I accept any number of arguments of any type" — not "I accept no arguments"). The real printf is declared as variadic with .... gcc flags this because its optimisations for printf depend on the exact prototype being correct.

Why gcc warned but still compiled: These are warnings, not errors. A warning means "this is probably wrong and you should look at it," not "I refuse to continue." The code is syntactically valid C. Declaring a function named printf with your own implementation is legal in C. gcc notes the mismatch and moves on. To make all warnings into errors — which you should do in production — add -Werror to your compile flags. Then this would have stopped the build.

// Deep Explanation Part 2 — what actually happened at runtime (function shadowing)

Your stub was called instead of the real printf: When the compiler processed the call printf("%d\t%d\t%d\n", v, u, z) in main, it looked up the name printf and found your definition in the same translation unit (your .c file). It generated a call to your function. Your function returned 0 and did nothing else. No output was produced. The program exited with status 0.

How the linker handles symbol conflicts: When the linker combines your object file with the C standard library (libc.so.6), it uses a simple rule: the first definition wins. Your object file is processed before the library. Your printf symbol is found first. The linker uses it. The real printf in libc is never referenced. This is called function shadowing or symbol interposition, and it is a deliberate and powerful mechanism in the C toolchain.

Real-world uses of function shadowing: This is not a footgun — it is a tool. Unit testing frameworks use it to mock system calls: define your own malloc that tracks allocations, your own open that returns a fake file descriptor. Memory allocator replacement works the same way — the jemalloc and tcmalloc libraries replace the system malloc by defining their own symbols with the same names. Security researchers use it to audit programs by interposing on library calls. On Linux, LD_PRELOAD lets you do this without recompiling — you load a shared library whose symbols take precedence over libc at runtime.

The threat in your specific case: If you had accidentally defined a stub for malloc or free instead of printf, your program would silently use the stub — never allocating real memory, returning NULL or garbage for every allocation, and crashing in hard-to-diagnose ways. The compiler warning was your only protection. This is why -Wall -Wextra must always be on. The warning about the built-in prototype mismatch was precisely the signal that you had shadowed a known function.

// Deep Explanation Part 3 — reading the Valgrind output in full detail

The PID prefix ==18167==: Every line of Valgrind output is prefixed with the process ID of the program being analysed, enclosed in double equals signs. This lets you separate Valgrind's output from your program's stdout when both are being written to the terminal simultaneously. In your output, both runs show different PIDs (18167 and 18222) because each ./declarationv7 invocation creates a new process.

Memcheck, a memory error detector: Valgrind is not a single tool — it is a framework. Memcheck is the default tool (selected by --tool=memcheck, which is the default when you omit --tool). Memcheck instruments every memory read and write in your program, tracking which bytes have been allocated, freed, initialised, and accessed. It runs your program inside a virtual machine — which is why Valgrind-instrumented programs run 10–50x slower than native execution.

Using Valgrind-3.25.1 and LibVEX: LibVEX is Valgrind's IR (intermediate representation) compiler. It translates your program's machine code into VEX IR — a portable intermediate language — instruments it with memory checks, then re-compiles it to machine code for execution. This is how Valgrind intercepts every memory operation without modifying your source code.

HEAP SUMMARY: in use at exit: 0 bytes in 0 blocks: At the moment your program exited (returned from main and the C runtime called _exit()), Valgrind found zero bytes allocated on the heap and not yet freed. "In use at exit" means currently allocated — memory that was allocated with malloc/calloc/realloc and not yet freed. Zero here means either your program never used the heap, or everything it allocated was freed before exit. In your case: your stub printf does nothing, which means libc's stdio never initialised its internal buffers (which are heap-allocated). No heap usage at all.

total heap usage: 0 allocs, 0 frees, 0 bytes allocated: The lifetime total across the entire run. Zero allocations, zero frees. This confirms that your stub printf prevented any libc stdio activity — the real printf would have triggered at minimum one malloc call to set up the stdio buffer on first use. The fact that this shows zero is itself a diagnostic: it tells you your stub successfully intercepted the call.

All heap blocks were freed -- no leaks are possible: A clean bill of health. Since there were zero allocations, there cannot be any leaks. This line would change to a detailed leak report if any malloc'd memory was not freed before exit.

For lists of detected and suppressed errors, rerun with: -s: Valgrind has a suppression system — certain known benign issues in system libraries (particularly libc and the dynamic linker) are pre-suppressed so they do not pollute your output. The -s flag shows you what was suppressed. In production debugging you would check this to ensure nothing important is being hidden.

ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 0 from 0): The final verdict. Zero memory errors detected in zero distinct code locations (contexts). The second set of zeros — suppressed: 0 from 0 — means Valgrind did not suppress anything either. A truly clean run. Your second invocation with --show-error-list=all added that flag to also display any suppressed errors — and still got zero, confirming the result.

Why Valgrind reported perfectly clean results for a broken program: This is the important lesson. Valgrind reported zero errors because your stub printf genuinely produced no memory errors — it allocated nothing, freed nothing, and accessed no invalid memory. But the program was still wrong: it printed nothing when it should have printed three numbers. Valgrind measures memory safety, not program correctness. A program that does nothing has no memory errors — but it also does nothing. Valgrind clean does not mean the program is correct. It means the program has no memory-safety violations.

Conclusions — seven experiments, five things established

What this session proved

Question	Answer
Can you initialise multiple variables with a comma-separated value list like `int u,z = 300,400`?	No. C has no such syntax. The comma separates variable names, not values. The correct form is `int u = 300, z = 400;`
Does `int u,z = 300` assign 300 to both u and z?	No. Only z gets 300. u is uninitialised. Reading it is undefined behavior — the 0 you saw was coincidence, not a language guarantee.
Does whitespace matter to the C compiler?	No. v4 and v5 produce byte-for-byte identical object code. The lexer discards all non-essential whitespace before parsing begins.
What happens if you remove `#include <stdio.h>`?	The compiler errors: printf is undeclared. The header provides the function declaration the compiler needs to type-check the call. The implementation lives in libc.
Can you replace a library function by defining one with the same name?	Yes. Your definition shadows the library version. The linker uses your symbol first. Valgrind reported zero errors — because your stub did nothing, including no I/O and no heap activity.

The one thing to fix going forward: In v3, your note said u "defaulted to 0." It did not default — it was undefined. The rule from here on is: every local variable is initialised at the point of declaration, every time. int u = 0; costs nothing at runtime — the compiler emits a single mov instruction — and eliminates an entire class of bugs that have caused real security vulnerabilities in production C code. If you do not have a meaningful initial value yet, use 0 or -1 as a sentinel and document why.

1.4 — #include Internals · Exploratory

Not a K&R exercise. A direct experiment to prove what #include actually does at the file system level — initiated by one question from 1.3: if <> searches the system and "" searches the current directory first, what happens if you copy the real stdio.h locally and modify it? Two versions, one renamed file, one custom function injected into a system header, and a clean proof that file extensions mean nothing to the C preprocessor.

#include mechanics preprocessor <> vs "" file extension convention include guards function definitions in headers declarations vs definitions libc symbol resolution

working · v1

hello_again.c + stdio.h (modified)

Copied real stdio.h locally. Added try_func() definition at the bottom. Included with "stdio.h" (quotes). Both try_func() and printf worked.

key discovery · v2

hello_again.c + stdio.c (renamed)

Renamed stdio.h → stdio.c. Changed include to "stdio.c". Still worked. Proved: the .h extension is a human convention, not a compiler rule.

Background — the question this was answering

From 1.3: you already knew "" searches locally. So what if you own the local copy?

In 1.3 you established that #include <stdio.h> uses the system copy and #include "stdio.h" searches the current directory first. The natural next question: if you physically place a file named stdio.h in your working directory and include it with quotes, the preprocessor will use yours instead of the system's. And if it pastes the file in verbatim, you can put anything in it — including your own function definitions. 1.4 is the proof of concept for that entire chain of reasoning.

Working · v1 — hello_again.c + modified stdio.h

Copy the real stdio.h, inject a function, include it with quotes — it works

hello_again.c — exact source

#include "stdio.h"

int main(void) {
  try_func();
  printf("This is the real hello world...\n");

} // What about renaming this header to stdio.c? what happens?
  // try that in v2

tail of local stdio.h — the appended function

/* ... 978 lines of real stdio.h ... */

#endif

__END_DECLS

void try_func(void) {
  printf("Hello World from anonymous?\n");
}


#endif /* <stdio.h> included. */

terminal — compile and run

❯ gcc -Wall -Wextra -Wpedantic -std=c11 -o hello_again hello_again.c
❯ ./hello_again
Hello World from anonymous?
This is the real hello world...

Deep Explanation — what the preprocessor did, step by step

The compile invocation: When you ran gcc ... hello_again.c, the first thing gcc launched was cpp — the C preprocessor — on your source file. The preprocessor scans the file top to bottom looking for lines beginning with #.

The include directive with quotes: It hit #include "stdio.h". The double-quote form tells the preprocessor: search for stdio.h starting in the directory containing the current source file (Basics/1.4/v1/). It found stdio.h there — your local copy. It opened it, and performed a textual substitution: the entire content of that file was inserted inline, at exactly the position of the #include line. The #include directive itself disappeared.

What "entire content" means: All 985 lines of your modified stdio.h — including every macro definition, every type alias, every extern function declaration, and your appended try_func definition — were pasted into the translation unit before the compiler saw a single token of your actual code. By the time the compiler started parsing, from its perspective, try_func was already defined above main.

Why printf still worked: Your local stdio.h is the real system header — you copied it verbatim from /usr/include/stdio.h. It still contains the real extern int printf(const char *__restrict __format, ...) declaration. The compiler saw that declaration (from your local copy), type-checked your printf call correctly, and emitted a call instruction. At link time, the linker found printf's actual implementation in libc.so.6 as it always does — that part is unaffected by which copy of the header you used. The header only provides the declaration. The library provides the implementation.

Why you placed try_func after __END_DECLS: The real stdio.h wraps its entire content between __BEGIN_DECLS and __END_DECLS. These are macros that expand to extern "C" { and } respectively when compiled as C++ (to suppress C++ name mangling on C function names). In a pure C compilation they expand to nothing. Appending after __END_DECLS keeps your function outside that block — a reasonable placement, though in a C-only context it makes no functional difference.

Definition — #include, the preprocessor, and the two include forms

What #include actually is, and what the angle-bracket vs quote distinction does

the two forms

#include <stdio.h>   // angle brackets — system include paths only
                         // searches: /usr/include, /usr/local/include,
                         // and any -I paths passed to gcc
                         // does NOT search the current directory

#include "myfile.h"  // double quotes — current directory first
                         // searches: directory of the source file being compiled
                         // then falls back to the same paths as <>
                         // use this for your own headers

Deep Explanation — the preprocessor as a pure text transformer

The preprocessor is not a compiler: It does not parse C syntax. It does not understand types, expressions, or control flow. It is a stream processor that reads text, responds to directives (lines beginning with #), and outputs transformed text. The three things it does: file inclusion (#include), macro substitution (#define), and conditional compilation (#ifdef / #if / #endif). That is the entire job. The output — a single merged text file — is what the actual C compiler (cc1, inside gcc) receives.

#include as literal file paste: When the preprocessor encounters #include "stdio.h", it performs a single operation: open the file, read every byte, and emit those bytes into the output stream at the current position. There is no parsing, no validation, no type-checking. If you put a Python script in that file, the preprocessor would paste it in without complaint — the compiler would then fail to parse the result. The preprocessor does not care what is in the file. It only cares that the file exists and is readable.

See it yourself: Run gcc -E hello_again.c — the -E flag stops compilation after preprocessing and dumps the expanded output to stdout. You will see your ~985 lines of stdio.h pasted in, followed by your try_func definition, followed by main. That is exactly what the compiler parses. This is one of the most educational things you can do with any non-trivial C file.

The search algorithm in full detail: For #include "file", the preprocessor searches in this order: (1) the directory containing the file that contains the #include directive — not the current working directory of the shell, but the directory of the source file. (2) the directories of any file that included that file (following the chain of inclusions upward). (3) the same system paths that <> uses. For #include <file>, only step (3) applies. This means if you have a header at src/utils/mylib.h that includes "config.h", the preprocessor looks for config.h in src/utils/ first — regardless of where you invoked gcc from.

Definition — Include Guards

The #ifndef / #define / #endif pattern at the top of every real header

top and bottom of stdio.h — the include guard

/* line 24 of stdio.h */
#ifndef _STDIO_H          // "if _STDIO_H is not yet defined..."
#define _STDIO_H  1       // "...define it now, so next time this check fails"

/* ... entire header content ... */

#endif                     // "end of the conditional block"
/* line 985 — last line of stdio.h */

Deep Explanation — why include guards exist and what happens without them

The problem they solve: Large C programs have many source files, each including multiple headers, and those headers may include other headers. It is entirely common for stdio.h to be included — directly or indirectly — five or ten times in a single translation unit after all the inclusions are resolved. Without a guard, the preprocessor would paste the full contents of stdio.h five or ten times into the output. This causes redefinition errors: the compiler sees typedef ... FILE; declared twice, sees extern int printf(...); twice, and errors out.

How the guard works: The first time the preprocessor encounters #include <stdio.h>, _STDIO_H is not defined. The #ifndef _STDIO_H condition is true, so the block executes. The first thing inside the block is #define _STDIO_H 1 — it immediately defines the macro. All the real header content follows. The #endif closes the block. The second time the preprocessor sees #include <stdio.h> in the same translation unit, _STDIO_H is now defined. The #ifndef condition is false. The entire block — all 960-odd lines — is skipped. Zero redefinitions.

Why your appended try_func was outside the guard: You placed it after __END_DECLS but there is a subtlety: look at the end of the real file — the final #endif is the one that closes the #ifndef _STDIO_H guard. Anything you append after that final #endif is outside the guard and will be included every single time the header is pasted in. In a multi-file project where two source files both include your modified stdio.h, you would get two definitions of try_func and the linker would error: multiple definition of 'try_func'. For a single-file experiment it works fine, but this is exactly why you do not put function definitions in headers in real code — only declarations.

Modern alternative — #pragma once: Many compilers including gcc and clang support #pragma once as the first line of a header as a shorthand for the entire ifndef/define/endif pattern. It is not in the ISO C standard (it is a compiler extension), but it works on every compiler you will realistically use on Linux. System headers use the guard pattern because they must be portable to compilers that do not support pragmas.

Definition — Declarations vs Definitions

The most important distinction in C: what a header should contain, and why

declarations vs definitions — annotated

// DECLARATION — tells the compiler a thing exists and what type it is
// no storage allocated, no code emitted, can appear many times
extern int printf(const char *__format, ...);  // from stdio.h
extern int some_global;                         // variable declared elsewhere

// DEFINITION — the actual thing: allocates storage or emits code
// must appear exactly once across the whole program
void try_func(void) {                           // function definition
  printf("Hello World from anonymous?\n");      // has a body — emits machine code
}
int some_global = 42;                          // variable definition — allocates 4 bytes

Deep Explanation — the one definition rule and why it matters for headers

The One Definition Rule (ODR): In C, a function or variable may be declared as many times as needed — the compiler treats each declaration as a promise that the thing exists somewhere. But it must be defined exactly once across the entire program. The linker's job is to match every reference to a name with exactly one definition. If it finds two definitions of the same name, it errors: multiple definition of 'try_func'. If it finds zero, it errors: undefined reference to 'try_func'.

Why headers contain only declarations: A header is included in potentially many .c files. Each .c file is compiled separately into its own .o object file. If the header contained a function definition, that definition would be compiled into every .o that included the header. The linker would then find multiple copies of the same symbol and fail. By putting only declarations in headers, you allow many files to know about a function without each of them containing the implementation. One .c file contains the definition; all others just use the declaration from the header.

What stdio.h actually contains: Open cat /usr/include/stdio.h and search for printf. You will find a line like: extern int printf (const char *__restrict __fmt, ...) __attribute__((__format__(__printf__, 1, 2))); That is a declaration, not a definition — it has no body, no braces. The extern keyword makes this explicit: "this thing exists, but not here." The actual printf implementation — hundreds of lines of formatting logic — is in the glibc source, compiled into /usr/lib/libc.so.6. The header is the menu. The library is the kitchen.

The __attribute__((__format__(...))) annotation: This is a gcc extension that tells the compiler "the first argument to this function is a printf-style format string, and the additional arguments start at position 2." This is how gcc implements -Wformat checking — the warning you saw in 1.2 when you called printf("%d\t%d\n") with no arguments. Without this attribute, gcc would have no way to know that printf's first argument controls the types of its remaining arguments. It is the same mechanism that would let you write your own printf-style logging function and get format-string warnings for it too, by applying the same attribute to your function declaration.

Key Discovery · v2 — hello_again.c + stdio.c (renamed)

Rename stdio.h → stdio.c, change the include to "stdio.c" — still compiles, still runs

hello_again.c v2 — only change is the include filename

#include "stdio.c"   // ← was "stdio.h" in v1. renamed file. still works.

int main(void) {
  try_func();
  printf("This is the real hello world...\n");

} // Ok, it works, so it : anything.h is just a representation that it is not
  // a normal c file, but a headers file which does not include any executable
  // stuffs, but only definitions and declarations of Constants and those
  // functions are actually present in another file. libc library.

terminal — compile and run

❯ gcc -Wall -Wextra -Wpedantic -std=c11 -o hello_again hello_again.c
❯ ./hello_again
Hello World from anonymous?
This is the real hello world...

// identical output to v1. the .c extension changed nothing.

Deep Explanation — why the extension is irrelevant, and what the convention actually means

The preprocessor does not inspect extensions: The #include directive takes a filename. The preprocessor opens that filename, reads it, and pastes it. It does not look at the extension to decide how to interpret the content. It does not validate that a .h file contains only declarations. It does not refuse to include a .c file. The distinction between .h and .c exists entirely in the conventions of human programmers and the tooling built around those conventions — make, IDEs, linters — not in the compiler or preprocessor itself.

What .h communicates: The extension is a signal to people (and tools) that this file is a header: it should contain only declarations, type definitions, macros, and inline functions — things that are safe to include in multiple translation units without violating the One Definition Rule. A file named .c conventionally contains exactly one translation unit's worth of definitions — code that gets compiled once and linked. When you name a file stdio.c, you are telling every programmer who reads your project: "this file has runnable definitions in it, compile it separately." When you name it stdio.h, you are saying: "include this freely, it is safe to paste everywhere."

Your conclusion in the source comment was exactly correct: "anything.h is just a representation that it is not a normal C file, but a headers file which does not include any executable stuff, but only definitions and declarations of constants and those functions are actually present in another file — libc library." The only precision to add: the word "definitions" in that comment refers to type definitions (typedef) and macro definitions (#define), not function definitions. Function definitions with bodies should not be in headers — only declarations. That distinction is the core lesson of ODR.

When you legitimately would #include a .c file: There is one real-world pattern where this is done intentionally: unity builds (also called single-translation-unit builds). Some projects have a unity.c file that contains nothing but a sequence of #include "module1.c", #include "module2.c", etc. This compiles the entire project as one giant translation unit. The advantage: the compiler can optimize across all the included files simultaneously (no linker boundary). The disadvantage: you lose incremental compilation — any change requires recompiling everything. It is a compile-time vs runtime performance trade-off, and it only works because the files being included are written carefully to avoid ODR violations when merged.

Definition — libc and the linker's role

Where printf actually lives, and how your binary finds it at runtime

tracing printf from header to library to binary

# where is the real printf implementation?
❯ nm -D /usr/lib/libc.so.6 | grep " printf$"
000000000006f4a0 T printf
# T = text section = compiled executable code, at that address in libc

# what does the linker actually link against?
❯ ldd ./hello_again
    linux-vdso.so.1 (0x00007ffd...)
    libc.so.6 => /usr/lib/libc.so.6 (0x00007f...)
    /lib64/ld-linux-x86-64.so.2 (0x00007f...)
# your binary depends on libc.so.6 — printf is resolved from there at runtime

# see the unresolved printf reference in your object file before linking
❯ gcc -c hello_again.c -o hello_again.o
❯ nm hello_again.o | grep printf
                 U printf
# U = undefined = reference exists but no address yet. linker will fill this in.

Deep Explanation — the full journey from source to running process

The three pieces: There are three distinct things involved in using printf: the declaration (in stdio.h — tells the compiler the function exists and its signature), the call site (in your main — the compiler emits a call printf instruction with an unresolved address), and the definition (in libc.so.6 — the actual machine code that formats and writes the string). These three live in entirely different places and are connected at different stages of the build.

Static vs dynamic linking: By default on Linux, gcc links programs dynamically against libc — the printf address is not fixed at compile time. Instead, the ELF binary records a dependency on libc.so.6 (you see this with ldd). When the kernel loads your program via execve(), the dynamic linker (ld-linux-x86-64.so.2) runs first, loads libc.so.6 into the process's address space, resolves the printf symbol to its actual address in the loaded library, and patches the call site in your code before your main runs. This is why every Linux process has libc loaded — the dynamic linker handles it automatically.

Why this matters for your experiment: In v1 and v2, you used your local modified header — which changed what the preprocessor pasted in. But the linker was completely unaffected. It still found printf in libc.so.6 and resolved the call correctly. The header and the library are fully independent. You can use any header you like — as long as the declaration you give the compiler matches the actual function signature in the library, everything will work. If they mismatch (wrong argument types, wrong return type), you get undefined behavior at runtime — which is another reason why using the real, correct header matters in practice.

Conclusions — two versions, five things established

What this session proved

Question	Answer
Can you include a locally modified copy of `stdio.h` and have your additions available?	Yes. `#include "stdio.h"` (quotes) finds your local copy first. The preprocessor pastes it in — including anything you appended — before compilation begins.
Does the `.h` extension mean anything to the compiler or preprocessor?	No. Renaming to `stdio.c` and changing the include to `"stdio.c"` produces byte-for-byte identical output. The extension is a human convention, not a language rule.
What does `stdio.h` actually contain?	Function declarations, type definitions, and macros only. No executable code. The actual `printf` implementation lives in `libc.so.6`, linked at build time and loaded by the dynamic linker at runtime.
Why does putting a function definition in a header cause problems in multi-file projects?	Each `.c` file that includes the header gets its own copy of the definition. The linker finds multiple definitions of the same symbol and errors: multiple definition of 'try_func'. Headers should contain declarations only.
What are include guards (`#ifndef _STDIO_H / #define _STDIO_H / #endif`) for?	They prevent a header's contents from being pasted more than once per translation unit. Without them, multiple indirect inclusions cause redefinition errors for every type and declaration in the header.

The rule going forward: Headers contain declarations, type definitions, and macros. Source files contain definitions. The .h / .c naming convention communicates this intent. Always include system headers with <> and your own headers with "". Use gcc -E yourfile.c any time you want to see exactly what the preprocessor produced — it is the most direct way to understand what the compiler actually parsed.

CHAPTER ONE

First attempt — int main() with no return statement

// Deep Explanation — what each line does at the system level

Explicit return 0 added — why does the OS care?

// Deep Explanation — return codes, exit status, and the OS

int main(void) — explicit, modern, correct

// Deep Explanation — void, prototypes, and type safety

No return type — implicit int error from K&R era

// Deep Explanation — implicit int and why C99 removed it

gcc -Wall -Wextra -Wpedantic -std=c11 -o helloworldv2 helloworld.c

// Deep Explanation — the 4 compilation stages

What \n actually is — bytes, ASCII, and terminal interpretation

// Deep Explanation — escape sequences at the byte level

Semicolon after #include + missing ; → cascade of undeclared errors

// Deep Explanation — the cascade effect and how parsers work

printf with no arguments — reading uninitialized stack memory via ABI

// Deep Explanation — x86_64 ABI, registers, and why printf prints garbage

Proof of Concept — 200+ runs proving non-determinism via ASLR

// Deep Explanation — ASLR and why values change between runs

5*(32-f)/9 — inverted subtraction — negative results

// Deep Explanation — signed integer arithmetic and formula correctness

5*(f-32)/9 — correct formula — multiply first, then divide

// Deep Explanation — why operator order matters and fixed-point arithmetic

(5/9)*(f-32) — all zeros — proving integer truncation experimentally

// Deep Explanation — integer division at the CPU level, and floating point preview

int u,z = 300,400 — the value-list syntax does not exist in C

// Deep Explanation — why this failed and what clang was really telling you

int u,z = 300,200;; — the semicolon did not help, and it confirmed something important

// Deep Explanation — why the semicolon changed nothing and what your inference got right and wrong

int u,z = 300 — u printed 0, but it was not "defaulted". It was undefined behavior.

// Deep Explanation — uninitialised variables, the stack, and undefined behavior

Separate int x = val per line — correct, clean, verbose

// Deep Explanation — the three valid declaration forms and the trade-offs between them

Everything collapsed to one line — valid C, and the first deep note on #include

// Deep Explanation — why C ignores whitespace, what #include actually does, and the bash comparison

#include commented out — printf is undeclared. The preprocessor is not optional.

// Deep Explanation — what #include copies, what the compiler actually needs, and the history of implicit declarations

Defining a stub printf() — compiled with warnings, ran, printed nothing, Valgrind reported zero errors

// Deep Explanation Part 1 — what the gcc warnings mean, line by line

// Deep Explanation Part 2 — what actually happened at runtime (function shadowing)

// Deep Explanation Part 3 — reading the Valgrind output in full detail

What this session proved

From 1.3: you already knew "" searches locally. So what if you own the local copy?

Copy the real stdio.h, inject a function, include it with quotes — it works

Deep Explanation — what the preprocessor did, step by step

What #include actually is, and what the angle-bracket vs quote distinction does

Deep Explanation — the preprocessor as a pure text transformer

The #ifndef / #define / #endif pattern at the top of every real header

Deep Explanation — why include guards exist and what happens without them

The most important distinction in C: what a header should contain, and why

Deep Explanation — the one definition rule and why it matters for headers

Rename stdio.h → stdio.c, change the include to "stdio.c" — still compiles, still runs

Deep Explanation — why the extension is irrelevant, and what the convention actually means

Where printf actually lives, and how your binary finds it at runtime

Deep Explanation — the full journey from source to running process