Mastering SuperComputing for AI: Core Principles, Systems, and Large-Scale Deep Learning
Format:
Paperback
En stock
0.47 kg
Sí
Nuevo
Amazon
USA
- This book grew out of frustration.For years I watched people write elegant models, clean training loops, and perfectly reasonable code—only to be surprised when everything slowed to a crawl, costs exploded, or scaling simply stopped working. The problem was never the math, and rarely the framework. It was the invisible machinery underneath. The part no one talks about until something breaks.Supercomputing for AI is my attempt to pull that machinery into the light.This is a book about what actually happens when an AI system runs at scale. Not in theory, not in benchmark slides, but in real environments with real constraints: memory that fills up sooner than expected, networks that refuse to keep up, synchronization that quietly eats your speedup, and energy budgets that matter long before you hit your accuracy target. It treats neural networks not as abstract objects, but as large numerical workloads that must survive contact with hardware, schedulers, clusters, and time.You’ll learn how training behaves when it leaves your laptop and enters GPUs, multi-node systems, and supercomputers. You’ll see why some models scale smoothly while others stall, why doubling resources often buys less than you think, and how design decisions made early—batch sizes, parallelism choices, data layout—reverberate through the entire system. Large language models are treated not as magic artifacts, but as demanding computational workloads whose costs, limits, and tradeoffs can be reasoned about.The perspective here is deliberately systems-first. Frameworks like PyTorch and tools like CUDA, MPI, SLURM, containers, and distributed runtimes appear not as black boxes, but as execution engines with specific behaviors. You’ll learn how to read those behaviors, predict them, and work with them instead of fighting them. Optimization, in this context, is not about clever tricks—it’s about understanding where time, memory, bandwidth, and energy are actually going.This material has been shaped by years of teaching, debugging, and running experiments on everything from single GPUs to national-scale supercomputers. Many of the examples were born from failures: jobs that crashed overnight, training runs that cost too much for what they delivered, and systems that looked powerful on paper but disappointed in practice. Those experiences are baked into the explanations, because that’s where real understanding comes from.You do not need access to a supercomputer to benefit from this book. Many examples run on a single GPU or in environments like Google Colab, and scale outward when resources allow. The emphasis is never on size for its own sake, but on judgment—knowing when scaling helps, when it hurts, and when it simply shifts the bottleneck somewhere else.If you are a student trying to understand how modern AI systems actually run, an engineer moving beyond single-node training, a researcher working with large models and reproducibility concerns, or an instructor looking for material that reflects the real state of the field, this book was written with you in mind.This is not a guide to writing code faster.It is a guide to understanding what your code becomes once it is unleashed on real hardware, under real constraints, at real scale—and how to make informed decisions when compute turns into capability.
IMPORTÁ FACIL
Comprando este producto podrás descontar el IVA con tu número de RUT
NO CONSUME FRANQUICIA
Si tu carrito tiene solo libros o CD’s, no consume franquicia y podés comprar hasta U$S 1000 al año.