Rate this book

Web Scraping with Python: Collecting Data from the Modern Web

Name: Web Scraping with Python: Collecting Data from the Modern Web
Rating: 4.17 (43 reviews)
ISBN: 9781491910269

Ryan Mitchell

Rate this book

Learn web scraping and crawling techniques to access unlimited data from any web source in any format. With this practical guide, you’ll learn how to use Python scripts and web APIs to gather and process data from thousands—or even millions—of web pages at once.

Ideal for programmers, security professionals, and web administrators familiar with Python, this book not only teaches basic web scraping mechanics, but also delves into more advanced topics, such as analyzing raw data or using scrapers for frontend website testing. Code samples are available to help you understand the concepts in practice.

Learn how to parse complicated HTML pagesTraverse multiple pages and sitesGet a general overview of APIs and how they workLearn several methods for storing the data you scrapeDownload, read, and extract data from documentsUse tools and techniques to clean badly formatted dataRead and write natural languagesCrawl through forms and loginsUnderstand how to scrape JavaScriptLearn image processing and text recognition

GenresProgrammingComputer ScienceTechnologyCodingReferenceComputersNonfiction

378 pages, Kindle Edition

First published April 25, 2015

237 people are currently reading

852 people want to read

About the author

Ryan Mitchell

99 books14 followers

What do you think?

Rate this book

Friends & Following

Create a free account to discover what your friends think of this book!

Community Reviews

5 stars

156 (38%)

4 stars

176 (43%)

3 stars

68 (16%)

2 stars

4 (<1%)

1 star

4 (<1%)

Displaying 1 - 30 of 43 reviews

karzee

37 reviews

November 16, 2016

Since I started the semester and I have been reading internet scraping and network security books.
All the books use the example of two arbitrary people Alice and Bob exchanging information.And these examples have been getting better and funnier and weirder.
Somehow,I don't know why,but it's maybe because I love reading books or I love fiction,my mind has been looking for patterns in these books between Bob and Alice.
My conclusion is that these two are government spies and are knee-deep in cover and are trying to get out important information without letting cluing in on their marks.
Also,while typing that,I lmao'd like a hundred times because I'm saying such BS.

But,this book was brilliant.The information was spot-on and wasn't repetitive.It was very helpful and it was one of the most helpful books around.

Cliff Chew

121 reviews10 followers

May 27, 2016

If you ever want to collect amounts of data off the Internet through Web Scraping, please read this book. If you have done some web scraping, this book provides extremely useful nuggets of information to further enhance your web scraping capabilities. Faced some web scrapping blocker practices? This book has a great section on how to make your scrapper look more "human"!

To balance things out, the author even included a section on the ethics of web scrapping, which is something that ever web scrapper should understand!

I rarely give 5 stars, but this book really took it all the way there. Truly a beautiful ~~soup~~ book!

Sebastian

200 reviews9 followers

March 15, 2020

This is a great text spanning most of the tools, methods and philosophies underpinning web scraping.

It's main problem is a lack of identity: is it teaching web scraping to those with one or two simple tasks, looking to just dip their toe in, or those looking to build production quality web scrapers for large scale tasks? As such it jumps to and fro in the tools it suggests. The start of the book seems lightweight and much of it is replaced by recommendations later in the text. This could be made much clearer from the start.

Having said that, Mitchell's textbook is fairly thorough on the topic, and rewards those who persevere through the start with the more nuanced sides of web scraping (multithreading/processing, solving captchas, finding APIs).

Mikhail

69 reviews12 followers

November 14, 2020

So, I was getting heavily armadildoed by the Text Mining course; luckily, Mitchell's pangolin saved me.

The book is written very nicely and covers all the imaginable subfields of scraping.

Joshua Hruzik

17 reviews7 followers

March 14, 2017

The books gives a good general introduction to BeautifulSoup (which is used for webscraping). However, the focus is too heavily skewed towards less important topics. I would have loved to get more details on BeautifulSoup functions and not about data import to csv etc. since most readers would already have some experience with these sort of tasks.

Nickolai

937 reviews8 followers

February 10, 2022

Решил прочитать эту книгу после просмотра мини-курса по скрапингу от Р.Митчелл на LinkedIn Learning. Видеокурс просто замечательный, но довольно короткий, поэтому хотелось углубить знания. В целом, книга скорее разочаровала. Первые четыре главы были хороши, а потом многое испортилось. Основных проблем две, но они затрагивают почти все последующие главы. Во-первых, автор останавливается на темах, лишь косвенно относящихся к скрапингу, например nltk, обработка pdf и doc файлов. Во-вторых, многие действительно интересные и нужные темы раскрываются лишь мазками, предлагая читателю дальнейшее их самостоятельное изучение. В итоге из всей книги нашел для себя полезными не более сотни строчек кода. А остальное придется почерпнуть где-нибудь еще.

Giacomo Debidda

29 reviews

October 19, 2020

Good introductory book on web scraping, but needs an update.

This book does a really good job describing the main techniques and strategies for web crawling and web scraping. Unfortunately, most of the technologies and libraries used in this book are quite outdated today, so if you want to follow the exercises you will need to use different libraries (which might not necessarily be a bad thing).

Sean

4 reviews1 follower

May 17, 2016

A solid overview of web scraping with python. Python is currently the most widely used language for web scraping, and this book gives an overview of how to do it. There are minor errors throughout the text, but the author stated she will fix them in the next edition. If you want a book to read through on scraping rather than exercising your Google search skills, this is the book to get.

Hadiana Sliwa

68 reviews8 followers

March 14, 2021

Great book for those who want to learn about scraping data from internet and ethics behind it.

The book is not for beginners completely but a bit of background in python or programming in general would work.

Leonardo

Author 1 book80 followers

November 12, 2018

Excelente libro, completo y bien explicado. Creo que puede ser una buena iniciación al scraping para cualquiera que tenga un poco de conocimiento de Python. Me sorprendió que los temas que cubre fueron casi exactamente a los que me fui enfrentando por mi cuenta tratando de resolver los problemas que se me presentaban a la hora de buscar información en internet. Hubiera sido de gran ayuda arrancar por acá, aunque tal vez no hubiera entendido nada si hubiera sido así.

Es de gran ayuda la página web propia del libro, y el GitHub con el código.

Es un poco autobombo de los otros libros de O'Reilly, pero realmente parecen valer la pena. Me quedo con ganas de leer más sobre scraping, big data, meterme con algo de machine learning (incluso llegar alguna vez a deep learning). También me anima a leer alguna vez de corrido algún libro sobre VBA. También me queda pendiente algo más de NLP y entender mejor el MySQL. Como sea, fue un buen pantallazo.

Creo que me faltaron ideas concretas de en donde aplicar lo aprendido. Pero creo que cuando me enfrente la próxima a un problema real voy a estar mejor parado, con más ideas desde las que partir.

own-digital

Vikram

15 reviews7 followers

May 31, 2019

This book contains wisdom and methods that have been refined by the author after having to webscrape for what might be years. The starting few chapters of the book, while introducing new things, can often feel like a cookbook, which the author finds is a concise way to write code to minimise the work. While those snippets of code can be a boon for some, for me, they took away the creativity of coding. But I will go back to see them once I have had years of experience in scraping to realise what value they hold.

The second half of the book deals with topics I had never imagined could be a webscraping book. And they are amazing and opens up your mind to the extent of possibilities you can go obtain that data that you desire. I think this book would have been perfect if there were code exercises to solve after all relevant chapters.

Lord Farquaard

8 reviews

September 16, 2022

Genuinely useful book that can still teach basic HTML webscraping, the underlying healthy practices and serve as an introduction to more advanced topics. So it's still worth picking up. However, since its release it's become annoyingly outdated.

PhantomJS was discontinued in 2017, thus Selenium (covered and used with PhantomJS in this book) no longer supports it, and to therefore download it one must step through a few more hurdles. Personally I just keeled over to the headless Chrome driver which seems to have emerged since the release of this book.

The syntax for Selenium has also changed, so the examples involving it won't work without modification - which defeats the purpose of learning it from this book - because by the time you've learnt the correct syntax for Selenium you wouldn't need the text anyway.

Kerszi

227 reviews1 follower

February 12, 2020

Przydatna książka, w której jest opisane, jak sprawnie wyciągać dane ze stron www.
Do ekstrakcji danych autorka głównie się skupia na bibliotekach beatiful soup i selenium języka Python.
Przy okazji poznajemy wyrażenia regularne i sposoby łączenia/zapisywania/itd z bazą MySql.
Na koniec jest opisana w bardzo ciekawy sposób legalność ekstrakcji danych.
Książkę będę traktować jako pomoc w swoich projektach.

Polecam.

Mikhail

345 reviews6 followers

July 29, 2021

Формат: Книга Язык: Английский
Прочитал книгу в рамках расширения скиллов в Питоне. Наверно как пособие через такую призму книга не очень релевантна, но для расширения кругозора по инструментам и технологиям скрэппинга сайтов вполне интересна. О потраченном на нее времени не пожалел, хотя пока не уверен пригодятся ли мне новые знания на практике.
К перечтению - возможно, если потребуется обновить знания или вспомнить детали по некоторым библиотекам указанным в книге.

This entire review has been hidden because of spoilers.

Ethan Swan

66 reviews

March 17, 2021

For someone with Python skills but a limited understanding of and skill in web scraping, this is a fantastic book. It covers the basics of a huge range of techniques (HTML wrangling, web APIs, headless browsers, testing) and also comes with some thoughtful discussions, such as the ethics of web scraping. Highly recommend.

aaron

37 reviews6 followers

January 10, 2025

good introduction or rather along the way reading. it seems like the author would share much more but maybe in a youtube video or substack post where freedom of speech is more applicable. here they have to remind too often what good boys and girls we should be while scraping
was reading it with web scraping by chapagain

Emilia Greendevald

57 reviews

September 14, 2025

Scraping data from Google News can be extremely valuable for business intelligence. This article https://groupbwt.com/blog/how-to-scra... helped me understand how to approach it efficiently. Companies can benefit from scraping google news to monitor industries and competitors. I found the advice here practical and actionable. GroupBWT explained it very well.

Byrne

73 reviews44 followers

March 22, 2017

A nice introduction to the basics of scraping. Reading this before your first scraping project will probably save you a lot of time and frustration--it's basically a compendium of the basics plus everything you wouldn't know how to search Stack Overflow for. It covers the basics (just grabbing simple HTML and parsing with BeautifulSoup) and touches on more advanced topics (using a headless browser like PhantomJS to parse modern, AJAX-y pages).

If you're more experienced, I'd recommend flipping through it quickly to see if you spot anything you didn't already know. It filled in a few gaps for me.

Johnny

617 reviews12 followers

March 5, 2023

Probably the best book on web scraping currently available. It not only covers how to handle HTML, but also binary formats like PDF and Word. There are many cautions on how to not shoot in your foot with an automated script that will help you a lot.

Mohamed Nijadi

7 reviews1 follower

September 27, 2024

I read it while doing a project and it really gave new perspectives and insights that helped me tweak my scrapers as I was reading more.
I recommend this book, especially if you have a bit of knowledge about the tools you are using but never done any medium to large projects.

technical

Loc Nguyen

2 reviews

December 8, 2017

Good book for learning web scraping quickly.

algo-trading

Akash Nidhi P S

41 reviews3 followers

January 30, 2018

A decent book to intro to webscraping, gives highlevel overall view of the webscraping world.

Ferhat Culfaz

273 reviews18 followers

February 5, 2018

Good introduction to web scraping giving you all the tools and relevant libraries you need depending on your application.

Ed Terrell

511 reviews27 followers

April 26, 2018

Well written, hands on analysis of how the web works and how to extract information from it--even when it appears in multiple sites and multiple forms. Very inciteful!

2018

Hasan Basri AKIRMAK

27 reviews9 followers

June 8, 2018

Practical guide

Practical guide on scraping tools, libraries for text and image data processing as well as do’s don’t do’s for a project.

Marcus Österberg

Author 9 books15 followers

October 5, 2018

Bra bok men lite irriterande att det slutliga kodexemplet av ngrams inte fungerar (också kollat bokens kod på Github utan framgång).

analytics

ana silva

6 reviews1 follower

March 20, 2019

A really good introduction to web scraping with Python, this book has saved me a lot time writing my first scraping project. (Also, loved the War and Peace references).