Java Web Scraping Handbook

Java Web Scraping Handbook

Learn advanced Web Scraping techniques

by Kevin Sahin


Book Description

Web scraping or crawling is the art of fetching data from a third party website by downloading and parsing the HTML code to extract the data you want. It can be hard. From bad HTML code to heavy Javascript use and anti-bot techniques, it is often tricky. Lots of companies use it to obtain knowledge concerning competitor prices, news aggregation, mass email collect.

This book will teach you how to extract data from any website, how to deal with AJAX / Javascript heavy websites, break captchas, deploy your scrapers in the cloud and many other advanced techniques.

This open book is licensed under a Creative Commons License (CC BY). You can download Java Web Scraping Handbook ebook for free in PDF format (4.7 MB).

Report an issue

Table of Contents

Chapter 1
Introduction to Web scraping
Chapter 2
Web fundamentals
Chapter 3
Extracting the data you want
Chapter 4
Handling forms
Chapter 5
Dealing with Javascript
Chapter 6
Captcha solving, PDF parsing, and OCR
Chapter 7
Stay under cover
Chapter 8
Cloud scraping

Book Details

Computer Science
PDF Size
4.7 MB

Book Hashtags

Related Books

Modern Web Development on the JAMstack
Modern Web Development on the JAMstack

by Mathias Biilmann, Phil Hawksworth

Learn how to run your web projects - everything from simple sites to complex applications - without a single server. It's possible with the JAMstack, a modern web development architecture for deploying fast, highly-scalable sites and applications that don't require traditional origin infrastructure. This practical report explains how the JAMstack d...

RESTful Web Services
RESTful Web Services

by Leonard Richardson, Sam Ruby

You've built web sites that can be used by humans. But can you also build web sites that are usable by machines? That's where the future lies, and that's what RESTful Web Services shows you how to do. The World Wide Web is the most popular distributed application in history, and Web services and mashups have turned it into a powerful distributed co...

Eloquent JavaScript
Eloquent JavaScript

by Marijn Haverbeke

JavaScript lies at the heart of almost every modern web application, from social apps like Twitter to browser-based game frameworks like Phaser and Babylon. Though simple for beginners to pick up and play with, JavaScript is a flexible, complex language that you can use to build full-scale applications. This much anticipated and thoroughly revis...

The JavaScript Way
The JavaScript Way

by Baptiste Pesquet

Love it or hate it, JavaScript is avidly eating the world of software development. From web sites and apps to servers, smartphones and connected objects, JavaScript is everywhere. It has evolved from a niche scripting tool crafted in a few days into a modern, multi-purpose language sitting on top of a rich ecosystem and a vibrant developer communit...

Web Page Size, Speed, and Performance
Web Page Size, Speed, and Performance

by Terrence Dorsey

Consumers prefer fast, no-nonsense web experiences, yet reports show that the top 2,000 retail websites have grown increasingly bigger and slower over the past three years. In this O'Reilly report, content strategist Terrence Dorsey examines why web pages have become so fat, and offers guidelines to help your company reverse the trend. Bigger sc...