Java Web Scraping Handbook

Learn advanced Web Scraping techniques

by Kevin Sahin

DescriptionTable of ContentsDetailsHashtagsReport an issue

Book Description

Web scraping or crawling is the art of fetching data from a third party website by downloading and parsing the HTML code to extract the data you want. It can be hard. From bad HTML code to heavy Javascript use and anti-bot techniques, it is often tricky. Lots of companies use it to obtain knowledge concerning competitor prices, news aggregation, mass email collect.

This book will teach you how to extract data from any website, how to deal with AJAX / Javascript heavy websites, break captchas, deploy your scrapers in the cloud and many other advanced techniques.

This open book is licensed under a Creative Commons License (CC BY). You can download Java Web Scraping Handbook ebook for free in PDF format (4.7 MB).

Table of Contents

Chapter 1
Introduction to Web scraping
Chapter 2
Web fundamentals
Chapter 3
Extracting the data you want
Chapter 4
Handling forms
Chapter 5
Dealing with Javascript
Chapter 6
Captcha solving, PDF parsing, and OCR
Chapter 7
Stay under cover
Chapter 8
Cloud scraping

Book Details

Computer Science
PDF Size
4.7 MB

Book Hashtags

Related Books

Python Notes for Professionals
The Python Notes for Professionals book is compiled from Stack Overflow Documentation, the content is written by the beautiful people at Stack Overflow....
The HTML Handbook
HTML, a shorthand for Hyper Text Markup Language, is one of the most fundamental building blocks of the Web. This handbook is aimed at a vast audience. - First, the beginner. I explain HTML from zero in a succinct but comprehensive way, so you can use this book to learn HTML from the basics. - Then, the professional. HTML is often considered l...
R Notes for Professionals
The R Notes for Professionals book is compiled from Stack Overflow Documentation, the content is written by the beautiful people at Stack Overflow....
Introduction to Data Science
The demand for skilled data science practitioners in industry, academia, and government is rapidly growing. This book introduces concepts and skills that can help you tackle real-world data analysis challenges. It covers concepts from probability, statistical inference, linear regression and machine learning. It also helps you develop skills such a...
Learning R
R is a programming language and free software environment for statistical computing and graphics. It is an unofficial and free R ebook created for educational purposes. All the content is extracted from Stack Overflow Documentation, which is written by many hardworking individuals at Stack Overflow....
Modern Web Development on the JAMstack
Learn how to run your web projects - everything from simple sites to complex applications - without a single server. It's possible with the JAMstack, a modern web development architecture for deploying fast, highly-scalable sites and applications that don't require traditional origin infrastructure. This practical report explains how the JAMstack d...