Ask your question


Web Data Web Data

What Is Web Data?

Web data encompasses various information drawn from web sites and apps about the connections between web sites and their visitors as well as information about the amount of time people spent on the sites, and what they did on them.

Where Does Web Data Come From?

Web data comes from internal and more external sources. Internal sources are trackers like cookies and website and app analytics. External sources include web scrapers, canvas fingerprinting, keyword search data for a geographical area, and more.

What Types of Columns/Attributes Should I Expect When Working with This Data?

Most websites and software programs have website analytics programs already available for you. These measure number of site visitors, click-through rate from off-site ads or connected social media pages, amount of time spent on the site, the demographics of the user, and so on. 

Other typical attributes of this data include IP address or mobile device ID.

What Is This Data Used For? 

This data is typically used by companies to measure their marketing campaigns and audience reach. However, it can also be used to conduct market research and assess the health of a website or app.  

How Should I Test the Quality of Web Data?

Many web hosting services provide analytics already, but to test the data quality yourself, just make sure your dataset is complete, accurate, relevant, and updated frequently. If you use web scraping tools, make sure they do not overwhelm the websites you are scraping data from, as that may get your tool blocked. 

You can also follow the example of Leadbook, which takes a random sample of their data every quarter and manually checks the information to be sure it meets minimum expectations of accuracy. Leadbook: Our Data 

Interesting Case Studies and Blogs to Look Into

General Information:

Wiley Online Library: Web Data: The Original Big Data – Taming the Big Data Tidal Wave

Nature: How we learnt to stop worrying and love web scraping

Analytics Insight: Driving Business Intelligence Through Web Data Mining

Case Studies: 

Lifesight: Top notch jewellery brand reaches 25M+ and a total visit of 10k+

Leadbook: Our Customers: Engage at scale

StartApp Custom Audiences Drive Interest in a Major University Graduate Program

Tangible Examples of Impact

Non-profit The Markup has developed an online tool called Blacklight to check web user monitoring programs. By entering a URL into Blacklight, you can scan for signs that your data is being collected by the site. 

The American Genius: This tool reveals how websites are tracking your data

With the Do Not Tracking setting in browsers becoming increasingly useless, web browsers are increasingly flexing their privacy credentials Apple’s Safari browser has boosted its anti-tracking tech and Firefox has blocked trackers by default since 2018.

“Google Chrome is also planning on getting rid of third-party cookies. However this won’t happen until 2022 and there are still significant questions about how the change will be implemented.”

Wired: These Chrome extensions protect you against creepy web tracking

“A  year [before the outbreak of deadly riots related to food shortages], on January 12, 2010, a tech startup posted an article on its blog: “Yemen heading for disaster in 2010?” The author, “Ninja Shoes”, wrote: “Based on the information we’ve gathered, Yemen will likely experience food shortages and torrential floods in 2010. This combination of natural disasters, propensity for famine and malnutrition, and challenges with Islamic radicals and terrorists, make it a hot spot for conflict in the future.”

Wired: The news forecast: Can you predict the future by mining millions of web pages for data?

Relevant datasets

Quintelligence Web-Audience Analysis

by Quintelligence

Quintelligence Web-Audience Analysis delivers website audience segmentation data to companies. In addition, Quintelligence runs predictive analytics to enable specific audience targeting.

0 (0)   Reviews (0)

QuinStreet Customer Acquisition

by QuinStreet

QuinStreet Customer Acquisition tracks customer clicks, inquiries, and registrations to power customer acquisitions for their clients

0 (0)   Reviews (0)

QuestMobile TRUTH Databases


Quest Mobile TRUTH Databases provides four different datasets and analytical tools that track and measure Chinese apps and social media.

0 (0)   Reviews (0)