Ask your question


Web Data

What Is Web Data?

Web data encompasses various information drawn from web sites and apps. Typically, this tracks the connections between web sites and their visitors, including the amount of time people spent on the sites, and what they did there.

Where Does Web Data Come From?

Web data comes from internal and more external sources. Internal sources are trackers like cookies and website and app analytics. External sources include web scrapers, canvas fingerprinting, keyword search data for a geographical area, and more.

What Types of Columns/Attributes Should I Expect When Working with This Data?

Most websites and software programs have website analytics programs already available for you. These measure number of site visitors, click-through rate from off-site ads or connected social media pages, and amount of time spent on the site. They also track information like user demographics, whenever the devices they use record that data.

Other typical attributes of this data include IP address or mobile device ID.

What Is This Data Used For?

Companies typically use this data to measure their marketing campaigns and audience reach. However, they also use it to conduct market research and assess the health of a website or app.

How Should I Test the Quality of This Data?

Many web hosting services provide analytics already. However, to test the data quality yourself, just make sure your dataset is complete, accurate, relevant, and updated frequently. Further, if you use web scraping tools, make sure they do not overwhelm the websites you are scraping data from, as that may get your tool blocked.

You can also follow the example of Leadbook, which takes a random sample of their data every quarter and manually checks the information to be sure it meets minimum expectations of accuracy.

Interesting Case Studies and Blogs to Look Into

Leadbook: Our Data
Wiley Online Library: Web Data: The Original Big Data – Taming the Big Data Tidal Wave

Tangible Examples of Impact

“A year [before the outbreak of deadly riots related to food shortages], on January 12, 2010, a tech startup posted an article on its blog: “Yemen heading for disaster in 2010?” The author, “Ninja Shoes”, wrote: “Based on the information we’ve gathered, Yemen will likely experience food shortages and torrential floods in 2010. This combination of natural disasters, propensity for famine and malnutrition, and challenges with Islamic radicals and terrorists, make it a hot spot for conflict in the future.”

Wired: The news forecast: Can you predict the future by mining millions of web pages for data?

Relevant datasets

LingQ Platform

by LingQ_logo

LingQ Platform is based on the Comprehensible Input theory, where students push themselves to read or hear content that they mostly understand. LingQ tracks these new words and reintroduces them on a spaced repetition system until students have truly learned them.

The LingQ Platform suits both individual students and entire classrooms at any level. It comes in the form of an app for iOS and Android and extensions for Chrome, Safari, and Firefox.

0 (0)   Reviews (0)

Google Dataset Search

by Google_logo

Google Dataset Search provides quality, continuously-updating data of all kinds for both researchers, data analysts, journalists, and the general public. They aim to enable the free and open discovery of all kinds of data and metadata in the world.

The platform also offers a Dataset Developer Page to help people add structured data to their datasets or to resolve any other problems.

0 (0)   Reviews (0)

KHIPU Networks Cyber Security Services

by KHIPU Networks logo

KHIPU Networks Cyber Security consists of twenty-five security products maintained by experts in technology all over the world. These services can be divided into two main areas: Next-generation networking and advanced cyber security.

0 (0)   Reviews (0)

Similar Data Providers

  • The Arabesque GroupThe Arabesque Group
    5 (1)
    Reviews ()
    Data sets (4)
    Established in 2013, the Arabesque Group is a leading global financial technology company that combines AI with environmental, social and governance (ESG) data to assess the performance and sustainability of corporations worldwide. In addition to their Asset Management consultation service, the groups offers Arabesque S-Ray GmbH and Arabesque AI Ltd. datasets.
  • Black Box Intelligence Consumer IntelligenceBlack Box Intelligence Consumer Intelligence
    5 (1)
    Reviews ()
    Data sets (0)
    Black Box Intelligence Consumer Intelligence is designed to provide detailed analysis on individual competitor sales and performance data.
  • Home by VendigiHome by Vendigi
    4.3 (3)
    Reviews (1)
    Data sets (1)
    Home by Vendigi provides audience data for all things home buyers, remodelers, and sellers. Their data comes from first-party sources like top multiple listing systems (MLSs) major brokers like RE/MAX, Coldwell Banker, Century 21, and Sotheby's. Users of Vendigi's Home data range from home and garden retailers to insurance institutions to telecom companies.