Ask your question


What Is Legal Case Data?

Legal data is the collection of data and metadata about legal matters. This includes information on cases, judges, jurisdictions, and industries. The ability to process and analyze massive amounts of data as close to instantaneously as possible has profound implications for the profession.

Where Does Legal Case Data Come From?

This data is publicly available, though it is also often difficult to access, with most legal data published in law books. However, some jurisdictions have begun to publish legal cases digitally. Additionally, volunteers have digitized historical legal data for public access, such as the Caselaw Access Project.

Additional legal data may include interviews with lawyers and judges.

What Types of Columns/Attributes Should I Expect When Working with This Data?

Common attributes of legal data range from the case name, docket number, court and decision date, decision, and jurisdiction. However, the core data is the actual text of the cases themselves.

Furthermore, there is obvious overlap with industry-specific data. For example, a firm specializing in malpractice suits needs access to healthcare and EMS/EHS data while a firm representing a mining company needs access to mining, minerals, and environmental data.

What Is This Data Used For?

Generally, it is lawyers, paralegals, clerks, and other professionals who use this data as references. The development of translation, image search, speech-to-text, and other features assist them in this.

However, emerging artificial intelligence programs enable legal professionals to use data in other ways. For example, AI programs can analyze judges’ decisions and predict their rulings, suggesting approaches arguments that lawyers may find successful.

These AI programs can also enable law firms to conduct competitor and market analysis by comparing their successes against others in the same field or area.

How Should I Test the Quality of Legal Case Data?

There is little quality control to do for legal data; human beings have reviewed the cases countless times before publication. It is generally only in the digitization or database creation process that mistakes appear.

For public, open-source databases, mistakes are easily remedied due in part to the volume of database users worldwide. You should, however, check other databases to make sure they’ve been cleansed properly. Consider the data vendor’s reputation and focus on the completeness, consistence, and relevancy of the data more than update frequency as cases take years to reach any conclusion.

Interesting Case Studies and Blogs to Look Into

Caselaw Access Project
Lexis Legal Advantage

Tangible Examples of Impact

“On the corporate legal team side, we’ve seen groups clean and aggregate data from multiple systems to speed reporting. We’ve seen leaders set up data models and reports that are tailored to specific business units and executive stakeholders for added awareness and transparency. We’ve seen groups use analytics to “sort” or “triage” matters by risk to enhance the sourcing of work. We are seeing some interesting work around reserve setting and using data to more accurately drive those reserves. We’ve seen some pioneers create models to forecast and predict spend and outcome on matters.”

Data Science + Law: An Interview with LexPredict

Relevant datasets

Kadaster Data

by Kadaster

Geospatial and geographic data make up the majority of Kadaster Data. However, attendant categories like land use, legal ownership, property, and even international development data also appear.

0 (0)   Reviews (0)

AI Internet Data

by AT Internet

With a focus on marketing and product development, AI Internet Data helps businesses mine and manage data. They primarily track customer journey, web, event, and other online data and offer resources and training for business professionals. They also offer specific products for marketers, product developers, and executives.

0 (0)   Reviews (0)

Premise Sentiment Data

by Premise

Premise Sentiment Data tracks online, survey, and in-store behavior to mark when customers make the decision to purchase products or walk away

0 (0)   Reviews (0)

Similar Data Providers

  • The Arabesque GroupThe Arabesque Group
    5 (1)
    Reviews ()
    Data sets (4)
    Established in 2013, the Arabesque Group is a leading global financial technology company that combines AI with environmental, social and governance (ESG) data to assess the performance and sustainability of corporations worldwide. In addition to their Asset Management consultation service, the groups offers Arabesque S-Ray GmbH and Arabesque AI Ltd. datasets.
  • Black Box Intelligence Consumer IntelligenceBlack Box Intelligence Consumer Intelligence
    5 (1)
    Reviews ()
    Data sets (0)
    Black Box Intelligence Consumer Intelligence is designed to provide detailed analysis on individual competitor sales and performance data.
  • Home by VendigiHome by Vendigi
    4.3 (3)
    Reviews (1)
    Data sets (1)
    Home by Vendigi provides audience data for all things home buyers, remodelers, and sellers. Their data comes from first-party sources like top multiple listing systems (MLSs) major brokers like RE/MAX, Coldwell Banker, Century 21, and Sotheby's. Users of Vendigi's Home data range from home and garden retailers to insurance institutions to telecom companies.