Citations vs Citations - How to find patent citations missed by other databases - and avoid double counting

Summary

Patent citations and citation data is available from a number of different sources. But how reliable is this data? Are citation links double counted? What about the effect of family members?

In this blog we look at these questions and compare the Ambercite approach to some other well known databases. We show how Ambercite can give both more direct citation links that these other sources - and yet avoid the errors and extra time caused by the double counting of citations links seen in these other database.

We do this for four case studies, being:

  1. A granted US patent, filed by a US applicant
  2. A granted US patent, filed by a non-US applicant
  3. A patent family without any US family members
  4. Patent families with US family members, but from the perspective ofnon-US patents

Background

Patent citations are increasingly being recognised as a valuable piece of information for a patent. Many people judge the quality of a patent by its forward citation count, while backward citations are carefully reviewed to see if the patent is truly valid. Reviewing forward citations can help identify monetization options. And reviewing both can quickly help similar patents to relevant patents you have found.

In addition, the Ambercite patent search tool Cluster Searching goes one step further and uses citation data to find similar patents not already listed as citations, and then all ranking all citations found.

Given this importance of patent citations, you would think that there would be some sort of agreement of what the citations are for a given patent.

One would think that, but the reality is there can be a surprising range of opinions, depending on the patent database you use.

To demonstrate this , we have prepared four cases studies for some representative patents, both US and non-US.  We have chosen slightly older patents as they would have been around long enough to accumulate a reasonable number of both forward and backward citations, but we would expect the same principles to apply to more recent patents. 

Case study #1 - US granted patent, filed by US applicant

The first patent to be considered is US6212066, filed by Apple in 1999 for a Portable computer with removable keyboard.

We looked up this patent in the following range of patent databases, and recorded the number of prior art and forward citations. For these citations, we also recorded if the citation was a US record or an international record.

The databases we looked at were:

  • Ambercite Cluster Searching
  • Google Patent
  • Espacenet
  • Patentlens
  • USPTO

The results are shown in the table below. The number of prior art citations is consistent at 9 from every data source.  But the number of forward citations ranged from 35 up to 63, a significant variation.

Known citation search based on US6212066

Ambercite Cluster Searching

Google Patent

Espacenet

Patentlens

USPTO

# of prior at citations (number of non-US citations)

9 (0)

9(0)

9(0)

9 (0)

9 (0)

# of forward citations (number of non-Us)

35(1)

63 (5)

46 (1)

63 (5)

35 (0)

So what is causing this big variation in forward citation count?

There are two factors affecting the number of citations shown in this example:

1) The first factor is the jurisdictions of the citations found.  The USPTO database shows backward citations to other jurisdictions, but not forward citations to patents outside of the US.  In contrast, the remaining databases do show citation connections to patents in other jurisdictions.

As an example of the effect of this, one of the forward citations we found was CN10220082, filed for a Notebook computer with detachable keyboard and movable host. This was identified as a forward citation by Cluster Searching, Google, Espacenet and Patentlens, but not by the USPTO.

2) The second factor is the concept of a 'patent' record. Google, Patent Lens and sometimes Espacenet treat a patent application and its granted version as being separate records - but self-evidently they are the same patent. As an example Google lists US patent application 20080030934 and US7643278 as being separate records, but in fact the second patent is the granted form of the first patent. 

Ambercite Cluster Searching goes further. The data is this database is further de-duplicated by combining all of the members of a EPO "simple family" (a group of patents sharing the same priority documents) into a single record. We do this because we believe that this leads to a higher quality of analysis by avoiding counting the same invention (because a simple family is one invention) more than once. But this leads to a decision of which representative patent to include - which tends to be the earliest granted US patent in each simple family.

This can have a surprising impact on the results. For example, in the Espacenet results the forward citations W02014164470 and US9229486 are counted as separate records. But in fact they are not - the are simply different members of the same patent family.

This can also affect the number of non-US patents shown. For example, the Google patent forward citations includes the record EP2631753 - but the Ambercite data does not recognise this is as a separate as non-US record as this is in the same patent family as US9025323, which is in the list of Ambercite forward citations. Ambercite is very careful to avoid double counting - thereby providing more accurate records as well as removing the need to review what turn out to be separate members of the same patent family.

Of interest though, neither the family member US9025323 or its application number US20130222993 are recognised as forward citations in the Google list.

These two factors combine to lead to Cluster Searching having fewer forward citation records than some other databases - but this due to the factor it was careful to only one show one forward citation per family, i.e without the extra work of double counting different patents in shared patent families.

So this is one example of how different patent databases can give different citation records from a given patent, in this case from a US patent filed by a US company.

Case study #2 - US granted patent, filed by non-US applicant

But what about a US patent filed by a non-US applicant, for example a Japanese company?  The next example is US6520045. which was filed by Toyota in 2001 for a the arrangement of pedals in a car. 

Of note, this patent has other family members filed in Japan and Germany.

We will apply the same analysis as before:

Known citation search based on US6520045

Ambercite Cluster Searching

Google Patent

Espacenet

Patentlens

USPTO

# of prior art citations (number of non-US citations)

9 (1)

10 (0)

10 (1)

10(1)

10 (1)

# of forward citations (number of non-US citations)

28 (8)

26 (0)

11 (4)

27 (1)

10 (0)

Again we see big differences, and in some cases for the same reasons as before. But in this data, we see an additional factor come in.

This third factor is that the Ambercite results include a much higher number of non-US forward citations. An example of such a citation is DE202005004272, filed for a Adjustable position pedal mechanism for motor vehicle... This patent is not found in the list of Google forward citations, for example, because it is a forward citation from the German family member of the Toyota patent, and not the US member.

There are similar reasons for the other additional non-US citations found in the Ambercite data, including forward patents filed in China,Korea and Japan that were not found in some other lists of forward citations. 

 

Case study #3 - Patent families without US family members

What if a patent family has no US family members? An example of this, consider EP0741529B1, filed by Adidas in 1995 for a Elastomer midsole shoe structure.

This has recognised family members in Australia and Germany, a WO patent application, but no US family member.  Its citation count from different databases is shown below:

Known citation search based on EP0741529B1

Ambercite Cluster Searching

Google Patent

Espacenet

Patentlens

# of prior art citations

13

None listed

0*

0

# of forward citations

15

None listed

7

0

The * on the backwards citation count is because if you were to search in Espacenet for patents linked to the patent application EP0741529 A1 (as opposed to B1), it would have 8 backward citations. But no forward citations. 

So in this case, Ambercite provides citation links unavailable from these other databases.

 

Case study #4 - Patent families with US family members, but from the perspective of  non-US patents

The final example concerns a patent family with a combination of US and non-US family members  - but this time from the perspective of a non-US family member. We might choose EP0879230, filed by GlaxoSmithKline for a Optically Active Phenyl Pyrimidine Derivative As Analgesic Agent.

Known citation search based on EP0879230B1

Ambercite Cluster Searching

Google Patent

Espacenet

Patentlens

# of prior art citations

2

None listed

0

0

# of forward citations

13

None listed

0

0

So again a big difference between the Ambercite results, and those available from the data sources. 

I should note that, for example, if we instead looked the granted US family member US6124308 Espacenet would show 3 backward citations (2 in one patent family) and 7 forward citations. This is useful to know – but not as useful, if for example you are one of our many clients who use Cluster Searching outside of the US, as being able to run such a search from any family member and end up with the same result, regardless of the family member searched on. 

 

Discussion - Confirming the benefits of the Ambercite approach

These are some great case studies of the benefits of the Ambercite approach to finding citations, namely that

  • It finds citations missed by other services, particularly non-US citations 
  • It does so without double counting citations, providing more accurate citation counts and avoiding the need to review the same patent family more than once - saving you valuable time
  • It can be searched from any family member.

Also, unlike these other services the Ambercite citations are ranked in terms of order of likely similarity.

But this is regarding direct citations, or as we like to say 'known citations' because in theory these citations are known to other databases - although this blog  shows that this is not always the case. 

Ambercite also, within the same results, also supplies a long list of 'unknown' or indirect citations, and these can also be highly relevant. And these unknown citations  also have exactly the same benefits as above compared to patents listed by other database.

 

All of which can improve the results of your patent searching while reducing the time spent on it

 

>>>>>>>>>>>>>>

I should note that this blog was catalysed by a comment by a contact along the lines of 'I can get citation data in other sources'. Hopefully this blog has instead shown you that is not always the case, not by a long way in some of the above case studies.