How to Cluster Search

Cluster Searching is remarkably simple and fast, and will soon have you asking why you have wasted hours and hours with more conventional search methods.

To learn more, you can check our 'How to' videos - or follow the more detailed 4 step process below:

These steps will be explained below, including in reference to a case study. There are a series of sections in this explanation:


1) Define your search objective

Cluster Searching can be applied for a variety of purposes, including

  • Invalidation
  • Patentablilty
  • Portfolio review
  • Freedom to operate
  • Patent landscaping.

Case studies for some of these applications are found here.


2) Choose suitable seed patents

The input for a cluster search are the seed patents. These can come from a variety of sources:

  • Patents being evaluated or licensed
  • A preliminary or comprehensive search in more conventional patent search engines - yes, we are happy to recommend that you use both types of search engines in your projects. 
  • Known or relevant citations to the patents you are interested in - because Cluster Searching will take you a long way beyond these.

Patents can be applied for a wide range of countries, but for some countries, you need to add the kind code, as shown below:



Finland Denmark Poland Austria
China Japan South Korea Belgium
Netherlands Norway Bulgaria Slovakia

Why these countries? Unfortunately these countries have patent numbering systems where a given patent number can lead to more than one invention, depending on the kind code. 


Case study: To illustrate how Cluster Searching works, we will take a sample search as an invalidation for three patents previously discussed in a blog post, namely three US patents successfully litigated by SmartFlash against Apple. These patents are

  • US7334720
  • US8118221
  • US8336772

These claim inventions that control access to media content (Apple was alleged to infringe these patents with their iTunes system). In this case, we will start our search with these three patent numbers, all which share a priority date of 25 October 1999.


3) Run the Cluster Search

The Cluster Searching Entry box is shown below. In this example, we will limit the results to the most similar 50 patents, but we could set limits between 5 and 1000. Also in this example we will reduce the weighting (influence) of the last patent to 0.5 (or 50% of the normal weighting of a patent). And because all three patents have priority dates of Oct 25 1999, we will limit results to patents filed before November 2009.



It is possible to enter patent number in a variety of formats,




Note that the weighting (the number after the patent number) is optional - in the absence of a weighting, Cluster Searching will assume a weighting of 1.

You can enter up to 200 seed patents in a single query.


4) Review the results

The results for the query look like the image below.  The first table lists the seed patents you used (the top table), and then lists the most similar patents we found, ranked in order of similarity. 

There is a lot of valuable information and tools provided within these tables to help you extract the most from these results, which is discussed below. In the image above, information has been highlighted with a blue circle. Only the top 3 results are shown in this particular image.

The different symbols mean

A - The top list shows the five seed patents that we used, and their weightings.

B - The bottom lists show the most similar patents we found to the seed patents. It is possible to click on any of patent numbers and this will display more patent details.

C- The AmberScore values of both the seed and found patents are listed. AmberScore is our citation based predicted of patent importance - a higher value suggest greater importance. The average AmberScore value for granted US patents  less than 20 years old is 1.0

D - The second column in the results shows the predicted similar score of the patents found to the seed patents. A higher value suggests greater similarity

E - Note that the results have the date limitation imposed in the query

- The First column shows whether a found patent is a direct patent citation of any of the seed patents ('known'', because this is an already known citation) or not a direct citation ('unknown')

G - The first three columns allow you to like a patent (click on it to change the colour ), tell you if the patent is new to the search session, or allow you to hide patents you do not wish to see again. We will explain this in further detail below



This list is relatively self-explanatory, but in addition there are some tools to help your analysis. These tools are shown below.


These tools are: 

1 - The top list (or the bottom list) can be downloaded to Excel, and from there imported into any other application you choose. If you manipulate this tables using the tools discussed below, only the manipulated results will be downloaded.

2 - Each of these blue links (circled) will open up further patent details in a pop-up box

3 - Each of these red links will open up the patent in AmberScope,  so that you can gain a visual understanding of its network (A subscription to AmberScope will be required to explore this network in detail)

4 - Selecting any column heading will allow you to sort by the column heading. For example, the patents have been sorted by the Owner field in the image below.

5 - Each of these little 'antenna' type filters provide key filter for text fields, and number or date filters for number fields. An example is shown below for the Title field, where the filter has been set up to only show patent titles that contain the word "Network" and do not contain the word "Fiber".

6 -You can "Like" individual results if these meet your search objectives

7 -You can hide individual results - and these will not be shown in subsequent searches (until you refresh your browser)



Smart ways of reviewing the results - because it is what you do with the information that counts...

Lets have a look again at the results.

The top ranked three patents appears to be from the same applicant (note the that one of the seed patents had this applicant). So we might be able to ignore these in some circumstnaces.

The next two patents (#4 and #5 ) appear to be related to electronic ticketing - they might contain useful prior art, or they may not.

This includes US5457746 for a System and method for access control for portable data storage media. This discloses: 

Through this system and method, the secure periodic distribution of several different sets of data information to the end user is achieved with access control selectively performed by at the user's site through communication with the billing/access center.

This looks quite relevant to me, which is why it was picked up the examiner. But what if you were looking for prior art that had not been picked up by any examiner? - and therefore could be used for reexamination purposes.

This would be an easy query to run.  We could set the sort function on the the citation results so that the 'unknown' (not previously cited) reults are shown first. We could then re-sort the results from the top rank to the bottom:

Of these results, note US5050213 for a Database usage metering and protection system and method . This discloses: 

a database access system and method at a user site which permits authorized users to access and use the database and absolutely prevents unauthorized database use and copying. The present invention also provides a facility for measuring usage of the on-site database for the purpose of billing the user according to the amount he has used the database, and for periodically conveying the measured usage information to the database owner (or his agent)--while preventing the user from tampering with the measured usage information.

This looks pretty relevant - and just took 4 simple steps to find:

1) Define the search objective

2) Choose Seed Patents

3) Run Cluster Search

4) Review results


Saving and re-using results 

Feedback from a number of clients and prospective clients has been that they take the confidentiality of their search processes very seriously. We respect these wishes, and for this reason:

  • Ambercite will in no way save either any client query or the results
  • However the flip-side of this is that users of Cluster Searching are 100% responsible for saving the results - by downloading the results into Excel, as discussed.

Once the results are downloaded in Excel, they can be either saved or copied or imported into any other application you choose.


Known limitations 

Cluster Searching is very powerful, but like any process has its limitations. These are the known limitations:

Running queries

  • A current maximum of 200 seed patents in a single query
  • A current maximum of 2000 results


Patent numbers and data coverage

  • Patent input numbers must be compatible with those used in Espacenet. If in doubt, trying searching for the same patent number in Espacenet - if the query does not run in Espacenet, it will not work in Cluster Searching (in fact there are a couple of exceptions to this rule, but this is a good fallback position), 
  • For the above reason, Cluster Searching will not process US serial application numbers, such as 'US12/357,343. Use the application publication number ('US2005013174') or grant numbers ('US7970907') instead.


Are the results reliable?

We have extensively tested Cluster Searching and strongly believe, that in terms of the overall proportion of top ranked patents (say the top 100) that are relevant to the seed patents, that Cluster Searching will comfortably outperform both conventional and semantic searching processese in many cases. You should spend less time reviewing irrelevant patents compared to keyword and class code searching. However, there are natural limitations where:

  • Relevant patents have not been recognised by any patent examiners of applicants that have filed patents connected to your seed patents
  • The seed patents are not as carefully chosen as they could be

We strongly recommend that you use Cluster Searching in conjunction with other searching process for the most reliable results.


Have you picked up a limitation not listed? Or have an idea for improvement? 

Please let us know - we will greatly appreciate the feedback.