Ambercite Searching: An unfair advantage? - A patent office comparison

June 11 2019: Ambercite has been often compared to semantic search engines. Both apply algorithms to predict similar patents. In the case of semantic searching, the algorithms look for similarities in language to the starting patent, while in the case of Ambercite the algorithms look for similarity using citation analytics.

At Ambercite we believe that citation analytics give great results because experts such as patent examiners are better at recognising similarity than computers, and we have previously provided data to support this. However, we recognise that many readers would also like to see independently prepared data - and luckily for us, Dr Burkhard Schlechter from Austrian Patent Office has recently provided such data*.

(*See “Semantik und AI - auf dem Weg zu einer effizienteren Suche?”, proceedings of Patinfo 2019, Ilmenau, published in June 2019. This title translates to “Semantics and AI - on the way to a more efficient search?”)

The comparison that the Dr Schlechter performed included the following steps:

1) Dr Schlechter identified seven inventions, in different technical areas, and carried out traditional (keyword and class code) patent searching for these seven inventions. Altogether these seven patent searches produced 8,000 patents (in total), which were manually reviewed for relevance to the inventions.

These 7 searches were based on the invention disclosed in the following patents. Click on any of these patent numbers to see an interactive listing of the most similar 25 patents, as suggested by Ambercite.

DE102014216582, Spark plug with multi-ground electrode

EP2961250A1, Assembly for emitting light with led and support element

AT516096A1, Keyboard for a musical instrument

DE3037619A1, Bedridden patient leg-exercising instrument - comprises tilting footrest on support subjected to tension load

EP1152305A1, Analog clock with date display

AT517376A1, Locking Unit

DE102016204957A1, Automated lane change in the dynamic traffic, based on driving dynamics constraints

2) For these seven traditional patent searches, Dr Schlechter identified a total of 38 X or Y citations (relevant to novelty or inventive step) and a total of 60 A citations (defining the state of the art without being particularly relevant).

3) Dr Schlechter then ran patents searches in Ambercite and 6 semantic based AI search engines, listed below, identifying the 25 most similar patents for each search engine (thereby returning 175 patents in total).

The results are provided below:

 

  Criteria

Ambercite

Innovation-Q

IPScreener

Teqmine

Octimine

Incopat

1) # of patents found by AI search engine, that were in the right technical area

Out of 175

123

119

101

64

80

98

%

70%

68%

58%

37%

46%

56%

2) # of patents found by AI search engine, also found in the traditional search

Out of 175

65

49

20

14

4

0

%

37%

28%

11%

8%

2%

0%

3) # of patents found by AI search engine, also found in the traditional search as X or Y citations

Out of 38 X or Y citations

19

6

3

1

1

2

%

50%

16%

8%

3%

3%

5%

4) # of patents found by AI search engine, also found in the traditional search as A citations

Out of ~60 A citations

24

23

8

2

5

14

%

40%

38%

13%

3%

8%

23%

5) # of new X citations found by AI search engines, that were not found by the traditional search

3

2

0

1

0

0

These results can be understood in the following way, at least in relation to the Ambercite results:

1) 70% of patents found by Ambercite were in the right technical area. This was a higher proportion than the other AI engines.

2) 37% of patents found by Ambercite that were also found in the traditional search. Again, this was a higher proportion than the other search engines. Of course, traditional patent searching can be quite inefficient in that a large number of ‘false positives’ are produced. As evidence of this, only 38 out of the 8000 patents founds in the traditional search were rated as X or Y citations.

3) Ambercite found 50% of the 38 patent identified as X and Y citations in the traditional patent search. This proportion was much higher than the other search engines evaluated, and is a good result considering just how fast Ambercite is (a search can take minutes) compared to traditional searching (which can take many hours).

This result is displayed in graphical form below:

Ambercite comparison.JPG

4) Ambercite found 40% of the ~60 patents identified as A citations in the traditional patent search. Remember that A citations are not judged to be strongly relevant, so this lower result may not mean that much. Having said that, this proportion was still higher than the other search engines.

5) Ambercite was able to find 3 new X citations not found in the traditional search. This also was a higher number than the other search engines. This is to my my mind is a key result. R

X-citations.JPG

This to my mind is a key result. Remember that these three 3 additional X citations, i.e. documents relevant to novelty, were found in only the first 25 patents listed in the Ambercite searches. Had the search been expanded to display 50, 100 or even more, (up to 2,000) more X citations would have been found, (albeit that Amberlite’s scoring and ranking system, would most likely show the most relevant patents in the top 25 to 50 patents displayed.

But this initial list is only part of the full Ambercite search experience. Ambercite now makes it very easy to run Augmented Searching, shown in the links below. Augmented searching allows you to easily use your initial findings to rapidly find further highly relevant patents. So in other words, any new X citations found in the first 25 results are only the start of the new relevant results you can find in Ambercite.

Does Ambercite provide an unfair advantage?

In his paper, Dr Schlechter noted the leading performance of Ambercite, but wondered if it was fair to compare the Ambercite approach to the semantic search engines, as the semantic search engines can search from text as well as patent numbers.

“By also doing an intellectual search mostly with citing and cited documents, naturally, the individual criteria at AMBERCITE deliver better results.”

This is a fair point, but this depends on the context of the search. If a searcher is looking for prior art to an invention only defined by a description, a text based engine can help. Having said that, amany users of Ambercite firstly find some relevant starting patents using conventional patent searching, and then use Ambercite to grow the data set by finding relevant patents missed by conventional searching.

However, in many cases a searcher does have a relevant patent - for example their own patent, a patent they are trying to invalidate, or other relevant patents that they are already aware of.

And in these other cases, it should not matter how Ambercite finds its results, in the same way that it does not matter that Usain Bolt towers over his competitors in the 100m sprints:

UsainBolt.JPG

It should only be the out-performance that matters.

What does this mean to patent searchers?

At Ambercite, we are not opposed to traditional searching - it certainly has its place. However it can take many hours, and may not give a complete listing.

In contrast, Ambercite searching can take just a few minutes, and still supply results missed by traditional searching - while outperforming semantic search engines. Not only that, it can be highly complementary to traditional patent searching - a combination of the two approaches, used sequentially, can be very powerful indeed.

Given increasing demand on patent searchers and examiners to produce high quality results in a short time frames - wouldn’t it make sense to you to use Ambercite to strengthen your search results?

 
Mike Lloyd