Ambercite Searching: How does it compare to semantic searching? - A patent office comparison

June 11 2019: Ambercite has been often compared to semantic search engines. Both apply algorithms to predict similar patents. In the case of semantic searching, the algorithms look for similarities in language to the starting patent, while in the case of Ambercite the algorithms look for similarity using citation analytics.

At Ambercite we believe that citation analytics give great results because experts such as patent examiners are better at recognising similarity than computers, and we have previously provided data to support this. However, we recognise that many readers would also like to see independently prepared data - and luckily for us, Dr Burkhard Schlechter from Austrian Patent Office has recently provided such data*.

(*See “Semantik und AI - auf dem Weg zu einer effizienteren Suche?”, proceedings of Patinfo 2019, Ilmenau, published in June 2019. This title translates to “Semantics and AI - on the way to a more efficient search?”)

The comparison that the Dr Schlechter performed included the following steps:

1) Dr Schlechter identified seven inventions, in different technical areas, and carried out traditional (keyword and class code) patent searching for these seven inventions. Altogether these seven patent searches produced 8,000 patents (in total), which were manually reviewed for relevance to the inventions.

These 7 searches were based on the invention disclosed in the following patents. Click on any of these patent numbers to see an interactive listing of the most similar 25 patents, as suggested by Ambercite.

DE102014216582, Spark plug with multi-ground electrode

EP2961250A1, Assembly for emitting light with led and support element

AT516096A1, Keyboard for a musical instrument

DE3037619A1, Bedridden patient leg-exercising instrument - comprises tilting footrest on support subjected to tension load

EP1152305A1, Analog clock with date display

AT517376A1, Locking Unit

DE102016204957A1, Automated lane change in the dynamic traffic, based on driving dynamics constraints

2) For these seven traditional patent searches, Dr Schlechter identified a total of 38 X citations (relevant to novelty or inventive step) and a total of 60 A citations (defining the state of the art without being particularly relevant).

3) Dr Schlechter then ran patents searches in Ambercite and 6 semantic based AI search engines, listed below, identifying the 25 most similar patents for each search engine (thereby returning 175 patents in total).

The results are provided below:

 

  Criteria

Ambercite

Innovation-Q

IPScreener

IPRally

Teqmine

Octimine

Incopat

Ambercite rank in results

1) # of patents found by AI search engine, that were in the right technical area

Out of 175

123

119

101

114

64

80

98

1st

%

70%

68%

58%

65%

37%

46%

56%

2) # of patents found by AI search engine, also found in the traditional search

Out of 175

65

49

20

 

14

4

0

1st

%

37%

28%

11%

 

8%

2%

0%

3) # of patents found by AI search engine, also found in the traditional search as X or Y citations

Out of 38 X or Y citations

19

6

3

3

1

1

2

1st

%

50%

16%

8%

8%

3%

3%

5%

4) # of patents found by AI search engine, also found in the traditional search as A citations

Out of ~60 A citations

24

23

8

11

2

5

14

1st

%

40%

38%

13%

18%

3%

8%

23%

5) # of new X citations found by AI search engines, that were not found by the traditional search

3

2

0

3

1

0

0

1st equal

Preferred input

Relevant patent numbers

Full text

Extended abstract

Claim 1

Full text

Full text

Abstract or full text

 

Data coverage (as at June 2019)

All patents found in Espacenet – which is virtually all important 

countries

EP, WO, US. FR, DE, IEEE full text

 

EP, WO, US full text

EP, WO, US full text

EP, WO,

US full text

EP, WO, US. DE full

 text

EP, WO, US, FR, DE, CN, JP, KR, RU

 

Some of these results are shown in graphical form below:

 
Graph1.png
Graph2.png
 
 
Graph 3.png
Graph 5.png
 

These results can be understood in the following way, at least in relation to the Ambercite results (remember that all of these results are for the first 25 patents listed by each of these engines):

1) 70% of the patents listed by Ambercite were in the right technical area. This was a higher proportion than all of the other search engines.

2) 37% of patents listed by Ambercite that were also found in the traditional search. This was a higher proportion than the other search engines. Of course, traditional patent searching can be quite inefficient in that a large number of ‘false positives’ are produced. As evidence of this, only 38 out of the 8000 patents found in the traditional searches were rated as X citations.

3) Ambercite listed 50% of the 38 patent identified as X citations in the traditional patent search. This proportion was higher than the other search engines evaluated. Some may ask - what about about the remaining 50% of patent citations, and what is else is listed by Ambercite? The other 50% of patent citations will be listed by Ambercite, just further down the list (say the top 50 patents listed) - and the other patents listed by Ambercite will be other patents suggested by Ambercite to be at least as similar as the known citations.

4) Ambercite listed 40% of the ~60 patents identified as A citations in the traditional patent search. Remember that A citations are not judged to be strongly relevant, so this lower result may not mean that much. Having said that, this proportion was still higher than the other search engines.

5) Ambercite was able to find 3 new X citations not found in the traditional search. This was as good as any of the other search engines.

This to my mind is a key result. Remember that these three 3 additional X citations, i.e. documents relevant to novelty, were found in only the first 25 patents listed in the Ambercite searches. Had the search been expanded to display 50, 100 or even more, (up to 2,000) more X citations would have been found, (albeit that Ambercite’s scoring and ranking system, would most likely show the most relevant patents in the top 25 to 50 patents displayed).

So overall, as shown in the right hand column of the table above, Ambercite returned the best or best equal results in all of the categories considered.

But this initial list is only part of the full Ambercite search experience. Ambercite now makes it very easy to run Augmented Searching, shown in the links below. Augmented searching allows you to easily use your initial findings to rapidly find further highly relevant patents. So in other words, any new X citations found in the first 25 results are only the start of the new relevant results you can find in Ambercite.

And lastly, Ambercite has the broadest data coverage.

Does Ambercite provide an unfair advantage?

In his paper, Dr Schlechter noted the leading performance of Ambercite, but wondered if it was fair to compare the Ambercite approach to the semantic search engines, as the semantic search engines can search from text as well as patent numbers.

“By also doing an intellectual search mostly with citing and cited documents, naturally, the individual criteria at AMBERCITE deliver better results.”

This is a fair point, but this depends on the context of the search. If a searcher is looking for prior art to an invention only defined by a description, one of the above text based engine can help. Having said that, many users of Ambercite firstly find some relevant starting patents using conventional patent searching, and then use Ambercite to grow the data set by finding relevant patents missed by conventional searching.

However, in many cases a searcher does have a relevant patent - for example their own patent, a patent they are trying to invalidate, or other relevant patents that they are already aware of.

And in these other cases, it should not matter how Ambercite finds its results, in the same way that it does not matter that Usain Bolt towers over his competitors in the 100m sprints: It should only be the out-performance that matters.

What does this mean to patent searchers?

At Ambercite, we are not opposed to traditional searching - it certainly has its place. However it can take many hours, and may not give a complete listing.

In contrast, Ambercite searching can take just a few minutes, and still supply results missed by traditional searching - while outperforming semantic search engines. Not only that, it can be highly complementary to traditional patent searching - a combination of the two approaches, used sequentially, can be very powerful indeed.

Given increasing demand on patent searchers and examiners to produce high quality results in a short time frames - wouldn’t it make sense to you to use Ambercite to strengthen your search results?