Real-Time Fact Checking With Pocketsphinx and Google’s FactCheck API

Max Slick
3 min readOct 26, 2020

TL;DR It only sorta works. The reliability of PocketSphinx for a continuous Speech-To-Text transcriber is spotty and Google’s FactCheck API is not as robust as would have made this article compelling. But some semblance of a real-time fact-checker shines through on the more odious and well known socio-political claims, so it was a cool experiment.

Source Code: https://github.com/rmslick/RealTimeFactChecker/blob/master/RTFactCheck.py

Overview

Here are the vestiges of an attempt to compose a real-time automated fact-checking system using an amalgam of API’s. The results were slightly underwhelming but intriguing nonetheless.

API’s/Modules to unpack: Google FactCheck and Pocketsphinx continuous Speech-To-Text.

The Pipeline.

Real-Time Text To Speech —pocketphinx

Pocketsphinx is a CMU product. Free and opensource, it boasts one of the only STT models available offline and it keeps you away from the harlot of the computer sciences — machine learning. You’ll have to install it to use. But on Ubuntu 20.04 and Python 3.8, the dog hunts.

Stream takes all of three lines to initiate. Nice.

And proved precisely why it’s free in mere moments. Ice.

Spoken claim: “The White House is full of mice.”

Google’s FactCheck API

A google search for ‘python fact checking,’ a hasty scroll past all solutions that had sentences sentimentally similar to ‘and now we import 12 modules to format the data from JSON into an excel spreadsheet which we can then use to read into a dataframe’ in them and I happened on an unassuming API by Google called FactCheck.

You will need to acquire a google developer API key to use FactCheck.

What I was hoping for was the ability to throw any arbitrary sentence over to Mountain View and get back a ‘This is true’/‘This is false’ response with the full level of robustness that you might assume a company with 2.5 million servers of data could provide. This sort of was the case, but not enough to justify checking the victory box. The documentation for the API is here. From this I was sort of able to create the simple “The earth is flat” -> [Google Wizardry] -> “No it isn’t. Stick to basketball” schema I was hoping for. But with limitations. In fairness, this may actually be by design. There is evidence of a corpus of validated claims and their sources, but I am unsure how to acquire it in full and unsure of how it is maintained/grown. Any information on this would be awesome.

Below is a method to pass queries to the API and obtain a validation/refutation along with a source.

The code above roughly accomplishes the initial intent to have a claim validated. But from testing it is clear that unless the claim is a propagation of some dogma that has been scrutinized widely in the media, then it is unlikely to have representation in the corpus and may return nothing. However, some our more cherished social peculiarities are well represented as evidenced in the TestFactCheck method in the source.

At the last minute, I decided to throw in a Text-To-Speech to qualify statements audibly. This can be improved upon above by rigging a mechanical air horn to blast upon an uncovered lie and I sincerely welcome the contribution.

It is highly likely, I am under-utilizing this API in some way, will mess around with it more in the future.

Here is the pipeline in play working after three attempts to get PocketSphinx to recognize the word “corona”:

Add attempts 1 and 2 in the terminal to STT time capsule archives.

Here it is handling the question which famously places strain on the old adage ‘There are no stupid questions”:

Awarding no points.

Happy hacking.

--

--

Max Slick
0 Followers

Computer Engineer. Novelty enthusiast. Class D Motorist.