D'abord, nous développons une conception de dispositif de roman qui emprunte aux techniques utilisées généralement pour l'extraction de dispositif dans la reconnaissance de la parole et le traitement de musique. Ces techniques sont adaptées vers l'oreille humaine, qui est limitée à approximativement. 20 kilohertz et dont la sensibilité est logarithmique dans la fréquence ; pour des imprimeurs, nos expériences prouvent que la plupart des dispositifs intéressants se produisent au-dessus de 20 kilohertz, et une balance logarithmique ne peut pas être assumée.
Notre conception de dispositif reflète ces observations en utilisant une décomposition de sous-bande qui met l'accent sur les fréquences, et en écartant des fréquences de filtre linéairement sur la gamme de fréquence. We further add suitable smoothing to make the recognition robust against measurement variations and environmental noise.Second, we deal with the decay time and the induced blurring by resorting to a word-based approach instead of decoding individual letters. A word-based approach requires additional upfront effort such as an extended training phase as the dictionary grows larger, and it does not permit us to increase recognition rates by using, e.g., spell-checking. Recognition of words based on training the sound of individual letters (or pairs/triples of letters), however, is infeasible because the sound emitted by printers blurs so strongly over adjacent letters.
Third, we employ speech recognition techniques to increase the recognition rate: we use Hidden Markov Models (HMMs) that rely on the statistical frequency of sequences of words in text in order to rule out incorrect word combinations. The presence of strong blurring, however, requires to use at least 3-grams on the words of the dictionary to be effective, causing existing implementations for this task to fail because of memory exhaustion. To tame memory consumption, we implemented a delayed computation of the transition matrix that underlies HMMs, and in each step of the search procedure, we adaptively removed the words with only weakly matching features from the search space.
We built a prototypical implementation that can bootstrap the recognition routine from a database of featured words that have been trained using supervised learning. Afterwards, the prototype automatically recognizes text with recognition rates of up to 72 %.
.
.