Hierarchy of Design Voice Controlled Remote Voice Input Control Path Speech Processing IR Interface
Characteristics of Speech n Amplitude variations n Frequency variations n Continuous in frequency domain n Most of the energy is within 100Hz to 4kHz n Requires >8kHz sampling for intelligible speech
Our Speech Algorithm n Isolated word - cannot distinguish important areas in a stream of uninterrupted speech n “Small” vocabulary - in the zero to tens of words region - Up, Down, Power, Surf n Training Required - tells the device what the command sounds like n Speaker Dependent - re-training required for separate user
The Voice Input n Condenser microphone n Signal is amplified approximately 6000x n Sampling rate ~8 kHz n 8 bit linear conversion
Word Boundary Detection n Samples continuously n Has the threshold level been reached? n Begin analyzing the data n Is the threshold level being reached very often? n Stop analyzing the data
Zero Crossings n One transition from positive to negative or vice-versa n Algorithm to determine the frequency of the signal n Frequency inversely proportional to the period
Energy Analysis n The energy of the signal is the amplitude squared (Parseval’s theorem). n we used absolute value of amplitude. n Real-time calculation (as it is received).
The Recognition Process Compare the characteristics of the sample against Command1 Compare the characteristics of the sample against Command2 Compare the characteristics of the sample against Command3 Compare the characteristics of the sample against Command4 The command most similar to the recognized word. The command most similar to the recognized word. The command that was spoken
The Infra-Red Beam n Detects and stores codes for common Sony TVs n Utilizes blind copycat method of IR memory, no decoding occurs n Method easily modified to other IR protocols
General A/V IR coding schemes n 38-40kHz carrier at 940nm wavelength n Carrier output is gated by bit stream. n Most protocols use Pulse Width Modulation for bit encoding. – Logic ‘1’s coded as T (un-modulated) followed by 2T (modulated). Where T 550 s – Logic ‘0’s coded as T (un-modulated) followed by T (modulated). n Various bit lengths, start and end sequences.
The Control Path n Implemented in two Moore state machines n Training/Initialization n Active/Recognition
The Surf Function n Start and stop the function with the utterance of the command SURF n Enables a three-second preview of each channel n Risk of developing carpal- tunnel syndrome decreases sharply!