3 Overview: Tokens, Categories and the Registry
4.3 Instantiating an Object from a Token
5 Tokens and Categories For Engine Developers
5.1 Making Resources Available Through SAPI
5.2 Associating Files with Tokens
5.3 Inspecting UUnderlying Keys of a Token
5.4 Creating New Keys in the Registry
This document is intended to help developers of speech-enabled applications discover and use resources (Voices/Recognizers) on a computer that has SAPI installed. A speech-enabled application is one that attempts to either recognize or synthesize speech.Developers of speech recognition (SR) and speech synthesis (Text to Speech or TTS) engines make their resources available to applications.
This spec answers the following questions:
· What are Tokens and Categories in SAPI?
· Where is information about tokens stored in the Registry?
· How does an application find tokens and initialize resources (i.e., Voices or Recognizers) from them?
· What are the SAPI-defined attributes that engines should document in the registry?
· How are files associated with tokens?
Note:
The Speech SDK documentation section on Object Tokens, which provides a complete description of the ISpObjectToken and ISpObjectTokenCategory and their methods, complements this document.
A token is an object representing a resource that is available on a computer, such as a voice, recognizer, or an audio input device. A token provides an application an easy way to inspect the various attributes of a resource without having to instantiate it. The Vendor of a Recognizer, and Gender of a Voice are examples of attributes of resources. In many cases, applications should use SAPI-provided helper functions for common scenarios. For example, an application can use the SpCreateBestObject helper function to rapidly create the object, given a certain type of resource. The application can also query for tokens meeting certain criteria without using the helper function. To do this, the application calls the EnumTokens method on the ISpObjectTokenCategory interface to get an enumerator, and inspect the tokens in the enumerator further if it chooses to. Finally, the application selects one of the tokens in the enumerator to instantiate a resource. Once the resource (such as SR Engine) is instantiated, if it implements the ISpObjectWithToken interface, then it is handed a pointer to the token that was used to create it. This way, the resource contains a handle to more information about itself.
Conceptually, a token contains the following information:
· The language-independent name. This is the name that should be displayed wherever the name of the token is displayed. It is marked as (Default) in the registry. The implementer of the token may also choose to provide a set of language-dependent names in several languages.
· The CLSID used to instantiate the object from the token.
· A set of Attributes, which are the only set of queriable values in a token. This means SAPI provides a mechanism to query for tokens whose attributes match certain values. Details on how to query for tokens that match a set of attributes are in Sections 4.1 and 4.2.
A token may also contain the following:
· If a token has user interfaces (UIs), such as the properties of a Recognizer or a wizard to customize a Voice to display, then the token will also contain the CLSID for the COM object used to instantiate each type of UI.
· The set of Files from which SAPI returns the paths to all the associated files for the token.
SAPI stores information about tokens in the registry. A token is represented in the registry by a key and the key’s underlying keys and values. When an application queries SAPI for tokens of all the female voices on the computer, SAPI will look at the HKEY_LOCAL_MACHINE\Software\Microsoft\Speech\Voices area. This corresponds to a Category and categories are discussed in the Section 3.2. SAPI searches for tokens that match the criteria (in this case, a voice with the Gender attribute set to female) and uses one of these matching tokens to initialize the voice. The application may also specify a different fully qualified registry path to specify any non-standard (from a SAPI) location in the registry for SAPI to search for a token. In addition to the keys SAPI recommends, the entry for the token may contain any other bits of information that the implementer of the token can store here. In the registry, a token looks like this:
Table 1 Parts of a Token in the Registry
|
RegKey |
ValueName |
Sample Value |
Comments |
|
SampleTokenKey |
|
|
Required - This is the RegKey for the Token. |
|
|
(Default) |
Joe |
Required - Language Independent Name. |
|
|
409 |
Joe |
Name in Hex LangID 409, which is English. There may be several of these rows, one for each LangID in which the Token has a name. Note, no leading 0x before the LangID. |
|
|
809 |
Joe |
|
|
|
CLSID |
{8021D50E-D93C-4075-8504-FC4E124D64E9} |
Required - Sample CLSID for object which instantiates the token. |
|
SampleTokenKey/Attributes |
|
|
Attributes for the token are under this key. |
|
|
Language |
409;809 |
There may be several of these rows, one for each attribute that is queriable. See Section 4 for an explanation of each of the attributes. |
|
|
Vendor |
VoiceVendor |
In the registry, this looks like:
Figure 1 A Token Key in the Registry

The Attributes key contains all the queriable values for the token. Section 4.2 discusses in detail how an application queries a token.
Figure 2 Attributes of a Token

If the token is capable of displaying UI, then each UI has its own key under the token. Fig 3 shows the token for a Recognizer that supports four types of UI: AddWord, EngineProperties, MicTraining and UserTraining, as well as the CLSID underlying each UI type.
Figure 3 A Token that supports UI has a token for each UI type

SAPI provides a comprehensive set of helper functions for the common scenarios using tokens. Section 4.1 provides a number of examples. SAPI also provides a way for engines and applications to implement tokens in their own proprietary manner. See Section 3.4 on token enumerators, for further discussion. Sections 4 and 5 explore common scenarios using these interfaces from application and engine coding perspectives.
A ObjectTokenCategory (hereafter referred to as category) is the highest level of grouping of registry entries in SAPI. A category is a class of tokens (or of resources, since each token represents an actual resource on the computer). Intuitively, a category is a type of SAPI resource. It is represented in the registry by a key containing one or more token keys under it. It is created and manipulated using helper functions such as SpCreateDefaultObjectFromCategoryIDor methods on the ISpObjectTokenCategory interface. Please refer to the SAPI documentation for details on either of these. Examples of categories are Recognizers and Voices. Figure 4 shows the default SAPI categories, with the category Voices selected.
Figure 4
The Category Voices
SAPI organizes tokens in the Registry under seven categories.
By default, the following tokens for six of the SAPI categories are located under HKEY_LOCAL_MACHINE\Software\Microsoft\Speech (HKLMS). This is where all system-specific SAPI keys and values should be stored as recommended by Windows Application guidelines. Examples include settings and files for Voices and the Recognizers ( also known as Speech Recognition engines) installed on a computer, as shown in Figure 1.
1. Voices
2. Recognizers
3. AppLexicons
4. AudioInput
5. AudioOutput
6. PhoneConverter
The tokens for the other category, Recoprofiles, are located under HKEY_CURRENT_USER\Software\Microsoft\Speech (HKCUS).HKCUS also contains all other user-specific keys and values in the registry, such as user defaults for Voices, Recognizers, as well the location of the user lexicon file.
Categories contain the following items:
· A single key called Tokens, and the keys for the tokens that belong to that category under it. For example, the Voices category has a key for the voice called Manuel. All the keys and values for Manuel are located under HKLMS/Tokens/Manuel.
· Keys for token enumerators. A token enumerator is a special type of token that generates other tokens for the same category. This token provides a way for Vendors to generate tokens that are generated in non-standard way, such as, reading data from a stored file stored. Those engine vendors following SAPI guidelines for registering resources (Sections 4 and 5) can safely ignore these and regard them as generators for another set of tokens. Section 3.4 explains token enumerators in more detail.
A CategoryID uniquely identifies a category in the registry. For SAPI defined categories they take the form of HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Speech\{CategoryName}. For example, HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Speech\Recognizers\ for the Recognizers category. All SAPI CategoryIDs should be referenced using the constants defined in sapi.idl file:
1. SPCAT_AUDIOOUT
2. SPCAT_AUDIOIN
3. SPCAT_VOICES
4. SPCAT_RECOGNIZERS
5. SPCAT_APPLEXICONS
6. SPCAT_PHONECONVERTERS
7. SPCAT_RECOPROFILES
Similarly, TokenIDs uniquely identify tokens in the registry. For tokens located in SAPI defined categories, they take the form of:
· CATID\Tokens\TokenKeyName - a static token from the registry. For example, HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Speech\Recognizers\MSASREnglish
· CATID\TokenEnums\TokenEnumKeyName - a static token from the registry that represents a token enumerator. This token instantiates a token enumerator used to enumerate dynamic tokens. SAPI uses this for its own implementation of audio input and output to list just the channels available on the computer at runtime. Token enumerators can also read tokens from other areas of the registry, or from remote computers. For example, HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Speech\AudioOutput\DSoundAudioIn
· CATID\TokenEnums\TokenEnumKeyName\ - a dynamic token representing the default token that the specified token enumerator generates. For example, SPDSOUND_AUDIO_IN_TOKEN_ID creates the default Dsound audio in an object. For example: HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Speech\AudioOutput\DSoundAudioIn\
· CATID\TokenEnums\TokenEnumKeyNameEnumExtra… - a specific dynamic token from the specified token enumerator. For example: HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Speech\AudioOutput\DSoundAudioIn\Direct Sound Crystal WDM Audio, which generates the Direct Sound Crystal WDM audio object.
In addition to the category defaults mentioned in Section 3.2, the categories Voices, Recognizers, AudioInput, AudioOutput and RecoProfile, also have user defaults and settings. As shown in Figure 5, these are located in the HKCUS area, under their respective category keys. Section 6 explains each category of tokens. This section also lists out the user-specific entries in the HKCUS and the system-wide entries in HKLMS.
Figure 5 The User category for Recognizers

Note: This section is relevant only forEngine or Application developers who need to store tokens in a separate part of the registry or even on the file system, and dynamically enumerate them.
SAPI provides a way for third parties to store their registry settings without following any of the SAPI-recommended guidelines. SAPI can find these tokens as long as the parties have implemented token enumerators. Token enumerators are COM objects that enumerate the necessary entries for the tokens under it. All token enumerators are stored under CategoryName/TokenEnums. Each token enumerator listed under a category needs to have the CLSID of the COM object that implements it under the token enumerator.
The token enumerator
· Must implement the methods Next, Skip, Reset, Clone, Item, GetCount on the IEnumSpObjectToken interface.
· May choose to implement methods SetObjectToken and GetObjectToken on ISpObjectWithToken interface. As mentioned in the end of Section 3.1, these give a resource a handle to the token that was used to instantiate it.
These tokens can be located in a separate part of the registry or somewhere else (possibly on the flusters). It is the responsibility of the token enumerator to return correctly on the above methods so an application does not know the difference between tokens coming from the token enumerator and tokens coming from the SAPI-specific part of the registry.
SAPI itself uses token enumerators only for the AudioInput and AudioOutput categories. Refer to Sections 6.4 and 6.5 for more details. Note that the token enumerator for the MMSYS audio object creates its tokens from keys that are under it.
The following is an example of what a TokenID for a token located under a token enumerator looks like: CategoryName/TokenEnums/TE1/XXX where (i) TE1 is a sample token enumerator and (ii) XXX is a reference to one of the tokens generated by TE1. On a call to the helper function SpCreateCreateNewToken giventhe TokenID above, the IEnumSpObjectToken returned by the token enumerator TE1 to SAPI includes all tokens. SAPI then goes through each token (those returned by token enumerators and those under the tokens key) to find the one that has a Token name matching XXX.
Table 2 Parts of the AudioInput token enumerator
|
RegKey |
ValueName |
Sample Value |
Comments |
HKEY_LOCAL_MACHINE\SOFTWARE\MICROSOFT\Speech\AudioInput\ |
|
|
This is the category. |
|
|
DefaultDefaultTokenID |
HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Speech\AudioInput\TokenEnums\MMAudioIn\ |
This is the TokenID for the default token for this category. If the DefaultTokenID is present, it will supercede this default token for the category. Details in section 4.2 |
HKEY_LOCAL_MACHINE\SOFTWARE\MICROSOFT\Speech\AudioInput\TokenEnums |
|
|
|
HKEY_LOCAL_MACHINE\SOFTWARE\MICROSOFT\Speech\AudioInput\TokenEnums\MMSys |
|
|
This is the MMSys token enumerator |
|
|
CLSID |
{GUID} |
This is the CLSID for the COM object that implements the MMSound token enumerator. |
Figure 6 AudioInput token enumerator in the registry
Figure 6 illustrates how the AudioInput token enumerator looks in the registry.

A SAPI 5 application needs to find tokens and instantiate objects that meet certain criteria from the resources available on a computer. Helper functions distributed in the sphelper.h file are the recommended way for applications to interact with tokens and categories whenever possible. Table 3 provides a list of helper functions and the scenarios they address. The helper functions have been broken up into Common Helper Functions and Engine Developer Helper Functions based on likelihood of use. If the specific helper function is not found in either section, refer to the SAPI documentation for the comprehensive listing.
Table 3 Common Helper Functions
|
Helper Function |
Action |
Example Helper Function Call |
|
SpGetDefaultTokenFromCategoryID |
Creates the default token from a CategoryID. The last argument tells SAPI to create the token if it does not currently exist. |
CcomPtr<ISpObjectToken> m_cpEngineToken; hr = SpGetDefaultTokenFromCategoryId(SPCAT_RECOGNIZERS, &m_cpEngineToken); |
|
SpFindBestToken |
Finds the most appropriate token given a set of required and optional criteria. For details on attribute matching see Section 4.2 |
CComPtr<ISpObjectToken> cpTokenEng; hr = SpFindBestToken(SPCAT_RECOGNIZERS, L"Language=409", L"VendorPreferred", &cpTokenEng);
|
|
SpEnumTokens |
Returns a token enumerator containing all tokens meeting a set of required and optional attributes. Tokens in the enumerator are sorted in the order specified in the Section 4.2. |
CcomPtr<IEnumSpObjectTokens> cpIEnum; hr = SpEnumTokens(SPCAT_VOICES, L"Gender=Female;Language=409", L"Vendor=VoiceVendor1;Age=Child” , &pEnum); |
|
SpCreateDefaultObjectFromCategoryID
|
Creates the default object in a category, such as AudioInput or Recognizer |
CComPtr<ISpVoice> cpVoice; SpCreateDefaultObjectFromCategoryID(SPCAT_VOICES, &cpVoice); |
|
SpCreateBestObject |
Instantiates a resource that best matches a set of required and optional criteria. For details on attribute matching see Section 4.2 |
CComPtr<ISpVoice> cpVoice; SpCreateBestObject(SPCAT_VOICES, L"Vendor=VoiceVendor1;Age=Child", L”Gender=Female”, &cpVoice);
|
|
SpCreateObjectFromToken |
Creates an object from a token. |
CComPtr<ISpVoice> cpVoice; CComPtr<ISpObjectToken> cpVoiceToken; //--like last step SpFindBestToken(SPCAT_VOICES, L"Language=409", L"VendorPreferred", &cpVoiceToken); /--now create object SpCreateObjectFromToken(cpVoiceToken, &cpVoice); } |
Table 4 Engine Developer Helper Functions
|
Helper Function |
Action |
Example Helper Function Call |
|
SpCreateNewToken |
Creates a new object token in the registry with CategoryID, but without specifying a keyname. This creates a token with a GUID as its registry key. |
CComPtr<ISpObjectToken> cpUserToken; hr = SpCreateNewToken(SPCAT_RECOPROFILES, L"", &cpUserToken); cpUserToken; |
|
SpGetTokenFromID |
Creates a token from a TokenID of an enumerator or a new token if the token does not already exist. The last argument of FALSE tells SAPI not to create the token if it does not already exist. |
CComPtr<ISpObjectToken> cpAudioInTok; hr = SpGetTokenFromID(SPCAT_AUDIOIN, &cpAudioInTok, FALSE))) |
|
SpCreateObjectFromSubToken |
Creates an object from a subtoken of a token. In this case, the engine token pEngineToken has the Lts key under it, which in turn has a CLSID value under it. This CLSID is used to instantiate the object. |
CComPtr<ISpObjectToken> m_cpEngineToken; hr = SpGetDefaultTokenFromCategoryId(SPCAT_RECOGNIZERS, &m_cpEngineToken); ISpLexicon * m_pLtsLex; HRESULT hr = SpCreateObjectFromSubToken(pEngineToken, L"Lts", &m_pLtsLex); |
|
SpGetSubTokenFromToken |
Creates a subtoken under a token. This is useful, for example, when an Engine vendor would like to create a subtoken for custom data under its Recognizer token. |
CComPtr<ISpObjectToken> cpSubSubToken; hr = SpGetSubTokenFromToken(&m_cpEngineToken, L"EngineProperties", &cpSubSubToken, TRUE ); |
The principal tasks related to tokens and categories that an application needs to accomplish are:
· Enumerating tokens
· Inspecting and instantiating tokens
The two primary ways to enumerate tokens are by the helper function SpEnumTokens, or by the methodISpObjectTokenCategory::EnumTokens. Both methods allow the caller to specify a category and a set of required and optional attributes. The call then returns a token enumerator containing all the tokens matching those criteria. The method is defined as:
|
HRESULT EnumTokens( [in] const WCHAR *pszCatName, [in, string] const WCHAR *pReqAttrs, [in, string] const WCHAR *pOptAttrs, [out] IEnumSpObjectTokens **ppEnum); |
When identifying matching tokens under in a category, an application needs to specify a fully qualified category identifier (FQCID). An FQCID is the full registry path to a category, such as HKEY_CURRENT_USER\Software\Microsoft\Speech\Voices. It is recommended that these categories be referenced using the constants defined in the sapi.idl file below, and not using the full string to minimize typos in commonly used registry paths. SAPI maps the constant to the correct hive in the registry and returns matching tokens from the category. For instance, the SAPI defined AudioInput constant (from the sapi.idl file) is:
|
//--- Categories for speech resource management const WCHAR SPCAT_AUDIOOUT[] = L"HKEY_LOCAL_MACHINE\\SOFTWARE\\Microsoft\\Speech\\AudioOutput"; |
Similarly, there are constants for the AudioInput, Voices, Recognizer, Applexicon, PhoneConverter, and RecoProfile categories.
An application may also specify a non-standard registry location by simply providing its FQCID, such as HKEY_CURRENT_USER\Software\TTSVendor1\Speech\Voices
In both SpEnumTokens and ISpObjectTokenCategory::EnumTokens the following clauses are permitted in the ReqAttrs and OptAttrs strings, separated by semicolons.
|
Condition |
Example |
Explanation |
|
Exists |
Telephony;Dictation |
The valuenames Name and Dictation exist in the list of attributes for this token. |
|
One of |
Language=409 |
At least one of the values of the Valuename Language is 409. There may be other values, like 809, 512 as well. |
|
Not Equals |
Age!=Child;Age!=Teen |
Values of Age that are neither “Child” nor “Teen”. |
The tokens are sorted “best matches” first using the following intuitive rules:
1. Only tokens matching the required attributes are returned.
2. Those tokens matching the optional attributes as well will be before those that just match the required attributes.
3. If there are no required or optional attributes (i.e., both are set to NULL), the first token is the default token for that category. If there is a valid DefaultTokenID in HKLMS/Category, that is returned as the default tokenID. If not, if there is a default tokenID in HKCUS/Category, that is returned. If none of these exist, SAPI searches for a DefaultdefaultTokenID in HKLMS/CategoryName, and that is returned.
4. Matching Rules: If a token matches an optional attribute, it gets a score of 1, otherwise, 0 for that attribute. The optional attributes mentioned earlier in the query string are more significant. These scores are concatenated as shown in Table 7. The tokens are then placed in descending order. This is illustrated in Tables 6 and 7.
5. Tokens having the same score are returned in random order in the enumerator.
A call to EnumTokens could look like:
|
CComPtr<IEnumSpObjectTokens> cpEnum; CComPtr<ISpObjectTokenCategory> cpVoiceCat;
HRESULT hr = cpTokenCategory.CoCreateInstance(CLSID_SpObjectTokenCategory); const WCHAR Req_Attrs[ ]=L"LanguagesSupported=409"; const WCHAR Opt_Attrs[]=L"Vendor=VoiceVendor1;Age=Child;Gender=Female”;
HRESULT hr = cpVoiceCat->EnumTokens(SPCAT_VOICES, ReqAttrs , OptAttrs , &cpEnum); // SPCAT_VOICES is defined in sapi.idl |
If the following voices are installed on a computer as shown in Table 6:
Table 6 Voices installed on a computer
|
Voice |
Vendor |
Age |
LanguagesSupported |
Gender |
|
Michelle |
VoiceVendor1 |
Child |
409; 411 |
Female |
|
Mary |
VoiceVendor1 |
Adult |
409 |
Female |
|
Jane |
VoiceVendor2 |
Child |
409 |
Female |
|
Frank |
VoiceVendor2 |
|