Just a quick follow-up for those of you who showed interest.
My proof-of-concept with FastText has been a success. The C# class library has been integrated with a product and staged for a future release.
The product provides a web-based conversational user interface, accepting natural language as user input and performing configurable actions, possibly interacting with other (external or internal) software products and services. For example, “How’s the weather in Tokyo?” will send a request to a weather service and return the response, or “Show me the current sequence of application xyz” will query another product’s database and display the results (prompting the user in case of uncertainty). Therefore the initial use case is supervised text classification, and FastText has proven to be fit for the purpose.
The FastText functionality I’ve exposed to .NET so far includes
- all training methods (cbow, skipgram, supervised) and their parameters
- file persistence
- text classification
- partial word representation, “nearest neighbour” queries, “analogies”
- accessing dictionary and model data
The top two points open the possibility to re-train based on new user input, ie. continuous learning.
Additionally, I’ve implemented database persistence of the dictionaries and trained models so they can be queried directly in SQL code. For example, the classic “king – man + woman” (using cosine similarity):