This talk presents effective feature extraction methods for mining transactional data elements. Transactional data are bags of large vocabulary categorical data that are optionally time-embedded.
This class includes textual data as a sub-class. By extending previous methods for feature extraction in textual settings, Dr. Ted Dunning has been able to define classes of feature extractors that work on a wide class of problems of significant practical import. These include profitability prediction in insurance, fraud detection in credit, recommendation based on implicit observation of web behavior and other areas.
This talk defines a mathematical framework for dealing with transactional data, demonstrate a class of feature extraction approaches and give several case studies of the application of these techniques.
Dunning presents the architecture and shows preliminary results for a novel kind of video search engine that uses neither meta-data, nor video content to achieve remarkably accurate results. This video search engine is based on transactional analysis of viewing habits of millions of users.
An important common feature of all of these techniques is extreme simplicity of implementation. The fundamental feature selector can be implemented in literally a half dozen lines of code.
Dr. Ted Dunning
Ted Dunning is currently CTO of Deepdyve. He has previously served as Chief Scientist for Veoh Networks, ID Analytics and Musicmatch, where he researched methods for pattern discovery and analyzed symbolic sequences in language, genetic sequences, web-browsing behavior, musical preferences, purchasing behavior and financial transactions.
His particular interest is very low cost algorithms for mining very large data streams, particularly those that involve text-like time embedded symbolic information.