Scanmine is a news aggregation and content analysis technology. It uses a unique pattern matching technique to automatically identify and isolate article content in html pages and a neural network model similar to those found in AI systems for content analysis.
Unique Independent Technology
Scanmine is lightweight, scalable and almost maintenance free. It is only based on open source and de-facto standard developing tools such as Java, PHP and Lucene/Solr. The Scanmine technology is uniquely developed and owned by Scanmine A.S. and does not rely on any paid recurring licenses other than the hardware running the news service NewsOwner, showcasing the utilization of the technology.
Automated News Aggregation
New sources for monitoring and aggregation of content can be added to the system without any manual preparation (“tagging”) or maintenance. Scanmine not only retrieves news content from web pages based on meta information, but also retrieves the actual (open) article by automatically isolating article content from ads and other information on a web page. It also identifies multiple articles on each web page typically used for short news desk postings. A highly scalable number of boots are checking sources for new content that enters a Lucene / Solr index. Scheduling of visiting intervals are automatically adjusted based on how often a source produces new content. Sources are grouped in indexes allowing content to be extracted only from relevant sources or languages.
Content Analysis
The Scanmine technology identifies events, trends, topics, places and people, etc. - all in real-time. The technology currently handles four languages: English, Norwegian, Swedish and Danish. Scanmine does not use a synonym database (although adding one would enhance the quality somewhat). Furthermore, it does not rely on a fixed semantic interpretation of content but interprets in real time content from searches in and rapidly evolving neural network based reinforcing clusters of similar information for further analysis. It also allows very little content available from an article to be given a meaningful interpretation and it makes it easy to add new (western) languages to the technology. Scanmine also has a license free picture database where relevant pictures may be added articles for illustration purposes.
Scanmine has developed NewsOwner that allows users to create their own news services.
It is a REST based client server system utilizing the Scanmine technology. NewsOwner can easily be modified to include the look and feel of partners wanting to integrate the service into their own products or services. The service currently collects news form about 30 countries/regions. Sources can be added to reflect the content relevant for any user groups including paid content behind paywalls and does not rely on a specific article structure.
Please contact us for more information.