Framework for search and display of heterogenous document collections.
NOTICE: This code repository is deprecated. Please visit https://github.com/cdlib/xtf for the latest updates.
Obsolete Description: The eXtensible Text Framework (XTF) is an architecture that supports searching across collections of heterogeneous textual data (XML, PDF, HTML, text, and more), and the presentation of results and documents in a highly configurable manner. Includes highly customized versions of the proven open-source components Lucene and Saxon.
Web-as-corpus tools in Java.
* Simple Crawler (and also integration with Nutch and Heritrix)
* HTML cleaner to remove boiler plate code
* Language recognition
* Corpus builder
Dr. Micheal Kay: "Saxon 8.7 is the first release to be released simultaneously by Saxonica on the Java and .NET platforms." MDP: Mission accomplished! Saxon for the .NET platform from Saxonica is now available and supported via the http://saxon.sf.net
Group-CCS development Components, templates, tools, accessories, tutorial, modules, translations, documentation, codes, scripts, everything that can improve the work of who uses the powerful tool of development, CCS - CodeCharge Studio.
This code supplies miniature pedagogical Java implementations of information retrieval, spidering, and text-processing software. It was initially developed for an introductory course on Intelligent Information Retrieval and Web Search in UT Austin.
IGLU is a Java class library designed to facilitate sharing of code among Artificial Intelligence/Information Retrieval researchers to illustrate how various problems can be solved in Java. It is developed and maintained by the IGLU Research Group.
TouchGraph provides a set of interfaces for graph visualization using force-based layout and focus+context techniques. For now only older code is available, but we are planning to release new versions as well.
Arachnid is a Java-based web spider framework. It includes a simple HTML parser object that parses an input stream containing HTML content. Simple Web spiders can be created by sub-classing Arachnid and adding a few lines of code called after each page
Frosttie (FROnt-end SchemaTron Text Internet Engine) takes XHTML pages and processes them with various user-definable filters such a W3C's WAI, Section 508 (US) web usability compliance, ad removal, etc. It can be used with zKnowMan.
Modern software for non-emergency medical transportation providers, built to improve scheduling, billing, routing, and dispatching processes.
RouteGenie NEMT software is a modern system built to automate all non-emergency medical transportation processes including routing, scheduling, dispatching, and billing. It helps manage everyday challenges like vehicle breakdowns, traffic problems, cancelations, driver call-offs, will calls, no shows, add-on trips, on-demand trips, and more.