Fast, easy to use email client that is suitable for both the inexperienced email user as
well as for the most demanding of power users. Successor of Pine, also
developed at the University of Washington.
The Advanced Linux Sound Architecture is used to provide audio and MIDI
functionality to the GNU/Linux OS. Since I work in the Speech Recognition
field, its importance is evident.
Another Neat Tool. Software tool for automating software build processes.
Similar to make but implemented using the Java language. It
requires the Java platform, and is best suited to building Java projects.
The build process is described with a XML file.
Open source UML modeling tool and includes support for all standard UML 1.4 diagrams. It runs on any Java platform and is available in ten languages.
It is the shell, or command language interpreter, that will appear in the
majority of GNU/Linux distributions.
Bash is an sh-compatible shell that incorporates useful features from the Korn shell (ksh) and C shell (csh).
Provides keyboard shortcuts to the Blackbox WM. This useful program
takes me one step closer to be a Keyboard Jedi. I may use it several
times per minute without even noticing.
LaTeX class for creating slides for presentations. It works together with pdflatex, dvips and LyX.
A classic clustering solution. A most mature project on clusers of
Fast and lighweight window manager for the X Window System built with C++.
This WM provides a nice look-and-feel without the expense of spending lots
of memory. It follows my deed of having a light but still powerful
workstation in my poor resources laptop.
3D animation studio. It includes tools for modeling, sculpting, texturing, UV mapping, rigging and constraints, weight painting, particle systems, simulation , rendering, node-based compositing, and non linear video editing, as well as an integrated game engine for real-time interactive 3D, and game creation and playback with cross-platform compatibility.
Open-source software for volunteer computing and grid computing. A massive
worldwide cluster for taking profit from idle computer time by using
the processor to cure
diseases, study global warming, discover pulsars and do many other
types of scientific research.
This Debian package installs all the necessary packages to compile.
Basically deals with GCC (the GNU Compiler Collection), GNU Make and
Cross Platform Make. CMake is a family of tools designed to build, test and package software. CMake is used to control the software compilation process using simple platform and compiler independent configuration files. CMake generates native makefiles and workspaces that can be used in the compiler environment of your choice.
Free, light-weight system monitor for X, that displays any information on your desktop.
Common UNIX Printing System. Its name stands for its description. The
packages for Debian are: cupsys, cupsys-server, cupsys-client and
CVS. A different open source source code management system. It is used,
for example, by the DokuWiki developers.
As described in the project's website, it is The Universal Operating
System. It's one the most stable GNU/Linux distributions, widely
spread among servers (a familiar example is the so beloved
cygnus.salle.url.edu) and with over 18733 packages it's almost
sure to be suitable for any application or need.
Debian is highly scalable, a feature I do appreciate very much since I have quite an old laptop with limited resources. In order to obtain a fairly good performance with such an old machine I installed the OS with no graphical environment and with the special laptop utils, apart from the base system. These options are available with the tasksel application. The resulting box is small and swift, ready to grow into a powerful workstation.
GTK+ based diagram creation program released under the GPL license.
A useful tool for creating high quality diagrams, such as the ones than
can be found in exams ;)
Complete Wiki. Written in PHP and text-files based. Excellent wiki engine.
Permissions, revisions, RSS, search engine... A swift
alternative to Mediawiki, upon Wikipedia is based.
A x86 emulator with DOS. Ideal for running those old rusty apps (or
games) that needed DOS, sentimental software.
Documentation system for C++, C, Java, Objective-C, Python, IDL (Corba and Microsoft flavors), Fortran, VHDL, PHP, C#, and to some extent D.
An app required for many Internet widgets, such as embedded videos on
Newline conversion between Unix, Macintosh and MS-DOS ASCII files.
ASCII text files can contain different forms of newlines, depending on which operating system is being used. Converting between these formats is often necessary if you use several operating systems. The flip program will convert the newlines to any format.
Simple binary editor. It lets users view and edit a binary file in both hex and ascii with a multiple level undo/redo mechanism.
GNU Image Manipulation Program. A raster graphics editor. Ideal for
tweaking photographs. Sometimes taken for the free software replacement
CVS. Open source version control system designed to handle very large projects with speed and efficiency, but just as well suited for small personal repositories; it is especially popular in the open source community, serving as a development platform for projects like the Linux Kernel, Ruby on Rails, WINE or X.org.
Tool which controls the generation of executables and other non-source files of a program from the program's source files.
Make gets its knowledge of how to build your program from a file called the makefile, which lists each of the non-source files and how to compute it from other files. When you write a program, you should write a makefile for it, so that it is possible to use Make to build and install the program.
Hacha Open Zource. File splitter.
Displays the present temperature of the HDD passed as a parameter.
See [Debian GNU/Linux]
Free implemetation of the Java Virtual Machine.
The IcedTea project provides a harness to build the source code from
openjdk using Free Software build tools and provides replacements for the binary plugs with code from the GNU Classpath project.
Debian's own Mozilla Firefox compilation, having passed the sieve of
the free software statements. Anyway, this Internet browser is practically
unbeatable. It's most complete and customizable. The only setback is
the big amount of memory that requires.
Vector Graphics Editor. With this program I became aware of the value
and usefulness of the vector images. Ideal for designing Web 2.0 icons
with Free Software Tools.
Open source bibliography reference manager. The native file format used by JabRef is BibTeX, the standard LaTeX bibliography format. JabRef runs on the Java VM (version 1.5 or newer).
Wiki engine. The smallest I have ever seen. Written in PHP and text-files
based. This homepage is based upon this project.
To my mind, the big advantage being a small program is that
it can be thoroughly read and
understood to then hack it and adapt it to the needs and preferences of
every developer. I do like it a lot.
Tool for generating API documentation in HTML format from doc comments in source code. It can be downloaded only as part of the Java 2 SDK.
Java New Operating System Design Effort. Simple to use & install Java operating system for personal use.
It runs on modern devices.
Graphical Java debugger front-end, written to use the Java Platform
Debugger Architecture and based on the NetBeans Platform.
Feature-rich and easy to handle CD burning application aimed at the
KDE graphical environment.
Trolltech Qt library, version 3. Required library for the applications
that link against libqt-mt.so.3, like all KDE apps and Opera browser.
Free tool for self-creation of virtual annotated library of PDF articles, designed for small trusted groups, e.g. science labs.
Librarian is written in PHP and thus produces standard HTML output that can be read by IE5 or NN4 compatible internet browsers.
The kernel headers. These are used for building extra kernel modules. In my case, I used them for my laptop to support the proprietary Nvidia driver. The Debian package holds the same name.
This Debian package discovers
what libraries and programs are using up memory.
A RAM memory tester. Useful for checking a recently bought memory.
My favorite P2P client. It accesses lots of different file-sharing
networks. It has a GUI, a TUI and a WUI.
Provides a terminal-based interface for installing and configuring device
driver modules. I used it to set the cpufreq module in order to
control the working frequency of the processor thus obtaining an optimal
See [Debian GNU/Linux]
Scans a network in order to determine what hosts are available, what
services (application name and version) those hosts are offering, what
operating systems (and OS versions) they are running, what type of
packet filters/firewalls are in use, and dozens of other characteristics.
It a swiss-army knife for crackers when used malevolently. Useful for
Office suite provided by Sun Microsystems. Nothing to envy towards the
proprietary MSOffice that most people are still stubborn on using.
Runs a deamon in the host that accepts remote connections via SSH. I find
it useful/necessary for controlling the pc remotely, specially when
a problem occurs and all other peripherals are dead, there's always an
open port (usually TCP/22) available to save the computer
from a crude reboot.
A very nice fully standards compliant Internet browser with a low memory
footprint that fits in my low resources laptop. Although it is not free
software, the enterprise that develops it offers a free binary distribution
for personal computers and mobile phones.
PDF Toolkit. Simple tool for doing everyday things with PDF documents such as merging, splitting,
PMD scans Java source code and looks for potential problems like
possible bugs, dead code, suboptimal code, overcomplicated expressions
and duplicate code.
Client for establishing a VPN against La Salle (for example) through
the PPTP protocol. This is a security hole (remember that MS is
behind it). If I need to surf the Internet with the IP of La Salle
for accessing scientific literature, I rather prefer to use wget in a
Dynamic object-oriented programming language that can be used for many kinds of software development. It offers strong support for integration with other languages and tools, comes with extensive standard libraries, and can be learned in a few days. Many Python programmers report substantial productivity gains and feel the language encourages the development of higher quality, more maintainable code.
Quick Image Viewer. A CLI tool to display images. Handy and swift.
Debian admin tool for configuring system services according to system runlevels.
Open source client for Windows Terminal Services, capable of natively speaking Remote Desktop Protocol (RDP) in order to present the user's Windows desktop. Supported servers include Windows 2000 Server, Windows Server 2003, Windows Server 2008, Windows XP, Windows Vista and Windows NT Server 4.0.
Easy incremental backups from the command line.
rdiff-backup is a python script that helps doing local and remote incremental backups.
Sophisticated calendar and alarm program with a Text User Interface. Ideal for combining with alpine (see above).
simple php blog
Flat file blog written in PHP. Easy to install and run.
Sound eXchange. The Swiss Army knife of sound processing programs, as
described in the project's homepage. SoX is a cross-platform
command line utility that can convert various formats of computer audio files in to other formats. It can also apply various effects to these sound files and play and record audio files on many major platforms.
CVS. This is one of those tools one begins getting used to
doing without, until one
is aware of its existence, then tries it and ends up finding impossible
to do the coding tasks without it. I'm not the only one that supports this
Tiny Java Web Server. The server is pretty small as in Java code as in result byte code. General purpose of the Web server is running and debugging servlets. However, it can be used as a regular web server for sites with low to medium load.
Program for extracting, testing and viewing ACE archives. The Debian package holds the same name.
Obvious usefulness. The Debian package holds the same name.
Vi improved. Text editor. Console-based, light, customizable... For me,
one essential tool. It has advanced features for programming tasks such
as colored syntax, auto indentation and line nummeration. I use it
almost for everything.
Easy virtualization program. It provides a generic hardware emulation that
is used to install and run a guest OS inside a host OS. Ideal for
running a guest Win box with all those apps that are still subject to this
Eines de virtualitzacio lliures per a sistemes GNU/Linux
Visual tool integrating several commandline JDK tools and lightweight profiling capabilities. Designed for both production and development time use, it further enhances the capability of monitoring and performance analysis for the Java SE platform.
Video LAN Client. A media player. Supports the majority of the encodings
used nowadays. Streaming also available.
Locks the current terminal (local or remote), or locks the entire
virtual console system, completely disabling all console access.
A nice way to keep nosy people at bay.
The VNC client most compatible and compliant with the original implementation.
Wine Is Not an Emulator. It is an implementation of the Win16 and Win32
API for Unix-like systems under the Intel platforms. A means of having
Windows software running on a GNU/Linux box without virtualizing the
Write Optical Disk Media. A command line tool that allows you to create CDs or DVDs on a CD/DVD recorder.
Blog publishing system written in PHP. Runs along with a MySQL database.
Very usable and customizable.
GUI cross-platform library which can be used from languages such as C++, Python and Perl.
A most complete VNC server which runs and is configured through the
A viewer for MS Compiled HTML Help files.
A tiny paint program for X. Ideal for those of us who have not had
time to learn a good application like Gimp but still need to hack
images from time to time.
Graphical server based on the open source implementation of the X
Window System provided by the X.Org project. Yes, GUIs drive crazy
eventually. Jokes apart, the installation of this package is a must.
We engineers do have a lot of PDF reading.
Yet Another FTP Client. A very nice one. This is a sort of mixture
between a plain
FTP client and a SSH client, with all the advantages that this
adaptive resonance theory for unsupervised learning
This software package includes the ART algorithms for unsupervised learning only. It is a family of four programs based on different ART algorithms (ART 1, ART 2A, ART 2A-C and ART distance). All of them are clustering algorithms and they are command-line programs. Written in C++.
Audio Desktop Reference Implementation and Networking Environment.
It is an implementation of an easy-to-use desktop system, which can be used entirely without vision oriented output devices. Especially access to standard internet services like email, www, chat, and using mobile phone extension services like SMS and MMS (over the users own mobile phone via bluetooth) are supported.
Software package providing a series of algorithms for statistical relational learning and probabilistic logic inference, based on the Markov logic representation. Alchemy allows you to easily develop a wide range of AI applications. Coded in C++.
Multi-platform machine learning framework aimed at simplicity and
performance, and library of selected state-of-the-art algorithms.
Aleph is coded in the Java programming language.
Library for Approximate Nearest Neighbor Searching.
ANN is a library written in C++, which supports data structures and algorithms for both exact and approximate nearest neighbor searching in arbitrarily high dimensions.
The Artificial Neural Network Architecture. It is a Back propagation neural network C++ class developed thinking in a good matching class to the FLTK library.
Library aimed at delivering scalable machine learning tools under the Apache license.
ANother Tool for Language Recognition.
Language tool that provides a framework for constructing recognizers, interpreters, compilers, and translators from grammatical descriptions containing actions in a variety of target languages. ANTLR provides excellent support for tree construction, tree walking, translation, error recovery, and error reporting.
DAW. Used to record, edit and mix multi-track audio. Ardour strives
to meet the needs of professional users.
An open source electronics prototyping platform for developing projects
with an ATMEL microcontroller. Ideal for small university projects.
Linear algebra library (matrix and vector maths) aiming towards a good balance between speed and ease of use. It's distributed under a license that is useful in both commercial and open-source contexts.
This library is useful if C++ has been decided as the language of choice (due to speed and/or integration capabilities).
Sound editor. A free, open source software for recording and editing
sounds. It's quite complete to my taste. Also extendable through
LADSPA plugins in order to obtain a bigger collection of sound effects.
Tool for the automatic analysis of Standard American English prosody. AuToBI is a java toolkit that hypothesizes pitch accents and phrase boundaries. The toolkit includes an acoustic feature extraction frontend, and a classification backend that is heavily supported by the weka machine learning toolkit.
Bison is a general-purpose parser generator that converts an annotated context-free grammar into an LALR(1) or GLR parser for that grammar.
Bison is upward compatible with Yacc: all properly-written Yacc grammars ought to work with Bison with no change. Anyone familiar with Yacc should be able to use Bison with little trouble. You need to be fluent in C or C++ programming in order to use Bison.
Simulator for spiking neural networks written in Python.
PoS Tagger. Uses Transform-Based Learning. Implemented in C.
chestnut machine learning suite
Collection of machine learning algorithms written in Python with some code written in C for efficiency. Most algorithms are called with a simple, functional API with input data encoded as arrays.
Bayesian Classifiers for Java. This project contains two bayesian classifiers for Java: a Naive implementation and a Fishers implementation. It's merely a port from Toby Segaran's python code for Bayesian analysis from his book "Programming Collective Intelligence."
The only requirement for this library is javolution.
It's licensed under the Artistic License.
Small, Fast and Free Text-To-Speech Engine.
Computational Intelligence Library written in Java.
It is a collaborative component
based framework for developing Computational Intelligence software in
swarm intelligence, evolutionary computing, neural networks,
artificial immune systems, fuzzy logic and robotics. Developed at the
University of Pretoria.
A Java-based circuit simulator. A great way to simulate simple circuits using a plain Java enabled browser.
C++ Library for Audio and Music. CLAM is a full-fledged software framework for research and application development in the Audio and Music Domain. It offers a conceptual model as well as tools for the analysis, synthesis and processing of audio signals. It also provides a Faust integration.
Class Library for Numbers. CLN is a C++ library for efficient computations
with all kinds of numbers in arbitrary precision.
Modular Java software library for the research and development of cognitive systems. It contains many reusable components for machine learning, statistics, and cognitive modeling. It is primarily designed to be easy to plug into applications to provide adaptive behaviors.
corpus building for minority languages
A web crawling software.
It exploits the vast quantities of text freely available on the web as a way of bringing the benefits of statistical NLP to languages with small numbers of speakers and/or limited computational resources.
Matlab Software for
Disciplined Convex Programming.
Matlab-based modeling system for convex optimization. CVX turns Matlab into a modeling language, allowing constraints and objectives to be specified using standard Matlab expression syntax.
databases for machine learning experiments
An experiment database is a database designed to store learning experiments in full detail, aimed at providing a convenient platform for the study of learning algorithms.
Community effort to extract structured information from Wikipedia and to make this information available on the Web. DBpedia allows you to make sophisticated queries against Wikipedia, and to link other data sets on the Web to Wikipedia data.
Open source extensible data mining platform which provides common architecture for data processing algorithms of various types. The algorithms can be combined together to build data processing networks of large complexity. The unique feature of Debellor is data streaming, which enables efficient processing of large volumes of data. Written in Java.
Implements the DeltaLDA model, which is a modification of the Latent Dirichlet Allocation (LDA) model. DeltaLDA can use multiple topic mixing weight priors to jointly model multiple corpora with a shared set of topics. The inference method is Collapsed Gibbs sampling. The program can also be used to do "standard" LDA as a special case, and is implemented as a Python C extension module.
A modern C++ library with a focus on portability and program correctness. It strives to be easy to use right and hard to use wrong. Thus, it comes with extensive documentation and thorough debugging modes.
It contemplates threading, networking, GUIs, numerical algorithms,
ML algorithms, image processing, data compression, integrity algorithms and
Tool for supervised Machine Learning in OWL and Description Logics.
The goal of DL-Learner is to provide a DL/OWL based machine learning tool to solve supervised learnings tasks and support knowledge engineers in constructing knowledge and learning about the data they created.
C++ library for distributed probabilistic inference and learning in large-scale dynamical systems. It provides methods such as the Kalman, unscented Kalman and particle filters and smoothers, as well as useful classes such as common probability distributions and stochastic processes.
Object-oriented C++ library that implements various machine learning models, including energy-based learning, gradient-based learning for machine composed of multiple heterogeneous modules. In particular, the library provides a complete set of tools for building, training, and running convolutional networks.
Efficient Java Matrix Library (EJML) is a linear algebra library for manipulating dense matrices. Its design goals are; 1) to be as computationally and memory efficient as possible for both small and large matrices, and 2) to be accessible to both novices and experts.
Efficient Learning, Large-scale Inference, and Optimization Toolkit.
An open source library for machine learning licensed under the Mozilla Public License. Written in Python.
C++ physics simulation software to simulate static magnetic fields and movement of charged particles in those fields (using the Lorentz force). Coulomb forces are also accounted when simulating particle paths. So Ephi allows you to model and visualize magnetic fields through current elements and also to visualize electron paths within those fields. Magnetic fields are calculated using numeric integration over the Biot-Savart law.
Language independent rule-driven Text-to-Speech (TTS) system primarily designed to serve as a research tool. Epos is (or tries to be) independent of the language processed, linguistic description method, and computing environment.
TTS. Compact open source software speech synthesizer for English and other languages, for Linux and Windows.
ompact open source software speech synthesizer for English and other languages, for Linux and Windows.
Extended Java WordNet Library is a Java API for creating, reading and updating dictionaries in WordNet format. extJWNL is an upgraded version of JWNL.
Fast Artificial Neural Network Library. Implements multilayer artificial
neural networks in C with support for both fully connected and sparsely
connected networks. Cross-platform execution in both fixed and floating
point are supported. It includes a framework for easy handling of training
data sets. It is easy to use, versatile, well documented, and fast, with
many bindings to different languages.
Functional AUdio STream.
A compiled language for real-time audio signal processing.
Its programming model combines two approaches : functional programming and block diagram composition. You can think of FAUST as a structured block diagram language with a textual syntax.
TTS. Speech synthesis system. Developed at the Centre for Speech Technology
Research at the University of Edinburgh and written in C++, Festival stands
for one of the most important free software speech synthesis systems
nowadays. It is related to the Festvox project.
HOWTO: Make festival TTS use better voices (MBROLA / CMU / HTS)
Aims to make the building of new synthetic voices more systematic
and better documented, making it possible for anyone to build a new voice
Developed by the Carnegie Mellon University's speech group.
Library for fast computation of Gauss transforms in multiple dimensions, using the Improved Fast Gauss Transform and Approximate Nearest Neighbor searching.
flanagan java scientific library
Java scientific and numerical library to support both research and undergraduate programming courses and projects.
Flite (festival-lite) is a small, fast run-time synthesis engine developed at CMU and primarily designed for small embedded machines and/or large servers. Flite is designed as an alternative synthesis engine to Festival for voices built using the FestVox suite of voice building tools.
Data flow oriented development environment. It can be used to build complex applications by combining small, reusable building blocks. In some ways, it is similar to both Simulink and LabView, but is hardly a clone of either.
Quite fast, written in C++ features a plugin mechanism that allows plugins/toolboxes to be easiliy added.
Fully Modular Synthesizer. Tool to generate all kinds of sounds.
Language written in C++ dedicated to the
finite element method. It enables solving
Partial Differential Equations (PDE) easily.
VHDL simulator. Used by Qucs for digital simulation.
Open Source Suite of Language Analyzers developed in C++.
Includes Larger Spanish dictionary,
Debugged English dictionary,
More WN-based semantic information access,
More expressive rule language for dependency parsing,
Machine Learning functionalites moved to external omlet+fries library, for clearer organization,
Suport for 64-bit processors and
Extended Java API.
Free environment for rapid engineering and scientific prototyping and
data processing. Similar to Matlab.
RTOS. Portable open source mini Real Time Kernel for applications that
are critical with time. The project implements lots of ports to
multiple processor architectures.
A speech synthesizer written entirely in the Java programming language.
FreeTTS is a speech synthesis system written entirely in the Java programming language. It is based upon Flite: a small run-time speech synthesis engine developed at Carnegie Mellon University. Flite is derived from the Festival Speech Synthesis System from the University of Edinburgh and the FestVox project from Carnegie Mellon University.
A calculating tool and programming language that tracks units of measure
through all calculations, being adequate for physical calculations. This
tool was named after a Simpson's character: brilliant professor
Set of C++ genetic algorithm objects. The library includes tools for using genetic algorithms to do optimization in any C++ program using any representation and genetic operators.
gaussian process resources
Resources concerned with probabilistic modeling, inference and learning based on Gaussian processes.
Literature, software and more.
GPL'd suite of Electronic Design Automation tools. Another application
I would have liked to know when doing electronic designs. It includes
schematic capture, simulation, prototyping and production. A true
alternative to commercial proprietary software like Orcad.
Complete VHDL simulator using the GCC technology. Its results are
thrown into a text file which are then visually interpreted with the
GTKWave program. Anyway, Altera already offers a free binary distribution of
its IDE for working with its FPGAs.
General Hidden Markov Model library.
C library implementing efficient data structures and algorithms for basic and extended HMMs.
Coded at the Max Planck Institute for Molecular Genetics.
Graphical Interface for Neural Networks.
A decision-making platform written in Java. It has been developped to favorize the developpement and use of neural networks.
Neural network classical models are already available (Multi-layer perceptron, Kohonen self-organizing maps, neural gas, growing neural gas, etc.).
Coded in Java.
Pseudo-random number generator for the purpose of simulations,
Monte-Carlo integration, computer games and the like.
Simple and fast Genetic Programming toolbox written in Java.
gp music composition
Genetic Programming techniques to allow computers to compose music.
Genetic Programming is an Artificial Intelligence technique that evolves "fit" individual programs from an initially random population of programs. In the case of music, fitness can be defined as how pleasing it is to listen to a particular sequence.
GNU Scientific Library (GSL) is a numerical library for C and C++
programmers. The library provides a wide range of mathematical routines
such as random number generators, special functions and least-squares
fitting. There are over 1000 functions in total with an extensive test
A library for constructing graphs of media-handling components.
Multimedia framework written in the C. GStreamer serves a host of multimedia applications, such as video editors, streaming media broadcasters, and media players.
A waveform viewer for interpreting the results dumped by ghdl.
HTK Application Programming Interface.
Java HMM toolkit implemented at the Arizona State University, aimed at
building a gesture recognition system.
Genuine random numbers, generated by radioactive decay.
An Internet resource that brings genuine random numbers, generated by a process fundamentally governed by the inherent uncertainty in the quantum mechanical laws of nature, directly to your computer in a variety of forms. HotBits are generated by timing successive pairs of radioactive decays detected by a Geiger-MÃ¼ller tube interfaced to a computer. Includes Java
code to query the server.
Hidden Markov Model Toolkit. Excellent toolkit for HMM-based speech
recognition applications, among many others. Written in C by
the Cambridge University Engineering Department, it has been
adopted by many universities for research projects.
TTS. HMM-based Speech Synthesis System. The speech synthesis system developed
at the Nagoya Institute of Technology. Makes use of HTK. The produced
voices can be used with Festival.
Advanced drum machine for GNU/Linux. It's main goal is to bring
professional yet simple and intuitive pattern-based drum programming.
Public domain, Java-based image processing program developed at the National Institutes of Health.
ImageJ was designed with an open architecture that provides extensibility via Java plugins and recordable macros.
ASR developed at the Mississippi State University. Written in C++ and
aimed at research activities.
C++ library of mathematical, signal processing, speech processing
and communications classes and functions.
Its main use is in simulation of communication systems and for
performing research in the area of communications.
Developed at the Chalmers University of Technology.
Low-latency audio server, written for POSIX conformant operating systems
such as GNU/Linux. It can connect a number of different applications to
an audio device, as well as allowing them to share audio between
themselves. JACK is an essential tool for audio plumbing.
Java HMM library written with code readibility in mind. Designed to be easy
to use and general purpose.
JACK Audio Connection Kit (JACK) Audio Mastering interface. JAMin is an open source application designed to perform professional audio mastering of stereo input streams. It uses LADSPA for digital signal processing (DSP).
Java Compiler Compiler. The most popular parser generator for use with Java
A parser generator is a tool that reads a grammar specification and converts it to a Java program that can recognize matches to the grammar.
Java Machine Learning Library. Library of ML algorithms and related
datasets. Machine learning techniques include: clustering, classification,
feature selection, regression, data pre-processing, ensemble learning,
Java WordNet API library (hence the "Jaw" portion of the name - an acronym for Java API for WordNet). It makes it very easy to search the Wordnet data files for terms, either all terms or just those terms matching some search criteria.
Java API for WordNet Searching.
API that provides Java applications with the ability to retrieve data from the WordNet database. It is a simple and fast API that is compatible with both the 2.1 and 3.0 versions of the WordNet database files and can be used with Java 1.4 and later.
Fast linear algebra library for Java. jblas is based on BLAS and LAPACK, the de-facto industry standard for matrix computations, and uses state-of-the-art implementations like ATLAS for all its computational routines, making jBLAS very fast.
Software system for Evolutionary Computation (EC) research, developed in the Java programming language. It provides a high-level software environment to do any kind of Evolutionary Algorithm (EA), with support for genetic algorithms (binary, integer and real encoding), genetic programming (Koza style, strongly typed, and grammar based) and evolutionary programming.
ava framework for building Semantic Web applications. It provides a programmatic environment for RDF, RDFS and OWL, SPARQL and includes a rule-based inference engine.
Rule engine and scripting environment written entirely in Java. Using Jess, you can build Java software that has the capacity to "reason" using knowledge you supply in the form of declarative rules. Jess is small, light, and one of the fastest rule engines available.
Java-based toolkit and library that supports and demonstrates the use of n-gram graphs within Natural Language Processing applications, ranging from summarization and summary evaluation to text classi?cation and indexing.
A scientific open-source programming environment coded in Java.
A Java Clone of Octave, SciLab, Freemat and Matlab.
Java scenegraph API. Its primary focus is high-performance 3D gaming. jME itself is written entirely in Java and uses an abstraction layer for communicating natively with the platform's hardware.
Java Implementation of Naive Credal Classifier 2.
NCC2 constitutes an extension of the traditional Naive Bayes Classifier (NBC) towards imprecise probabilities; it is designed to return robust classification, even on small and/or incomplete data sets. A peculiar feature of NCC2 is that it returns set-valued (or imprecise) classifications (i.e., more than one class) when faced with doubtful instances.
Java Object Oriented Neural Engine. A framework to create, train
and test artificial neural networks.
PRObabilistic GRAphical Models in Java. Open-source Java library which can be used for learning the following probabilistic models from data: Bayesian networks, Markov random fields, hybrid random fields, probabilistic decision trees, dependency networks, Gaussian mixture models, and Parzen windows.
High-performance, two-pass large vocabulary continuous speech recognition (LVCSR) decoder software for speech-related researchers and developers.
Based on word N-gram and context-dependent HMM, it can perform almost real-time decoding on most current PCs in 60k word dictation task.
It is written in C and the main platform is Linux.
Java WordNet Library. API for accessing WordNet-style relational dictionaries. It also provides functionality beyond data access, such as relationship discovery and morphological processing.
Pure Java standalone object-oriented interface to the WordNet database of lexical relationships. It is intended for Java programmers who wish to write portable Java applications that use a local copy of the WordNet files, or who find JWordNet's object-oriented interface preferable to the procedural interface that the C library (and native method interfaces built on top of it) provide.
Knowledge Extraction based on Evolutionary Learning.
Spanish National Project providing a
Software tool to assess evolutionary algorithms for Data Mining problems including regression, classification, clustering, pattern mining and so on. It contains a big collection of classical knowledge extraction algorithms, preprocessing techniques (instance selection, feature selection, discretization, imputation methods for missing values, etc.), Computational Intelligence based learning algorithms, including evolutionary rule learning algorithms based on different approaches (Pittsburgh, Michigan and IRL, ...), and hybrid models such as genetic fuzzy systems, evolutionary neural networks, etc.
Kernel-based machine learning methods for classification, regression, clustering, novelty detection, quantile regression and dimensionality reduction. Among other methods kernlab includes Support Vector Machines, Spectral Clustering, Kernel PCA and a QP solver.
Linux Audio Developer's Simple Plugin API. It is a standard that allows
software audio processors and effects to be plugged into a wide range of
audio synthesis and recording packages.
High quality MPEG Audio Layer III (MP3) encoder licensed under the LGPL.
A document preparation system. It is a high quality typesetting system.
It is oriented to scientific and technical productions, although
I had read that faculties of letters began to use for the spendid
quality that it yields. Prestigious scientific magazines, such
as the IEEE, hand over
the layouts required for publications.
Apart from articles and books, LaTeX can prepare presentations, calendars, drawings... it's really powerful. The Debian packages for having the application ready are: tetex-base, tetex-bin, tetex-doc and tetex-extra.
Library that supports all kinds of Neural Nets, including ARTs and more.
Currently it has a Multi-Layer Perceptron network, Kohonen network, a
Boltzmann machine and a Hopfield network.
The library is written objectively using the C++ Standard Template Library.
Library implementing OCAS solver
for training linear SVM classifiers from large-scale data. Coded in C.
A C++ and Java Library for Support Vector Machines.
LIBSVM is an integrated software for support vector classification,
regression and distribution estimation.
It supports multi-class classification.
Smith charting program for GNU/Linux, mainly designed for educational use.
Locally Weighted Projection Regression (LWPR) is a recent algorithm that achieves nonlinear function approximation in high dimensional spaces with redundant and irrelevant input dimensions. At its core, it uses locally linear models, spanned by a small number of univariate regressions in selected directions in input space. A locally weighted variant of Partial Least Squares (PLS) is employed for doing the dimensionality reduction. A C-library with wrappers for C++, Matlab/Octave, and Python.
MAchine Learning for LanguagE Toolkit.
Java-based package for statistical natural language processing, document classification, clustering, topic modeling, information extraction, and other machine learning applications to text.
Extensible image processing framework developed in Java. It is open source and distributed under GPL License.
Algorithms to manipulate images are externally implemented as plug-ins. The framework provides an interface to manipulate plug-ins.
TTS. Speech synthesis system for German, English and Tibetan. Produced by
the Institute of Phonetics at Saarland University using the Java
Graphics Library written in C++.
PoS Tagger. Mature Java package for training and using maximum entropy models.
CAS. System for the manipulation of symbolic and numerical expressions,
including differentiation, integration, Taylor series, Laplace transforms,
ordinary differential equations, systems of linear equations, polynomials,
and sets, lists, vectors, matrices, and tensors. A direct competitor of
Modular toolkit for Data Processing. Data processing framework written in Python.
MDP consists of a collection of trainable supervised and unsupervised algorithms or other data processing units (nodes) that can be combined into data processing flows and more complex feed-forward network architectures.
MIT Electromagnetic Equation Propagation. FDTD simulator for modeling
Comprehensive scalable machine learning library.
Developed by the Fundamental Algorithmic and Statistical Tools laboratory (FASTlab), MLPACK and its core functions library FASTlib are the much needed filling of an existing void.
High-performance Python/NumPy based package for machine learning.
Includes classification, feature weighting, feature ranking, resampling
methods, metric functions, feature list analysis and landscaping tools.
Live CD Linux distribution with a rich collection of Natural Language Processing (NLP) applications.
Open-source Java library for learning from multi-label datasets. Multi-label datasets consist of training examples of a target function that has multiple binary target variables. This means that each item of a multi-label dataset can be a member of multiple categories or annotated by many labels (classes).
Multilingual lexical database in which the Italian WordNet is strictly aligned with Princeton WordNet 1.6.
Java PoS Tagger based on the model of maximum entropy.
Network Audio System.
Network transparent, client/server audio transport system. It can be described as the audio equivalent of an X server.
Network Learning toolkit for statistical relational learning. It is written in Java 1.5 and was designed with a plug-and-play architecture to enable the mix-and-match between different components in the relational learning process. It integrates seamlessly with the Weka machine learning toolkit, making it possible to use any of Weka's learning classifiers in the context of relational learning.
A set of C++ library classes for neural networks development.
The main goal of the library consists in supporting researchers and
practitioners in developing new neural network methods and applications,
exploiting the potentialities of object-oriented design and programming.
NEURObjects provides also general purpose applications for classification
problems and can be used for fast prototyping of inductive machine
Java library for language recognition. It uses language profiles (counts of character sequences) to guess what language some arbitrary text is.
nico ann toolkit
General purpose toolkit for constructing artificial neural networks
and training with the back-propagation learning algorithm.
It is written in C and originally
developed for speech recognition applications.
Machine learning library for large-scale classification, regression and ranking. It relies on the framework of energy-based models which unifies several learning algorithms. This framework also unifies batch and stochastic learning which are both seen as energy minimization problems. Nieme is released under the GPL license. It is efficiently implemented in C++.
nist math resources
Mathematical and statistical engineering resources from the National
Institute of Standards and Technology.
Natural Language Toolkit.
Suite of open source Python modules, data and documentation for research and development in natural language processing. NLTK contains Code supporting dozens of NLP tasks, along with 40 popular Corpora and extensive Documentation including a 375-page online Book.
A command line program intended for numerical computations. Its high level
language is mostly compatible with Matlab, which is a feature to consider
when having to hand in determined practice papers at the school.
Octave is available at the servers of La Salle.
This application has a lot of community support under the Octave-Forge
Introduccion Informal a Matlab y Octave
Common platform to build and share artificial intelligence programs. The long-term goal of OpenCog is acceleration of the development of beneficial AGI, a goal which includes developing tools and protocols for AGI safety.
The OpenCog Framework which provides an OS-like infrastructure and stable APIs. Written in C++.
Open source version of the Cyc technology, the world's largest and most complete general knowledge base and commonsense reasoning engine. OpenCyc can be used as the basis of a wide variety of intelligent applications such
as rapid development of an ontology in a vertical area,
email prioritizing, routing, summarization, and annotating,
expert systems and games.
Hidden Topic Markov Model. Application to model the topics of words
in a document as a Markov chain. Written in C++.
open mind speech
Free speech recognition for GNU/Linux. Tools and applications.
Organizational center for open source projects related to natural language processing. Its primary role is to encourage and facilitate the collaboration of researchers and developers on such projects.
Free, open source scriptable screen reader. Using various combinations of speech, braille, and magnification, Orca helps provide access to applications and toolkits that support the AT-SPI (e.g., the GNOME desktop). The development of Orca has been led by the Accessibility Program Office of Sun Microsystems, Inc. with contributions from many community members.
Python library and command line application for learning the structure of a Bayesian network given prior knowledge and observations. Pebl has been developed at the Systems Biology lab at the University of Michigan and is available with a permissive MIT-style license.
Portable, Extensible Toolkit for Scientific Computation.
Suite of data structures and routines for the scalable (parallel)
solution of scientific applications modeled by partial differential
equations. It implements bindings for Python.
Physics Education Technology. Interactive Physics Simulator. Fun, interactive, research-based simulator of physical phenomena from the Physics Education Technology project at the University of Colorado.
Written in Java.
Python Hidden Markov Models. Implemented at the University of Bologna.
Connected speech recognition system.
Phoenix is a speaker dependent (user trained) connected word recognition
system. Phoenix is designed as a real-time recognition system in that
recogniton takes place in parallel to utterance input and partial
results are available before the end of utterance is encountered.
Graphic IDE for the developement of PIC-based applications.
Developed in C++ under Linux and based on the KDE environment.
PiKdev can drive parallel port programmers or serial port programmers.
The project's page provides the needed schematics for cheap
IDE for applications based on Microchip PIC and dsPIC microcontrollers
similar to the MPLAB environment. It integrates with several compiler
and assembler toolchains (like gputils, sdcc, c18) and with the GPSim
simulator. It supports the most common programmers (serial, parallel,
ICD2, Pickit2, PicStart+), the ICD2 debugger, and several bootloaders
(Tiny, Pickit2, and Picdem).
Arduino-like board based on a PIC Microcontroller. The goal of this project is to build an integrated IDE easy to use on LINUX, WINDOWS and MAC OS X.
The IDE of Pinguino is built with Python. An integrated preprocessor translates specific Arduino instructions directly into C. This preprocessor reduces the code length and the execution speed. Pinguino hardware is based on a 18F2550. This chip has an integrated native USB module and an UART for serial link.
Program for statistical analysis of sampled data. It is a Free replacement for the proprietary program SPSS, and appears very similar to it with a few exceptions.
PSPP can perform descriptive statistics, T-tests, linear regression and non-parametric tests. Its backend is designed to perform its analyses as fast as possible, regardless of the size of the input data. You can use PSPP with its graphical interface or the more traditional syntax commands.
Sound server written in C. It allows you to do advanced operations on your sound data as it passes between your application and your hardware. Things like transferring the audio to a different machine, changing the sample format or channel count and mixing several sounds into one are easily achieved using a sound server.
According to Paul Davis, main developer of Ardour and Jack, PulseAudio is
One Audio System To Bind Them All, adapting the famous quote from The
Lord of the Rings.
Real-time graphical programming environment for audio, video, and graphical processing.
A very complete and complex application.
Python-Based Reinforcement Learning, Artificial Intelligence and Neural Network Library.
PyBrain is a modular Machine Learning Library for Python. It's goal is to offer flexible, easy-to-use yet still powerful algorithms for Machine Learning Tasks and a variety of predefined environments to test and compare your algorithms.
Python Robotics. The goal of the project is to provide a programming environment for easily exploring advanced topics in artificial intelligence and robotics without having to worry about the low-level details of the underlying hardware.
python numeric and scientific
A rich set of numerical tools for scientific computations with the
python programming language.
A graphical tool for designing finite state machines. Written in C++
Probabilistic parts-of-speech tagger. That means it's a program that reads text and for each token in the text returns the part-of-speech (eg noun, verb, punctuation, etc). It works using statistical methods, hence the `probabilistic'. As a result it does make mistakes (as does every POS tagger), but it is fairly robust and (from informal evaluation) tags texts with good accuracy.
Quite Universal Circuit Simulator. This is one of the tools I would have
liked to know when I took the Bachelor's degree in Electronic
Engineering. It is a circuit simulator with GUI that supports various
kinds of simulations including DC, AC, S-parameter, Harmonic Balance
Environment for machine learning and data mining experiments. It allows experiments to be made up of a large number of arbitrarily nestable operators, described in XML files which are created with RapidMiner's graphical user interface. RapidMiner is used for both research and real-world data mining tasks. Written in Java.
Interactive interpreted scientific programming environment. Rlab is a very high level language intended to provide fast prototyping and program development, as well as easy data-visualization, and processing.
It focuses on creating a good experimental environment (or laboratory) in which to do matrix math, for what it can be called "Matlab-like".
A set of common guidelines for the reinforcement learning community to follow to allow us to share and compare agents and environments with greater ease.
The software implementation of RL-Glue is the reusable glue to connect the basic parts of an experiment.
RL-Glue is functionally a harness to "plug in" agents and environments and experiment without having to continually rewrite the connecting code.
Free software environment for statistical computing and graphics that
is used frequently for Machine Learning applications and research.
A most complete RTOS for multiprocessor systems.
Open source platform for numerical computation developed at INRIA, the
French national institute for research in informatics and automatics.
It has a command line console and a dynamic systems simulator.
Its inferface resembles Matlab, used par excellence in La
Salle. There are though
some universities that have
switched to Scilab because of its good performance
Scilab is in constant development, for which I prefer downloading the
tarball from the Internet instead of dealing with the non-free Debian repos.
A 70,000-node terminology taxonomy, as a framework into which additional knowledge can be placed. SENSUS is an extension and reorganization of WordNet (built at Princeton University).
At the top level, nodes from the Penman Upper Model have been added, and the major branches of WordNet have been rearranged to fit. In addition, nodes based on work with other ontologies have also been added.
Free multimedia C++ API that provides you low and high level access to graphics, input, audio, etc.
Toolkit for declarative programming, image processing and computer vision.
ShapeLogic is a library for declarative programming and lazy computations in Java,
image processing and computer vision and particle analyzer for medical image processing.
Modular C++ library for the design and optimization of adaptive systems. It provides methods for linear and nonlinear optimization, in particular evolutionary and gradient-based algorithms, kernel-based learning algorithms and neural networks, and various other machine learning techniques.
Machine Learning toolbox focused on large scale kernel methods and
especially on Support Vector Machines. Written in C++ it interfaces
Matlab, R, Octave and Python.
Java 3d robot simulator for scientific and educationnal purposes. It is mainly dedicated to researchers/programmers who want a simple basis for studying Situated Artificial Intelligence, Machine Learning, and more generally AI algorithms, in the context of Autonomous Robotics and Autonomous Agents.
snack sound toolkit
Designed to be used with a scripting language such as Tcl/Tk or Python. Using Snack you can create powerful multi-platform audio applications with just a few lines of code. Snack has commands for basic sound handling, such as playback, recording, file and socket I/O. Snack also provides primitives for sound visualization, e.g. waveforms and spectrograms. It was developed mainly to handle digital recordings of speech, but is just as useful for general audio. Snack has also successfully been applied to other one-dimensional signals.
Sparse Network of Winnows learning architecture. Multi-class classifier that is specifically tailored for large scale learning tasks and fpr domains in which the potential number of features taking part in decisions is very large, but may be unknown a priori. It learns a sparse network of linear functions in which the targets concepts (class labels) are represented as linear functions over a common feature space.
Small string processing language designed for creating stemming algorithms for use in Information Retrieval.
Screen review package for the Linux operating system.
Speakup allows you to interact with applications and the GNU/Linux operating system with audible feedback from the console using a synthetic speech device.
Device independent layer for speech synthesis, developed with the goal of making the usage of speech synthesis easier for application programmers. It takes care of most of the tasks necessary to solve in speech-enabled applications.
The architecture is based on a proven client/server model. The basic means of client communication is through a TCP connection using the Speech Synthesis Independent Protocol (SSIP), or through an interface library.
ASR. Speech recognition application and set of tools for speech
recognition developed at Carnegie Mellon University. Originally it
was implemented in C, but its latest release, Sphinx4, has been
programmed in Java.
Free speech signal processing toolkit which provides runtime commands implementing standard feature extraction algorithms for speech related applications and a C library to implement new algorithms and to use SPro files within your own programs.
The Speech Signal Processing Toolkit (SPTK) is a suite of speech signal processing tools for UNIX environments, e.g., LPC analysis, PARCOR analysis, LSP analysis, PARCOR synthesis filter, LSP synthesis filter, vector quantization techniques, and other extended versions of them.
Stochastic Simulation in Java. It provides facilities for generating uniform and nonuniform random variates, computing different measures related to probability distributions, performing goodness-of-fit tests, applying quasi-Monte Carlo methods, collecting (elementary) statistics, and programming discrete-event simulations with both events and processes.
stanford log-linear part-of-speech tagger
A Part-Of-Speech Tagger (POS Tagger) is a piece of software that reads text in some language and assigns parts of speech to each word (and other token), such as noun, verb, adjective, etc., although generally computational applications use more fine-grained POS tags like 'noun-plural'. This software is a Java implementation of the log-linear part-of-speech tagger.
Standard Template Library. A C++ library of container classes, algorithms,
and iterators. It provides many of the basic algorithms and data
structures used in computer science.
Implementation of Support Vector Machines (SVMs) in C
for the problem of pattern recognition, for the problem of regression, and for the problem of learning a ranking function.
The algorithm has scalable memory requirements and can handle problems with many thousands of support vectors efficiently.
A knowledge-based software project to create artificial intelligence.
The first approach is to construct an English dialog system, to then
let it acquire linguistic and common sense skills for representing
its own beahvior in the knowledge base. Implemented in Java.
Java implementation, with an easy to use API and full unit-test coverage, of some techniques to perform Text Language Detection, Keywords and keyphrases extraction, Text Classification, Text Clustering, Document Summarization (single or multiple documents) and Plagiarism Detection.
Video compression. Theora is a free and open video compression format from the Xiph.org Foundation.
Theora scales from postage stamp to HD resolution, and is considered particularly competitive at low bitrates. It is in the same class as MPEG-4/DiVX, and like the Vorbis audio codec it has lots of room for improvement as encoder technology develops.
Trigrams'n'Tags. ery efficient statistical part-of-speech tagger that is trainable on different languages and virtually any tagset. The component for parameter generation trains on tagged corpora. The system incorporates several methods of smoothing and of handling unknown words.
Written in C.
Matlab-like environment for state-of-the-art machine learning algorithms.
PoS Tagger. Language independent part-of-speech tagger.
TreeTagger is a tool for annotating text with part-of-speech and lemma information.
It has been successfully used to tag German, English, French, Italian, Dutch, Spanish, Bulgarian, Russian, Greek, Portuguese, Chinese and old French texts and is easily adaptable to other languages if a lexicon and a manually tagged training corpus are available.
Full text engine, fully integrated into PostgreSQL RDBMS.
Universal Java Matrix Package. The Universal Java Matrix Package (UJMP) is an open source Java library that provides sparse and dense matrix classes, as well as a large number of calculations for linear algebra like matrix multiplication or matrix inverse. Operations such as mean, correlation, standard deviation, replacement of missing values or the calculation of mutual information are supported also.
Support Vector Machine with Large Scale CCCP Functionality.
The UniverSVM is a SVM implementation written in C/C++. Its functionality comprises large scale transduction via CCCP optimization, sparse solutions via CCCP optimization and data-dependent regularization with a Universum.
Collection of transcribed speech for use with Free and Open Source Speech Recognition Engines (on Linux, Windows and Mac).
Desktop control. Written in Python, Voximp is an application, with which programs can be spawned and key/mouse presses simulated, all from just speaking a few words.
Collection of C++ classes and tools for researchers in machine learning, AI, data mining, pattern recognition, and related fields.
Java Framework for Evolutionary Computation.
Extensible, high-performance, object-oriented framework for implementing platform-independent evolutionary algorithms (EAs) in Java. The framework provides type-safe, non-invasive evolution for arbitrary representations.
Watchmaker project's home
Sound visualization and manipulation tool. WaveSurfer has a simple and logical user interface that provides functionality in an intuitive way and which can be adapted to different tasks. It can be used as a stand-alone tool for a wide range of tasks in speech research and education. Typical applications are speech/sound analysis and sound annotation/transcription.
Weka is a collection of machine learning algorithms for data mining tasks. The algorithms can either be applied directly to a dataset or called from your own Java code. Weka contains tools for data pre-processing, classification, regression, clustering, association rules, and visualization. It is also well-suited for developing new machine learning schemes.
Data Sniffer. Network protocol analyzer used in the universitiy's labs.
It's a mature project, useful and complete.
Large lexical database of English.
Nouns, verbs, adjectives and adverbs are grouped into sets of cognitive synonyms (synsets), each expressing a distinct concept. Synsets are interlinked by means of conceptual-semantic and lexical relations.
The Web Service Modeling Toolkit (WSMT) is a collection of tools for Semantic Web Services intended for use with the Web Service Modeling Ontology (WSMO), The Web Service Modeling Language (WSML) and the Web Service Execution Environment (WSMX).
Yet Another Compiler Compiler.
Parser generator developed by Stephen C. Johnson at AT&T for the Unix operating system.
It generates a parser (the part of a compiler that tries to make syntactic sense of the source code) based on an analytic grammar written in a notation similar to BNF. Yacc generates the code for the parser in the C programming language.
Yet Another Screen Reader. General-purpose console screen reader for GNU/Linux and other Unix-like operating systems.
Interpreted programming language, designed for postprocessing or steering large scientific simulation codes.
The language features a compact syntax for many common array operations, so it processes large arrays of numbers very efficiently.