Blog
Post 77Lean Startup hackers were already there back in the early eighties
02-May-2013
The fancy "lean" adjective that accompanies every rocking tech business issue nowadays is an already old story. I found it out the other day while skimming through Steven Levy's groundbreaking book "Hackers: Heroes of the Computer Revolution".

Many hackers of the Homebrew Computer Club (HCC) followed these lean principles as a means to avoid building things that no one really wanted or needed:
- Bob Marsh, following Ed Roberts from MITS company, would announce his product first, and then collect the money required to design and manufacture the product.
- Lee Felsenstein would incorporate the user in the design of the product.
- Steve Wozniak would sit in the back of the auditorium of the HCC, where the electrical outlet was, getting suggestions for improvements and incorporating those improvements into the Apple II design.
Considering that all these business approaches were raised during the recessionary period of the early eighties, and that they served the American economy very well, perhaps they should still be regarded of utmost importance nowadays.
Post 76NLP-Tools broadens its capabilities with a RESTful API service
18-Apr-2013
In the software tool development business, the API is the new language of the developers, i.e., the customers. In this regard, nlpTools keeps pace with the evolution of the industry market and introduces its RESTful API service to facilitate its integration. And in that quest for added-value and kaizen it partners with Mashape to handle the commercialisation issues. The original website still maintains the evaluation service, but further performance features now need to be routed through the Mashape nlpTools endpoint.

In the dark jungle of validated learning through product iteration nlpTools relies on the five keys that make a great API:
- Provide a valuable service: the amount of emails asking for the API indicates that there is at least a deal of demand, it is therefore sensible to think that this is an added-value service product.
- Have a plan and a business model: this a tool for the software development market offering a paid service to scale up to customer needs.
- Make it simple and flexible: a domain-specific service is represented by a single identity parameter, which can be tuned to customer needs just by offering a differentiated service.
- It should be managed and measured: Mashape keeps track of these measurement aspects and provides the necessary information to make informed decisions.
- Provide great developer support: we are all working to deliver a wonderful experience to our customers, and considering that the service is still in its first stages, we pay a lot of attention to personalised developer support.
APIs may have nonetheless some caveats that could threaten the success of a project built around them, but most of them boil down to not having a paid option entailing a high quality of service. However, nlpTools does consider this commercial option and may thus scale up to the needs required by the developers by contracting more powerful hosting features. The added-value of the service (which is also its core business) lies in its customisation, that is its ability to adapt to the particularities of the developer's problem, such as the fitting to the specific salient characteristics that represent their data.
Post 75Foraging ants as living particle filters
24-Feb-2013
Ant colonies are admirable examples of cooperative societies. Some of its members are prepared to build their complex lairs, some others constitute an army to protect their population, some others explore the outer world and gather food, etc. With respect to the latter function, which to me is the most representative of ant colony behaviour, I coded a simple simulation in JavaScript inspired by the js1k competition (demo and code available here).

The ants in the app have been implemented following a state machine. Initially, they forage for food, drawing a random walk while they operate in this searching-state. Once they find a source of nurture, the ants transit to another state where they return home, leaving a pheromone trail behind for others to follow. Finally, they end up in a loop going back and forth collecting more food. And as time goes by, more and more ants flock to the food-fetching loop. Therefore, they get the job done more rapidly and minimise the danger of an outer menace.
In a sense, foraging ants remind me of a particle filter where the particles are living beings moving stochastically to reach some objective. Thus, their behaviour could be cast as a biologically-inspired search algorithm for an optimisation procedure, considering that the objective is a cost function to be minimised.
Post 74The abridged build-measure-learn loop: innovate and seek excellence
12-Feb-2013
The principal objective of a tech startup (Research and Development also fit the shoes of tech entrepreneurship without loss of generality) is to learn how to build and run a sustainable business where value is created when a new technology invention is matched to customer need. Therefore, validated learning is a fundamental issue in this uncertain quest for success.
Ideas are born of initial leaps of faith. But then, they need to be conceptualised, sketched, implemented, submitted for testing through a minimum viable product, and by making use of innovation accounting and actionable metrics, the results have to be evaluated and the decision must be made whether to pivot or persevere. It is true that simulation is useful to understand the impact of uncertainty on the distribution of expected outcomes, but the real world is much harder to debug than a piece of code and there is always the need to iterate a business idea with real people (i.e., prospect customers) in order to discover their actual needs.
Similarly, in innovation management, it is said that the innovation that moves along the technology and market curves is incremental (persevere), in contrast to the innovation that is disruptive, which introduces a discontinuity and shifts to new curves (pivot). A pivot is a special kind of structured change designed to test a new fundamental hypothesis about the product, business model and engine of growth. It is the heart of the Lean Startup method (in fact, the runway of a startup is the number of pivots that it can do), which makes a company resilient in the face of failures (which are not mistakes, this is a different issue). However, there is (at least) one peril/caveat wrt the Lean Startup method (what's left out of the pivoting topic): if you do have true expertise in a particular field, you are then likely succeed and end up doing something of value for the customers you discovered, but this is no guarantee to be a rewarding experience to you. In that situation, you cannot do great work (unless you have a very wide band or changing taste, your work preferences will prevent you from doing a great job). We know from Steve Jobs that "the only way to do great work is to love what you do", so one might still need to pivot in that situation, too. Joel Spolsky also proclaims this message in his "careers badge": Love your job. Or else, pivot, and have a read at Cal Newport's book: So Good They Can't Ignore You, where it is supported that what you do for a living is much less important than how you do it, focusing on the hard work that is required to become excellent at something valuable instead of keeping pivoting until all variables fit your taste.

In a recent podcast, though, Cal emphasises the importance of craftsmanship, which is somewhat contradictory because craftsmanship is rather associated with passion. Anyhow, it's a sensible link and it's always reasonable to bear in mind the reverse side of an argument.
Post 73A New Year's resolution: get over specialisation and embrace generalisation to face real world industry problems
01-Jan-2013
Regularisation is a recurrent issue in Machine Learning (and so it is in this blog, see this post). Prof. Hinton also borrowed the concept in his neural networked view of the world, and used a shocking term like "unlearning" to refer to it. Interesting as it sounds, to achieve a greater effectiveness, one must not learn the idiosyncrasies of the data, one must remain a little ignorant in order to discover the true behaviour of the data. In this post, I revisit typical weight penalties like Tikhonov (L-2 norm), Lasso (L-1 norm) and Student-t (sum of logs of squared weights), which function as model regularisers:

And their representation in the feature space is shown as follows (the code is available here; this time I used the Nelder-Mead Simplex algorithm to fit the linear discriminant functions):

As expected, the regularised models generalise better because they approach the optimal solution, although the differences are small for the problem at hand. Even more different regularisation proposals could still be suggested using model ensembles through bagging, dropout, etc, but are they indeed necessary? Does one really need to bother learning them? The obtained results are more or less the same, anyway. What is more, not every situation may come down to optimising a model with a fancy smoothing method. For example, you can refer to a discussion about product improvement in Eric Ries' "The Lean Startup" book (page 126, Optimisation Versus Learning), where optimising under great uncertainty can lead to a total useless product in addition to a big waste of time and effort (as the true objective function, i.e., the success indicator the product needs to become great, is unknown). And still further, not in the startup scenario but in a more established industry like the rail transport, David Briginshaw (Editor-in-Chief of the International Railway Journal, October 2012) wrote:
"Specialisation leads to people becoming blinkered with a very narrow view of their small field of activity, which is bad for their career development, (...), and can hamper their ability to make good judgements."
So, a lack of generalisation (as in happens with overfitted models) leads to a useless skewed vision of the world. Abraham Maslow already put it in different words: if you only have a hammer, you tend to see every problem as a nail. This reflection inevitably puts into scene the people who are at the crest of specialisation: the PhD's. Is there any place for them outside the fancy world of academia where they usually dwell and solve imaginary problems? Are they ready to face the real tangible problems (which are not only technical) commonly found in the industry? The world is harder to debug than any snippet of fancy code. Daniel Lemire long discussed these aspects and stated that training more PhD's in some targeted areas might fail to improve research output in these areas. Instead, creating new research jobs would be a preferable choice, as it is usually the case that academic papers do not suit many engineering needs and those fancy (reportedly enhanced) methods are thus never adopted by the industry. His articles are worth a read. Research is indeed necessary to solve real world problems, but it must be led by added-value objectives, lest it be of no use at all. Free happy-go-lucky research should not be a choice nowadays (has anyone heard of the financial abyss in academia?).
Post 72Sure, you can do that... and still get an IEEE published article
24-Dec-2012
This year has been rather prolific with respect to the attained number of research publications. The most noteworthy is the one on the IEEE Transactions on Audio, Speech and Language Processing (TASLP), which is entitled "Sentence-based Sentiment Analysis for Expressive Text-to-Speech". Its abstract is posted as follows:
"Current research to improve state of the art Text-To- Speech (TTS) synthesis studies both the processing of input text and the ability to render natural expressive speech. Focusing on the former as a front-end task in the production of synthetic speech, this article investigates the proper adaptation of a Sentiment Analysis procedure (positive/neutral/negative) that can then be used as an input feature for expressive speech synthesis. To this end, we evaluate different combinations of textual features and classifiers to determine the most appropriate adaptation procedure. The effectiveness of this scheme for Sentiment Analysis is evaluated using the Semeval 2007 dataset and a Twitter corpus, for their affective nature and their granularity at the sentence level, which is appropriate for an expressive TTS scenario. The experiments conducted validate the proposed procedure with respect to the state of the art for Sentiment Analysis."
In addition, three other publications at the SEPLN 2012 Conference (see Publications) have allowed focusing on specific aspects as subsets of a greater whole (i.e., the IEEE TASLP article). This has been hard work, indeed. And I'm proud of it. Nonetheless, I cannot help being objective about it and admit that this line of research falls into the "data porn" category (check out the "publication Markov Chain" that is being mocked there). In any case, the addressed problem is a real one and alternative sources of knowledge have been considered to solve it, so this an altogether good lesson learnt.
By the way, Merry Xmas!
Post 71Perceptron learning with the overused Least Squares method
02-Nov-2012
Following Geoffrey Hinton's lectures on Neural Networks for Machine Learning, this post overviews the Perceptron, a single-layer artificial neural network that provides a lot of learning power, especially by tuning the strategy that is used for training the weights (note that Support Vector Machines are Perceptrons in the end). To keep things simple, 1) no regularisation issues will be covered here, and 2) the weight optimisation criterion will be the minimisation of the squared error cost function, which can be happily overused. In another post, the similarity between using the least squares method and the cross-entropy cost through the negative log-likelihood function (as it is reviewed in class) assuming a Gaussian error was already discussed. So using one or the other won't yield much effectiveness improvement for a classic toy dataset sampled from two Gaussian distributions.
Therefore, the ability of the perceptron to excel in classification tasks effectively relies on its activation function. In the lectures, the following functions are reviewed: binary, linear, logit and softmax. All of them provide their own singular learning capability, but the nature of the data for the problem at hand is always a determining factor to consider. The binary activation function is mainly used for describing the Perceptron rule, which updates the weights according to the steepest descent. Although this method is usually presented as an isolated golden rule, not linked with the gradient, the math is clearer than the wording:

The gradient for the logit is appended in the figure above to see how a different activation function (and thus a different cost function to minimise) provides an equivalent discriminant function (note that the softmax is a generalisation of the logit to multiple categories, so it makes little sense here):

As it can be observed in the plot, the form of the activation indeed shapes the decision function under the same cost criterion (not of much use here, though). In certain situations, this can make the difference between a good model and an astounding one. Note that different optimisation functions require different learning rates to reach convergence (you may check the code here). And this process can be further studied with many different activation functions (have a look at the variety of sigmoids that is available) as long as the cost function is well conformed (i.e., it is a convex function). Just for the record, the Perceptron as we know it is attributed to Rosenblatt, but similar discussions can be found with respect to the Adaline model, by Widrow and Hoff. Don't let fancy scientific digressions disguise such a useful machine learning model!
Post 70Least Squares regression with outliers is tricky
23-Jul-2012
If reams of disorganised data is all you can see around you, a Least Squares regression may be a sensible tool to make some sense out of them (or at least to approximate them within a reasonable interval, making the analysis problem more tractable). Fitting functions to data is a pervasive issue in many aspects of data engineering. But since the devil is in the details, different objective criteria may cause the optimisation results to diverge considerably (especially if outliers are present), misleading the interpretation of the study, so this aspect cannot be taken carelessly.
For the sake of simplicity, linear regression is considered in this post. In the following lines, Ordinary Least Squares (OLS), aka Linear Least Squares, Total Least Squares (TLS) and Iteratively Reweighted Least Squares (IRWLS) are discussed to accurately regress some points following a linear function, but with an outlying nuisance, to evaluate the ability of each method to succeed against such a noisy instance (this is fairly usual in a real-world setting).
OLS is the most common and naive method to regress data. It is based on the minimisation of a squared distance objective function, which is the vertical residual between the measured values and their corresponding current predicted values. In some problems, though, instead of having measurement errors along one particular axis, the measured points have uncertainty in all directions, which is known as the errors-in-variables model. In this case, using TLS with mean subtraction (beware of heteroskedastic settings, which seem quite likely to appear with outliers; otherwise the process is not statistically optimal) could be a better choice because it minimises the sum of orthogonal squared distances to the regression line. Finally, IRWLS with a bisquare weighting function is regarded as a robust regression method to mitigate the influence of outliers, linking with M-estimation in robust statistics. The results are shown as follows:

According to the shown results, OLS and TLS (with mean subtraction) display a similar behaviour despite their differing optimisation criteria, which is slightly affected by the outlier (TLS is more affected than OLS). Instead, IRWLS with a bisquare weighting function maintains the overall spirit of the data distribution and pays little attention to the skewed information provided by the outlier. So, next time reliable regression results are needed, the bare bones of the regression method of use are of mandatory consideration.
Note: I used Matlab for this experiment (the code is available here). I still do support the use of open-source tools for educational purposes, as it is a most enriching experience to discover (and master) the traits and flaws of OSS and proprietary numerical computing platforms, but for once, I followed Joel Spolsky's 9th principle to better code: use the best tools money can buy.
Post 69On using Hacker News to validate a product idea involving NLP and PHP
12-Jul-2012
The first step to creating a valuable product is to discover what it is exactly wanted or needed by the target customers. The Lean Startup process states it straight, and the Pragmatic Programmer even provides a means to find it out by asking Hacker News (HN). HN is a vibrant community of tech people, hackers is its broadest sense... and entrepreneurs (these concepts need not be disjoint), which can provide a lot of insight into the value of a product idea.
Now, my product idea: a general-purpose Natural Language Processing (NLP) toolkit coded in PHP. This is certainly a long wanted product (note that the two links date back to 2008), and for a sensible reason: the Internet is bloated with textual content, so let's develop a NLP tool that is focused on processing text on the web. In this sense, the PHP programming language, i.e., by definition, the Hypertext Preprocessor, should be a practical choice with which to do it. Moreover, PHP is the default platform that is available on a web server. Then, all the elements seem to be in the right place. And the problem seems to be addressed logically this way, but it still needs positive feedback from the end users (the developers) to succeed. Note that none of the currently available NLP toolkits reported in the Wikipedia list has been developed in PHP, so there must be a niche of improvement here, or must there be something wrong going on? Why is it so? Perhaps the product was not interesting a few years ago, maybe it did not catch up because of marketing issues, or using the many bindings and wrappers available was just enough in contrast to putting the effort in doing it all again from scratch... Therefore, the question naturally arises: is it really interesting to the community? If so, to what extent? Is it worth the bother? Will this be a profitable project? Would it be nuts to rely solely on Ian Barber's opinion?
These questions require some scientific experimentation, so I built a prototype (mainly based on text classification, which has 24 GitHub watchers at this time of writing; thanks for your interest, indeed) and submitted it to HN. What I found out was contrary to what I expected: the general interest in this kind of product is essentially nonexistent, just in line with what had already happened with the previous approaches. I failed. OK. At least I now know by myself it's nonsense to invest in this product. I'd better do something else. Fine. Let's keep engineering. The upside is that I practised some PHP (my skills with this language were getting a little rusty) and (more importantly) I learnt that businesses need solutions, not tools to develop solutions (this conclusion is derived directly from the only -ironic- comment that appears in HN, which was motivated by the demo app that I provided where I trained the classifier with a popular research dataset only as a proof of concept). That's awesome! If I had dismissed the so-valuable Lean Startup directive, assuming that the world was just how I saw it, I would have "wasted" (please note the quotation marks) a whole lot of time developing something nobody would pay for (I'm being rather like Edison here, I know). This is an undoubtedly good "lesson learned". Needless to say, though, if I ever get to obtain economic support for its development, I will gladly resume the coding phase!
Post 68The Passionate Programmer in the late-2000s recession
03-Jun-2012
The present receding economy displays a scenario that is wildly unknown, and this inevitably affects the attitude that we take with respect to our careers, reminding us all of the crucial importance to always be heading to where the magic happens. In addition to the renown advice to not settle, what's utterly of value is to stay hungry in this continuously changing world.
In this regard, the Passionate (and Pragmatic) Programmer provides some insight that is worth noting. In this post, I review some of its guidelines to "create a remarkable career in software development", and the many connections with the present situation arise naturally (the book was published three years ago):
- Pursue the bleeding edge of technology (the Next Big Thing), out of the comfort zone.
- Seek salient features as a professional (this was related to the author's stay in India for recruitment issues, which reminded me of the Aspiring Minds Machine Learning Competition).
- Choose your crowd wisely. The people around you affect your own performance. Make the hang with the greats. Be on the shoulders of giants, there are many ways to paraphrase this.
- Practise at your limits to improve. If you always do what you've always done, you will always get what you've always got.
- Work with a mission. Attain daily accomplishments (consider the pomodoro technique).
- You are what you can explain, so don't make a fool of yourself with (unnecessary) rigid values (avoid monkey traps).
- Be intentional about your choice of career path and how to invest in your professional self. Career choices should be sought after and decided upon with intention. Each choice should be part of a greater whole (connecting the dots...).

And last but not least: you can't creatively help a business until you know how it works. In this regard, the next book in my reading list is The Lean Startup.
older
