There are two silences. One when no word is spoken. The other when perhaps a torrent of language is being employed.
Substitute “information” for “word” and you get censorship in one case and information overload in the other. Two sides of the same coin?
Posted in Musing | Tagged quote, silence | Leave a Comment »
The WordPress.com stats helper monkeys prepared a 2013 annual report for this blog.
Here’s an excerpt:
The concert hall at the Sydney Opera House holds 2,700 people. This blog was viewed about 20,000 times in 2013. If it were a concert at Sydney Opera House, it would take about 7 sold-out performances for that many people to see it.
Click here to see the complete report.
Posted in Uncategorized | Leave a Comment »
Because that’s how you get on the bestseller list. You promise the moon and stars, you say everything you heard before was wrong, and you blame everything on one thing. You get a scapegoat; it’s classic. Atkins made a fortune with that formula. We’ve got Rob Lustig saying it’s all fructose; we’ve got T. Colin Campbell saying it’s all animal food; we now have Perlmutter saying it’s all grain. There’s either a scapegoat or a silver bullet in almost every bestselling diet book.
The recurring formula is apparent: Tell readers it’s not their fault. Blame an agency; typically the pharmaceutical industry or U.S. government, but also possibly the medical establishment. Alluding to the conspiracy vaguely will suffice. Offer a simple solution. Cite science and mainstream research when applicable; demonize it when it is not.
Is complexity too scary to be the subject matter of a bestseller?
via This Is Your Brain on Gluten – James Hamblin – The Atlantic.
Posted in Musing | Tagged bestseller, gluten, scapegoat, silver bullet | Leave a Comment »
LinkedIn, for example, has almost no batch data collection at all. The majority of our data is either activity data or database changes, both of which occur continuously. In fact, when you think about any business, the underlying mechanics are almost always a continuous process—events happen in real-time, as Jack Bauer would tell us. When data is collected in batches, it is almost always due to some manual step or lack of digitization or is a historical relic left over from the automation of some non-digital process. Transmitting and reacting to data used to be very slow when the mechanics were mail and humans did the processing. A first pass at automation always retains the form of the original process, so this often lingers for a long time.
Production “batch” processing jobs that run daily are often effectively mimicking a kind of continuous computation with a window size of one day. The underlying data is, of course, always changing. These were actually so common at LinkedIn (and the mechanics of making them work in Hadoop so tricky) that we implemented a whole framework for managing incremental Hadoop workflows.
Seen in this light, it is easy to have a different view of stream processing: it is just processing which includes a notion of time in the underlying data being processed and does not require a static snapshot of the data so it can produce output at a user-controlled frequency instead of waiting for the “end” of the data set to be reached. In this sense, stream processing is a generalization of batch processing, and, given the prevalence of real-time data, a very important generalization.
(emphasis added by me)
via The Log: What every software engineer should know about real-time data’s unifying abstraction | LinkedIn Engineering.
Posted in Technology | Tagged LinkedIn, log, stream processing | Leave a Comment »
Ever itched to do machine learning and data mining on streams? On huge, big data streams?
We have a solution for you!
SAMOA (Scalable Advanced Massive Online Analysis) is a platform for mining big data streams. It features a pluggable architecture that allows it to run on several distributed stream processing engines such as Storm and S4. SAMOA includes distributed algorithms for the most common machine learning tasks such as classification and clustering. For a simple analogy, you can think of SAMOA as Mahout for streaming.
SAMOA is currently in Alpha stage, and is developed in Yahoo Labs in Barcelona. It is released under an Apache Software License v2.
Thanks to everybody who made this release possible!
read more here on Yahoo engineering
Posted in Research, Technology | Tagged big data, data mining, machine learning, S4, SAMOA, Storm, stream processing | Leave a Comment »