Archive for July, 2014



R is a leading platform for data analysis, predictive analytics and machine learning.


By Joshua Weiner and Ali Syed

The path to sustainable and meaningful advantage is being able to find new ways of managing data, discovering what’s in it, finding patterns and predictions, and what to do with it. Fuelled by data deluge, predictive models and machine learning programs are being used to improve everything around us from the way we shop to the web experiences we enjoy, and the way we receive social and health care. Public sector, telecom, sports, healthcare, retail, and agriculture are just a few industries where big data and predictive analytics are changing the way we work and live.

The proliferation of smart devices, cloud computing and mobile applications, both in our personal lives and the workplace, gives us the ability to know more about markets, customers, processes, behaviours and practices then ever before. The exciting result is that we have the ability to learn from past and present events to predict future outcomes.

We envision a future with personalized medicine. Drugs that are specifically tailored to treat us based on our unique attributes, our medical and family history. A future where roommates are selected not through Craig’s List or a four question survey from the Undergraduate Housing Department, but rather a tailored questionnaire that utilizes years of historical data in order to minimize the chance of conflict and increase my overall satisfaction. A world that has a real smart car. “One that adjusts the suspensions and gears to my nephew; a 16 year old and with his learning permit, who seems to enjoy alternating between slamming the brakes and gas pedal.” Or maybe even a movie studio that can more successfully predict the expected outcome of a movie. One that can identify commercials that are applicable to the target audience, and avoid $300mm losses on mega-flops, like Fox’s “John Carter”. This world is being enabled as we speak through predictive analytics and the open source platforms and programing environments like R and Hadoop.

One of the exciting changes in statistical analysis and predictive analytics during the last decade has been the growth of open source platforms and languages for analysing and predicting from data. R is one such programing language. Worldwide, millions of analysts, researchers, professionals and data scientists use R for data analysis, predictive modelling, machine learning and graphical analysis. R was created in 1993 by Ross Ihaka and Robert Gentleman of the University of Aukland, New Zealand.  It is a GNU project which is similar to the S language which was developed at Bell Laboratories by John Chambers and colleagues.

R provides a wide variety of statistical, machine learning methods and predictive modelling techniques, and is highly extensible. You can easily download and use 4000 plus methods in statistics, predictive modelling, and machine learning free of charge. R has the enterprise capabilities needed to drive adoption across the organization and for business and technology professionals to make data-driven decisions.

When it comes to statistical modelling and predictive analytics, there are three clear leaders in the software space — SAS, SPSS, and R. But, which of the three makes sense to learn first? Which one has the most lasting power? Which will offer the most utility?

The answer is quite simple. In terms of flexibility, price, popularity and graphical capabilities, R has distinct advantages that give it a significant competitive edge.


Being able to perform a variety of functions is key for a statistical analysis program. R has thousands of packages that you can add on that allow deeper, customized functionality which execute complicated processes at the click of a button. These shortcuts are incredibly useful, especially if you are not familiar with some of the mathematics behind every algorithm. You can use methods like gradient boosting machines and random forests in 1 line of code — essentially removing the complication of knowing how to build a model in order to run it.


How does R have so many packages? It’s Open Source! That means that the product is constantly evolving as dedicated users add new features. SAS and SPSS release new versions on an annual basis, but chances are, if you are looking to add a function to R, another data scientist has already created an add-on that you can easily install and take advantage of. If SAS and SPSS don’t already incorporate a feature, then you are most likely stuck waiting for next year’s release or a patch.

Unlike its competitors, which will cost you $5,000 to $10,000 per license, R is free to use and offers unlimited access to the latest and greatest packages that the dedicated community creates. Can’t argue with that, right?


With such a dynamic and affordable product, it’s no surprise that up to 70% of analytics professionals use it at least occasionally. Starting off in the academic level, students have easy access to the platform, then can quickly adapt it for their research needs or even professional projects later on. Like with Linux, there is an incredible diverse and supportive community that offers tutorials and troubleshooting to help you take full advantage. When you interact with other analytics professionals, chances are they will also be familiar with it, making your collaboration a breeze.


Visualization is a critical component for data scientists. If you can’t find a way to communicate your findings in a clear way, then it’s going to be extremely difficult to move forward with your work. Luckily, R has unparalleled graphics capabilities due to integration with tools like ggplot2, rCharts, googleVis.


The value of expertise in R is incredible, and will only continue to grow as our world becomes more digital and data driven. School of Data Science has designed a 1 day hands-on, practical workshop for beginners to learn the core skills and concepts required for visualizing, transforming and analysing data in R. Great opportunity for data analysts, business analysts, technology and business consultants and all mortals interested in learning basics of R for effective data analysis and predictive analytics.

The workshop is designed for people who are just starting with R as well as for data analysts who are switching to R from other statistical software, such as SAS or SPSS. Read and download the workshop brochure.

Please take a moment to register now and avail the special discounts offered. Special discounts available for civil servants, charities and not for profit organizations. We look forward to having you join us for this unique learning experience. If you have any questions about the workshop or registration please email at or give us a call +44 (0) 2032 39 3141

          read more



The real innovation in big data is human innovation.


By Ali Syed

Digital world is continuously churning vast amount of data which is getting ever vaster ever more rapidly. Some analysts are saying that we are producing more than 200 exabytes of data each year. We’ve heard this so many times that managed well, this (big) data can be used to unlock new sources of economic value, provide fresh insights into science, hold governments to account, spot business trends, prevent diseases, combat crime and so on.

Over the past decade (noughties), we have witnessed the benefits of data from personalized movie recommendations to smarter drug discovery  – the list goes on and on. Joe Hellerstein, a computer scientist from University of California in Berkeley, called it “the industrial revolution of data”. The effect are being felt everywhere, from business to science, from government to the society

“You are thus right to note that one of the impetuses is that social as well as cultural, economic and political consequences are not being attended to as the focus is primarily on analytic and storage issues.” Evelyn Ruppert, Editor Big Data and Society

At the same time this data deluge is resulting in deep social, political and economic consequences. What we are seeing is the ability to built economies form around the data and that to me is the big change at a societal and even macroeconomic level. Data has become the new raw material: an economic input almost on a par with capital and labour.

Organizations need data from multiple systems to make decisions. Need data in easy to understand, consistent format to enable fast understanding and reaction. They are now trying to capture every click because storage is cheap. Customer base is harder to define and constantly changing. While all this is happening expectation is to have the ability to answer questions quickly. Everyone is saying “Reports” don’t satisfy the need any more.

The global economy has entered in the age of volatility and uncertainty; a faster pace economic environment that shifts gears suddenly and unexpectedly. Product life cycles are shorter and time to market is shorter. Instant gratification society, society which expects quick answers and more flexibility more than ever. Consequently, the world of business is always in the midst of a shift, required to deal with the changing economic and social realities.

The combination of dealing with the complexities of the volatile digital world, data deluge, and the pressing need to stay competitive and relevant has sharpened focus on using data science within organisations. At organisations in every industry, in every part of the world, business leaders wonder whether they are getting true value from the monolithic amounts of data they already have within and outside their organisations. New technologies, sensors and devices are collecting more data than ever before, yet many organisations are still looking for better ways to obtain value from their data.

Strategic ability to analyse, predict and generate meaningful and valuable insights from data is becoming top most priority of information leaders’ a.k.a CIOs. Organisations need to know what is happening now, what is likely to happen next and, what actions should be taken to get the optimal results. Behind rising expectations for deeper insights and performance is a flood of data that has created an entirely new set of assets just waiting to be applied. Businesses want deeper insights on the choices, buying behaviours and patterns of their customers. They desire up to date understanding of their operations, processes, functions, controls and seek information about the financial health of their entire value chain, as well as the socio economic and environmental consequences of both near term and distant events.

“Every day I wake up and ask, ‘how can I flow data better, manage data better, analyse data better?” Rollin Ford – CIO of Wal-Mart


Although business leaders have realized there’s value in data, getting to that value has remained a big challenge in most businesses. Friends in industry have cited many challenges, and none can be discounted or minimized: executive sponsorship of data science projects, combining disparate data sets, data quality and access, governance, analytic talent and culture all matter and need to be addressed in time. In my discussions with business executives, I have repeatedly heard that data science initiatives aligned to a specific organisational challenge makes it easier to overcome a wide range of obstacles.

Data promises so much to organisations that embrace it as essential element of their strategy. Above all, it gives them the insights they need to make faster, smarter and relevant decisions – in a connected world where to understand and act in time means survival. To derive value from data, organizations needs an integrated insight ecosystem of people, process, technology and governance to capture and organize a wide variety of data types from different sources, and to be able to easily analyse it within the context of all the data.

We are all convinced that data as a fabric of the digital age underpins everything we do. It’s part and parcel of our digital existence, there is no escape from it. What is required is that we focus on converting big data into useful date. We now have the tools and capabilities to ask questions, challenge status quo and deliver meaningful value using data. In my opinion, organizations and business leaders should focus more on how to minimise the growing divide between those that realise the potential of data, and those with the skills to process, analyse and predict from it. It’s not about data, it’s about people. The real innovation in big data is human innovation.

“The truth is, that we need more, not less, data interpretation to deal with the onslaught of information that constitutes big data. The bottleneck in making sense of the world’s most intractable problems is not a lack of data, it is our inability to analyse and interpret it all.” – Christian Madsbjerg


Whether you are looking to amplify value and impact using data or you want to analyse data scientifically for insights and to ask relevant questions, Persontyle Services is the place to start to ensure you have what you need to strategize, design, implement, and fulfil your analytic needs. We will collaboratively work with you from data to insight and on to impact and value.

read more