Yes, because data science has the potential to revolutionize the way business, government, science, research, and healthcare are carried out. Data science is emerging as a hot new profession and academic discipline. In my opinion, 2015 will see data science becoming a mainstream career choice. Data science has come a long way – but the evolution is only beginning.
Focus of this post is not to convince cynics. In fact, I’m not interested in them at all. I don’t want to waste a second of my time on cynics, wimps, and haters. Their cynicism leads to mediocrity. Umair Haque, one of my favorite thinker and writer, eloquently said that, “there’s nothing more poisonous to self-belief than people who tell you what you cannot do.”
Today, I want to share couple of interesting updates that highlight the significance of data literacy and why data literacy is a fundamental skill for all professionals. In massively connected digital world, it is imperative that the workforce of today and tomorrow is able to understand what data is available and use scientific methods to analyze and interpret it.
Hal Varian, Google’s chief economist in his 2009 interview for The McKinsey Quarterly said that;
“The ability to take data – to be able to understand it, to process it, to extract value from it, to visualize it, to communicate it’s going to be a hugely important skill in the next decades, not only at the professional level but even at the educational level for elementary school kids, for high school kids, for college kids. Because now we really do have essentially free and ubiquitous data. So the complimentary scarce factor is the ability to understand that data and extract value from it. I think statisticians are part of it, but it’s just a part. You also want to be able to visualize the data, communicate the data, and utilize it effectively. But I do think those skills – of being able to access, understand, and communicate the insights you get from data analysis – are going to be extremely important. Managers need to be able to access and understand the data themselves.” – Hal Varian
Wait a minute. Is this a pitch for Persontyle? I think so…
Persontyle is a social enterprise dedicated to data literacy. Born from the idea of creating a platform for the people, by the people to share passion, knowledge, theory and practices of analyzing data scientifically. Most people are focused on making machines smarter. We are focused on making people smarter. Through its various educational programs offered by the School of Data Science, you can learn the practical skills required to be a player in this fascinating field and get to have a good time at the same time. But you don’t have to start from the very basics, if you are already knowledgeable about some things. The courses, workshops, and bootcamps it offers cover various levels and are all self-contained. Also, all the learning experiences have a practical aspect to them, making sure that you get some hands-on experience while doing them. The best part is that they are all very short, so it is easy to fit them in your schedule. If you are unable to attend during the days that they are held, there are also customized solutions, so that you can get some training at a time and place of your convenience. This option is particularly good for you if you have a data science project team with which you work and everyone needs to get the training.
Data science is scorching hot right now, in headlines, board rooms, universities and philanthropy. As the world is becoming more and more aware of the benefits of data science, more and more organizations are looking for ways to harness the value that lingers in the big data they have access to. It has become clear that the latter is not just a buzz word; it is a whole new world of potential, waiting to be explored.
Develop the most demanded skill for the 21st century.
We believe data literacy will be the fundamental skill for the 21st century. In 2015, we are launching some exciting new data science and big data engineering learning opportunities, also we are collaborating with our partners to offer data science training programs in US and Middle East.
Data science is all about building teams and culture.
It is crazy to think that a doctor must know everything, and it is just as crazy to think a data scientist should be an expert in machine learning, statistics, hacking, programming, application development, production deployment, etc, etc. Data science is a team sport. All playing have a role to play, everyone contributes. Somebody has to bring the data together, somebody has to build the data workflows and do the data engineering work, someone needs to apply the machine learning models, someone needs to be there to challenge the results and so on and so forth. Important thing to note here is that all involved in the data science project lifecycle should be aware of the limitations of their expertise and knowledge, and willing to call in for help when required.
“People make a mistake by forgetting that Data Science is a team sport. People might point to people like me or (Jeff) Hammerbacher or Hilary (Mason) or Peter Norvig and they say, oh look at these people! It’s false, it’s totally false, there’s not one single data scientist that does it all on their own.” – DJ Patil
As data science evolves to become a business necessity, the importance of assembling a strong and innovative data teams grows. Assembling a group of talented people with diverse skills is the best way to meet your data science needs. Engage Persontyle data science and data engineering experts to conduct a free Data Science Talent Strategy Workshop for your organization. Book now!
Beyond data science: Advancing data literacy.
First piece I want to share is by Leslie Bradshaw. Excellent post to understand the concept of data literacy and why contextualizing, storytelling, and visualizing are equally important tools to get better, smarter, faster, and more reliably predictive decision making.
“From public policy to sports to finance to health to economics to businesses to citizens to elected officials to education… so many aspects of our individual lives can and will be made better through including more disciplines in the science of data as it evolves to become a literacy of data.” – Leslie Bradshaw
Read the full post here
The 25 hottest skills that got people hired in 2014.
Second post I’m sharing is by25 hottest skills that got people hired in 2014. To determine which skills were most in demand in 2014, LinkedIn data specialists analyzed more than 330 million profiles. You can notice that 5 of the skills mentioned are directly related to data literacy and data science.
Read the full post here
Become data literate in 2015.
Idea that data scientists are as mythical as unicorns is simply false. Only wannabes, protectors of mediocrity and industrial age pundits call data scientist a unicorn. You should not believe the fantasy that data scientists are mythical, and should definitely not neglect developing data science capabilities.
I’ve said it before, saying it again that we all can learn and apply data science to make lot of mistakes. In a complex world, the process of trial and error is essential. We need to promote the culture of experimentation. Understand and embrace the concept of trial and error.
Data Science = Ask questions and challenge the status quo to deliver meaningful value. To me, essence of data science is to break the rules and challenge status quo by building new models that make the existing models obsolete. Because greatness doesn’t stem from merely counting what can be counted.
“There are two rules I’ve always tried to live by: turn left, if you’re supposed to turn right; go through any door that you’re not supposed to enter. It’s the only way to fight your way through to any kind of authentic feeling in a world beset by fakery.” – Malcolm McLaren
Happy 2015! A new year. A new beginning. Learn something new. Become data literate in 2015. Break the rules. And never give up on your dreams!
All the best!read more
Guest post by Dr. Mike Ashcroft
If you know all there is to know about Data Science and Machine Learning, you may want to stop reading now.
Oh good, we’re alone.
Despite the hype and confusion surrounding Data Science, the need for people who can interpret data and use it to find patterns and predictions to help organizations make informed business decisions is very real. Data Science is fueling the digital economy, we need to move it to the very center of our business, research and social change endeavours. Data Science is bringing new levels of speed, relevance, and precision to the way we design and manage businesses and operating models. Machine Learning is without a doubt the core aspect of Data Science and predicative analytics in general. In health care, Machine Learning is changing the way doctors identify people at risk of developing certain diseases; in retail, machine learning is used to analyze purchasing data to anticipate trends; CRM and marketing experts use it to tailor campaigns and offers.
Machine Learning is simple: We have the algorithms, we have the experience and, these days, we have the data. The ‘complicated’ mathematics behind the data revolution is the process of the cumulative application of basic techniques that can be understood in terms of mathematics we learnt in high school and early college. Providing this understanding is the most important facet of Data Science education: Only those who understand the tools they use are able to choose the appropriate technique for the tasks they face. I have never met an organization prepared to trust their data analysis to analysts who cannot explain why they use the techniques they do.
Fundamentals of Machine Learning bootcamp is designed to give you this understanding. It will provide you with the ability to apply the most powerful techniques in Machine Learning, to select appropriate techniques for particular problems , and to say exactly what these techniques do and why they work in a way that is understandable to data analysis stakeholders.
Fundamentals of Machine Learning bootcamp will take you through the conceptual and applied foundations of the subject. Topics covered will include Machine Learning theory, types of learning, techniques, models and methods. Labs are developed to practically learn how to use the R programming language and packages for applying the main concepts and techniques of Machine Learning. In this bootcamp, our goal is to give you the basic skills that you need to understand Machine Learning algorithms and models, and interpret their output, which is important for solving a range of data science problems. This is an applied Machine Learning course, and we focus on the intuitions and practical know-how needed to get Machine Learning algorithms to work in practice, rather than the mathematical equations and derivatives.
Using actual data, the bootcamp begins by reviewing important basic statistical methods. You will learn to use the popular statistical programming language R to build these simple models from the ground up. You will then see how these simple techniques can be improved, combined, augmented and adjusted to produce powerful statistical tools for different tasks in data analysis. In this way, you will learn to see advanced Machine Learning techniques not as black boxes, but as principled techniques used to unlock patterns from data.
Over the course of five days, over two dozen techniques will be examined, implemented through supervised exercises and tutorials, and compared. You will learn the relative advantages and disadvantages of different types of techniques in different contexts. You will see how some models are entirely data driven, while others can be used to encode defeasible expert knowledge. You will learn methods for validating selected models and techniques and for choosing among alternative methods.
As we proceed we discuss with examples the sorts of data that suit these different approaches, and you will continue to apply these techniques ‘live’ in R. All topic areas have practical exercises, where you implement the algorithms we are looking at, as well as analyze their outputs and their suitability to particular problems. It is an essential part of the course aims that you get real ‘hands on’ experience working with the techniques we cover, in the comfortable environment of a classroom where you can discuss and work through problems you encounter with the instructor (me). The purpose is to arm you with a set of tools that you know how to apply, how to explain and when to use, as well as their theoretical background.
Fundamentals of Machine Learning bootcamp is for students, researchers and professionals from industry, services, social and public sectors who wish to develop the ability to turn data into meaningful and actionable insights. The greatest care is taken to provide bootcamp participants with high quality instruction that makes the journey of understanding and using advanced data analysis tools as easy and enjoyable as possible. So join us, master the science behind ‘data science’ and equip yourself for a role in the data revolution.
Read and download the bootcamp brochure.
Special Offer – 25% Discount!
Please take a moment to register now and avail the special 25% discount offered. Visit the event page and use the promo code FMLB100 to get 25% off. We are encouraging University students (post graduate and PhD) and researchers to learn machine learning by offering a special 50% discount for them. Also, 40% discount for Limited seats, I encourage you to register as soon possible.
About the Author
The author is Dr. Mike Ashcroft, Lecturer in Machine Learning and Artificial Intelligence at Uppsala University in Sweden, and founder of data analytics company Inatas AB. He has worked in the Machine Learning field for over five years, developing cutting edge software, providing professional and university courses and performing specialist consulting work. He has extensive experience teaching and working both in Europe and Asia.read more
By Ali Syed
Throughout the world, organizations and leaders widely acknowledge that digital technologies are transformational. Most of them understand the point that we are not living in the digital revolution. We are living and breathing the history that many will look back at in generations to come and see this time in history as one that fundamentally changed everything. Forever. The society is changing, technology is changing, business models are changing, the way we interact and connect with each other is changing and it’s no longer about becoming digital or not, it’s about sustaining, succeeding and competing in a digital era.
In this digital era organizations can no longer operate, sustain and compete using traditional industrial age thinking, systems, models, technologies and operating frameworks. Digital technologies are irrevocably changing the way organizations and institutions engage and interact with people, citizens, consumers and many other stakeholders. Traditional thinking, operating models, and value delivery channels are being disrupted, driving leaders and executives to reassess their strategies and plans. The combination of dealing with the complexities of the volatile digital world, data deluge, and the pressing need to stay competitive and relevant has sharpened the focus on using digital technologies to transform, in order to stay relevant and competitive.
“Digital technologies are assuming an increasingly prominent place in everyday life, both in the more traditional areas and in the field of new information and communication technologies. Digital is the common language for information, whether in the form of text, pictures or video images. Digitisation, ie the conversion of information into a string of 0s and 1s, provides a common denominator for telephone, television, radio, camera, camcorder or computer signals.”- Information society and a digital world. Report by Council of Europe, Committee on Science and Technology
Data as a fabric of the digital age underpins everything we do. It’s part and parcel of our digital existence. To succeed in a digital world we must master the art and science of managing, leveraging and applying data. Organizations need to become connected and data driven. To sustain and remain competitive they should know what is happening now, what is likely to happen next, and what actions should be taken to get the optimal results. The buzz word associated with this is ‘Big Data’. The trick however is to ignore the Big and just focus on the Data.
Data has become the new raw material: an economic input almost on a par with capital and labor. Organizations need data from multiple systems to make decisions. They need information in easy to read and consistent format to enable fast understanding and response. The path to sustainable and meaningful advantage is being able to find new ways of managing data, discovering what’s in it, finding patterns and predictions, and deciding what to do with all that. Fuelled by data deluge, predictive models and machine learning programs are being used to improve everything around us from the way we shop to the web experiences we enjoy, and the way we receive social and health care.
At the core of any digital transformation is the ability to think creatively and differently around digital technologies, based on the right mix of people, architectures, systems, processes, frameworks and experience around collaboration. This involves creating a mind-set change. One of the areas where we have to do it on immediate basis is how organizations manage, process, and analyze data. Within organizations in every industry, in every part of the world, business and technology leaders are assessing how to get true value from the monolithic amounts of data they already have within and outside their organizations. At the same time, new technologies, comprising of sensors and devices are collecting more data than ever before.
One of the exciting changes in predictive analytics and machine learning during the last 3 years has been the growth of predictive APIs, applications and machine learning as a service (MLaaS) for analyzing and predicting from data. This is an emerging domain. Machine learning platforms of various sorts are revolutionizing many areas of business, public and social services, and predictive APIs (PAPIs) have the potential to bring these capabilities to an even wider range of applications.
Unless you’re a nerd or a developer, you’ve probably never paid much attention to the term “API” — an acronym for “Application Programming Interface.” However, if you’re obsessive compulsive user of social media platforms, you’ve most likely used an application or service built using an API. Twitter, Facebook, LinkedIn, WordPress, WhatsApp, Uber, Amazon, Airbnb and thousands of other applications rely on APIs. What’s more, without APIs, Apple App Store and the Android Marketplace would be very small!
APIs are carefully thought out pieces of code created by programmers for their applications that allow other applications to interact with their application and platform. APIs matter to all organizations operating in the digital era because using them, they can develop platforms, applications and experiences that help us do our jobs effectively and optimally, market products and ideas better, drive revenues, and connect with consumers, customers and partners. Many companies have realized the opportunities that APIs offer and have launched their own platforms and applications to deliver products and services. The popularity of APIs isn’t limited to social media. APIs are strategic tools to unlock business value. Check out how extensive APIs are by reviewing the API directory found at ProgrammableWeb.
Just like conventional APIs are making it easy for programmers to create applications, similarly Predictive APIs are making machine learning simple and accessible to everyone. This type of APIs are making it easier to apply machine learning to data — and thus to create Predictive Apps. In essence, these APIs abstract away some of the complexities of creating and deploying machine learning models and make machine learning more accessible to developers. They also allow them to spend more time on user experience, design, data munging, experimenting and delivering value from data.
In simple terms, machine learning is a computer’s ability to learn from data, and it is one of the most useful tools we have to develop intelligent systems and applications. Machine learning is used widely today for all kinds of tasks, from churn prediction in large companies, to web search, to medical diagnostics, to robotics. It’s hard to find a field that cannot benefit from machine learning in one way or another. Predictive analytics and machine learning are bringing new levels of speed, relevance, and precision to the way we design and manage operating models. In health care, machine learning is changing the way doctors identify people at risk of developing certain diseases; in retail, machine learning is used to analyze purchasing data to anticipate trends; CRM and marketing experts use it to tailor campaigns and offers.
Machine learning is fun once you know what it is and how to use it. Predictive APIs provides great opportunity for all of us to use this super awesome capability with just enough Math to make awesome applications. Organizations can analyze data and predict future outcomes with Predictive APIs. They have the opportunity to use these APIs to build smart and intelligent apps using machine learning algorithms.
“Machine learning, predictive analytics and APIs for that matter are not technologies of the future, but important technologies of the present.” – Janet Wagner, Machine learning and predictive analytics foster growth
Read this post to understand the business possibilities enabled by Predictive APIs. To learn how to use Predictive APIs and make machine learning work for you I highly recommend reading Louis Dorard’s book “Bootstrapping Machine Learning”.
Who do you believe is the #1 at the Kaggle Rankings?
A) A professor from Stanford with 20 years experience in machine learning.
B) A Russian mathematician who solved college level math puzzles at the age of 3 and works for the KGB.
C) A Spaniard from Andalusia (where my hometown is and where everybody naps twice a day) who works for a hospital.
D) Chuck Norris
Of course answer was C. The point he was trying to make was that we don’t need machine learning gurus and academics instead we need experts and professionals from other fields who know enough about machine learning to use it.
Since then we have been collaborating on many initiatives related to promoting machine learning and making it more accessible and simple. Earlier this year, Francisco introduced me to Louis Dorard who is actively helping people to exploit the power of machine learning with minimal coding experience using Predictive APIs. In the last 6 months or so three of us discussed the need for a community and platform to bring together practitioners from industry, academia and public services to present new developments, identify new needs and trends, and discuss the challenges of building real-world predictive APIs and applications. Last month we announced that PAPIs 2014 – The First International Conference on Predictive Application and APIs will be held in Barcelona on November 17-18. A technical and practical conference dedicated to Predictive APIs and Predictive Apps
PAPIs ‘14 is the first International Conference on Predictive APIs and Apps. It will take place on 17-18 November 2014 (right before the O’Reilly Strata Conference) in Barcelona, Spain, where it will connect those who make Predictive APIs with those who use them to make Predictive Apps.
We want PAPIs to become an open forum for technologists, researchers and developers of real-world predictive APIs and applications to get together to learn and discuss new machine learning APIs, techniques, architectures, and tools to build predictive applications.
With the barrier of entry for machine learning effectively removed by predictive APIs, the time is now for all of us (programmers, researchers, business and technology professionals) to take advantage of machine learning to deliver real and meaningful economic and social value. Just remember, data alone is not enough. We need predictive APIs and applications to make it valuable, actionable and meaningful.
See you all at PAPIs in Barcelona!read more
By Dr. Pawel Kobylinski
Upon joining the Persontyle team, I have decided to introduce myself to the readers of this blog by giving you a rationale behind the connection between social science and data science. Persontyle wants to help in transferring both ideas and strict technical know-how between the two disciplines mentioned. Whether you are a social scientist willing to learn R or a data scientist eager to grasp Design of Experiments and psychometrics, we are prepared to give you a hand.
Last year Harris, Murphy, and Vaisman (2013) surveyed over 250 data scientists from around the globe. The authors wanted to answer a question fundamental to data science: what are the educational and professional backgrounds of people who during last years ended up as data scientist? The authors report that they found evidence in data in favour of a scientific versus a tool-based education for data scientists. And what were the dominating fields among the scientifically educated? Let me cite the “Analyzing the Analyzers” report: “social or physical sciences, but not math, computer science, statistics, or engineering”.
Surprised? I was a bit. The results made me wonder if the surveyed sample was representative for the data science population. If “Self-Identification” part of the survey questionnaire was reliable and valid (in strict psychometric sense). And – last but not least – if the questionnaire was not missing something crucial, namely measurement of latent personality variables which possibly mediate or moderate the reported overt effects… Forgive me the dense language of the last sentences, I have been trained by experts how to care about the quality and meaning of data… By academic social scientists.
Quite in line with the reported results – not by statisticians, mathematicians, and programmers, to whom I am grateful for teaching me how to deal with numbers. Math people have a great privilege to explore a beautiful, pure, abstract universe in which numbers are disconnected from everyday meaning. Computer science experts are obviously focused on the fascinating and rapidly developing technological tools for processing digitized data. Social science on the other hand has developed a very strict methodological apparatus allowing for measurement and quantification of usually messy social and psychological reality. Furthermore, social scientists are familiar with basic statistical notions: sampling, measurement error, correlation, causation, prediction, statistical inference, etc. Many are acquainted with a bunch of quite complex methods, like repeated measures ANOVA, factor analysis, or regression, the latter often considered as a machine learning method.
If so, there should be no surprise at all – having a solid social science preparation is a great starting point for a data science career. Within the data science field we tend to algorithmize and automatize analytics. What social science people exert in SPSS or Statistica, we do by means of coding. Why coding? Just because we process data on everyday basis and it turns out convenient to have scripted procedures and programs that can be used over and over again, tweaked and combined into larger ensembles. So, if a social scientist dreams of data science, the first step (no leap at all) is obvious – learn statistical programming. What programming language to choose as first one? The most established, widespread, comprehensive and free at the same time: R. Having mastered R you will have to put some effort in learning basic technicalities of databases (all those SQLs and Hadoops are just fancy data containers). And then – voilà, you have become a data scientist. Of course, this is not the end of the highway, after gaining some experience you will find yourself ready for further considerations, for example: are there algorithms that can conduct my analysis faster and on big data sets? Yes there are, you can google them. Or maybe you will find yourself in data science team, in which computer science experts take care of algorithmic efficiency of big data processing tools and you are the one who is responsible for research strategy and analytics prototyping in R. After all you come from social science background, you are prepared to design a research process.
As the use of big data attracts more and more attention, significance of social science becomes increasingly apparent. Dr. Rebecca Eynon’s presentation at the HEA summit earlier this year’s highlights the challenges and opportunities of using big data for social science.
We know from our experience that when disciplines crash into each other great stuff can emerge. I honestly think it is hard to overstate the significance of social science to perform meaningful data science and vice versa. It is time to say goodbye to academic silos as we enter into a new age of cross/multi/and interdisciplinary work.
I encourage you to read this excellent post by Dr. Emma Uprichard from University of Warwick to explore the reasons why without social science big data cannot deal with big questions. And check out the work Dr. Patrick Dunleavy, Dr. Simon Bastow and Jane Tinkler are doing to understand the impact and opportunities presented by the the new methods coming in from the STEM sciences on the social sciences.
Data science needs to care more about data quality and meaning. The field needs to learn more about topics like research design, sampling design, Design of Experiments, psychometrics, latent variables and constructs to name a few. Without social science know-how all the fancy data science technology may find itself at danger of processing trash. GIGO – garbage in, garbage out. The other way round, social science needs to incorporate new tools developed by data scientists. Being not able to lay their hands on the big amount of digitized social data flowing all around us, academia looses a hecking lot of research opportunities. The connection between data science and social science will inevitably tighten. The latest (in)famous Facebook experiment, whatever its ethical implications, proves it is already happening. Here at Persontyle we are aware of the convergence process. Don’t let yourself stay behind.read more
The path to sustainable and meaningful advantage is being able to find new ways of managing data, discovering what’s in it, finding patterns and predictions, and what to do with it. Fuelled by data deluge, predictive models and machine learning programs are being used to improve everything around us from the way we shop to the web experiences we enjoy, and the way we receive social and health care. Public sector, telecom, sports, healthcare, retail, and agriculture are just a few industries where big data and predictive analytics are changing the way we work and live.
The proliferation of smart devices, cloud computing and mobile applications, both in our personal lives and the workplace, gives us the ability to know more about markets, customers, processes, behaviours and practices then ever before. The exciting result is that we have the ability to learn from past and present events to predict future outcomes.
We envision a future with personalized medicine. Drugs that are specifically tailored to treat us based on our unique attributes, our medical and family history. A future where roommates are selected not through Craig’s List or a four question survey from the Undergraduate Housing Department, but rather a tailored questionnaire that utilizes years of historical data in order to minimize the chance of conflict and increase my overall satisfaction. A world that has a real smart car. “One that adjusts the suspensions and gears to my nephew; a 16 year old and with his learning permit, who seems to enjoy alternating between slamming the brakes and gas pedal.” Or maybe even a movie studio that can more successfully predict the expected outcome of a movie. One that can identify commercials that are applicable to the target audience, and avoid $300mm losses on mega-flops, like Fox’s “John Carter”. This world is being enabled as we speak through predictive analytics and the open source platforms and programing environments like R and Hadoop.
One of the exciting changes in statistical analysis and predictive analytics during the last decade has been the growth of open source platforms and languages for analysing and predicting from data. R is one such programing language. Worldwide, millions of analysts, researchers, professionals and data scientists use R for data analysis, predictive modelling, machine learning and graphical analysis. R was created in 1993 by Ross Ihaka and Robert Gentleman of the University of Aukland, New Zealand. It is a GNU project which is similar to the S language which was developed at Bell Laboratories by John Chambers and colleagues.
R provides a wide variety of statistical, machine learning methods and predictive modelling techniques, and is highly extensible. You can easily download and use 4000 plus methods in statistics, predictive modelling, and machine learning free of charge. R has the enterprise capabilities needed to drive adoption across the organization and for business and technology professionals to make data-driven decisions.
When it comes to statistical modelling and predictive analytics, there are three clear leaders in the software space — SAS, SPSS, and R. But, which of the three makes sense to learn first? Which one has the most lasting power? Which will offer the most utility?
The answer is quite simple. In terms of flexibility, price, popularity and graphical capabilities, R has distinct advantages that give it a significant competitive edge.
Being able to perform a variety of functions is key for a statistical analysis program. R has thousands of packages that you can add on that allow deeper, customized functionality which execute complicated processes at the click of a button. These shortcuts are incredibly useful, especially if you are not familiar with some of the mathematics behind every algorithm. You can use methods like gradient boosting machines and random forests in 1 line of code — essentially removing the complication of knowing how to build a model in order to run it.
How does R have so many packages? It’s Open Source! That means that the product is constantly evolving as dedicated users add new features. SAS and SPSS release new versions on an annual basis, but chances are, if you are looking to add a function to R, another data scientist has already created an add-on that you can easily install and take advantage of. If SAS and SPSS don’t already incorporate a feature, then you are most likely stuck waiting for next year’s release or a patch.
Unlike its competitors, which will cost you $5,000 to $10,000 per license, R is free to use and offers unlimited access to the latest and greatest packages that the dedicated community creates. Can’t argue with that, right?
With such a dynamic and affordable product, it’s no surprise that up to 70% of analytics professionals use it at least occasionally. Starting off in the academic level, students have easy access to the platform, then can quickly adapt it for their research needs or even professional projects later on. Like with Linux, there is an incredible diverse and supportive community that offers tutorials and troubleshooting to help you take full advantage. When you interact with other analytics professionals, chances are they will also be familiar with it, making your collaboration a breeze.
Visualization is a critical component for data scientists. If you can’t find a way to communicate your findings in a clear way, then it’s going to be extremely difficult to move forward with your work. Luckily, R has unparalleled graphics capabilities due to integration with tools like ggplot2, rCharts, googleVis.
The value of expertise in R is incredible, and will only continue to grow as our world becomes more digital and data driven. School of Data Science has designed a 1 day hands-on, practical workshop for beginners to learn the core skills and concepts required for visualizing, transforming and analysing data in R. Great opportunity for data analysts, business analysts, technology and business consultants and all mortals interested in learning basics of R for effective data analysis and predictive analytics.
The workshop is designed for people who are just starting with R as well as for data analysts who are switching to R from other statistical software, such as SAS or SPSS. Read and download the workshop brochure.
Please take a moment to register now and avail the special discounts offered. Special discounts available for civil servants, charities and not for profit organizations. We look forward to having you join us for this unique learning experience. If you have any questions about the workshop or registration please email at email@example.com or give us a call +44 (0) 2032 39 3141read more
By Ali Syed
Digital world is continuously churning vast amount of data which is getting ever vaster ever more rapidly. Some analysts are saying that we are producing more than 200 exabytes of data each year. We’ve heard this so many times that managed well, this (big) data can be used to unlock new sources of economic value, provide fresh insights into science, hold governments to account, spot business trends, prevent diseases, combat crime and so on.
Over the past decade (noughties), we have witnessed the benefits of data from personalized movie recommendations to smarter drug discovery – the list goes on and on. Joe Hellerstein, a computer scientist from University of California in Berkeley, called it “the industrial revolution of data”. The effect are being felt everywhere, from business to science, from government to the society
“You are thus right to note that one of the impetuses is that social as well as cultural, economic and political consequences are not being attended to as the focus is primarily on analytic and storage issues.” Evelyn Ruppert, Editor Big Data and Society
At the same time this data deluge is resulting in deep social, political and economic consequences. What we are seeing is the ability to built economies form around the data and that to me is the big change at a societal and even macroeconomic level. Data has become the new raw material: an economic input almost on a par with capital and labour.
Organizations need data from multiple systems to make decisions. Need data in easy to understand, consistent format to enable fast understanding and reaction. They are now trying to capture every click because storage is cheap. Customer base is harder to define and constantly changing. While all this is happening expectation is to have the ability to answer questions quickly. Everyone is saying “Reports” don’t satisfy the need any more.
The global economy has entered in the age of volatility and uncertainty; a faster pace economic environment that shifts gears suddenly and unexpectedly. Product life cycles are shorter and time to market is shorter. Instant gratification society, society which expects quick answers and more flexibility more than ever. Consequently, the world of business is always in the midst of a shift, required to deal with the changing economic and social realities.
The combination of dealing with the complexities of the volatile digital world, data deluge, and the pressing need to stay competitive and relevant has sharpened focus on using data science within organisations. At organisations in every industry, in every part of the world, business leaders wonder whether they are getting true value from the monolithic amounts of data they already have within and outside their organisations. New technologies, sensors and devices are collecting more data than ever before, yet many organisations are still looking for better ways to obtain value from their data.
Strategic ability to analyse, predict and generate meaningful and valuable insights from data is becoming top most priority of information leaders’ a.k.a CIOs. Organisations need to know what is happening now, what is likely to happen next and, what actions should be taken to get the optimal results. Behind rising expectations for deeper insights and performance is a flood of data that has created an entirely new set of assets just waiting to be applied. Businesses want deeper insights on the choices, buying behaviours and patterns of their customers. They desire up to date understanding of their operations, processes, functions, controls and seek information about the financial health of their entire value chain, as well as the socio economic and environmental consequences of both near term and distant events.
“Every day I wake up and ask, ‘how can I flow data better, manage data better, analyse data better?” Rollin Ford – CIO of Wal-Mart
Although business leaders have realized there’s value in data, getting to that value has remained a big challenge in most businesses. Friends in industry have cited many challenges, and none can be discounted or minimized: executive sponsorship of data science projects, combining disparate data sets, data quality and access, governance, analytic talent and culture all matter and need to be addressed in time. In my discussions with business executives, I have repeatedly heard that data science initiatives aligned to a specific organisational challenge makes it easier to overcome a wide range of obstacles.
Data promises so much to organisations that embrace it as essential element of their strategy. Above all, it gives them the insights they need to make faster, smarter and relevant decisions – in a connected world where to understand and act in time means survival. To derive value from data, organizations needs an integrated insight ecosystem of people, process, technology and governance to capture and organize a wide variety of data types from different sources, and to be able to easily analyse it within the context of all the data.
We are all convinced that data as a fabric of the digital age underpins everything we do. It’s part and parcel of our digital existence, there is no escape from it. What is required is that we focus on converting big data into useful date. We now have the tools and capabilities to ask questions, challenge status quo and deliver meaningful value using data. In my opinion, organizations and business leaders should focus more on how to minimise the growing divide between those that realise the potential of data, and those with the skills to process, analyse and predict from it. It’s not about data, it’s about people. The real innovation in big data is human innovation.
“The truth is, that we need more, not less, data interpretation to deal with the onslaught of information that constitutes big data. The bottleneck in making sense of the world’s most intractable problems is not a lack of data, it is our inability to analyse and interpret it all.” – Christian Madsbjerg
Whether you are looking to amplify value and impact using data or you want to analyse data scientifically for insights and to ask relevant questions, Persontyle Services is the place to start to ensure you have what you need to strategize, design, implement, and fulfil your analytic needs. We will collaboratively work with you from data to insight and on to impact and value.read more
By Ali Syed and Dr. Zacharias Voulgaris
Big data is declared as the next big thing – one of the strategic resource to remain relevant, value focused and competitive in the digital economy. To truly take advantage of this opportunity organizations must become connected enterprises, rapidly obtaining and reacting to the intelligent predictions and relevant insights obtained by continuous exploration and analysis of data. The increasing importance of data and everyone’s desire to become a data driven organization has boosted the demand for people with the skills to analyze, interpret and predict from data.
As an executive you fully appreciate the significance of data to compete in the digital age, value it can bring to your organization and you are actively exploring opportunities to leverage the value of big data. You are aware of this fact that there are certain professionals, called data scientists, that can help make this happen. However, you are also cognizant of the fact that data scientists are a rare resource and this data scientist shortage won’t go away soon, because organizations need them more than ever to deal with the complex and massively data driven digital world.
“Data by itself is meaningless. It’s the skill of the data scientist that makes the difference”
Dr. Josh Sullivan
Some additional research may help you realize that this isn’t a game of musical chairs for employers, as data scientists also come about through talent development and following a team based approach to address data science skills shortage. This option, that is not so well-known, is definitely worth exploring more, which is what we shall attempt in this article.
Organizations are struggling to find individuals who possess all of the skills and abilities to think and work like data scientists. Developing your own people by helping them learn data science is an excellent strategy, considering that the most effective set-up for tackling big data problems is through a team of data scientists. Data Science is a team sport so why not use a team based approach to address the problem of Data Scientist shortage? Contrary to what many people think (or fear, rather) training your employees reduces the chances of them jumping ship. According to recent market research, 92% of employees who learned new skills in an organization decided to maintain their position. And why wouldn’t they? If the person who hired you shows that he/she cares about your skill-set and makes sure that you become a more valuable asset, then wouldn’t you want to stay close to that person and express your gratitude towards them through your work?
Whether your employees will be willing to learn or not about data science is something that you may find concerning. However, most of them are bound to be quite keen to learn new things and data science is one of the most appealing things out there nowadays, so you’ll find little resistance from them. Of course not everyone will be interested in learning the ropes of data science but you don’t need all of them take up this role anyway. If your approach to data science is team based then just a few data scientists are enough to successfully solve a complex business challenge or to create new business opportunity. To this end, this newfound team of data scientists can then collaborate with business analysts, project managers, systems architects, developers, web designers, product and subject matter experts engineers, etc.
Training your own people (particularly cross-training them) is a win-win situation that can have clear advantages for data science endeavors. This strategy brings about more agility in your workforce and enables your projects to be more flexible in their execution. By cross-training your employees you allow them to understand each other better as they have a wider frame of reference, enabling them to have a better synergy in their work. All this is very useful for data science projects in particular, as the problems being tackled by data scientists involve a more inter-disciplinary approach, making collaboration more challenging. Cross-training resolves all that, plus you get more flexibility in your project as a bonus, since you no longer need to rely on a few experts who may not always be available.
“As an executive recruiter specializing in quantitative recruiting, I work with clients continuously looking to find the unicorn that can do it all – the algorithm development, the data munging, building visualization and BI tools, scaling, and turning all this into enterprise wide adaptation. They are out there, but it could turn into a long and frustrating search. It takes a team and a solid commitment from the top.” Linda Burtch
Naturally, seasoned data scientists are also worthwhile as an investment, albeit a risky one. If you have a large organization, in particular, hiring a more experienced data scientist may be a big boost in your dealing with big data. Also, their more in-depth understanding of the data science field may bring about more useful insights about the value that can be derived from the available data. Of course, his/her positive effect will be maximized if you have some people dedicated to working on the same projects to facilitate the development of the data products involved. Regardless of all this, however, a good data scientist is hard to find and you’ll have to rely mainly on recruiters for this task. Also, even if she/he is the best data scientist around, some training to get acquainted with your domain will be inevitable. Finally, he/she may leave at any time (especially if he/she is quite experienced) as there is bound to be some other organization out there that’s willing to offer a more appealing package for data science expertise.
There are several reasons why having a mix of in-house trained and hired data scientists is the best way to go about it. Having both types in your organization will allow for a stronger data science team, combining the experience and know-how of the hired data scientist with the versatility and other merits of the in-house trained data scientists. Moreover, a hired data scientist in your ranks can help your employees learn the practical aspects of data science faster and more effectively.
Training your employees to be like the aforementioned seasoned data scientists is not an easy task, however, although it’s not too challenging either, especially today. The reason for that is that there is a large variety of resources that your employees can use to learn the ins and outs of the field. The most efficient of these resources are without a doubt data science courses, practical use case based project work and mentoring. One great place that offers data science education and talent development services is Persontyle.
Summing up, the data available is a great asset but it’s completely useless without the right people, working together as a team, turning this data into actionable information and insight. A healthy mix of in-house trained data scientists and hiring external ones is probably the most effective strategy. This can allow the formation of a flexible data science team that can benefit from both the experience of the seasoned data scientists and the various benefits that the in-house trained data scientists yield . One way to make the latter a feasible and effective option is through the use of Persontyle’s “Data Science Talent Strategy” workshop.read more
As you have probably figured out by now, data science is my thing. How do I know that? Well, I asked myself the following question: what would I do if I had all the money in the world and all my needs were met? Believe it or not, I would practice and research data science (along with some travelling probably)! Contrary to what some people in the field think, data science is for everyone, regardless of what you do for a living. It’s not limited to the few people, like me, who have a passion for it. That’s because everyone can benefit from a solid understanding of data science.
Now, you may want to learn data science because you want to reap some of its fruits, or you may want to become a full-time professional in the field. If it’s the latter you are after, you may want to check out my book which is finally becoming available in both paperback and electronic format. Unfortunately it took longer than I thought to get it out there, but there was a good reason for all the delays: we wanted it to be sufficiently good and as free of errors as possible. In this book I mention the various ways where you can learn the various skills needed to become a (good) data scientist. What I don’t mention is the series of data science and machine learning courses that are available at The School of Data Science (as these courses were not finalized at the time I wrote the text).
Persontyle is a social enterprise dedicated to data literacy. Through its various educational programs offered by the School of Data Science, you can learn the practical skills required to be a player in this fascinating field and get to have a good time at the same time. But you don’t have to start from the very basics, if you are already knowledgeable about some things. The courses it offers cover various levels and are all self-contained. Also, all the courses have a practical aspect to them, making sure that you get some hands-on experience while doing them. The best part is that they are all very short, so it is easy to fit them in your schedule. If you are unable to attend during the days that they are held, there are also customized solutions, so that you can get some training at a time and place of your convenience. This option is particularly good for you if you have a team with which you work and everyone needs to get the training.
Note, that School of Data Science also has courses for businesspeople who wish to learn about the business aspects of the field. Perhaps you are interested in hiring a data scientist and you want to learn about the field so that you can manage your expectations better and assess your candidates better.
Whatever the case, the resources in this post are really useful and can turn into great assets that can bring a lot of value to your career and the organization you belong to. I say “organization” instead of company because data science is not limited to companies only. NPOs and charities can benefit from data science as well (actually this is one of the things that Persontyle promotes), so it’s not limited to the benefit of stockholders. I hope you take your time to look into these resources and find out how they can be beneficial to you, in an educational and enjoyable way.read more
Guest post by Louis Dorard, author of Bootstrapping Machine Learning
Prediction APIs are a growing trend and they are changing the way people approach Data Science. Recently, Persontyle partnered with BigML which is a company that provides one such API. Services like BigML abstract away the complexities of learning models from data and making predictions against these models. Thanks to Prediction APIs, anyone is now in a position to do Machine Learning.
However, apart from a few blog posts here and there, there was no long-form resource to introduce you to Machine Learning through Prediction APIs. All the books on the market will teach you how to implement Machine Learning algorithms. But most people who could benefit from it are not willing to invest the time and efforts required to understand how these algorithms work. As Bret Victor wrote: “Until machine learning is as accessible and effortless as typing the word ‘learn,’ it will never become widespread.”
I was really excited when I first learnt about Prediction APIs in 2011. I kept an eye on them and eventually I decided to write the first guide to use them. Although they are indeed making machine learning quite effortless, people still need to be educated to its possibilities, its limitations, how to prepare the data to learn from, and what to do once a machine learning model has been created. As you can imagine, my core audience is not people wanting to become experts in the field but people looking to leverage these technologies for their apps or businesses. They can be hackers, startuppers, CTOs, lead devs, analysts, … They are not going to become Data Scientists, but rather what you could call Data Artisans and they can now do things that only Data Scientists could in the past.
Instead of writing a traditional book, I went for a self-published ebook and I was inspired by successful self-published authors such as Nathan Barry, Sacha Greif, or even Guy Kawasaki. The ebook is complemented by extra material such as videos, screencasts, tutorials, IPython notebooks, code, datasets, a Virtual Machine, and free subscriptions to BigML. The objective is to save time to the person who wants to get started with BigML or even Google Prediction API.
For those who need more hands-on training or who want to be able to ask me questions in person, The School of Data Science and I will soon run a workshop on Prediction APIs: stay tuned! In the meantime, you can check out the book and start using Machine Learning within a day!
My goal is to help you create better apps by using Machine Learning and Prediction APIs. If you like you can read more about me and you can follow me on Twitter (@louisdorard) to see what I’m up to.
Download a free sample of the book with a detailed table of contents.read more
By John MacCuish
Data science and machine learning techniques are transforming science, technology, and industry. Vast quantities of data – big data and small – are being explored and adding business value. Cluster analysis or clustering , namely, finding groups in data, is an extremely important tool within this process (e.g. read this post by Mark van Rijmenam, author of “Think Bigger – Developing a Successful Big Data Strategy for Your Business”).
As, humans tend to group things. It’s what we do. We can tell apples from oranges by their color and subtle differences in shape; or we divvy up a set of books into fiction and non-fiction – and subdivide it again into poetry, and literature, or math books and stat books, etc. Sometimes it is hard to group things easily, like discriminating friends, colleagues, and acquaintances. It doesn’t stop us. We love or feel the necessity to organize, to create tables and taxonomies. We classify things, ideas, and behaviors. Cluster Analysis is just such an activity: quantitative methods for finding groups in data. Originally referred to as numerical taxonomy, it leans on the quantitative side of organization, and it explores, learns, and projects classes of items from a set of discriminatory features. It generates a hypothesis about a certain order, whether the found classes are distinct, fuzzy, or overlapping. It is useful wherever there is data, almost without exception – pick an industry, a field, or a discipline, there is a use for it.
Cluster analysis is a machine learning technique to identify the groups within a dataset. This technique is applied to group a set of objects in such a way that objects in the same group (called a cluster) are more similar (in some sense or another) to each other than to those in other groups. Clustering is one of the most widely used technique and has numerous applications e.g. image segmentation, to identify groups/segments within CRM databases to offer more targeted services and products, search engines like Google use cluster analysis to identify similar documents within their indexes and social networks like Facebook, LinkedIn, Twitter use clustering to identify communities, commonalities, and bands within large groups of users. Well, to put it in a nutshell, clustering methods form the backbone of numerous machine learning applications in Big Data.
“London is calling you.” This past February, Ali Syed from Persontyle, contacted me via email about the possibility of designing and teaching a course on cluster analysis with the R language and environment in London. Given how clustering theory and applications continues to grow, coupled with the expansion of the utility and flexibility of R and R clustering software for data scientists, researchers and industry professionals, I jumped at the chance to design and teach the course. Having spent the past twenty years of working with data and machine learning in applied settings across numerous industries and disciplines, and originally much of it in S, but in the last X years in R, it is a great pleasure to put my experience and knowledge of cluster analysis into a course setting. R contains scores of packages and hundreds of functions devoted to cluster analysis (clustering algorithms, validation, visualization, analysis of results, etc.), and, the ease of creating the course through the RStudio IDE and R Markdown for reproducible web authoring, made accepting the offer to teach the course that much more exciting.
Cluster Analysis with R is a comprehensive course covering both applications and the theory of clustering. The R language and environment is uniquely positioned to provide both effectively and economically the full range of clustering methods and visualization graphics to present clustering concepts and applications. R is very flexible which will allow the students to investigate methods and data more fully. Real world and simulated data will be used throughout the course.
Data will be analyzed from fields as diverse as marketing and ecology, finance and drug discovery, bioinformatics and psychometrics. Customer segmentation for marketing or studying the impact of churning, species dispersal or ordination, designing diverse compound libraries, studying gene expression, determining cliques and cohorts in a population, finding social communities in social media, all involve the application of cluster analysis.
This course will reward those with a very limited understanding of clustering and those who have some experience with clustering but want a broader and deeper understanding of cluster methods and analysis. Participants will come away with an understanding of a full set of clustering tools and the theory behind them, so that they will know the subset of tools to use for their applications … and why. They will learn that clustering is not a simple cookbook exercise, and that they will have to bring a good many methods and creativity to bear on each clustering problem. In the R coding labs, the participants will work with extensive R code examples, and, having used and explored them, they will develop the fluency necessary to modify and extend them to their specific projects.
After attending the course, participants will be able to use R to perform exploratory data analysis, feature selection, dimension reduction, and further pre-processing necessary for the use of their data for clustering. They will be able to find and use the appropriate (dis)similarity measure(s) (symmetric and asymmetric), choose the appropriate set of clustering algorithms to explore from among the different types (partitional, hierarchical, hybrid, online, graphical, asymmetric, co-clustering, etc.), given the size and type of their data. They will be able to employ the various validation and visualization R functions on their clustering results, and have the tools to explore cluster stability, plasticity, and ambiguity.
I’ve designed this course for people interested in developing practical skills on how to implement clustering algorithms using R. Along with basic experience of programing in R and knowledge of statistics, you’ll need an inquisitive mind and curiosity about analyzing data for insights and predictions, and how to best group it. Looking forward to meeting you in the class and together we will do some clustering!
About the Author
John D. MacCuish is the founder and president of Mesa Analytics & Computing, Inc. He has co-authored several software patents and has worked on many image processing, data mining, and statistical modeling applications, including IRS fraud detection, credit card fraud detection, and automated reasoning systems for drug discovery.read more