By Dr. Pawel Kobylinski
Upon joining the Persontyle team, I have decided to introduce myself to the readers of this blog by giving you a rationale behind the connection between social science and data science. Persontyle wants to help in transferring both ideas and strict technical know-how between the two disciplines mentioned. Whether you are a social scientist willing to learn R or a data scientist eager to grasp Design of Experiments and psychometrics, we are prepared to give you a hand.
Last year Harris, Murphy, and Vaisman (2013) surveyed over 250 data scientists from around the globe. The authors wanted to answer a question fundamental to data science: what are the educational and professional backgrounds of people who during last years ended up as data scientist? The authors report that they found evidence in data in favour of a scientific versus a tool-based education for data scientists. And what were the dominating fields among the scientifically educated? Let me cite the “Analyzing the Analyzers” report: “social or physical sciences, but not math, computer science, statistics, or engineering”.
Surprised? I was a bit. The results made me wonder if the surveyed sample was representative for the data science population. If “Self-Identification” part of the survey questionnaire was reliable and valid (in strict psychometric sense). And – last but not least – if the questionnaire was not missing something crucial, namely measurement of latent personality variables which possibly mediate or moderate the reported overt effects… Forgive me the dense language of the last sentences, I have been trained by experts how to care about the quality and meaning of data… By academic social scientists.
Quite in line with the reported results – not by statisticians, mathematicians, and programmers, to whom I am grateful for teaching me how to deal with numbers. Math people have a great privilege to explore a beautiful, pure, abstract universe in which numbers are disconnected from everyday meaning. Computer science experts are obviously focused on the fascinating and rapidly developing technological tools for processing digitized data. Social science on the other hand has developed a very strict methodological apparatus allowing for measurement and quantification of usually messy social and psychological reality. Furthermore, social scientists are familiar with basic statistical notions: sampling, measurement error, correlation, causation, prediction, statistical inference, etc. Many are acquainted with a bunch of quite complex methods, like repeated measures ANOVA, factor analysis, or regression, the latter often considered as a machine learning method.
If so, there should be no surprise at all – having a solid social science preparation is a great starting point for a data science career. Within the data science field we tend to algorithmize and automatize analytics. What social science people exert in SPSS or Statistica, we do by means of coding. Why coding? Just because we process data on everyday basis and it turns out convenient to have scripted procedures and programs that can be used over and over again, tweaked and combined into larger ensembles. So, if a social scientist dreams of data science, the first step (no leap at all) is obvious – learn statistical programming. What programming language to choose as first one? The most established, widespread, comprehensive and free at the same time: R. Having mastered R you will have to put some effort in learning basic technicalities of databases (all those SQLs and Hadoops are just fancy data containers). And then – voilà, you have become a data scientist. Of course, this is not the end of the highway, after gaining some experience you will find yourself ready for further considerations, for example: are there algorithms that can conduct my analysis faster and on big data sets? Yes there are, you can google them. Or maybe you will find yourself in data science team, in which computer science experts take care of algorithmic efficiency of big data processing tools and you are the one who is responsible for research strategy and analytics prototyping in R. After all you come from social science background, you are prepared to design a research process.
As the use of big data attracts more and more attention, significance of social science becomes increasingly apparent. Dr. Rebecca Eynon’s presentation at the HEA summit earlier this year’s highlights the challenges and opportunities of using big data for social science.
We know from our experience that when disciplines crash into each other great stuff can emerge. I honestly think it is hard to overstate the significance of social science to perform meaningful data science and vice versa. It is time to say goodbye to academic silos as we enter into a new age of cross/multi/and interdisciplinary work.
I encourage you to read this excellent post by Dr. Emma Uprichard from University of Warwick to explore the reasons why without social science big data cannot deal with big questions. And check out the work Dr. Patrick Dunleavy, Dr. Simon Bastow and Jane Tinkler are doing to understand the impact and opportunities presented by the the new methods coming in from the STEM sciences on the social sciences.
Data science needs to care more about data quality and meaning. The field needs to learn more about topics like research design, sampling design, Design of Experiments, psychometrics, latent variables and constructs to name a few. Without social science know-how all the fancy data science technology may find itself at danger of processing trash. GIGO – garbage in, garbage out. The other way round, social science needs to incorporate new tools developed by data scientists. Being not able to lay their hands on the big amount of digitized social data flowing all around us, academia looses a hecking lot of research opportunities. The connection between data science and social science will inevitably tighten. The latest (in)famous Facebook experiment, whatever its ethical implications, proves it is already happening. Here at Persontyle we are aware of the convergence process. Don’t let yourself stay behind.read more