What is now the accepted definition of data scientist ? If you google the term 'data scientist' you are going to get a million results, most of them repeating the curated definitions over and over again.
Bottom line - data scientists are the curious kind. Thanks to Dr. DJ and others for defining it for us. I'm not sure if the term "Data Scientist" deserves the unicorn title. But it's really someone who doesn't get intimidated by problems and who asks for more data to solve. If you have a team of analysts, you can always find few folks in the group who'll be willing to work for no pay (figuratively speaking) to get insights in the data - just trying to get to the bottom of issues. They'll make a time and $ investment to learn any tool that'll help them to solve the problem. In other words, any tool (R/Python/Java/SQL/Scala etc.,) is just the means to the end - to satisfy the curiosity that keeps them awake at night about data and what it's saying. If they don't have time, they'll hire people like that and frame problems nicely so it can be solved.
If this sounds like you or a colleague, then we have found a data scientist ...
Are Data Scientists just gifted or can they be trained to be like above ?
As with any art or trade, some are just pure gifted out of the gate and others by either association or practice. Some simply find their inspiration by stumbling into certain DS tools that are incredibly easy to use. A few get their inspiration by a good prof. at school that sparked their curiosity. Others by working with a fellow data scientist who inspired them.
Data Scientists can come from all backgrounds - from math PhDs, to Bachelors in Biology or just a fanatic coder or a SQL enthusiast.
What's an analogous example in the real world ?
If one ever tries to learn something new like golf, skiing, painting, piano or whatever, it's hard to say what clicks unless we try it out yes ? Some keep at it until it clicks or give up and some just stumbles into the correct tools+conditions+tutor and bingo find that they are good at it immediately. With practice, it can only get better over time ...
Does learning Statistics, R, Python, Scala and knowing how to install Apache Spark turn you into a data scientist overnight ?
Unlikely, unless you are an analyst struggling to solve certain problems with traditional tools and writing 1000s of lines of code. If an analyst is writing 1000s and 1000s lines of SQL or Procedural code to find out who the influencers are or whether someone will checkout a shopping cart and spent years perfecting it with existing tools, then he or she is candidate to be a rocket data scientist. Learning R, Teradata Aster SQL/MR, Scala etc., will suddenly make a lot of sense. Machine Learning should make that analyst sleep well at night as there are now smarter ways to do things using data science vs traditional rules based problem solving.
I'm a graduate with a fresh Data Science degree
As a student, if you learnt a new tool and not stuck to it and willing to learn new things, you are on your way to be a data scientist. It's all about learning new stuff and never giving up and pushing the envelope. Data Scientists generally never get tired of data or problems! They ask for more to find insights.
What are a few things that a data scientist does differently ?
To be a data scientist, requires one to have an entrepreneurial & fail fast attitude. It's a big cultural shift (ask my friend @John Thuma). It's all about trying different things without the fear of failure or deadlines. Most importantly, it's not about fitting into a mold or being politically correct to match existing perceptions on known insights. A good data scientist will present what the data is telling accurately and be candid enough to admit that results are inconclusive, if it really is.
I've seen junior data scientists not visualizing stuff when they have less data, because they tend to wait until all the ducks line up, data parsed and ready. Sometimes just visualizing stuff with data that you have can give you a lot of information w/o waiting for everything to come together. There is no cost to it, so why not try it ? When you get new data, re run models, visualize and keep playing!
Biggest impediment to data science is wearing an 'ops' hat from get-go. That'll kill it. While it's paramount to product-ionize stuff at the end, starting with that will kill creativity and suck the energy out of a data science project. Playing with the data is extremely important until the science matures. Most "new" data science problems may not have precedence as it's an evolving practice, especially with machine learning/deep learning etc.,
It's like asking your kid to think about performing in Carnegie hall, on the first day of piano class
Data scientists need space for art. So this is a bit different than a traditional approach where the processes are universally more refined/evolved and established. However when things reach steady state and repeatable, the data scientist must figure out the 'ops' scenario using the current tools available.
Note: Organizations with more mature data science initiatives can certainly create (have created in the past as well) an environment that can benefit both creativity and operationalization!
Art/Science show case by Teradata data scientists ...