- YouTube/Apache Spark
Study after study shows that one of the best jobs in tech these days is the “data scientist.” It’s a job that’s high in demand, with not enough qualified people in the market, so the pay is high.
For instance, “data scientist” was recently named the No. 1 best job in America for 2016, with a median base salary of $116,840, according to job hunting site Glassdoor.
Data science has to do with the “big data” trend, where companies are now storing massive amounts of information and sifting through it to find business insights, and using all that data to offer their customers new programs and services.
So it’s not surprising that schools everywhere from traditional universities to online coding schools are offering to teach folks data scientist skills. Some of them charge a lot of money for their programs. Some of them are free but they leave students on their own to put together a “big data” platform so they can study, learn and practice in a real-world environment.
One of the biggest names in the big data science, a hot startup called Databricks, has come up with a solution. It will soon be giving students a free version of its big data cloud service called the “Community Edition.”
The company is currently accepting applications for a wait list for students who want to receive the Community Edition.
The company likens this “an easy button” for learning data science, Databricks CEO Ali Ghodsi tells Business Insider.
Training is in DNA of Databricks. The founders are UC Berkeley professors who invented the Databricks technology, called Spark, while still at UC Berkeley. In fact, Ion Stoica, Databricks’ original cofounder CEO, (now executive chairman), is still a well-known professor at Berkeley.
A year ago, they released a free data science online course that trained people to use the big data tech they invented, called Spark.. Demand for their massive open online course (MOOC) went nuts. Some 20,000 people have completed it, Ghodsi says, and over 76,000 people to date have enrolled.
“We wanted to double down on this. We thought, how can we teach more people, teach 100,000 people how do data science? That’s what we’ve been working on for six months, this version of Databricks,” Ghodsi tells us.
Databricks cofounder is a cult star
The reason for the interest, and why learning Spark is the doorway to a high-paying data science job, is that Spark one of the hottest technologies around these days.
Databricks cofounder and CTO Matei Z
aharia created Spark while earning his PhD at Berkeley with help from the other cofounders.
He released it as a free and open source project, meaning anyone can take the software, use it for free and make changes to it. So far, 1,000 people have contributed to it, Ghodsi says.
With the rise of Spark’s popularity, Zaharia has become a cult hero in the world of big data.
Ghodsi likes to tease him about the fame: “There are a lot of requests for taking a picture with Matei. Do you mind if we take a picture with Matei? This has become so common, I suggested we get full-size portraits of Matei and place them around for photos,” he says.
— Evan Casey (@ev_ancasey) February 19, 2016
This is just one of the many photos with Zaharia on Twitter.
Why Spark is so hot
There are two parts to big data, storing all the data and using it, or in geek-speak “big data processing. A popular technology called Hadoop is used to store the data in a lot of low-cost computer storage systems.
Spark does the processing and it’s famous for doing “real time processing,” crunching through huge amounts of data so fast it spits out results the instant new data is added to the system. It is that kind of realtime processing that is ushering in the new wave of smart, machine learning systems that are powering things like self-driving cars, face recognition tech and the like.
Their company, Databricks, offers a commercial cloud version of Spark.
Spark really burst onto the scene last year when IBM announced plans to invest about $300 million over the next few years into the open source version of Spark, and assign 3,500 people as Spark developers. IBM wants to use Spark for its own ambitious machine learning and cloud computing services involving its own smart technology called Watson.
In some ways, IBM will ultimately compete with Databricks, as they both try to get customers for their clouds. But in other ways, IBM is helping to develop the underlying tech, and the two companies are starting to explore working together for a few joint customers, too, Ghodsi says.
In any case, there’s plenty of interest in Databricks and Spark to go around. Databricks launched its cloud service in June, 2014, and has grown to “several hundred” paying customers, and 100 employees.
“I’ve been in startups before, you usually spend a lot of effort to get your name out there. For us they are knocking on our door,” Ghodsi says.
That’s likely not hyperbole. Here’s a tweet to Zaharia from the assistant VP of Scottish Development International, an agency that gets companies to invest in Scotland.