Quantcast
Channel: team
Viewing all articles
Browse latest Browse all 1477

On Data And Entrepreneurship

$
0
0
These lecture notes are from my Lyon & Bendheim Alumni Lecture Series talk on April 25, 2016 edited to include answers to questions from the lecture. I graduated from Tufts in 2000 with a degree in Computer Engineering. In the 16 years since, I’ve worked primarily at startups using and building technologies for data analytics. These include a local company called Vertica that’s now part of HP and a San Francisco-based company called Cloudera, which prompted my move to the Bay Area in 2010. Two years ago I moved back east to Vermont, on the border with Dartmouth, and cofounded a company called Rocana, where I currently serve as CEO. While I didn’t learn to be an entrepreneur at Tufts, my experiences here certainly fanned the flames. Throughout my career at startups I’ve learned about the importance of data. I’m going to talk a bit about the impact that data can have on entrepreneurship, highlight where Tufts stands out, and share some ideas about how Tufts can uniquely prepare students in a world where it’s become a necessity to understand the science of data. What I love most about Tufts is that it’s a very hackable school. I mean that in the classic sense of being malleable, not the malware or intrusion and stealing your data sense. Tufts has just enough process and rules that an enterprising student can find a way to achieve almost anything. I know this from personal experience. My freshman year, I wanted to take a data structures course with Alva Couch and the following year I had my sights set on a compilers seminar with David Krumme, who passed away a few years ago. Neither of these was a normal course order. The schedules and prerequisites didn’t align with all the other classes I had to take and I ended up in intro to chemistry my second semester senior year, instead of my freshman year. Yet with enough juggling, my advisor and I found a way to get into those courses and as a result, both professors had a huge impact on my education and career. My junior year — that’s back in 1998 — I found some research on distributed systems being done at Oxford University. Encouraged by my ability to hack the system in previous years, I convinced my advisor to help me transfer many of my core credits from the program I attended at Oxford. That’s not something that is factored in to the standard undergraduate engineering course schedule. Today, distributed systems plays a critical role in global infrastructure and having the opportunity to study distributed computing while at Oxford changed the trajectory of my career. The extracurricular activities at Tufts are a breeding ground for entrepreneurship. My senior year we decided to stage a joint Film Series and Torn Ticket production of the Rocky Horror Picture Show, with rice and toast and a plan to start it at midnight as is tradition. The Biology department quickly squashed the idea of throwing food in the lecture hall in Barnum where Film Series normally screens movies. We managed to convince dining services to let us stage it in the Dewick dining hall — as long as we helped clean up and covered overtime. The problem was that the show had to end by 1 am and everything had to be cleaned up by 2. That didn’t give us enough time to start the movie at midnight, especially with a pre-show. We hacked the schedule by showing the moving on daylight savings weekend and gained an extra hour when the clock struck 2 am and it instantly became 1 am again. We also figured out that to get a big enough audience we could only charge $3 for tickets instead of the usual $2 (back when Film Series movies charged a fee) but we needed $5 per person to cover the additional cost. So we got a lot of $1s and sold bags of toast and rice for $2. The other stand out quality I love about Tufts is the connectedness that ties the University together. Tufts is much more than just a collection of departments and colleges. There are genuine and valuable touch points between the schools and regular collaboration across departments. The T10 plan specifically calls out the importance of connectedness, citing how “seemingly disparate forces can, and must, work together if we are to reach our full potential.” I’m jealous of Tufts students who get to study entrepreneurship because it’s not just a course, it’s the essence of the university. In my experience, entrepreneurship is primarily about finding ways to fulfill a vision. After graduating Tufts I worked at a series of startups that relied heavily on data, both as a consumer of, and later building, technology to use data. Throughout my career as an entrepreneur, I’ve learned how powerful data is as a tool to help entrepreneurs find the patterns and gaps that others may not see, and the importance of learning to harness that knowledge. As I changed careers from a product engineer to a field engineer to sales and marketing to now running a company, the two constants that have been critical to my success were an understanding of technology differentiators and the impact of data. At Cloudera in particular I started to appreciate the value of data to every single industry, from finance and media to health care and education. The industries that have just started to go deep with data, including manufacturing, farming and shipping, will likely end up being the most impacted. Thanks to pioneers at companies like Google, Facebook, LinkedIn and Y!, collecting and analyzing data has become a science all its own. You can study how to collect and use data in research, product development, education, social sciences and day to day business. It was my experience working with customers at Cloudera that prompted me to start Rocana together with my cofounders. We noticed that the biggest challenge to any technology adoption was on the operations side. The teams responsible for running all our technology are both underappreciated and under resourced. We were investing significant time and dollars enabling businesses to benefit from data and yet we were missing the core benefit of applying data to make operations more reliable. My passion for helping customers solve problems came to a head when I saw this need. As difficult as it was to leave a fantastic job at a high growth company, I had a vision for a solution that could only be fulfilled by starting Rocana. At Rocana we help some of the largest companies in the world operate the technology infrastructure that runs our world. Things like cloud computing, that you’ve probably heard of by now, is really a concentration of computers where you can rent access to small or large chunks of processing and storage by the hour. If you want a thousand computers for a just few hours, you can get it instantly and pay for the time you use without buying any hardware or finding a place to put it. The cloud is an incredibly powerful structure and also much more complicated to operate than it is to run a single purpose computer. Rocana’s software helps companies operate complex infrastructure like cloud computing. The company was born out of the realization that the best way to understand the behavior of a complex system is to collect as much information as possible and find intelligent ways to sort through it. As I talk to people about Rocana, my experience is that most still don’t realize that it’s possible to collect near limitless volumes of data. And when I explain to people how and why to collect all this data, it takes a while before they really understand what they can do with it. A good way to think about the value of data is to consider the observation and pattern matching that we all do as humans. What we see and how we think can be extended and enhanced by machines in ways that have only recently been made possible. You can still go to the library, find a bunch of books on a topic, scan the indexes and identify the relevant bits about any topic. Google has figured out how to index nearly every topic and make the answers available instantly. In fact, they now can summarize the results automatically. To give you some more examples of how people are using data to impact our lives, consider that during the crash of the financial markets eight years ago, the companies that lost money were buying and trading complex financial instruments without understanding what was behind them — really bad mortgages. The companies that made money were both those who spent time looking at the underlying data as well as the companies that were algorithmically analyzing every single quote on every single trade across the entire market and betting, sometimes for the length of time it takes for a signal to transmit from NY to Chicago, that they could find someone to pay a fraction of a cent more on a handful of shares of something. This is data analysis to the extreme and the companies that did this well not only made a fortune, they buffered what would have been a completely uncontrolled crash of the markets with added liquidity. If anyone here uses Facebook, you know that the reason it’s a free service is that they sell spots in your timeline to advertisers, just like the major TV networks sell ad time during free TV shows. Facebook analyzes every single thing you do, sometimes in aggregate and sometimes, algorithmically, in detail, to show you hopefully not just relevant ads, but ads that make them money. The company made nearly $18 billion last year, up almost 50% from the previous year, by showing ads to 1 billion people a day. If you want to see a hockey stick growth chart, look at Facebook’s quarterly revenue for the past five years. Those are two very capitalistic examples, and yes data science originated in finance and advertising. In fact the guy who created the data science team at Facebook was an analyst at Bear Stearns. It turns out that data can be used to optimize things other than profits. Data has long played an intrinsic role in investigative journalism. The phrase “follow the money,” originally associated with the Watergate scandal, is emblematic of gathering data to corroborate a story. Today data is used by journalists both for good and bad. Statistics are regularly abused and more often misunderstood or misrepresented in order to make a case rather than deeply understood and presented as factually objective. There’s a relatively new group at Tufts called Enigma that’s been holding symposiums and helping teach students how to mine for data, analyze it and present it responsibly. The most important part of data journalism, just like any journalism, is citing sources. Data analysis is a science and all the results should be reproducible. Data is being used in healthcare to see things that humans cannot. Researchers at Case Western have demonstrated that data from magnetic resonance imaging scans can highlight different textures in tissue samples when analyzed algorithmically. The researchers have been able to train an algorithm to discern between various types of cancer. Data is also powering outpatient and follow up care. Express Scripts, which processes the prescriptions for 30% of the US, is using that data to identify factors of non-adherence and take different actions based on predicted behavior. Something as simple as a reminder text or call to take a prescription or automatically sending refills can significantly reduce complications and rates of hospital re-admittance. They can also identify possible instances of drug abuse by looking at prescription patterns. Farming has been developed more by brute force than by science for millennia as people cross bred plants for certain characteristics without understanding the underlying genetics. That’s all been changing. If you’ve heard about the internet of things and self driving vehicles, these data heavy technologies are already being used in farming. Think Norman Borlaug scale innovation happening on a regular basis. What’s interesting is that the same companies who used brute force genetic modification and chemicals to improve yields are now turning to data to understand what changes can have positive or negative impacts beyond just basic yield. This includes being able to automatically track, down to the square meter, the seeds and their planting and growing conditions, and compare results across millions of acres. You’re probably familiar with online courses and Tufts is involved in the open courseware initiative. The cutting edge of data in online courses is happening at companies like the Apollo Group who run the University of Phoenix. The same way people track online activity to show ads, Apollo Group studies how people learn and optimizes the online courseware. Their goal is to recreate on a massive scale the same experience that people had when there was individual apprenticeship and personalized attention to learning. Local companies like Ellevation education are applying similar data concepts to help people learn English. These are all entrepreneurial expeditions. They each have a vision and are using data to help them fulfill that vision. What they all have in common is that they realize the value of collecting more data than you think you might need. Collecting more data forces you to think about what you’re trying to achieve and what questions you’re trying to answer. If you’re familiar with behavioral economics, an overwhelming amount of data can help break the “What You See Is All There Is” mentality. More data increases the resolution of the picture you’re looking at, reducing the chances that you find misleading patterns or completely miss the right patterns. A talk about data and entrepreneurship isn’t complete without covering some startups. Today, every new company is a technology company and a data company. Some are creating the tools to make it easier to collect and analyze data and others are putting these tools to work. Rocana creates software that helps people use data to manage large scale IT infrastructure and we rely on tools from companies like Cloudera. There’s a local company founded by Andy Palmer and Mike Stonebraker called Tamr that uses machine learning to help people match data across catalogs such as purchasing systems and clinical databases. One of the most interesting emerging industries is data security. Just like network security exploded as an industry when we started connecting corporations to the Internet, data security is becoming an important factor in managing corporate data that is getting stored in the cloud. Because of how entrepreneurs use data, I can push a button and have a car waiting outside to take me (almost) anywhere — that’s a data company that is changing the fundamentals of not just taxis but car ownership as a whole. There are a dozen companies, big and small, using data to create a self driving car that can drive better than any of us here. Combine those two and you may see the most dramatic drop in car accident related injuries and deaths since the introduction of the airbag. Look at the rate of advances in electric motor and battery charging technologies and you’ll see a change in the trucking industry that will cause ripples throughout the economy and foreign policy that no one can begin to forecast. I met my wife over twenty years ago when we were both in high school and the idea of meeting someone online was cause for ridicule. As a result, I have no idea whether swiping left or right is the way to indicate that you like what you see. What I do know that these concepts, combined with the types of algorithms that power companies like Match.com, are changing not only dating but recruiting, entertainment, tourism and even college admissions. In fact, there’s a company based in Japan called Recruit that was started in 1960s to help students find their first job using surveys and data analysis. Over the years they’ve both expanded the scope of their services and the sophistication of their analysis. From recruiting services they got into travel, nightlife, matchmaking, wedding planning, car and house rental and purchase, education and career services, coming full circle. All of these increasingly driven by data. The use of data in marketing, advertising and product recommendations introduces new challenges around the ethics of using data and begs the question of whether our expectation of privacy has changed. While companies now know more about us than ever before, they’ve always known more than we might expect. Before big companies were the norm, when everyone bought their sundries from the local drug store, the druggist knew everything about his customers’ lives. The banker knew how much we earned and spent. The hairdresser knew all the town gossip. We just expected them not to use this information to sell us, even if they were using it to help us. Nielsen families reveal lots of information about their daily lives but that information is used in aggregate, not to target them individually. The ready availability of data has changed the equation for how data is used to target or help us personally. Older generations see this as creepy while we younger generations have come to accept it and even expect it. If I go to a website and it doesn’t help me find what I’m looking for or if I’m shown advertising and it’s not relevant, I get offended. There have been discussions about creating a Data Science major here at Tufts. Many schools now offer undergraduate and graduate degrees in Data Science, including MIT, Case Western, Rochester, and WPI. Looking at the curriculum and extracurricular activities at Tufts, there is already a lot of data related education happening. I encourage you all to think about the data that’s available to further the entrepreneurship that takes place here every day. Data is not just a computer science or social science skill and it’s about more than just statistics and probabilities. Data is just like language, you need to know how to use it, you need to understand the ethics of collecting and applying data. Data analysis has become a fundamental skill and Tufts has an opportunity to teach students how to use data to their advantage, to fulfill their visions.
image url: 

Viewing all articles
Browse latest Browse all 1477

Trending Articles