Videos provided by Strata Conference via O'Reilly YouTube Channel
Join us at O'Reilly's Strata Conference in Santa Clara to see the future of big data—as well as the analytics, architectures, techniques, and tools you need to put it to work.
Crossing the Chasm has been a key reference point for high-tech marketing since its publication in 1990, but a lot has changed since then, especially with the rise of cloud computing, software as a service, mobile endpoints, big data analytics, and viral marketing. This has led author Geoff Moore to produce a revised edition, released on January 28, with all new examples taken from the last decade and two new appendices to help bridge the gap between what's new and what's not. Join Geoff as he highlights lessons learned in bringing disruptive innovations to market in the 21st century.
Our legacy information architecture is not able to cope with the realities of today's business. This is because it is not able to scale to meet our SLAs due to separation of storage and compute, economically store the volumes and types of data we currently confront, provide the agility necessary for innovation, and most importantly, provide a full 360 degree view of our customers, products, and business. In this talk Dr. Amr Awadallah will present the Enterprise Data Hub (EDH) as the new foundation for the modern information architecture. Built with Apache Hadoop at the core, the EDH is an extremely scalable, flexible, and fault-tolerant, data processing system designed to put data at the center of your business.
To harness data one must first have data. Effective collection of massive amounts of data can be a challenge in and crowdsourcing data may well be an efficient solution in some situations. The International Barcode of Life and Technical University Munich (TUM) ProteomicsDB projects are two great examples of collecting and gathering data via crowdsourcing. The former will crowdsource the collection of DNA samples around the world by enabling citizen scientists to provide insect and plant specimens in return for identification and detailed information about their organism. The aim of the consortium of institutions across 25 nations including Canada, the United States, Germany and China, is to create a database containing a DNA-based barcode for every species in the world; with a goal of 500,000 species by the end of 2015. Crowdsourcing the data via a consumer-style application is seen as key to achieving this. The TUM ProteomicsDB project focuses on crowdsourcing data from within a given scientific community, but none the less relies on crowdsourcing to fill its burgeoning data store. The project stores protein and peptide identifications from mass spectrometry-based experiments and the data assembled provides identification of proteins mapping to over 18,000 human genes representing 90% coverage of the human proteome. It currently contains more than 11,000 datasets from human cancer cell lines, tissues and body fluids and enables real-time analysis of this highly dimensional data and creates instant value by allowing to test analytical hypothesis. The crowdsourced data stored and analyzed within ProteomicsDB can be used in basic and biomedical research for discovering therapeutic targets and developing new drugs as well as enhanced diagnosis methods. SAP is proud to be involved with driving the success of both these projects.
This keynote is sponsored by SAP
Humans are constantly curious and learning should be about making new discoveries. With big data, we have the potential to take formal learning which is taught and combine it with informal learning which is experienced, to create personalized learning paths for every individual. Ramona's personal story set her on this path and she will share what it means for companies and people across the world.
This five-minute keynote will provide a quick overview of some of the more surprising things Hadoop is capable of in 5 minutes or less.
This keynote is sponsored by MapR Technologies
We feel safer in big numbers, and we believe that numbers don't lie. But numbers don't actually speak for themselves -- people speak for them. The interpretive layer matters -- research is not design, but design is nothing without research. And when we're really trying to understand the how, the why, and the why not; when we need to inspire creative solutions and anticipate user needs, we need to talk to real people, even if it means we only talk to a few of them, and even when they might be "lying".
How does the world change when big data reaches a billion people? What happens when anyone, from farmers to criminal investigators, gains the power to quickly derive meaningful insights from vast and varied data sources? Join Quentin Clark, Microsoft Corporate Vice President, who will highlight how simple, familiar tools and cutting-edge cloud technologies are bringing big data to all.
Eli Collins discusses Cloudera's Enterprise Big Data Hub and gives his insights on Big Data and the federal goverment.
The gap between legendary and anonymity in sports is often less than a 1% performance difference in elite sports. Thus, finding the core, modifiable variables that determine performance and tweaking them ever so slightly can alchemize silver medals into gold ones.
Quentin Clarks highlights a few of the important topics from his keynote and discusses how Microsoft is delivering in the data ecosystem.
Ever see a death-defying stunt and wonder, "How did he practice that without getting killed?" By peering into the layered nature of practice through the lens of what high-level pro skateboarders do, we'll investigate how the nuanced way in which they go about it can be the biggest determinant of success— especially at groundbreaking levels. Tuning the dynamic balance of the analytic methods by which we learn with the internal sense or feel for the whole of what we're learning is an art, specific to the individual as well as the task. The better we tune our practice, the more practice will make perfect.
Vince Dell'Anno discusses big data trends, machine to machine learning, and the Internet of things.
Eric Frenkiel discusses his enterprise-focused startup, MemSQL, and its recent challenges and wins.
JC Raveneau discusses SAP's Big Data BI strategy and solutions, as well as, coming wave of machine-generated data from the Internet of Everywhere.
Paul Kent discusses the new Visual Analytics tool by SAS and the technologies that make it possible.
Since 2008, Epstein has co-authored several of SI and SI.com's most high profile investigative pieces, including the revelation that Yankees third baseman Alex Rodriguez tested positive for steroids in 2003, and an investigation that revealed a pattern of NCAA violations under former Ohio State football coach Jim Tressel. Audiences from the education, business, and health worlds will be captivated by the knowledge and experience Epstein possesses on the topics of sports. His speeches will alter the public perception of how athletes train and work towards success through learned performance from a genetic and biological perspective. Tying this in with how companies function today, the wisdom Epstein shares will produce irreplaceable lessons for all.
Big Data is the grand challenge of our time, most analytic effort is like ground control: the hard work behind the scenes that enables ambitious analysis to occur. Speaking from two perspectives in tech and design, we describe "Predictive Interaction"—a new model for high-level human-data interaction that radically improves the productivity and accessibility of the most time-consuming work in the analytics lifecycle.
Big Data technologies enable us to build the digital brain of smart systems. I will illustrate with examples how we build a digital brain by collecting data from a large number of sensors and using the brain to find value in that data. We build a Data Lake using cutting edge technology from Pivotal and use it to store large amounts of sensor and other data. Then we can find patterns in that data by applying the Data Science methodology using sophisticated machine learning and statistical algorithms customized to run on big data within the Data Lake. Armed with these patterns the system can detect anomalies and respond in an appropriate manner. Data Science combined with sensors and actuators can make a system smart!
At Intel, we envision a future in which every organization in the world can use new sources of data to enhance its operational intelligence, fostering discoveries and innovation in science, industry, and medicine. To accelerate this transformation, Intel and its partners are offering customers a new choice: advanced analytics tools like graph processing and machine learning running on open software platforms like Hadoop and Lustre using new computing architectures that enterprises can deploy with agility and confidence. Join Boyd Davis, VP & GM of Intel's Datacenter Software Division, to get a glimpse into the future of a datacenter-scale open source operating environment.
While the first big data systems made a new class of applications possible, organizations must now compete on the speed and sophistication with which they can draw value from data. Future data processing platforms will need to not just scale cost-effectively; but to allow ever more real-time analysis, and to support both simple queries and today's most sophisticated analytics algorithms. Through the Spark project at Apache and Berkeley, we've brought six years research to enable real-time and complex analytics within the Hadoop stack.
Big Data without analytics is just data, but how do you perform the analytics? For many, this is a sequential process: process your big data and then move it to where you can perform analytics. But what happens when you can perform analytics wherever your big data sits? This notion of In-Hadoop analytics is changing the game for the possibilities of Hadoop.
This keynote is sponsored by IBM
Cesar Rojas discusses some of the 2014 trends in the data ecosystem and what he hopes to see in the next year.
Jack Norris discusses a few items that are important for the development of the Hadoop ecosystem and what he hopes to see in 2014.
Boyd Davis shares the Intel perspective of the big data and hadoop ecosystem from openess to the democratization of tools and services.
Kaushik Das discusses the success of Pivotal One, Pivotal's new PaaS, and their integration of agile development with data science.
Anjul Bhambhri discusses her insights of In-Hadoop Analytics and shares how IBM's Watson group will effect the data-hadoop community.
Hadley Wickham is Chief Scientist at RStudio. He is an active member of the R community, has written and contributed to over 30 R packages, and won the John Chambers Award for Statistical Computing for his work developing tools for data reshaping and visualisation. His research focusses on how to make data analysis better, faster and easier, with a particular emphasis on the use of visualisation to better understand data and models.
Joe Hellerstein gives a brief history of Trifacta and discusses the new solutions they are providing to the market.
How do we know how many people have been killed in Syria? If violence is escalating or decreasing? The hard answer is we don't. But through careful application of machine learning and other statistical techniques, we can quantify what we do, and don't, know. In this talk Megan will present how the Human Rights Data Analysis Group uses random forests, multiple systems estimation, and various Python and R packages to estimate conflict casualties.
Strata 2014: Ben Fry; "Keynote with Ben Fry"
When failure becomes invisible, the difference between failure and success may also become invisible.
We each want to dissect and apply the lessons gained from the life stories of diet gurus, celebrity CEOs, and superstar athletes. We'd all like to deconstruct success and reconstruct it in our own lives, but looking to the successful for clues about how to better live your life is, at best, an incomplete strategy and, at worst, a giant waste of time.
David McRaney will tell the story of how the Department of War Math in World War II helped bring to light the psychology of how we miss what is important when it comes to failure, and how the modern understanding of the psychology of luck provides the best game plan for getting the best out of life.