http://petewarden.com/
https://metamarkets.com/2012/data-scientist-interview-pete-warden-co-founder-of-jetpac/
http://www.datascienceweekly.org/data-scientist-interviews/object-recognition-pete-warden-interview-co-founder-cto-jetpac
http://petewarden.com/2013/12/04/what-does-jetpac-measure/
Data Scientists at Work Paperback – December 8, 2014
by Sebastian Gutierrez
http://www.amazon.com/gp/product/1430265981/
http://www.amazon.com/gp/product/1430265981/
Pete Warden: "... object recognition is a crucial foundational technology for the future, but because it's currently so unreliable it's been hard to build any consumer applications around it. The problem is that object recognition is incredibly hard, and even the best [machine-learning] algorithms make a lot of mistakes. If you're doing a search application, these mistakes mean a lot of bogus images showing up in the search results."
Pete Warden: I would really tell them to find a [Data science] project that they actually care about. Grab a problem that’s interesting and has some data floating around somewhere, and take it all the way from figuring out where to get the data, to processing them, to analyzing them, to visualizing them, to trying to come up with something actionable at the end. That’s really what I feel like distinguishes data scientists from statisticians, or database analysts or these other more specialized roles, is that we’re able to do this full stack of stuff. It feels to me like a very, very hands‑on and practical thing. You should just be able to find something out there. There’s so much data flowing around now. There’s so many interesting problems in the world. You really should be able to find something in an area that you’re interested in, and even just do a visualization to show something.
http://www.catehuston.com/blog/2014/12/31/sigh/
http://petewarden.com/2014/10/25/how-to-run-the-caffe-deep-learning-vision-library-on-nvidias-jetson-mobile-gpu-board/
https://metamarkets.com/2012/data-scientist-interview-pete-warden-co-founder-of-jetpac/
I learned a way of thinking about problems from these really smart people who’d spent decades working on these problems that hadn’t made it into all of these other areas where they’re actually really useful. That’s really the thing that I found talking to people a lot—data science is a very fuzzy term, but what seems to unite a lot of people is that is they’ve come from a lot of different directions and ended up in the same place. It almost feels like this crossroads where a lot of different disciplines actually meet and share and exchange tools and techniques and talk and collaborate.
people generally liked actually sharing knowledge and helping out.
http://petewarden.typepad.com/searchbrowser/2010/02/how-to-split-up-the-us.html
http://readwrite.com/2009/03/20/the_inner_circles_of_10_geek_heroes_on_twitter
http://www.datascienceweekly.org/data-scientist-interviews/object-recognition-pete-warden-interview-co-founder-cto-jetpac
Object Recognition space
I specialize in uncovering new information from discarded sources, mining neglected data exhaust, so most of the work I do is the initial extraction of useful features from apparently useless noise. Once I have the data, most of the analysis is fairly primitive database joins, sums, and division. We use machine learning, neural networks, and a lot of other fancy approaches to analyze the images, but Excel formulas are key too. A lot of people underestimate the usefulness of old-school data tools like spreadsheets.
http://www.datasciencetoolkit.org/
http://radar.oreilly.com
Data Scientists at Work Paperback – December 8, 2014
by Sebastian Gutierrez
http://www.amazon.com/gp/product/1430265981/
Data Scientists at Work is a collection of interviews with sixteen of the world's most influential and innovative data scientists from across the spectrum of this hot new profession. "Data scientist is the sexiest job in the 21st century," according to the Harvard Business Review. By 2018, the United States will experience a shortage of 190,000 skilled data scientists, ...
http://petewarden.com/2013/12/04/what-does-jetpac-measure/
Image-based measurements - The most important information we pull out is from the image pixels. These tell us a lot about the places and people who are in the photos, especially since we have hundreds or thousands of pictures for most locations.
One very important difference between what we're doing with Big Data and traditional computer vision applications is that we can tolerate a lot more noise in our recognition tests. We're trying to analyze the properties of one object (a bar for example) based on hundreds of pictures taken there. That means we can afford to have some errors in whether we think an individual photo is a match, as long as the errors are random enough to cancel themselves out over those sort of sample sizes
Testing - Internally, we use a library of several thousand images that we've manually labeled with the attributes we care about as a development set to help us build our algorithms, and then a different set of a thousand or so to validate our results. All of the numbers are based on that training set, and I've included grids of one hundred random images to demonstrate the results visually.
We're interested in how well our algorithms correlate with the underlying property they're trying to measure, so we've been using the Matthews Correlation Coefficient (MCC) to evaluate how well they're performing. I considered using precision and recall, but these ignore all the negative results that are correctly rejected, which is the right approach for evaluating search results you're presenting to users, but isn't as useful as a correlation measurement for a binary classifier.
Example: Pictures of Plates = Foodies - We run an algorithm that looks for plates or cups taking up most of the photo. It's fairly picky, with a precision of 0.78, but a recall of just 0.15, and an MCC of 0.32. If a lot of people are taking photos of their meals or coffee, we assume that there's something remarkable about what's being served, and that it's popular with foodies.
https://metamarkets.com/2012/data-scientist-interview-pete-warden-co-founder-of-jetpac/
http://www.datascienceweekly.org/data-scientist-interviews/object-recognition-pete-warden-interview-co-founder-cto-jetpac
http://petewarden.com/2013/12/04/what-does-jetpac-measure/
Data Scientists at Work Paperback – December 8, 2014
by Sebastian Gutierrez
http://www.amazon.com/gp/product/1430265981/
http://www.amazon.com/gp/product/1430265981/
Pete Warden: "... object recognition is a crucial foundational technology for the future, but because it's currently so unreliable it's been hard to build any consumer applications around it. The problem is that object recognition is incredibly hard, and even the best [machine-learning] algorithms make a lot of mistakes. If you're doing a search application, these mistakes mean a lot of bogus images showing up in the search results."
Pete Warden: I would really tell them to find a [Data science] project that they actually care about. Grab a problem that’s interesting and has some data floating around somewhere, and take it all the way from figuring out where to get the data, to processing them, to analyzing them, to visualizing them, to trying to come up with something actionable at the end. That’s really what I feel like distinguishes data scientists from statisticians, or database analysts or these other more specialized roles, is that we’re able to do this full stack of stuff. It feels to me like a very, very hands‑on and practical thing. You should just be able to find something out there. There’s so much data flowing around now. There’s so many interesting problems in the world. You really should be able to find something in an area that you’re interested in, and even just do a visualization to show something.
http://www.catehuston.com/blog/2014/12/31/sigh/
http://petewarden.com/2014/10/25/how-to-run-the-caffe-deep-learning-vision-library-on-nvidias-jetson-mobile-gpu-board/
https://metamarkets.com/2012/data-scientist-interview-pete-warden-co-founder-of-jetpac/
I learned a way of thinking about problems from these really smart people who’d spent decades working on these problems that hadn’t made it into all of these other areas where they’re actually really useful. That’s really the thing that I found talking to people a lot—data science is a very fuzzy term, but what seems to unite a lot of people is that is they’ve come from a lot of different directions and ended up in the same place. It almost feels like this crossroads where a lot of different disciplines actually meet and share and exchange tools and techniques and talk and collaborate.
people generally liked actually sharing knowledge and helping out.
http://petewarden.typepad.com/searchbrowser/2010/02/how-to-split-up-the-us.html
http://readwrite.com/2009/03/20/the_inner_circles_of_10_geek_heroes_on_twitter
http://www.datascienceweekly.org/data-scientist-interviews/object-recognition-pete-warden-interview-co-founder-cto-jetpac
Object Recognition space
I specialize in uncovering new information from discarded sources, mining neglected data exhaust, so most of the work I do is the initial extraction of useful features from apparently useless noise. Once I have the data, most of the analysis is fairly primitive database joins, sums, and division. We use machine learning, neural networks, and a lot of other fancy approaches to analyze the images, but Excel formulas are key too. A lot of people underestimate the usefulness of old-school data tools like spreadsheets.
http://www.datasciencetoolkit.org/
http://radar.oreilly.com
Data Scientists at Work Paperback – December 8, 2014
by Sebastian Gutierrez
http://www.amazon.com/gp/product/1430265981/
Data Scientists at Work is a collection of interviews with sixteen of the world's most influential and innovative data scientists from across the spectrum of this hot new profession. "Data scientist is the sexiest job in the 21st century," according to the Harvard Business Review. By 2018, the United States will experience a shortage of 190,000 skilled data scientists, ...
http://petewarden.com/2013/12/04/what-does-jetpac-measure/
Image-based measurements - The most important information we pull out is from the image pixels. These tell us a lot about the places and people who are in the photos, especially since we have hundreds or thousands of pictures for most locations.
One very important difference between what we're doing with Big Data and traditional computer vision applications is that we can tolerate a lot more noise in our recognition tests. We're trying to analyze the properties of one object (a bar for example) based on hundreds of pictures taken there. That means we can afford to have some errors in whether we think an individual photo is a match, as long as the errors are random enough to cancel themselves out over those sort of sample sizes
Testing - Internally, we use a library of several thousand images that we've manually labeled with the attributes we care about as a development set to help us build our algorithms, and then a different set of a thousand or so to validate our results. All of the numbers are based on that training set, and I've included grids of one hundred random images to demonstrate the results visually.
We're interested in how well our algorithms correlate with the underlying property they're trying to measure, so we've been using the Matthews Correlation Coefficient (MCC) to evaluate how well they're performing. I considered using precision and recall, but these ignore all the negative results that are correctly rejected, which is the right approach for evaluating search results you're presenting to users, but isn't as useful as a correlation measurement for a binary classifier.
Example: Pictures of Plates = Foodies - We run an algorithm that looks for plates or cups taking up most of the photo. It's fairly picky, with a precision of 0.78, but a recall of just 0.15, and an MCC of 0.32. If a lot of people are taking photos of their meals or coffee, we assume that there's something remarkable about what's being served, and that it's popular with foodies.
No comments:
Post a Comment