Saturday, December 31, 2016

Year in Review

It's been quite an interesting year to say the least.

Lots of new Machine Learning and Deep Learning tools and libraries were released into the wild reducing the barriers to entry.

I'm really hoping to turn a corner and do more writing next year. I just can't seem to be able to shake off my writer's block.

Here again is Jeff Leek's Non-comprehensive list of awesome things other people did in 2016

Sunday, July 24, 2016

Data Science Bootcamp Reviews

This post is also posted in whole / part here :

When we started this, our primary goal was to bring to light as much information as we could regarding Data Science Bootcamps. We initially published several  in-depth interviews with boootcamp founders. We’re still working on a few more and we are also embarking on the next stage.

We contacted and conducted detailed interviews with individuals who have graduated from different Data Science Bootcamps and as you might guess, we heard a lot of very interesting anecdotes.

We noticed a disparity on what we heard from the individuals we interviewed about bootcamp placements and outcomes compared to the information some bootcamps put out there.

We actually don’t think Data Science Bootcamps should guarantee placements or positive outcomes but a lot of them do imply it by using wordplay, sleight of hand or displaying statistics that are either out of date or are aggregates which may not be very useful.

A better approach in our opinion will be for the Data Science Bootcamps to publish 3 and 6 month post-mortems or detailed placement reports (at least 6 months) for each cohort they graduate.  A prospective bootcamp student would probably find it more useful to know that for a cohort, 30% of the students were not looking for a job, 15% decided this wasn’t the right path for them and of the remaining 55%, most ended up with placements or positive outcomes versus just saying the cohort had an 80% placement rate without providing any other information. So a 100% placement rate for a cohort might not always be as good as it sounds. You just only have to look behind the curtain at the details.

We know for a fact that some bootcamps kick people out that they feel will not be able to find a job and sometimes don’t include individuals that fall off the radar or aren’t able to find jobs in their placement stats.

People attend these Bootcamps for very different reasons. For some it’s probably to transition to a Data Science or Data related role, for others, it could be to skill-up and then make lateral moves within their organization or to work on their ideas and personal projects .

Over the next few weeks and months, we will be publishing some of these Data Science Bootcamp reviews here.

If you have attended and graduated from a Data Science Bootcamp and you’d like to do a review of your experience, we’d love to hear from you. Please fill this form and we’ll reach out to you to conduct the short interview.

Having this information out there helps prospective Data Science Bootcamps students understand the dynamics with each of these bootcamps and the value it will deliver for them. This will also give them enough information to decide which program is the best fit for them / their goals.

Friday, April 8, 2016

Some more interesting links-6, Tensorflow, Falcon 9 #falconhaslanded

Google outsourced TensorFlow, one of its machine learning interfaces [PDF] [Slides]
Jeff Dean on TensorFlow
Tensor Flow meetup recording

Startup Pitch Decks

Machine Intelligence Landscape 2.0

Google Self Driving Car

Open AI is really looking more like Xerox PARC and BellLabs

And the #falconhaslanded . Falcon 9 first stage landing on a Drone Ship . I guess with Elon Musk it was always really a matter of "when" and not "if".  Another step closer to re-usability and Mars

Thursday, December 31, 2015

Year in Review

It's been quite a year.

It appears we're moving closer to the Hardware + Software + AI singularity and all the stuff that comes with that... and it's kind of scary.

Here again is Jeff Leek's Non-comprehensive list of awesome things other people did in 2015.

Some of the majors like Google, Facebook, Baidu and Microsoft open sourced some of their internal Deep Learning tools / frameworks. Most of the value coming from these tools will be interesting and useful products built with / on them.

Also, this one literally sent chills down my spine - Landing of Falcon 9 first stage. I guess we're one step closer to Mars and becoming a multi planetary species.

I didn't quite put out much content out there this year. I'm hoping to do more writing next year.

Stay tuned...

Wednesday, August 5, 2015

Some more interesting links-6, Movie Math, Random Forests, TED, Unicorns

Remember that Math problem Matt Damon solved in Good Will Hunting? ... It turns out this problem is actually accessible to us mere mortals. Do checkout this awesome video explanation

I'm a fan of TEDtalks. Here are nice playlist of interesting Data Related TEDtalks

A very detailed coverage on feature transformations

Random Forest workhorse : [Paper 1] [Paper 2]

Nice coverage on python

Deep Learning libraries by language

Sam Altman's Startup Class

YC Open Office Hours, Fellowship [1] [2], Research and Blog

Just in case anyone was keeping score, TC's Unicorn Dashboard

Sunday, July 19, 2015

Book Review : Elon Musk - Tesla, SpaceX and the Quest for a Fantastic future

This book gives you a glimpse into the man and the machines and companies he has built and the trials you'll face as an enterprenuer. While reading you'll experience occasional bursts of laughter. This is quite an interesting read.

This is the second biography I've bought and read ( the first was of Robert Oppenheimer, one of the principal architects of the Manhattan Project )

Elon Musk was born in South Africa and moved to Canada when he was 17 to attend college. He eventually transferred to the University of Pennsylvania to continue his studies.

I just keep wondering, assuming he wasn't able to make it to the US. The companies he founded and is/was involved with at a high level -  Paypal, SolarCity, Tesla, SpaceX which collectively employ tens of thousands of people may never have happened. I guess we'll never know.

This guy is transforming three multi - billion dollar industries and their derivatives at the same time. Truly a modern day Renaissance man.

Just imagine.. in our lifetimes (in 20 or 30 years or maybe even less), humans will have boots on the ground in Mars and Elon Musk is leading the vanguard here.

On a recent visit to the Tesla Factory in Fremont, he seems to be owning the whole Iron Man / Tony Stark comparison .

Friday, May 15, 2015

Choosing a Data Science Bootcamp program? - questions to ask, things to look for and look out for

Over the past year, I have had the opportunity to speak with a lot of prospective Data Science bootcamp students sharing my pre and post bootcamp experiences and helping them put in context some of the major factors they need to consider before deciding to attend a Data Science bootcamp. This post is a summary to shed some light on some of those thoughts I've shared privately with prospective Data Science bootcamp students, things they should look for and things to look out for.

The list below may not be encompassing as each prospective Data Science bootcamp student is unique in their own way and what they hope to get out of the experience.

Do keep in mind this list was put together for those considering full time Data Science bootcamp programs.

Without further ado.. here we go

Background : Data Science is a hybrid role. Having a background with the right mix of Quantitative skills, Programming, Statistics, Math, Business Acumen, Databases, and Machine Learning would probably work in your favor. 6 or 12 weeks is a very short time to learn these things from scratch.

Also, having a good background improves your chances of getting into one of these Data Science bootcamp programs. I hear they're getting quite competitive these days.

Cohort Makeup : At most bootcamps part of your learning comes from lectures and interactions with the instructors, TA's and guest speakers. The other half comes from working and collaborating with your cohort mates on the course materials and projects. It is important that a cohort have people with a diversity of past educational / professional experiences. Your cohort mates will become your friends, co-workers, collaborators and maybe even co-founders.

Placement Rate : This is a really interesting one. A bootcamp with 100% placement may not always be the best choice. I've heard some bootcamps drop students who they feel may not be able to find a job and don't include them in the numbers. Prospective students have to dig deeper on the placement rates and ask the following questions:
  • Percent of students placed in actual Data Science roles
  • Percent of students placed within one month or three months of finishing the program
  • Percent of students placed through Hiring Day 
  • Percent of students placed through an introduction that the bootcamp made
  • Percent of students actually looking for a job post bootcamp
  • Median Salaries for students placed
  • Salary Range for students placed
Going through a Data Science bootcamp is definitely not a silver bullet. There are a lot of people that go through these programs and still end up with non-optimal outcomes.

Hiring Day : As as far I know, most of the Data Science bootcamps have a hiring day event where students get an opportunity to present their capstone projects to potential employers and "speed date" with those employers. Some Data Science bootcamps have exclusive hiring events with employers in their Hiring Network or guest lectures and presentation from companies that might be looking for talent.

Cost : This could be major factor in deciding to attend a bootcamp. Data Science Bootcamp program tuition range from free to $16,000. There might also be other costs like room and board, incidentals, relocation and lost wages. What this mix looks like will be different for each prospective student.

One way to look at cost is that this a short term investment for a chance to break into a new career.

A good amount of the material you need to learn to become a Data Scientist is free and available on the internet. Some students that get admitted to these Data Science bootcamps could have chosen to lock themselves in a room for 6 months and study all this material and then emerge having learned all the material / skills required to be able to land a job.

This is entirely possible but you lose out on all the intangibles you get from attending an in-person Data Science bootcamp - mentoring from instructors/guest lecturers, structured learning, motivation, positive reinforcement, collaborating with cohort mates, networking, getting a different view on approaching and solving problems, etc. What these intangibles are worth / could be worth down the line should be carefully evaluated and added to the cost equation.

Interview Prep / Soft skills / Business Acumen : It is important to know how much time the Data Science bootcamp spends on soft skills, interviewing and white boarding. Most job interviews you go to may require you to work through programming problems, communicate the results of an analysis you may have worked on to a technical / non-technical audience, working through a modeling case study, etc.

These are skills you get better at with practice. Some Data Science bootcamps weave this in as part of the curriculum so the students are more comfortable with this by the end of the program whereas others may reserve time towards the last few weeks of the program to work on these.

Curriculum : It is difficult to learn everything you need to become a Data Scientist in 6 or 12 weeks. You want to look for a program that will give you enough breadth and depth and a good enough foundation to start and build a career in this field.

Location : Majority of the Data Science bootcamp programs are based either in the Bay Area, New York or scattered through Europe (London, Dublin, Berlin) and most graduates end up working in those places. I've seen some setup shop in other tech metros like Boston, Seattle and Denver.

Contact Alumni : There is a lot of information to be gleaned from talking to past students of Data Science bootcamp programs you're considering. You'll get a raw and unfiltered view of their experience.

Projects : You should look for programs that will enable you work on variety of projects with small, medium and large data. This way you'll have a broad range of experiences and a portfolio of interesting projects or analysis to talk about once you hit the interview trail.

In-Person vs Online : It is very difficult to replicate the collaborative environment of a full time in-person Data Science bootcamp in an online setting. Assuming there are no other extenuating factors, choosing a full time in-person bootcamp should be the preferred option.

Established vs New Programs : This is actually one of the most frequent questions I get. Prospective students are usually torn between going with a more established program which has gone through several cohorts and has established a track record versus a new and upcoming program which may have gone through one or two cohorts or is just getting started.

Prospective students need to evaluate Data Science bootcamp programs they're considering on their merits and the factors that are actually most important to the student. There are advantages going with either a more established program or a much newer one. The prospective student needs to do some introspecting after which the path they have to take becomes very obvious.

To keep things in context, none of these Data Science bootcamp programs existed 3 years ago.

Alumni Network : Generally, this is a perk. Going to a bootcamp with a strong alum network could sometimes make the difference. You could be exposed to opportunities and / or jobs that you may not otherwise have access to. Having access to an active and collaborative alum network is worth its weight in gold (or whatever precious metal you prefer)

Outcomes : Students going through Data Science bootcamps usually have different goals. For most, it's probably getting a Data Science / Machine Learning focused job. For others it could be gaining a skill set that'll enable them work on their own ideas, break into the industry and / or move up the ladder in their current job. Whatever those goals are, going with a bootcamp that can work with you or even personalize some of the curriculum to ensure you're getting the best value for your time and money would be most ideal.

As far as customizing the curriculum, some of the bootcamps with smaller cohorts will have a much easier time doing this.

These are outcome based programs so you should go with a program that'll give you the best chance of finishing with a positive outcome whatever that may be.

As with programming bootcamps, Data Science bootcamps are now becoming commoditized. Some of these Data Science bootcamps consider themselves Post Doctoral training programs while others want to own a different segment of the market.

If you're considering a Data Science bootcamp program , do go through this list and then pick the program that is the best fit and will deliver the highest delta / value for you.

Hopefully, this blog post will help start the conversation.

Saturday, March 14, 2015

In honor of Pi Day : Estimating the value of pi via Buffon's Needle

Last year we estimated the value of $\pi$ via Monte Carlo simulation. This year, we'll be revisiting the same exercise but using a different approach : Buffon's Needle. This approach is actually one of the oldest geometrical probability problems and it involves dropping needles on a lined sheet of paper and calculating the probability of the needles crossing lines in the page. This technique was first used by 18th century mathematician Georges-Louis Leclerc, Comte de Buffon.

In this scenario, we'll be dropping a bunch of randomly generated needles of length 1 on a grid with vertical lines. The spacing between the vertical lines is also of length 1. It turns out that you can estimate the value of $\pi$ by taking the fraction of the number of needles you dropped (Drops) and those that crossed any of the vertical lines (Hits) and multiplying by twice the length of a needle. See this ipython notebook for code used.

The following two graphs show our grid with 100 and 1000 randomly generated needles respectively

Let's work through the math:

$ 2 \times needlelength \times  \frac{Drops}{Hits}  \approx   \pi  $

where length of needle is 1 and the length of the spacing between the vertical grid lines is also 1

The graphs below were generated from a few hundred trials. For each trial, we increased the number of randomly generated needles. We can see the estimated value of  $\pi$ is about 3.12 which is a bit off from the true value of 3.14. I suspect there might be something going on with how the random needle center coordinates are generated since the needle graphs above are showing some symmetry. Regardless, we are still within 1% of the true value of $\pi$.

It's actually pretty cool to see how the value of $\pi$ sneaks out from the woodwork. There's probably a more intuitive way to explain how $\pi$ shows up in places we least expect

For all the code used for this analysis, visit this ipython notebook

Saturday, February 28, 2015

Some more interesting links-5,YC Companies, ipython and more pandas

Comprehensive list of all YC companies and a comprehensive list of all accelerators / cohorts

Good article explaining *args and **kwargs  and generators in python

Pivot Tables in pandas and more pandas

Gallery of some of the best ipython notebooks

Interactive ipython notebooks

Monday, January 19, 2015

Slides from Getting Started with Vowpal Wabbit Talk #Vowpal Wabbit

Slides from Getting Started with Vowpal Wabbit talk

Wednesday, December 31, 2014

Year in Review

It's been a really interesting year.. I moved to the Bay Area. It's one thing to read about Silicon Valley or visit briefly. It's another to actually live out here and experience all it has to offer. This is the center of this data revolution everyone seems to be talking about. Obviously, if you can manage the ridiculously expensive housing out here and how much more expensive everything is out here, then you should be fine.

To wrap up the year, here is Jeff Leeks' Non-comprehensive list of awesome things other people did in 2014 . It has an rlang slant since he's a statistician.

I had more blog posts and traffic this year than each of the previous 3 year combined. Hoping this trend continues. Just looking at my traffic, it does appear there is a lot more interest in Data Science Education and immersive experiences like boot camps.

Going forward, I plan to do more tutorial style posts showing side projects or other interesting tech I encounter. 

I do want to spend more time delving into Deep Learning. Starting with the nuts and bolts and then moving to available libraries / implementations and sharing some of what I learn along the way... stay tuned 

Monday, December 22, 2014

Some more interesting links-4, Machine Intelligence, TDA, ipython notebooks

Most Topological Data Analysis tools are either stuck in academic research papers or Company intellectual property. DataRefiner might help to change that

Python for Exploratory Computing : Collection of ipython notebook showing python basics, statistics and advanced python topics

A collection of ipython notebooks on hacking security data 

This is the future of education Open Loop University, where your education is spread over several years. You'll have periods of work with schooling interlaced inbetween

Detailed infograph showing major players in the Machine Intelligence space

You should look at this if you're interested in the Quantified Self space

I've been looking for something like this. Instant temporary ipython notebooks hosted in the cloud

An extensive Deep Learning Reading list

Nice reading on Generative vs Discriminative Algorithms (Naive Bayes - Logistic Regression)

Sunday, August 17, 2014

Getting Started with Vowpal Wabbit - Part 1 : Installation

After a very long hiatus, I'm back blogging. I'm really excited about how the year is shaping up.... stay tuned.

I discovered Vowpal Wabbit about a year ago but only recently started using it. Vowpal Wabbit is a very fast out-of-core learning system. Its the brain child of John Langford. and development has been supported by Microsoft Research and Yahoo Research (past)

This is the first part of a series about getting started with Vowpal Wabbit

To get started on OSX, you need to ensure you already have XCode and Homebrew installed. If you already have these installed, run the command below to update Homebrew

brew update

Vowpal Wabbit has a few dependencies that also need to be installed via brew. The official docs have the boost library as the only external dependency, but I was having a few issues until I installed automake and libtool

brew install automake
brew install boost
brew install libtool

In order to prevent conflicts with Apple's own libtool, a "g" is appended when you install libtool so you have instead: glibtool and glibtoolize. The code below adds a symbolic link.

cd /usr/local/bin/
ln -s glibtoolize libtoolize

Clone the Vowpal Wabbit git repo for the latest code

git clone
cd vowpal_wabbit

Then you should run

make install

If you are having set up issues or issues with dependencies, you may want to spin up a virtual machine. If you're on Ubuntu, you should run

sudo apt-get install vowpal-wabbit

Two of the most informative blogs out there with great coverage of Vowpal Wabbit are MLWave and  FastML (it looks like this is behind a paywall)

Sunday, April 27, 2014

Some more interesting links-3, Quantified Self, Bandits

Extreme Quantified Self. This MIT professor analyzed about 90,000 hours of video / 140,000 hours of audio / 200 terabytes of home videos to understand how his child's speech developed. This is probably one of the coolest things I've seen. He started a company (Bluefin Labs) around the technology he used for the analysis and then sold that company to Twitter for a fat wad of cash. This is his TED talk

You definitely want to utilize the resources at your local public library. These days they have amazing resources like access to Safari which gives you access to O'Reilly and Packt titles

Some very good advice if you're on the interview trail - Always Be Coding

A nice visualization / simulation of what's happening in a Multi - armed Bandit problem

Friday, April 25, 2014

Zipfian Academy - All 12 weeks

Here you go.. a week to week summary of my experience at Zipfian Academy

Week 1 : Zipfian Academy - Priming the Pump , some Unix shell, python, recommenders and data wrangling 
Week 2 : Zipfian Academy - Are you Frequentist or Bayesian ? 
Week 3 : Zipfian Academy - Multi-armed bandits and some Machine Learning 
Week 4 : Zipfian Academy - Oh SQL, Oh SQL... MySQL and some NLP too 
Week 5 : Zipfian Academy - Graphs and Community Detection 
Week 6 : Zipfian Academy - The Elephant, the Pig, the Bee and other stories 
Week 7 : Zipfian Academy - Advanced Machine Learning and Deep Learning 
Week 8 : Zipfian Academy - Assessment and Review 
Week 9 : Zipfian Academy - Personal projects
Week 10 : Zipfian Academy - Closing the loop ... rinse..repeat 
Week 11 : Zipfian Academy -The Beginning of the End
Week 12 : Zipfian Academy - And That's All folks....

For a different point of view about the Zipfian experience do checkout another  fellow alum - Melanie's blog All the Tech Things 

Friday, April 18, 2014

Week 12 : Zipfian Academy - And That's All folks....

And so, all great things must come to an end. This is the final week for the bootcamp program. We continued interview prep, white boarding and code reviews. Apparently interviewing feels like having a full time job. Towards the end of the week, we continued with project one on ones, some more white boarding and interview prep, runtime and complexity analysis.

At the end of the week, we had a get together to celebrate the past the 12 weeks. A handful of alums from the last cohort attended and it's kind of cool to see what past alums are doing now.. some are at stealth startups and startups while others are working at some very impressive companies.

Highlights of the week:
  • We had a guest lecture from former cosmologist and Data Scientist @datamusing on using topic models to understand restaurant reviews. The topics were learned from the review corpus using LDA and NNMF. He also had a pretty cool d3 + flask visualization to show the results
  • We spent a day at the Big Data Innovation Summit. The morning talks mostly felt like business sales pitches. The afternoon talks were a lot more interesting as there were breakouts for Data Science, Machine Learning, Hadoop, etc  
  • In the Data Science breakouts, there were a lot of LDA related talks including using topic modeling in Health Care and using LDA to extract features for matches in a dating website. 
  • Lots of interview prep and white boarding 

And so this is it. My hope is that someone actually finds my ramblings over the past 12 weeks somewhat helpful in forging their own path into Data Science......signing off.

Sunday, April 13, 2014

Slides from Project Presentation #How Will My Senator Vote?

Here are slides from my project presentation on analyzing How Senators vote in Congress and building a model to predict how they would vote on future bills

Saturday, April 12, 2014

Week 11 : Zipfian Academy -The Beginning of the End

We started the week wrapping things up with our personal projects, putting together decks for our Hiring Day presentations and doing mock runs of our presentations. Towards mid-week, we did more mock runs and put final touches on our presentation decks.

Hiring Data was pretty hectic. It started off with a short mixer with representatives from the various companies that attended. Each of the companies did a quick presentation on who they were and what they were looking for. Once that was done, we proceeded to presenting each of our projects taking a few questions from the audience at the end of each presentation. There were a lot of really cool projects.

After project presentations and lunch, we had "speed dating" sessions with each of the companies that attended. It was a couple of minutes introducing yourself to the company, hearing what they were looking for and seeing if there's a good fit. It was quite tiring going through 16 or so interviews in the span of two hours but it was a worthwhile experience.

Most of us spent the last day of the week cleaning up and refactoring our project code.

Project Next Steps : I do plan to continue working on my project down the line, making some more improvements to my pipeline, looking at new and richer data sources, asking more interesting questions and doing some more analysis to improve my prediction accuracy. There's still a lot of ground to cover here. I also plan to use Latent Dirichlet Allocation (LDA) to extract better features from my data as you can pull out really rich and interesting features from your data using topic modeling. My original model used a "bag of words" approach. The eventual goal would be to release this as a web app anyone could use.

Highlights from the week:
  • We started the week with a guest lecture from @itsthomson. He is the founder of He just finished the YC program and had lots of words of wisdom. He walked us through his experience making the transition from academia to Data Science, moving to a Chief Scientist role and now Founder. It's refreshing to hear from someone that has gone through the process. Some quotes from his lecture : "Data is the most offensive (vs defense) resource a company has",.. "In Data Science, you have to know a little of everything",.."Being technical helps, but being convincing is better",.. "Understanding how your analysis ties back to your business / organization is key"
  • We attended a Data Science for Social Good panel event at TaggedHQ. The panelist included CTO - Code for America, CEO - Lumiata, Data Scientist - BrightBytes, Data Scientist - OPower and Lead Data Scientist - These companies are utilizing data science to make a difference. It was a very insightful panel session.
  • Hiring Day was rather interesting. 16 companies attended. The companies came from different verticals including CRM, consulting, social good, social, health, payments, real estate, education and infrastructure. It was interesting hearing some of the problems they were trying to solve in their respective domains

Saturday, April 5, 2014

Data Science Bootcamp Programs - Full TIme, Part Time and Online

I've gotten a lot of inquiries on options to move into Data science. This is my attempt to answer that question. If I excluded any programs from this, please feel free to ping me. You'll see that there are quite a few options and you need to find the best fit based on your profile. This list does not include any university programs.

Everyone seems to reference the quote from Google Economist Hal Varian "Being a statistician is the sexiest job of the 21st century" and the McKinsey report about the shortage in Data Science talent.

For a guide on factors to consider when Choosing a Data Science Bootcamp Program, the article should be helpful

We are collecting and publishing detailed Data Science Bootcamp Reviews from students that have attended and graduated from the various Data Science Bootcamps

Visit this link for more in depth coverage of Data Science Bootcamp Programs including Interviews with Data Science Bootcamp Founders

Regarding Data Science Interview Resources, I hear from a lot of people including those asking about interview resources and the most efficient way to prepare for Data Science Interviews. At a lot of companies and startups, a very important component of the interview process is either the Take Home Data Challenge and/or Onsite Data Challenge. Another important component is the Theory interview, I'll talk more on this later..

This is a also a great resource for individuals who feel they have the background and experience to interview for jobs without going through a bootcamp type program.

To become more familiar with and get efficient working on Data Challenges, I recommend taking a look at the Collection of Data Science Take-home Challenges book. It gives very clear and realistic examples of some of the types of problems you could face on a Data Challenge and projects you could potentially work on as a data scientist

Here we go...

Full Time

Zipfian Academy : This is not a 0-60 school. It's more like 40-80. They are currently about to graduate their second cohort.

  • Notes : Of all the Data Science bootcamps, Zipfian has the most ambitious curriculum. Graduates from the first cohort are currently working in Data Scientist roles across the Bay Area. I'm currently part of the second cohort
  • Location : San Francisco, CA
  • Requirement : Familiar with programming, statistics and math. Quantitative background
  • Duration : 12 weeks

Update : Since the initial post went up a few months ago, Zipfian Academy has added two more programs

Data Engineering 12 - week Immersive : This follows the same format as the Data Science Immersive. The first cohort for this program will start January 2015
  • Notes : This follows the same format as the Data Science Immersive
  • Location : San Francisco, CA
  • Requirement :  Quantitative / Software Engineering background
  • Duration : 12 weeks
Data Fellows 6 - week Fellowship :  The first cohort for the fellows program will start Summer 2014
  • Notes : This program is free for accepted fellows
  • Location : San Francisco, CA
  • Requirement :  Significant Data Science Skills, Quantitative background
  • Duration : 6 weeks
Also see a recent google hangout explaining these new programs :  Zipfian Academy Data Fellows Program  - Information Session 

Data Science Europe Bootcamp : This looks like its modeled after the Insight program. Select a small group of very smart people with advanced degrees and help them get ready for Data Science roles in 6 weeks. 

Interview with Data Science Europe Founder

Data Science Eutope Student Reviews
  • Notes : It enrolls the first cohort January 2015. Also if you don't receive an offer for a quantitative job with 6 months of completing the course, you'll receive a full refund on tuition paid. They're currently on their second cohort and have a 100% placement rate 
  • Location : Dublin, Ireland 
  • Requirement : Quantitative Degree, Programming knowledge and Statistics background. It looks like they prefer graduate students and Post Docs but are open to applications from undergrads.
  • Duration : 6 weeks 

Insight Data Science : Accepts only PhDs or PostDocs. They have completed 5 cohorts in Palo Alto and are opening up a new class in New York this summer. From their website, it does look like they have almost perfect placement. It is project based self directed learning, so if you need some hand holding or you're not already very familiar with the material this may not be the program for you

    • Notes : No Fees, pays Stipend
    • Location : Palo Alto, CA / New York, NY
    • Requirement : PhD / PostDoc
    • Duration : 7 weeks 

    Insight Data Engineering : They'll enroll the first cohort this summer. Bootcamp will focus on the data engineering track. It is project based self directed learning, so if you need some hand holding or you're not already very familiar with the material this may not be the program for you

    • Notes : No Fees 
    • Location : Palo Alto, CA
    • Requirement : strong background in math, science and software engineering
    • Duration : 7 weeks 

    Data Science Retreat : Follows the same format as Zipfian but is based in Europe

      • Notes : Curriculum is mostly in R, though they support other languages (python). They have tiered pricing for the class, so you can pay for which tier meets your needs
      • Location : Berlin
      • Requirement : Experience with programming, databases, R, Python
      • Duration : 12 weeks 

      Data Science For Social Good : hosted by the University of Chicago. The students work with non-profits, federal agencies and local governments on projects that have a social impact

      • Notes : they focus on civic projects or projects with social impact
      • Location : Chicago, IL
      • Requirement : It looks like they target academics (undergraduate and graduate students)
      • Duration : 12 weeks 

      Metis Data Science Bootcamp  : This looks like its modeled after the Zipfian program from a duration / structure / curriculum stand point. It is owned by Kaplan which also recently acquired Dev Bootcamp. Looks like the big .edu players are trying to make a play for the tech bootcamp space

      Interview with Metis Data Science Cofounder
      • Notes : It enrolls the first cohort Fall 2014. For individuals who are not already in the US or are international students, you could obtain an M-1 visa to attend. They're probably the first bootcamp that are able to issue M-1 student visas
      • Location : New York, NY and San Francisco, CA
      • Requirement : Familiarity with Statistics and Programming
      • Duration : 12 weeks 

      Science to Data Science : They accept only PhDs / Post Docs or those close to completing their PhD studies. We are seeing more bootcamps adopt this model.

      • Notes : It enrolls the first cohort August 2014. There is a small registration fee for the course otherwise the program is free for participants
      • Location : London, UK
      • Requirement : PhD / Post Doc
      • Duration : 5 weeks 

      Level Data Analyst Bootcamp : This is one of the first full time Data Analyst bootcamps we've seen and its run by a University which is also a first. I think folks in academia have realized that the typical university structure can't keep up with the pace of innovation in the space

      • Notes : Curriculum looks standard for the Data Analyst and Marketing Analytics job track. They also run hybrid and full - time programs
      • Location : Boston, MA, Charlotte, NC, Seattle, WA, Silicon Valley, CA
      • Requirement : 
      • Duration : 8 weeks 

      Praxis Data Science : This is another program coming with an interesting approach. Another option for individuals with a strong STEM and programming background who want to make a move into Data Science

      • Notes : It enrolls the first cohort Summer 2015. They also offer a money back guarantee and will refund up to half of the fees paid if you're unable to find a job within 3 months. This speaks to the fact that they have a vested interest in their students' success. The curriculum also seems to focus on building the practical skills needed to both land a role and continue to grow as a Data Scientist.
      • Location : Silicon Valley, CA 
      • Requirement : Looks like they're looking for people with a STEM background (advanced degrees preferred) and programming / quantitative experience 
      • Duration : 6 weeks 

      Insight Health Data Science : This is the first significant deviation we've seen from the norm (focus - wise). This program seems to have the same structure as the other Insight programs but the focus here is solely in Healthcare and Life Sciences. It is project based self directed learning, so if you need some hand holding or you're not already very familiar with the material this may not be the program for you

      • Notes : No Fees
      • Location : Boston, MA
      • Requirement : PhD / PostDoc
      • Duration : 7 weeks 

      Startup.ML Data Science Fellowship : Startup.ML is taking an interesting approach to Data Science education. Their fellows work on real problems with established Data Science teams or on undefined startup problems.

      • Notes : No Fees. They also enrolled their first cohort in March 2015. I would imagine the typical profile here is someone that may be much further along.
      • Location : San Francisco, CA 
      • Requirement : Background in Software Engineering, Quantitative Analysis. Advanced Quantitative degrees 
      • Duration : 4 months

      ASI Data Science Fellowship : This is another program modeled after the Insight program. They pair students with an Industry partner which allows students to work on real business problems / data. They also have a modular program which allows for some customization.

      • Notes : No Fees
      • Location : London, UK
      • Requirement : PhD
      • Duration : 8 weeks

      GA Data Science Immersive : General Assembly was actually one of the first outfits to start part time Data Science classes. It looks like they've decided to also jump into the fray with a full time Data Science Immersive

      • Notes : They've been doing part time Data Science classes for at least two years already
      • Location : San Francisco, CA
      • Requirement :  Seems like they're interested in folks with quantitative backgrounds looking to transition to Data Science
      • Duration : 12 weeks 

      Catenus Science :  Catenus is also taking a very different approach here. Catenus Science is a paid apprenticeship program helping skilled Data Scientists explore opportunities at different startups / domians

      • Notes : Paid Apprenticeship. Rotate through three different startups applying you skills to month long projects with these startups. The next sesison starts June 2016
      • Location : San Francisco, CA 
      • Requirement : Background and Experience in Statistics, Machine Learning, Programming, Product Development. They're probbaly looking for people who are much further along.  
      • Duration : 13 weeks

      The Data Incubator : Accepts only STEM PhDs or PostDocs. The first class is starting summer 2014.

      • Notes : No Fees
      • Location : New York, NY
      • Requirement : PhD / PostDoc
      • Duration : 6 weeks 

      NYC Data Science Academy : This looks like its also modeled after the Zipfian 12 week immersive. Another option for non-postdocs on the east coast looking to make the transition to Data Science

      • Notes : It enrolls the first cohort February 2015. Just looking at the curriculum, it appears well thought out and seems to cover a lot of breadth. They focus on R and Python and spend significant amounts of the course time covering both ecosystems. 
      • Location : Manhattan, NY 
      • Requirement : Looks like they prefer people with STEM advanced degrees or equivalent experience in a Quantitative discipline or programming 
      • Duration : 12 weeks 

      Silicon Valley Data Academy : This also looks like another program modeled after the Insight program. It does look like they skew towards applicants that are much further along the skills spectrum

      • Notes : No Fees and they run both Data Science and Data Engineering programs
      • Location : Redwood City, CA
      • Requirement : Advanced Degrees / PhD / Post Docs , Extensive quantitative / engineering background 
      • Duration : 8 weeks 

      Microsoft Research Data Science Summer School  : targets upper level undergraduate students attending college in the New York area. Program instructors are research scientists from Microsoft Research
      • Notes : Each student receives a stipend and a laptop
      • Location : New York, NY 
      • Requirement :  upper level undergraduate students interesting in continuing to graduate school in computer science or related field or breaking into Data Science
      • Duration : 8 weeks 

      Part Time
      • General Assembly - Data Science : San Francisco / New York. Part time program over 11 weeks (2 evenings a week) 
      • Hackbright - Data Science  San Francisco. Full Stack Data Science class over one weekend
      • District Data Labs : Washington DC.  Data workshops and project based courses on weekends
      • Persontyle : London, UK. Offering R based Data Science short classes
      • Data Science Dojo : Silicon Valley, CA /  Seattle, WA / Austin, TX. Offering data science talks, tutorials and hands on workshops and are looking to build a data science community
      • AmpCamp : This is run by UC Berkeley AMPLab. Over two days, attendees learn how to solve big data problems using tools from the Berkeley Data Analytics Stack. The event is also live streamed and archived on YouTube
      • NextML

      These bootcamps are popping up and thriving because there is currently an imbalance between demand and supply of Data Science talent and the acceptance rates at some of full time bootcamps are anywhere from 1 in 20 to 1 in 40

      p.s : I need to stress that with any of the programs listed above, you need to do your due diligence and ask the tough questions to find out if it's a good fit for you. You probably want to be on the look out for programs that are not transparent about their placement.

      Update 1 - 05/14  : Added the new Zipfian programs, Persontyle
      Update 2 - 07/14 :  Added Metis, Data Science Europe,  Science to Data Science
      Update 3 - 08/14 :  Added Data Science Dojo
      Update 4 - 10/14 :  Added AMPLab
      Update 5 - 11/14 :  Added Coursera/UIUC, Udacity Data Analyst Nanodegree, Thinkful, DataInquest
      Update 6 - 12/14 :  Added NYC Data Science Academy
      Update 7 - 01/15 :  Added Next.ML, Bitbootcamp, DataQuest  
      Update 8 - 04/15 :  Added Praxis Data Science, Insight Health Data Science
      Update 9 - 05/15 :  Added Startup.ML Fellowship, ASI Fellowship
      Update 10 - 09/15 : Added Silicon Valley Data Academy
      Update 11 - 01/16 : Added GA Data Science Immersive, Level Data Analyst Bootcamp, Udacity ML Nanodegree, Leada
      Update 12 - 05/16 : Added Catenus Science 

      Week 10 : Zipfian Academy - Closing the loop ... rinse..repeat

      Continued working on my personal project and was glad my data ingestion and aggregation pipeline was built and optimized.

      Analysis : Now that I had most of the data I needed, the next step was trying the close the loop as soon as possible, get some predictions for each Senator and then iterate. One challenge was trying the find signals that indicated uniqueness just from voting patterns and the content of the bills. As part of my analysis, I used techniques like MDS, clustering and NLP to extract salient features from my initial dataset. I did find out from my analysis that over the past 3.5 years, Democrats are more predictable and are more alike than Republicans based on just their voting patterns.

      Modeling : I started off with a Naive bag of words model and got an average prediction accuracy in the low 60's. I went back and did some chi-squared feature selection, natural language processing (tfidf, n-grams, stop-words, stemming, binning, lemmatization, etc...), grid search and cross-validation on a pipeline of models (Logistic Regression, Random Forest, SVM, AdaBoost, Naive Bayes, kNN) and added some social data from wikipedia and twitter. This improved my average prediction accuracy to the high 60's. Moving forward, there's still a lot of ground to cover here. I can probably get this to low 80's on average prediction accuracy across all the Senators in congress The biggest take away here is to spend time and lots of it understanding your dataset, crafting better features and adding external data that would give additional insights or increase the richness of your data. The modeling part can be automated but your models can only be as good as the data you feed them.

      At this point, we're all seeing the light at the end of the tunnel. I gave a top level overview of my project. I'm working on putting up a Github repo with a more in-depth version.

      Highlights from the week:
      • We had a guest lecture from @WibiData on building real-time recommendation engines at scale with kiji. The kiji platform seems pretty mature and has support for quite a few languages and connectors to several Big Data frameworks. Evaluating recommender engines has always been a problem. One approach is to perform validation on a hold out sample of your data.
      • We also had another pretty interesting guest lecture from @maebert. They've built an automatic journaling tool. They built a data product out all the ambient data (passively generated data) we generate by triangulating your position using GPS signals and cell phone towers. They use those data points to tell a story about you.  Their pipeline looks something like this (Signals -> Data -> Information -> Knowledge -> Stories). It's actually quite cool how patterns start to emerge when you look at aggregate data. I guess we all know a little something about that from the "revelations" that happened last summer. They utilize techniques like LSA / LDA/ SVD to extract concepts and their weights, expectation maximization (Gaussian mean shift) and some NLP. They try to see if the concepts change over time and also try to enrich their datasets using external feeds for weather data, events data, ticketing, etc
      • We had breakouts on presentations. We worked on our projects for two weeks and trying to bottle all that work into a three minute presentation won't do it justice. So you'd want to answer the following questions to give the audience enough to spark some interest - What?, Why?, How?, So What?, Next Steps? 

      Monday, March 31, 2014

      Some more interesting links...

      An awesome list of April Fools gadgets from various companies. Maybe someone should bring these products to life.. that would be pretty epic

      Another Python vs R  post

      Speeding up python

      Friday, March 28, 2014

      Week 9 : Zipfian Academy - Personal projects

      This is a little late, so I'll try and make this quick. Personal projects began this week and we'll be working on them for another week. My project is focused on modeling Senators' past voting patterns and using that to predict how they'll vote on future legislation and whether bills pass or not.

      Data Acquisition : I initially planned to source my data using APIs from The Sunlight Foundation and Votesmart but realized quickly things might take much longer with the APIs since I needed several different datasets and also needed a way to aggregate all the data. I decided it would be more optimal to go straight to the source: US Senate website. Setting up and debugging my data ingestion pipeline took another two days and by the end of the week I had all the data I needed..scraped, cleaned and packaged nicely in a database and several python pickle objects.

      Data Transformation : Getting the data is one thing and transforming it to get it ready for analysis and modeling is another. Most Data Scientist tend to spend a lot of project time cleaning, aggregating and transforming data.

      Highlights from the week:
      • Got a chance to attend a meetup organized by BaRUG (Bay Area R Users Group). There was a talk from the author of the caret package (this is kind of the R version of scikit-learn) and another from the Human Rights Data Analysis Group - they use R to build statistical models to work on human rights projects across the globe.
      • We had a guest talk from a former physicist who is now a Data Scientist @WalmartLabs. He works with a group that deals with algorithmic business optimization. The talk was actually quite insightful as he touched on some interesting pain points.."reconciling technical and business needs "..."The simpler the model the better"
      • We also had another guest lecture on visualization. The speaker also worked on this awesome visualization of BART employee salaries 
      • Several of us attended a D3 workshop organized by the VUDlab at UC Berkeley