Scaling Internet Applications

So you built a wildly successful service and now the users are pounding at the gate to get in. You pat yourselves on the back, draft up a business plan to present to the VCs and start dreaming about how you’re going to spend all that money. 6pm rolls around and the users are still flooding in. In fact, more are flooding in than every before. By 7pm the web servers are starting to get slow. It seems the slower they get the more users want to come on. By 8pm, the service has ground to a halt.

You spend all night tracking down problems, rejiggering your MySQL configuration, getting a friend to host another server for you. By 6am everything seems calm and the service is running smoothly again. A day goes by with no problems. A week. You’re thinking about a launch party. Then someone posts about it on Techcrunch and the whole thing comes grinding to a halt again.

Congratulations. You are a success.

The scenario is common with startups. So common that invite-only betas were created to help limit the number of users to what a service could handle (and as a marketing tactic, but that’s another story…).

Why is it so common? Well it is impractical for a small startup to fully optimize their software for millions of potential users. Not only is functionality not written in stone yet, but you have no idea how your software is going to be used. Plus there is the time and experience involved, both of which are precious in an early-stage startup. Finally, there are so many “experts” on scaling software giving advice on the Internet that scaling seems easy, certainly nothing to worry about.

If you ask people how to optimize a web server, you will receive literally hundreds of answers. Use linux, use lighthttpd, tweak your tcp settings, up your MTU, buy more machines, install more memory, buy faster CPUs, use an in-kernel webserver, split your static and dynamic sites, use a CDN, use python, use perl, use ruby, host it on amazon… With so many options to choose, how could anyone not think scaling was easy?

Let’s talk about the dirty little secret related to software scaling. Every application is different. What works for one system may not work for another. Just because LiveJournal use memcached to offload the database doesn’t mean you should. Just because Facebook uses massive numbers of MySQL servers doesn’t mean you should. Just because the Twitter architect recommended denormalizing the database doesn’t mean you should.

For the most part, none of these guys claimed to be The One True Way. They simply wrote about what worked for them. However there are almost always a situation where a particular method of scaling won’t work or will actually making things worse.

There are a few things that every early-stage startup should do to prepare for and manage scaling problems:

Track the usage of your applications

Before you launch, make sure you have at least basic logging and analysis setup. Not only is the data useful for business purposes, but will help you prioritize features to optimize.

Emergencies aren’t over after the system is back up

The absolutely worst thing you could do after a major system failure is go back to work like nothing happened. Failures always come in sets. You do not want to end up in a constant fire fighting situation where there is no time to think.

Emergency fixes tend to patch the symptom and not the problem. Time should be devoted to figuring out exactly what went wrong and why.

Stress test your system

Using that wonderful usage data you’ve collected, figure out where the next breaking point will occur. If you’ve had an outage, try to reproduce it. Plot out your growth. How much longer will it be before the system reaches capacity?

Call in an expert

You probably don’t need someone who built a stock exchange working full time for you. However, having someone who has faced the same problems you’re likely to face spend even a single day looking at your architecture, usage patterns and system can save you a lot of pain later on.

They say that’s too many users is a good problem to have. I agree. Just be sure you can cope with your success.


Quantifying interestingness

The Internet has become tedious.

There is so much information published these days, both directly and indirectly, by people that it has become a chore to sift through it all to find the truly interesting bits. The vast majority of the twitters, pownces, Facebook feed stories, LinkedIn Network Updates and are mundane at best.

If there was nothing worthwhile in these services, I could just ignore them altogether. Unfortunately, buried deep within each one are unique insights, interesting tidbits of information on friend’s lives and the occasional juicy bit of gossip. The problem is that the signal-to-noise ratio on these services is so ridiculously high that it has become work to actually find the good bits.

Quantifying interestingness is hard - extremely hard. Essentially you are trying to guess if someone will be interested in something before they have been exposed to it. The techniques used by services such as Amazon, Digg and others don’t lend themselves particularly well to large numbers of transitory messages with small audiences. This is especially true for Twitter where messages are often sent by SMS to users immediately after they are written.

However the services themselves, either by design or by accident, have contributed greatly to the amount of noise. Facebook seems to generate a feed story for seemingly every ordinary action a user takes. Twitter’s choice of a 140 character message doesn’t leave room for well thought out ideas and their system of broadcasting out replies to everyone means seeing a lot of messages without any context because you aren’t following all parties.

To Facebook’s credit, they do attempts to filter news feed articles based on what they think users will read. A news feed page contains only a small portion of all friend feed stories. Unfortunately it does very little to stem the flood of updates I see every day that waste my time. The filter controls aren’t fine grained enough to allow me to, for instance, only show messages where I know all the parties involved. This alone would significantly improve my experience.

If the amount of noise on these services gets much higher, I’m going to have to seriously rethink how I use these services. If that means getting my news second hand from friends the old way, well so be it.


Is San Francisco part of Central California?

According to websites I’ve visited recently, San Francisco and most of the Bay Area is in Central California. Welcome to CenCal!

I grew up in both the Bay Area and Los Angeles area and I was brought up to believe that San Francisco is in Northern California. As a kid I didn’t think much about Central California. It was just some place in “the middle.” I knew Southern California started around Santa Barbara and Northern California ended somewhere around Monterey.

For the last couple years I’ve visited a lot of websites with snow reports, surf reports and occasionally marine weather and I’ve noticed something quite peculiar. They refer to this area as Central California.

Surfline seems to think that Central California starts in San Francisco and extends all the way down to Northern Santa Barbara. Northern California officially starts across the Golden Gate in Marin.

The Surfrider Foundation says Northern California is everything North of San Francisco.

Even Wikipedia’s article on Pacifica (5 miles south of San Francisco) says it is part of Central California.

Surely these people must be wrong. I can’t have grown up in Central California. That is crazy. Or is it?

Dividing California by Latitude

The graphic is an image of California divided into thirds based on latitude. It isn’t absolutely perfectly accurate, but since my entire world is crashing down before me, you’ll have to forgive the quick job.

If you look at the graphic, San Francisco does indeed fall in Central California. Worse, it isn’t even the Northern most part. Who knew that Marin, Napa and Sonoma were in Central California?

Of course there is more than one way to divide a map. What if you divide it based on population?

California divided by population

The map to the left rough. I only used county population figures which means that the real borders are a little bit different due to uneven population distribution across counties. The data I used is from a couple years ago. At the time California had a population of 37,172,015, so each colored section represents about 12.4 million people.

I like this one better. All of Santa Clara, Santa Cruz and part of Monterey counties are included. It does go a bit further south than what I think of as Northern California. Southern California is a mess though. All of a sudden LA is split in two and Santa Barbara is part of Central California.

I could probably munge things a lot more. Maybe pull some political data or incoming information. The thing is that the separation of California is really more of a feeling.

California divided by my feeling

This is my imaginary map of how I view California. Yosemite is ours. Santa Cruz is ours. Most of the Sierra Nevada mountain range is ours.

Southern California gets Santa Barbara and San Bernardino. I thought to include Death Valley, but then I don’t think I remember it as really part of Southern California.

I’m not sure what I’d do if I lost the NorCal label. CenCal t-shirts? No. I don’t think I could get used to that.


Are social networks social?

I don’t know when it happened. Recently, I’ve stopped using the Internet and started viewing it.

For years I was extremely active on IRC, newsgroups, mailing lists, blogs and review sites. You can actually find posts by a preteen me on Usenet archives from 1992. I may not have met a lot of the people I interacted with, but I felt like I was part of a community and as a member of the community, I contributed.

Lately however I use the Internet to either look up facts or be entertained. I’ve stopped using it to connect with people. I don’t feel like I’m part of any community anymore.

We are in the age of Social Networking. Aren’t social networks supposed to be social? How is it that I don’t feel anymore attached to Facebook than I do to my address book?

I guess my real question is: what do social networks actually do?

The truth is that social networks let me portray a shallow persona of how I want to be seen to people portraying their own shadow persona of how they want to be seen. People there aren’t real. They don’t write about their real feelings or express their real self because the audience consists not only of their closest friends, but also their co-workers, acquaintances, distant family, ex-bfs/gfs, etc.

Social networks offer little more than watered down look at who people really are. This won’t help strengthen your bonds with existing friends and it certainly won’t help finding new friends.

That’s not to say social networks don’t have any good uses. If you want people you meet to be able to find you and contact you, they work great. I just don’t think they work well for anything particularly social.


When music was cool

I think the biggest sin that the major labels have committed was making music lame.

How is that possible? How can music, a fundamental building block of cool be lame?

Maybe I’m biased. Actually there is no maybe. I am biased. I was at Napster. I tried to deal with them at SNOCAP. I never understood the phrase “soul sucking” until I dealt with the major music labels. They are run by lawyers who will spend 6 months on a contract to extract every last dime out of you even if their own salaries far exceed what was gained. I mean they really are run by lawyers. Of course they are litigious! Of course they’d think suing their customers would solve their problems - strike fear into the hearts of those evil file sharers everywhere!

Thankfully the music industry is evolving. The major labels won’t die. Saying otherwise is denying how they came into existence in the first place. They will simply buy smaller, more successful indie labels until eventually they replace themselves from the inside. It’ll take a while, but short of firing their entire legal staff or hiring an executive staff that knows how to order them around (as opposed to be ordered around by them), it is the quickest way

The death of DRM is a good sign. I hope SNOCAP had something to do with that. In the end though, they still haven’t found the right price point. They are still focused on selling music. It isn’t all their fault. That is what they own. They haven’t built a secondary market like the movie industry. They let the artists own it.

Maybe tomorrow I won’t think music is lame. There is always hope.


I wish there was a small Macbook

I haven’t bought a new Apple laptop to replace my ancient 12″ iBook. I’ve been patiently waiting for Apple to put out a new 12″ laptop. It doesn’t seem like they’re going to.

I bought my girlfriend a 13″ Macbook. It is not small. It is significantly bigger than my 12″. If you look at the raw numbers, the 13″ Macbook is actually smaller than the iBook, but that’s only if you count the thickness. I don’t care about the thickness!

I like thin phones because I put them in my front pocket and don’t like what it does to my pants. I do not carry my laptop in my pocket. I put it in a laptop case which is padded and makes everything thicker. I put it in that case with the power cord and a dozen other things that increase the thickness of what I’m carrying.

The thickness of my laptop does not matter. What does matter is the width and height. I carry my laptop around. I balance it on my lap while in parks. I put it in front of me on crowded airline flights. Plus, a smaller width and height allows me to carry a significantly smaller looking laptop bag which more people see than my laptop itself.

Another thing I actually like about the Macbook over the Macbook Pro and Macbook Air is the casing. I owned an original Mac Titanium. A year in and I had a warped casing, snapped the clamps that held the LCD display in place, dented it in several places and otherwise made a pretty sad looking laptop.

I like the polycarbonate casing of the Macbook. It is strong and I am hard on laptops. My five year old iBook doesn’t have so much as a scratch on it even though it has been through many countries, a couple sailboats, fallen from several high places and traveled quite a bit without any type of protective casing.

So Apple, please make a smaller laptop. I will thank you. The Japanese will thank you. More importantly, I’ll finally be able to buy a new laptop.


Myspace is ugly - Facebook is soulless

Myspace lets users create hot pink on lime green pages with flashy doohikies violating every tenant of web design and possibly laws in several states.

Facebook is gray-blue and boxy. Everything has its place and every user looks pretty much the same.

While I can’t stand Myspace’s god awful webpage design, I equally can’t stand the soullessness of Facebook.

Why don’t these sites go the Wordpress route and allow users to create themes and submit them to a central directory?

Let users rate the themes and presented them ordered by their rating. Allow users to apply those themes and modify them giving them a good base to work off of. At the very least then you’ll end up with a lot of high quality designs as opposed to everyone trying their own random thing.


Hello

This will one day be the greatest blog on Earth.

Yep.

Greatest blog on Earth.

Until then it’ll be lame.

The transition between lame and awesome may take a while.


Soda Review

I’m a bit impulsive. Just a bit. When I get hungry I go to the grocery store and end up buying two weeks worth of lucky charms cereal. I would have bought my car without even test driving it if the sales person hadn’t mentioned it (and actually I wasn’t even the one who actually test drove it). I’m one of those people who should simply not be allowed to have money.

So one night around 4 am when I was low on caffeine and a bit too close to my credit card, I came up with the brilliant idea of buying various drinks online. First I was just going to get some of the more popular drinks… but that degraded to just getting two of everything. When all was said and done, five cases of soda arrived at the old Napster offices in San Mateo.

Continue reading »