Volatile and Decentralized: January 2013

Sunday, January 27, 2013

My mobile systems research wish list

Working on mobile systems at Google gives me some insight into what the hard open problems are in this space. Sometimes I am asked by academic researchers what I think these problems are and what they should be working on. I've got a growing list of projects I'd really like to see the academic community try to tackle. This is not to say that Google isn't working on some of these things, but academics have fewer constraints and might be able to come up with some radically new ideas.

Disclaimer: Everything in this post is my personal opinion and does not represent the view of my employer, or anyone else. In particular, sending a grant proposal to Google on any of the following topics will by no means guarantee it will be funded!

First, a few words on what I think academics shouldn't be working on. I help review proposals for Google's Faculty Research Awards program, and (in my opinion) we get too many proposals for things that Google can do (or is already doing) already -- such as energy measurements on mobile phones, tweaks to Android or the Dalvik VM to improve performance or energy efficiency, or building a new mobile app to support some specific domain science goal (such as a medical or environmental study). These aren't very good research proposal topics, in my opinion -- they aren't far-reaching enough, and aren't going to yield a dramatic change five to ten years down the line.

I also see too many academics doing goofy things that make no sense. A common example these days is dusting off the whole peer-to-peer networking area from the late 1990s and trying to apply it in some way to smartphones. Most of these papers start off with the flawed premise that using P2P would help reduce congestion in the cellular network. A similar flawed argument is made for some of the "cloud offload" proposals that I have seen recently. What this fails to take into account is where cellular bandwidth is going: About half is video streaming, and the other half things like Web browsing and photo sharing. None of the proposed applications for smartphone P2P and cloud offload are going to make a dent in this traffic.

So I think it would help academics to understand what the real -- rather than imagined -- problems are in mobile systems. Some of the things on my own wish list are below.

Understanding the interaction between mobile apps and the cellular network. It's well known that cellular networks weren't designed for things like TCP/IP, Web browsing, and YouTube video streaming. And of course most mobile apps have no understanding of how cellular networks operate. I feel that there is a lot of low-hanging fruit in the space of understanding these interactions and tuning protocols and apps to perform better on cellular networks. Ever noticed how a video playback might stall a few seconds in when streaming over 3G? Or that occasionally surfing to a new web page might take a maddening few extra seconds for no apparent reason? Well, there's a lot of complexity there and the dynamics are not well understood.

3G and 4G networks have very different properties from wired networks, or even WiFi, in terms of latency, the impact of packet loss, energy consumption, and overheads for transitioning between different radio states. Transport-layer loss is actually rare in cellular networks, since there are many layers of redundancy and HARQ that attempt to mask loss in lower layers of the network stack. This of course throws TCP's congestion control algorithms for a loop since it typically relies on packet loss to signal congestion. Likewise, the channel bandwidth can vary dramatically over short time windows. (By the way, any study that tries to understand this using simple benchmarks such as bulk downloads is going to get it wrong -- bulk downloads don't look anything like real-world mobile traffic, even video streaming, which is paced above the TCP level.)

The lifetime of a cellular network connection is also fairly complex. Negotiating a dedicated cellular channel can take several seconds, and there are many variables that affect how the cell network decides which state the device should be in (and yes, it's usually up to the network). These parameters are often chosen to balance battery lifetime on the device; signaling overhead in the cell network; user-perceived delays; and overall network capacity. You can't expect to fix this just by hacking the device firmware.

To make things even more hairy, mobile carriers often use different network tuning parameters in different markets, based on what kind of equipment they have deployed and how much (and what kinds) of traffic they see there. So there is no one-size-fits-all solution; you can't just solve the problem for one network on one carrier and assume you're done.

Understanding the impact of mobile handoffs on application performance. This is an extension to the above, but I haven't seen much academic work in this space. Handoffs are a complex beast in cellular networks and nobody really understands what their impact is on what a user experiences, at least for TCP/IP-based apps. (Handoff mechanisms are often more concerned with not dropping voice calls.) Also, with the increased availability of both WiFi and cellular networks, there's a lot to be done to tune when and how handoffs across network types occur. I hate it when I'm trying to get driving directions when leaving my house, only to find that my phone is trying in vain to hang onto a weak WiFi connection that is about to go away. Lots of interesting problems there.

Why doesn't my phone last all day? This is a hot topic right now but I think the research community's approach tends to be to change the mobile app SDK, which feels like a non-starter to me. Unfortunately, the genie is out of the bottle with respect to mobile app development, in the sense that any proposal that suggests we should just get all of the apps to use a new API for saving energy is probably not going to fly. In the battle between providing more power and flexibility to app developers versus constraining the API to make apps more efficient, the developer wins every time. A lot of the problems with apps causing battery drainage are simply bugs -- but app developers are going to continue to have plenty of rope to hang themselves (or their users) with. There needs to be a more fundamental approach to solving the energy management issue in mobile. This can be solved at many layers -- the OS, the virtual machine, the compiler -- and understanding how apps interact with the network would go a long way towards fixing things.

Where is my data and who has access to it? Let's be frank: Many apps turn smartphones into tracking devices, by collecting lots of data on their users: location, network activity, and so forth. Some mobile researchers even (unethically) collect this data for their own research studies. Once this data is "in the cloud", who knows where it's going and who has access to it. Buggy and malicious apps can easily leak sensitive data, and currently there's no good way to keep tabs on what information is being collected, by whom, for what purpose. There's been some great research on this (including the unfortunately-named TaintDroid) but I think there's lots more to be done here -- although we are sadly in an arms race with developers who are always finding new and better ways to track users.

What should a mobile web platform look like 10 years from now? I think that the research community fails to appreciate the degree of complexity and innovation that goes into building a really good, fast web browser. Unfortunately, the intersection between the research and web dev communities is pretty low, and most computer scientists think that JavaScript is a joke. But make no mistake: The browser is basically an operating system in its own right, and is rapidly getting features that will make it possible to do everything that native apps can do (and more). On the other hand, I find the web development community to be pretty short-sighted, and unlikely to come up with really compelling new architectures for the web itself. Hell, the biggest breakthroughs in the web community right now are a sane layout model for CSS and using sockets from JavaScript. In the mobile space, we are stuck in the stone ages in terms of exploiting the web's potential. So I think there is a lot the research community can offer here.

In ten years, the number of mobile web users will outstrip desktop web users by an order of magnitude. So the web is going to be primarily a mobile platform, which suggests a bunch of new trends: ubiquitous geolocation; users carrying (and interacting with) several devices at a time; voice input replacing typing; using the camera and sensors as first-class input methods; enough compute power in your pocket to do things like real-time speech translation and machine learning to predict what you will do next. I think we take a too-narrow view of what "the web" is, and we still talk about silly things like "pages" and "links" when in reality the web is a full application development platform with some amazing features. We should be thinking now about how it will evolve over the next decade.

Tuesday, January 22, 2013

The ethics of mobile data collection

The mobile computing and networking research communities need to start paying closer attention to the data collection practices of researchers in our field. Now that it's easy to write mobile apps that collect data from real users, I'm going to argue that computer science publication venues should start requiring authors to document whether they have IRB approval for studies involving human subjects, and how the study participants were consented. This documentation requirement is standard in the medical and social science communities, and it makes sense for computer science conferences and journals to do the same. Otherwise I fear we run the risk of accepting papers that have collected data unethically, hence rewarding researchers for not adequately protecting the privacy of the study participants.

I am often asked to review papers in which the authors have deployed a mobile phone app that collects data about the app's users. In some cases, these apps are overtly used for data collection and the users of the app are told how this data will be collected and used. But I have read a number of papers in which data collection has been embedded into apps that have some other purpose -- such as games or photo sharing. The goal, of course, is to get a lot of people to install the app, which is great for getting lots of "real world" data for a research paper. In some cases, I have downloaded the app in question and installed it, only to discover that the app never informs the user that it is collecting sensitive data in the background.

The problem is, such practices are unethical (and possibly illegal) according to federal requirements for protecting the privacy for human subjects in a research study. Even if there is some fine print in the app the use of data for a research study, it's not clear to me that in all cases the researchers have actually gone through the federally-mandated Institutional Review Board approval process to collect this data.

Unfortunately, not many computer scientists seem to be familiar with the IRB approval requirement for studies involving human subjects. Our field is pretty lax about this, but I think it's time we started taking human subjects approval more seriously.

It is now dead simple to develop mobile apps that collect all kinds of data about their users. On the Android platform, an app can collect data such as the device's GPS location; which other apps are running and how much network traffic they use; what type of wireless network the device is using; the device manufacturer, model, and OS version; which cellular carrier the device uses; the device's battery level; and the current cell tower ID. Similar provisions exist on iOS and other mobile operating systems. With rooted devices, it's possible to collect even more information, such as a complete network packet trace and complete information on which websites and apps have been used.

Put together, this data can yield a rich picture of the usage patterns, mobility, and network performance experienced by a mobile user. It is very tempting for researchers to exploit this capability, and it's easy to get thousands of people to install your app by releasing it on Google Play or the Apple App Store. However, I have very little confidence that most researchers are adhering to legal and ethical guidelines for collecting such data -- I bet the typical scenario is that the data ends up being logged to an unsecured computer under some grad student's desk.

So, what is an IRB? In the US and many other countries, any institution that receives federal funding must ensure that research studies involving human subjects protect the rights and privacy of the participants in such studies. This is accomplished through Institutional Review Board review which much occur prior to the study taking place. The purpose of the IRB is to ensure that the study meets certain guidelines for protecting the privacy of the study participants. The Stanford IRB Website has some good background about the purpose of IRB approval and what the process is like. The principles underpinning IRB review were set forth in the Declaration of Helsinki, which has been the basis for many countries' laws regarding protection of human subjects.

Failing to get IRB approval for a research study is serious business. In the medical and social science communities, failing to get IRB approval is tantamount to faking data or plagiarism. The Retraction Watch blog has a long list of cases in which published articles have been retracted due to lack of IRB approval. In those fields, this kind of forced retraction can destroy an academic's career.

Documenting IRB approval and informed consent for study participants is becoming standard practice in the medical and social science communities. For example, the submission guidelines to the Annals of Internal Medicine require an explicit statement from authors regarding IRB approval:

"The authors must confirm review of the study by the appropriate institutional review board or affirm that the protocol is consistent with the principles of the Declaration of Helsinki (see World Medical Association). If the authors did not obtain institutional review board approval before the start of the study, they should so state and explain the circumstances. If the study was exempt from review, the authors must state that such exemption complied with the policy of their local institutional review board. They should affirm that study participants gave their informed consent or state than an institutional review board approved conduct of the research without explicit consent from the participants. If patients are identifiable from illustrations, photographs, pedigrees, case reports, or other study data, the authors must submit the release form for each such individual (or copies of the figures with the appropriate release statement) giving permission for publication with the manuscript. Consult the Research section of the American College of Physicians Ethics Manual for further information."

But yet, in computer science, we tend not to take this process very seriously. I suspect most computer scientists have never heard of, or dealt with, their institution's IRB. I was surprised to see that CHI, the top conference in the area of human-computer interaction (in which user studies are commonplace), says nothing in its call for papers about requiring IRB approval disclosure for human subjects studies -- perhaps the practice of obtaining IRB approval is already widespread in that community, though I doubt it.

Why do I think we should require authors to document IRB approval? For two reasons. First, to raise awareness of this issue and ensure that authors are aware of their obligations before they submit a paper to such venues. Second, to prevent paper reviewers from having to make a judgment call when a paper is unclear on whether and how a study protects its participants. The whole point of an IRB is to front-load the approval process before the research study even begins, well before a paper gets submitted. The nature of a research project may well change depending on the IRB's requirements for protecting user privacy.

To give an example of how this can be done properly, colleagues of mine at University of Michigan and University of Washington are developing a mobile app for collecting network performance data, called MobiPerf. The PIs have IRB approval for this study and the app clearly informs the users that the data will be collected for a research study when the app first starts; clicking "No thanks" immediately exits the app. Furthermore, there is a fairly detailed privacy statement and EULA on the app's website, explaining exactly what data is collected. It's true that going through these steps required more effort on the part of the researchers, but it's not just a good idea -- it's the law.

This is my personal blog. The views expressed here are mine alone and not those of my employer.

Thursday, January 3, 2013

How to get a faculty job, Part 3: Negotiating the offer

This is the third (actually fourth) part in this series on how to get a faculty job in Computer Science. Part 1 and Part 1b dealt with the application process, and Part 2 was about interviewing. In this post, I'll talk about what happens when you get a job offer and how to negotiate when you have multiple offers.

There is often a long and painful wait from the time you complete the interview until you hear back from the school about whether they will be making you an offer. This is generally because all (or most) of the candidates need to complete interviews before the final hiring decisions are made, and the actual offer needs to be approved by the department or school administration before the candidate can be given the good news. Depending on how early you interview, this wait can be on the order of a month or two. (Generally, candidates interview between February and April, and offers start getting made around April or May.) Sometimes a school won't contact you at all after the interview, and after a while you figure you're not getting an offer after all. Sometimes they contact you fairly quickly to deliver the coup de grâce, which is greatly appreciated since then you can at least stop holding out hope.

As I pointed out in the previous post on interviewing, it is a very good idea to keep in touch with schools you are really interested in and let them know where you are in the process, and especially if you have offers from other schools. Usually this can be done via informal email to your host when you interviewed. The last thing a department wants is for their top candidate to take a job elsewhere before they have a chance to make an offer. So let people know what's happening and try to find out how your top choices are doing in terms of making offers.

There are three kinds of offers: (1) Straight-up offers; (2) "Offers for offers", and (3) Second-choice offers. I'll explain each below.

Straight-up offers

The best possible outcome is that you get a call from your host or the hiring committee chair who says, "I'm happy to let you know that we're going to be making you an offer." At this stage, you probably will not get into any of the details about salary, research funding, and the like -- that comes later.

Most of the time, departments will offer to fly you out for a second visit, sometimes with your spouse or significant other, so you can spend time getting to know the department, university, and town. This is much more relaxed than the interview, and is a great way to get to know your potential future colleagues under less stressful conditions. A second visit can be very important for deciding where to kick off your career as a faculty member: you will learn many things that you might not have had time to get into when you interviewed. In particular, you are going to care much more about things like housing, schools for your kids, quality of life, and other factors that you didn't get a chance to judge during the interview. Definitely do a second visit if you are serious about a school.

Offers for offers

The dilemma faced by many departments is that they have several really good candidates but only one (or maybe two) open positions. If a department blindly makes an offer to its top candidate, but that person is not that serious about taking the job there, then their second- or third-choice candidates (who might be just as good!) might end up taking offers elsewhere while the first candidate sits on the offer in the hopes of using it as a point of negotiation with another school. Also keep in mind that schools generally cannot have multiple outstanding offers for a single position.

So, sometimes a department won't make an outright job offer, but will instead feel you out to find out if you're really serious about taking a job there, a so-called "offer for an offer". The idea is that the department can (and will!) make a formal offer, but only after determining that you really want it.

From a purely selfish perspective, it might seem that your best strategy is to amass as many offers as you can so you have the most leverage when negotiating salary and other aspects of the compensation. But this also puts the department in a real bind if you end up sitting on the offer without any real intention of taking it. I don't think pissing a bunch of people off (even at a place where you don't take a job) is a good strategy for anyone trying to jumpstart an academic career.

Some schools do ridiculous things like exploding offers, which expire after a set time, to avoid the situation where someone sits on an offer for too long. Given that schools are rarely well-synchronized in their recruiting schedules, this can be disastrous: Say you get an offer that explodes after two weeks, but you haven't finished interviewing yet and still haven't heard from most of the schools. The last thing you want is to be forced into accepting a job at a school because the offer was going to time out. By no means should you be forced to make a decision on taking a faculty job before you have had a chance to evaluate all of your options. Personally, I think schools that do this are being idiotic and should think seriously about what kind of people they are going to be successful recruiting though such tactics.

I once heard a case of a hiring committee which couldn't make up its mind, so they called their top five candidates and said, "We have two offers available, the first two people who call us to claim the offer will get one, but it will explode in two weeks." I think this kind of strategy is a complete load of crap, and the hiring committee should be ashamed of itself for not being able to commit to their top one or two candidates and ride it through. But I digress.

Second-choice offers

It is often the case that you aren't the school's top choice, but you are their second (or third) choice for the position. Sometimes a school will tell you this outright: That they would love to make you an offer, assuming that their first-choice candidate declines them. This can sting, of course, and I question the wisdom of telling candidates this much information. Most people don't want to take a job somewhere where they feel as though they were the consolation prize. Sometimes, you find out through the grapevine that someone else already has an offer from that school, but later on you get a call with an offer of your own (and it just so happens that the other candidate recently accepted a job elsewhere). At some point you have to swallow your pride and appreciate that in a few months, nobody will remember (or care) that you weren't the first choice, and you got an awesome job at a good school, and that's all that matters. The point is that an offer's an offer, so don't worry too much if you weren't the department's original top choice.

From sitting on the faculty hiring committee at Harvard, I can vouch for how hard it can be for a school to narrow its choices to one or two people in a field of really good candidates. Often the choice of who to make the first offer to is arbitrary, based on some general vibe that you think the person might be more or less inclined to accept the job. A department might have two or three candidates who are all more or less equal but they have to make a first choice somehow.

What's in an offer?

In most cases, the initial job offer is verbal and you won't get a formal, written job offer until much later, based on extensive discussions with the dean or department chair about what you expect the offer letter to say. There are several components to most faculty job offers that should be (eventually) spelled out in writing:

The salary (of course). Usually salary is paid for 9 months of the academic year, with the expectation that you will pay the other 3 months out of a research grant. So if the offer is $100k for 9 months, that's really a 12-month salary of $133k.
Summer salary support. Since most junior faculty come in with no research grants, usually a department will offer to pay one or two summers' worth of your salary until you get grants of your own.
Teaching relief. At many schools, incoming junior faculty are given a semester of teaching relief which they can take at some point in the first couple of years. This gives you a little more free time to kick start your research and lessens the load of transitioning into the new job. My strong recommendation is to wait until your second or third term before taking teaching relief: Teaching a course (especially a graduate seminar) your first term on the job is a great way of recruiting students to your research group, and you're so screwed anyway the first semester as a new faculty member that teaching relief is hardly beneficial until you get your research group up to speed.
Graduate student support. Many schools will provide funding to support one or two grad students for a couple of years, to help seed your research group. Of course, you still have to identify and recruit the students (a topic for a future blog post). Keep in mind that grad students aren't cheap. In addition to their paltry salary, the student's tuition and fringe benefits need to be paid for. Typically a PhD student will cost around $75K year all in, so support for a couple of students is a lot of money.
Research support. This can take many forms depending on the school, but generally this is money (in some form) to help you get your research going in lieu of any grants. The best form of this is an outright slush fund which you can use to pay for anything related to your research: computers, equipment, students, summer salary, travel, conference registrations, pizza parties for the team, you name it. At Harvard, my "startup package" was in the six figures, but this is unusual; I think that most schools do something in the $20K range, sometimes less. (If the school is offering to pay for students or summer salary separately, you have to factor this in as well.) In many cases, a department will separately offer you some amount of equipment (such as a fund to buy a computers and laptop) in addition to, or in lieu of, a general slush fund. It depends very much on how the school manages its finances and chooses to account for things. Some schools without deep pockets may only offer you a hand-me-down workstation and a few hundred bucks to offset the cost of a laptop. It varies a lot.
Lab space. I don't know how common it is for a job offer to include an explicit provision for lab space (that is, not including your own office). In many departments, grad student space is a shared resource and there is not usually a need for dedicated labs for specific faculty. However, depending on the nature of your research, you might need specialized lab space -- for example, if you are developing a swarm of quad-copters you probably need some dedicated space for that.
Other perks. It is common for the department to pay for (or offset) your moving expenses, especially if you are moving from far away. An offer also might include things like temporary housing when you first move. Again, this varies a lot.

How to negotiate

Okay, so let's assume you're lucky enough to have a couple of faculty job offers in hand. What do you need to keep in mind?

First things first. Only negotiate with schools you are really serious about. It is a waste of everyone's time (and patience) if you feign excitement about a school just to get them to bump up your offer and use that as leverage against another school. People will know if you are bullshitting them. And keep in mind that even if you don't take a job somewhere, those people you run the risk of pissing off will continue to be important academic colleagues. One day they might be called upon to write tenure review letters for you. The point is you want to avoid making enemies.

Secondly, you can't compare industry and academic offers. At all. Compensation from industry is going to be much higher (especially over time) than any academic offer, when you factor in salary, bonuses, stock options, and the steeper increase year over year compared to a university job. So you can't expect to use an an industry offer as leverage to negotiate higher compensation at a university.

At many universities, the salary is non-negotiable as it is based on a standard scale that (in most cases) can't be changed. You might be able to negotiate a small salary increase if another school is offering much more, but this seems unlikely to me. Keep in mind that the range of starting salaries for junior faculty across different schools (at least among top-ranked research institutions) is pretty tight, so there's not much wiggle room there anyway. You can ask but don't be surprised if you're told that the salary is fixed.

If you can, try to get your startup package to be all or mostly cash. By "cash" I mean funding that can be used to pay for anything: students, equipment, travel, whatever. If your startup is segmented into X dollars for students, Y dollars for equipment, and so forth, that can constrain you down the line, if, for example, you end up wanting to hire more students than you expected or don't need as much travel funding. Fungibility is good.

It's a good idea to have a rough idea of how much you need to get started before you start talking hard numbers. When I did my faculty job search, I had in mind a research agenda involving building out an experimental workstation cluster as well as some other equipment needs, travel to several conferences in my first couple of years, and support for two students. I made up a quick and dirty spreadsheet to estimate how much all of this would cost and used that as the starting point for talking about the size of the startup package. If you have no idea how much you expect to spend -- and what you might spend it on -- you will have a hard time making a convincing case that you need more than what's being offered.

If you have a two-body problem (which is probably deserving of its own blog post), find out what, if anything, the university can do to help your partner land a job in the area. You may be surprised. When I was on the job market, my wife was finishing up medical school and we were going to make a decision about where to go in large part based on whether she would be able to get a good residency position. Although nobody could guarantee my wife a residency slot, the schools that were recruiting me helped set up meetings with a bunch of people to learn more about the programs in each area so we got a good sense of what her options were like. It is also not uncommon for universities to facilitate positions for spouses and partners of faculty they are trying to recruit -- many things are possible.

If you have kids, you should by all means try to negotiate for a spot in the university's day care center. The waiting lists for day care can be years long, but special exceptions can often be made when a school is trying to recruit a new faculty member. This is not always possible but it's worth asking about.

Finally, don't be greedy. This is not about maximizing your compensation and startup package and pissing everyone off in the process. Your goal in negotiating the offer is not to squeeze every penny you can out of them -- instead, it's to reach a point where you feel confident that the compensation and startup package will allow you to be happy and successful in your new job.

So which offer should you take?

Although I'm sure it happens, I would hope that nobody would take a faculty job just because it paid the most or had the largest startup package. If your only goal in life is to maximize your compensation, trust me: You do not want to be a professor. There are many, many other factors that are more important than the size of the offer: The culture and quality of the department, the students, the physical location, the quality of life ... the list goes on and on. In steady state, you're going to be a (relatively) poor academic, and struggling to get research grants just like everyone else. The initial salary and startup package can give you a boost, but it mostly comes out in the wash -- the absolute numbers won't matter much beyond the first year or so. So focus on finding the job that will make you happiest, not just that which pays the most.