What is this post about?
About a year ago I started my PhD in Machine Learning with Philipp Hennig. Over this year, I have changed my mind quite a lot on different issues regarding PhDs.
The purpose of this post is to help other people, e.g. if someone is unsure whether to pursue a PhD, and to expose me to feedback from others. So if you think, my beliefs are wrong or could be improved, please reach out and let me know. I also want to make clear, that I’m not even close to mastering all of the following suggestions and you should think of them more as “things I try to improve and currently implemented to different degrees”.
When I started my Master in Tübingen, there were lectures by a new professor, Philipp Hennig. After the very first lecture, I felt that the way he thought and taught was on another level compared to all of my other lectures. I began to enroll in every one of his lectures and seminars and ultimately did my Master’s thesis with him. Afterward, I applied for a PhD position with him and got accepted.
Content-wise, I’m working on probabilistic Machine Learning, more specifically on how to make Bayesian Machine Learning faster. So far, I have written two papers, both of which have not yet been published. The first paper is an extended version of my Master’s thesis and it has been rejected multiple times (confusingly, often with above-average scores). We have now decided to rewrite the paper entirely and add new content and resubmit once again.
My second paper deals with the core idea of the PhD - finding faster approximations for Bayesian Machine Learning. It is submitted to JMLR but I’m still waiting for the first round of reviews.
Currently, I’m working on a new idea that I’m very hyped about. For the first time in my PhD, a project is actually going smoother than I expected.
This post is more critical than I anticipated. Thus, I want to make two things very clear.
- I started my PhD during the covid-pandemic. I have never been to an in-person conference and couldn’t draw a lot of value from the online alternatives. Thus, I don’t really know anyone outside of my own group who works on similar questions. Maybe without the pandemic, some of the following problems would have been easier to solve.
- I think Philipp and the MoML group are great. I’m very happy to have Philipp Hennig as a supervisor even if it sometimes feels as if his brain is running on the latest GPU while mine uses an old CPU. The people in the group are really nice and smart. I couldn’t have wished for more. Also, a lot of the content of this post is just stuff I learned from the group and especially Philipp. If something sounds smart it’s probably from them.
A PhD is harder than I expected
When I started my PhD I expected research to be hard but it turned out to be even harder. An analogy might capture this discrepancy best. The Bachelor’s program is like a highway - everything is already established, there are lots of resources online and one can generally move fast. The Master’s is more like a country road - content gets harder, you can still find stuff online but the speed is still decent. I expected the PhD to be like riding a bike on a cobbled road but now it feels more like slashing through the jungle with a machete. I’m not sure whether this is due to the nature of my project, the situation with covid, that I’m just not suited for a PhD or just a general feature of research but I found that other people tend to share my perspective.
I think one of the fundamental struggles of doing a PhD is that the things you know you don’t understand increase much faster than the things you actually understand. This makes it always feel like you understand less over time even though your absolute amount of knowledge is increasing much more than you realize because you compare it to the known unknowns.
Before I started my PhD, it was pretty clear to me that I want to do a PhD. I didn’t really have any strong reasons for it other than it being “the thing that proper scientists do” and “just the next step in the education ladder” similar to how a Master’s degree seems to just follow a Bachelor’s. However, as described above, the difference between a PhD and Master’s felt much larger than between Bachelors’s and Master’s and I would thus recommend fewer people to get a PhD as a consequence. If you are currently thinking about getting a PhD, I would recommend asking the following questions.
- Why do you want a PhD? If the answer is undefined or just the social status, I would advise against it. If most interesting jobs in your field require one, that might be a strong reason. But I would advise against doing a PhD “just because” since it is just not worth the gains without a strong reason. In comparison, private industry is often less stressful, pays you better, and gives you more stability for a future career. So in my mind, the burden of proof has shifted from “why would you not want to do a PhD” to “why would want you to do a PhD”.
- Are you comfortable working alone for prolonged amounts of time? Especially early on in your PhD you might work alone for prolonged periods of time. Your supervisor and colleagues might answer your questions from time to time but often it’s just you, the internet, and a lot of unanswered questions. I certainly underestimated this aspect. If you desire quick feedback cycles and close mentorship there are better alternatives outside of academia.
- Are you comfortable with slow progress and getting unsatisfying answers? Once you reach the cutting edge of research, progress slows down dramatically. Things not working is the norm and things working is the exception. For me, this was my biggest struggle and still am not entirely used to it. Maybe, I also just got lucky with my latest project that looks quite promising.
If you answer one of the above questions with a definite no, then I would recommend not to pursue a PhD. If you are unsure, I would recommend talking to more current PhDs and people who decided against a PhD to get both perspectives (you should do that in any case). I would take the experiences of people who have already finished their PhD with a grain of salt because we tend to forget how bad past experiences truly were and overwrite them with a rosy picture of the past.
I found the following frame of mind suggested by Leon really helpful. Think about a PhD as being paid to learn things very accurately. Sometimes a thing you learn is valuable and other people don’t know that yet. Then you can write a paper telling everyone about the thing you found. If the thing you learned is already known or not valuable that doesn’t matter too much because you learned something. I like the frame because it removes the pressure of “making every project a paper” even if it’s not valuable. This mindset is only instrumentally valuable once you have already been accepted to a program, in the application process it will likely hurt rather than help your chances of acceptance.
When I started my PhD, my perception of the supervisor-student relationship was basically: “The professor knows a lot and is very smart. I know nearly nothing and am less smart. Thus, I mostly listen to what the professor says and develop their ideas.” In retrospect, however, that seems suboptimal. Firstly, I often didn’t understand exactly what Philipp meant during a meeting and wasted a lot of time second-guessing his true intentions afterward. It sometimes felt like he had already mapped out the entire project in his head and I was just unable to follow the map at the same pace. Secondly, due to the large perceived skill and intelligence difference, I sometimes didn’t want to ask dumb questions or rephrased them in weird ways to make them sound less dumb. Most of the time, the rephrasing just made it much more confusing. As a consequence, I wasted some of my and Philipp’s time and felt bad more often than I should have.
Now, my recommendations would be
- Own your project: Your advisor is just that - an advisor. They are not a dictator but should rather help you with your PhD project. They might have picked that project for you and are more knowledgeable about it in the beginning but it is still important that you develop a sense of ownership and responsibility for it. In the end, you will have to defend it and you will answer questions from colleagues about it. If you don’t feel like it’s yours, other people won’t either.
- Ask dumb questions: Don’t forget that the reason why you work with an advisor is that they know more than you. It would be weird if you didn’t have a lot of questions that might be simple for them to answer. This doesn’t imply that you should ask all questions that ever come to mind but if you are unable to find an answer on your own you should ask for their help. Now that I’m doing a PhD, I also help out with teaching and supervise my own Bachelor’s students. From the teacher/supervisor perspective I can definitely say that the people who ask questions move much faster in their research even if some of the questions could have been answered by themselves.
Philipp always likes to say that at the beginning of the PhD he is the expert and I’m the student and at the end of the PhD we should have swapped roles. I think this is a very helpful general framing for the supervisor-student relationship.
The Twitter discourse around PhDs is often focused on questions such as “How many hours do I need to work to be a good PhD? More than 50?”. I think this is totally misguided because it focuses on time rather than output. There might be a person who hustles 50+ hours every week and still doesn’t further the scientific discussion and there might be another person who chills all day, has one good idea, and significantly improves the state of the art with it. Clearly, putting in more time will yield better results in expectation, as you can test more hypotheses, etc. but even then time is only a proxy for your actual goal of increasing scientific progress. Thus, rather than counting hours, I would recommend asking yourself whether the goal you pursue would advance the scientific frontier in an important direction and whether you make consistent measurable progress towards it. I know that this opens the door for other problems but I still found it to improve my output and make me happier rather than tying my sense of self-worth to the amount of time I work.
Another thing that I found quite useful is harvesting productive hours. Especially when I have to do a cognitively demanding task such as deriving a mathematical equation or implementing a complicated model, getting three undisturbed hours after a good night’s sleep got me further than an entire day with bad rest and disturbances. Thus, once again, measuring time rather than output would give the wrong impression in this case.
I have written an entire blogpost on productivity if you are interested.
Your advisor is usually pretty busy. They have many different students, they teach, they have to do admin stuff, write grant proposals, and so on. Thus, they have a lot of different things on their mind. A PhD student, on the other hand, is able to focus their time mostly on their project. Thus it can easily happen that the PhD student enters the meeting ready to pick up exactly where they left off last time without realizing that the supervisor was in tens or hundreds of meetings and has no chance to remember it in any comparable detail. Therefore, they might suggest a strategy that was already discarded in an earlier meeting, redo a derivation or misremember some important detail. To prevent these kinds of things from happening, I would recommend to
- Really prepare meetings: I would argue that Philipp’s answers to my questions, strategic insight, and pointers to resources probably save me at least 20 hours in expectation per 1-hour meeting. Thus, it is completely reasonable to really prepare a meeting, e.g. by making slides with a summary of the last meeting, plots, and research questions. The less often you meet, the better your preparation should be.
- Gently take control: Your advisor usually has a lot of ideas. Some of them might be brilliant, some of them might be valuable for future projects but not now and some of them were already suggested and discarded in the previous meeting. So the less structure you give the meeting, the more explorative it will be. Sometimes this is nice, but more often than not, I just want to know the answer to a very specific question. Therefore, I found it helpful to state clear goals in the beginning and try to get back on track when I feel like our conversation has shifted too far away.
Choosing a project
At the very beginning of your PhD you might focus only on one project. But very soon, there will be more potential projects than time to complete them. There are a couple of heuristics I found useful so far.
- Does it fit the bigger picture? Optimally, a PhD is not a collection of lose ideas but rather has a clear goal from the beginning. This goal might change during the PhD but I think it’s still valuable to have a clear question that ties your projects together. A good heuristic to evaluate whether you have such a goal is if you can answer the questions “you are the person that does X” such that X is one short sentence, e.g. “I want to be the person that makes Bayesian ML faster”. If a project doesn’t bring me closer to that goal, I should not start it no matter how interesting I find it. There are a lot of advantages of having a core question that ties all of your individual projects together. On an individual level, your projects will get progressively easier since you built upon your previous work and it is likely easier to get hired in the future. From a societal standpoint, it is advantageous to have people specializing and working in narrow domains since this moves the overall scientific frontier faster than everyone pursuing many different projects.
- Pick only projects that you really care about: Don’t ask whether this idea would be interesting if developed or whether you would like your name on a paper. Rather ask yourself if you would be willing to do all the experiments required to convince yourself, all the work to convince reviewers, and put in the work for all follow-up (e.g. presentations, questions, code fixing, etc.) after publication. This means that you might say no to a lot of ideas or potential collaborations. But if you are unwilling to say no, you either can’t fully commit to any project or publish a lot of half-baked ideas - both of which feel miserable and don’t fulfill scientific standards.
- Kill your darlings: Most projects have some kind of Achilles heel in the sense that there is one finding that would kill the project if true. In my first project, I have subconsciously tried to dodge that Achilles heel probably because “if true it would invalidate all the work I previously put in”. However, after reading This tweet by Shengwu Li, I now try to actively seek this Achilles heel as fast as possible and investigate it. The sooner I do it, the less emotionally involved I am and the more I can change the direction of the project. Not doing it will only result in a bad review experience but never resolve me of actually looking into the Achilles heel eventually.
Writing a paper
I don’t think I’m very good at paper writing but I think I have become less terrible at it. Thus, the following suggestions are probably targeted more towards beginners than experts.
- Think about value: When I think about a paper it’s usually along the lines of “Isn’t that the paper that introduced this new method?” or “the paper that made something faster”, i.e. I think about a paper in terms of the value it creates. Thus, I think it makes sense to think about which value your paper provides and then frame everything around it. Value could be a new state of the art, new method, speed up, robustness, simplicity, new insight, etc., but whatever it is, it should be emphasized from the beginning and the experiments should be the evidence that this value exists.
- Whatever you write has to be true, understandable, and concise in that order. This is another one of Philipp’s great insights and it is really important. A paper is harmful if the content is not true. Not only does it reflect badly on you but it also misguides your colleagues who work on the same problem. To a lesser extent, this is also true for overpromising your results or representing them in a dishonest way because you waste someone else’s time by doing so. Once you have ensured that the content is true, you should try to explain it as clearly as possible. The goal should be that another person in your broad area of expertise can reproduce your results. Lastly, cut the paper down to make it more concise and improve the reading experience. However, be careful that truth and understandability trump conciseness - a short paper is useless if not true or understandable.
- Start writing early: I found that “writing the paper once the experiments are done” is a much worse strategy for me than “writing the paper as soon as possible”. The process of writing my thoughts down already clarifies how I want to run the experiments and directly writing a section after the results are done immediately shows problems that need to be corrected. So whenever I explore a new project I open a new overleaf document for it. Even if the project never becomes a paper I still have documentation of what exactly didn’t work out as expected and could reproduce the results in the future.
- Iterate yourself: It’s obvious that papers get better if you iterate over them and distill every paragraph down to its essential logic. Many of my colleagues iterate the paper many times in the last week(s) before the conference deadline but this didn’t work too well for me. I found it was easier to iterate with long time spans (e.g. one week) between iterations to get emotionally detached from my previous writing process and be able to think “what I wrote was bad, here is a better version” without feeling like I have to defend my past self.
- Get feedback early on (and give a lot of feedback): Just getting a glimpse of which parts other people struggle with is really helpful, even if it’s just in a chat over lunch. The earlier you get feedback, the less emotionally attached you are to work that you have already done and the more willing you are to change your paper for the better. Furthermore, I found that giving other people feedback on their papers is really helpful for me as well. It shows me how more senior people write, which kind of things they choose to explain vs. reference, and whether I can follow their logic.
- Make beautiful figures: When I first look at a paper I read the abstract and then look at all the figures - and I know many who do the same! Whether or not your figures look clean, overloaded, miss crucial information, etc. not only sets the tone for how I read the paper, it also determines to which extent I understand it. Thus, figures should not be seen as a clarification of the text but rather the text a detailed description of your figures. So whenever you can explain a concept well through a figure, I would do so. Furthermore, investing a lot of time in clean figures is usually worth the effort as you can reuse them on many future occasions such as presentations.
My current understanding is that writing a paper is partly a skill that comes over time but largely also a logistical exercise. One should start early enough to get feedback, leave time gaps between iterations, make sure that the figures look good, and only write things that one has actually confirmed rather than including a speculative experiment the night before the deadline.
Academia is even more broken than I expected
I heard about many problems in academia before starting my PhD. They reach from misaligned incentives over most people pursuing projects of questionable value to bad reviews. But my expectations were even slightly exceeded in practice. The following two stood out in particular.
- Misalignment of incentives: In science, you are supposed to improve the frontiers of knowledge. In academia, on the other hand, you are incentivized to produce research with numbers that look good. Your personal sense of achievement is usually higher when you have a positive rather than a negative result. Papers are often rejected for “not beating the state of the art”, even if that shouldn’t be a requirement for new knowledge. If you want to get tenure, your hiring committee might look at your h-index and the number of papers you published which just further incentivizes you to publish as much as possible (nearly) regardless of its quality. I personally found this not only frustrating but also deteriorating my ethical guidelines. In the beginning, I just plainly stated negative results. After a couple of reviewers said that the paper would be good if the results were better, I started to reframe parts of the paper to sound better and decreased the parts about negative results. What initially was “just a way to please my reviewers” became a mindset that stuck and I have to actively counteract.
- Quality of reviews: In the Machine Learning community papers are often published at conferences rather than in journals. This means that the reviewers are chosen from a much broader spectrum of expertise and the review process is shorter compared to journals. On top of that, reviewers are usually assigned between four and seven papers per conference which makes it nearly impossible to do a rigorous review of all of them. Thus, the average quality of reviews is said to be worse than it once has been (can’t judge myself). From the reviews that I have gotten, I would say that around half were helpful and improved the paper and the other half was not helpful at all. The reviewers either came from a completely different background or didn’t have the time to read the paper properly. I wouldn’t even say that it is mostly their fault. They probably didn’t choose to review something outside of their expertise - much less seven of them. The overall experience is frustrating to the reviewer and the reviewee. The reviewer has the feeling of being burdened with the impossible task of assessing a paper in a different domain with different norms and standards and the reviewee has the feeling of being misunderstood and not getting a proper assessment.
The awareness of these problems is slowly increasing and more and more people demand improvements but fixing these problems is a very very hard task that a lot of smart people have already attempted and failed to solve. I’m aware that it is a very sticky problem and it will likely take years to improve and decades to solve but it feels frustrating nonetheless.
I think it is quite easy to get into a mental tunnel during your PhD. Then, the only things that exist are conference deadlines, accumulating citations, and looking for career opportunities. I personally found it quite easy to fall into this mindset and have quite actively tried to distance myself from it because I realized that I started tying my personal self-worth to the success of my research projects. And since my research failed more often than not, I usually didn’t feel too great during the beginning of my PhD. Thus, I want to give a little bit of high-level advice to prevent this tunnel vision.
- Don’t take yourself too seriously: I find it hard to describe exactly what I have in mind, but I met some people who pretended that their work is just one step away from solving major societal problems while my outside perspective was that they are doing solid incremental work but not more. Just to be clear, I do think people can have a vision and should be ambitious and driven. However, I think it helps to be able to recognize which scope your project has and how promising it is. This helps to a) soften the fall once you realize that your project is less impactful than expected and b) prepares you for the situation in which you don’t have any major research findings because that is what 95% of research careers look like since science is super hard and stochastic.
- The steady turtle wins the race: I don’t like deadline stress and I think it’s counterproductive. After my first deadline sprint, I not only felt exhausted but had also lost all motivation to do research for a couple of weeks. This clearly is not an effective or sustainable strategy and I can only recommend chilling more before conference deadlines. I think it’s much better to start the paper early and try to finish it weeks before the deadline or just submit at a later conference if it still has major content gaps a week before the deadline rather than living on caffeine and little sleep for a week.
- Take active breaks: Often when I can’t find a bug or a mistake in my derivation I will spent hours looking for it while getting increasingly frustrated. I found it very helpful to just stop way earlier and take a break. Usually, I take a walk and get some fresh air but in any case, I will try not to think about this particular problem. Once I feel less emotionally stressed I return to the problem and often find the solution quite quickly.
Get a research partner
Whether this makes sense for you or not depends a lot on the setup of your PhD. In my case, I’m the only one in my group who is working on a specific problem and thus don’t “naturally” work too much with others. Therefore, for a long time my research experience was one hour a week with my advisor and 39 hours just with myself and the internet. I found this very frustrating because I really like fast feedback cycles and interaction with others helps me clarify my thoughts. Thus, I approached another PhD student in my group who was also working mostly on his own at the time to meet up weekly and explain our research to each other. Then we would ask simple questions, give suggestions, etc. and I found it extremely helpful. Often just having a smart person to ask good questions is already enough to make leaps in your project. By now, both of us work on collaborations and thus meet less regularly, but I still think it’s one of the most low-hanging fruits in research if you mostly work on your own.
One last note
If you have any feedback regarding anything (i.e. layout or opinions) please tell me in a constructive manner via your preferred means of communication.