Wednesday, December 05, 2018

Fair Warning

If you ever want to enjoy a movie, radio show, or anything with bird sounds in the background, don’t ever learn bird calls and bird songs. I’ve said this before, either in discussion or maybe even somewhere on this blog. Actually, in counterpoint, when they get it right it really helps set the scene. If you want to start learning bird songs then try this Bird Academy class. Or if you are already somewhat experienced you could try the Quiz.

This quiz that I speak of is actually the project I have been working on the last three months. It is a quiz consisting of mostly unrated media, photo or audio, from the Macaulay Library archives, most of it contributed through ebird. I know it’s sort of like my child, but I absolutely adore it. I already took enough quizzes that I was mostly familiar with 80% of the birds in Japan in May…

I really would like to jump into the spaced repetition, but if as one of the rules in the link says you have to make your own cards, then it would be really hard for us to do with our random species and asset selection. It isn’t totally random, you can pick a place and a time of year. We get a list of likely birds seen in that part of the world, you will be disappointed with Barrow, Alaska in December, and then pick 20 random species. Then we get the photo or audio clip with the least number of ratings and serve up the quiz.

This works well for the most part for well inoculated users, there are terrible tiny photos and misidentified photos, but in general I really can’t get enough of it myself. There are a few issues that I think would make this a better experience for someone trying to use this to learn a new region: Weight the chance to see a species by the number of times you have seen it previously, and try to never show the same photo to the same user if it can be helped.

Let’s say that you are trying to learn birds of the Northeast US. One of them is most certainly the American Robin. At the moment in my area there are getting to be around 200 potential species. If you take the quiz 10 times and see 200 photos then you might have seen the good ol’ Robin about five or six times, and other species on an average of two or three.

This is a several runs of completely random choosing of species for quizzes. At 10 quizzes the user will only have seen 125 species of the 200. Even at 50 quizzes there are still 4 species the user has seen once, though because of long tails and randomness 40 quizzes should be a better indication at 13 species seen less than twice, or rather no chance to learn the species.

So totally random is not really a great way to go. On the other hand totally not random is an algorithmic nightmare unless handed over to Machine Learning or better: Machine Teaching. But let’s say we need a bit of middle ground. How about just calculating a weighted random selection that favors the species a user hasn’t seen yet? Basically every time the user sees a species, they are a bit less likely to see it. It isn’t a perfect solution, but it would be that much better.

This second approach shows a much steeper decline in the number of unseen species. The tail is still potentially long, but we have a much better chance of seeing more species over the course of fewer quizzes. We are only missing repetition for 57 species after 10 quizzes rather than 75 for the completely random approach. I know this doesn’t sound like much, but 20 species could be significant, especially if they are the most common species.

The second issue is a bit more tame. Essentially we are sorting the possible assets by how many ratings they have and then taking the most recent one with the least ratings. All we really need to do is get rid of the most recent and check for a user’s previous rating on a piece of media. It would take more time, but it could make it that much more rich. Really this is only a problem for species that have very few photos, so go take more! Maybe also unpopular bird species don’t get enough attention in that there isn’t enough turnover to churn-up the number of ratings on a regular basis. Hopefully will see less of the latter with more participants.

Why am I geeking out about this? Well it goes back to when my wife was studying birds for her two jobs doing bird surveys. She hand-built study decks and then reviewed them every night with Anki. Of course that fits in with the spaced repetition from before, but then I started working at the lab and saw how many photos we had, what a treasure trove to build something that could build a set of flashcards from 26k+ Robins, etc.! Starting this was like a dream come true, and through some not-so-dreamy work here we are. It needs some tweaking, but it is still superb.


I wrote this back in March before we released the app. The whole idea of the app is to help Macaulay Library to curate data via community ratings. Of course it is pretty close to being a learning app that people can learn birds with, but ML doesn’t want to step on toes. Nine months in we have collected 3 million ratings which helps us to show some of the best photos the community has taken. Ratings were critical in the release of eBird species pages where 6k previously un-curated species had photos and audio chosen if they had any.

I think the biggest wish now is to add a bit of filtering. For instance people might be well-versed in their finches, but want to focus on their shorebirds, even if you are birding locally. Currently with hundreds of species it could take quite a few quizzes to learn all of your shorebirds, though we would get quite a few ratings in the process. It is certainly a possibility, but we have plenty of projects to keep us busy, even for a good part of next year.

I've taken a few more since the start.

Tuesday, December 04, 2018

Conflicted Progress

Read these two articles:

How can these two exist at the same time? One must cancel out the other like antimatter/matter annihilation. Or can they both be right?

Interestingly, I think both are possible, though the “Superpowers” article is mainly writing from the viewpoint of survivorship bias. That is the author is saying that because they did it, so can you, and because you are reading it then they succeeded wildly by following their own rules.

The Farnam Street Blog looks at several cases of Winner Takes All, but doesn’t tell you how to overcome that, except maybe buying several thousand copies of your own book. Does it cancel out the rule-following concept in the more positive article? No, but I think it tempers it quite a bit.

When it comes down to it: We can all improve, maybe even the top performers, but really the only person that would have thought we have superpowers is our former self. Let me frame it with the Recency Effect.

Let’s say that you are a hotdog eating contestant and you can eat ten hotdogs in a minute. You are so good that you have sponsorships and tour the country. Then someone comes along and introduces a new technique and is able to eat one hundred hotdogs in a minute. At the behest of your sponsor you learn the technique and are able to eat one hundred plus one hotdogs, reclaiming your crown.

Being at the top and then reclaiming the top certainly looks like superpowers to folks lower in the pecking order, but even the lightweight packing away only eighty per contest would scoff at your old record of ten due to your new achievement. You might appreciate the difference, but you only really have your initial reaction to gauge just how monumental a jump that is.

All’s good in hotdog-land, but what about starting from the bottom like removing a bad habit? Initially people may notice if it is an obvious habit, but soon it will be old history. Good changes are like a differential for people noticing and hopefully a integral in improving life. That is: people will only really notice as the change is happening, or if asymptotic like a haircut, right after, but then that will be the new normal. But maintaining a good habit like brushing your teeth might literally improve your life expectancy, and no one runs away from conversations anymore.

Self-set rules will only get you so far in becoming the best, the rest is how they relate to your goals, but I think they are a huge step in getting you away from the worst. Rule-following has another name: discipline.

Monday, December 03, 2018

And It's December

I just turned in my final assignment. By final I mean final assignment, meaning the last one in my master’s degree. I mean like if my team doesn’t miss more than four out of 16 points I will graduate on the 15th. I mean final like there is a chance for extra credit that is bigger than any other assignment that might mean the difference between me graduating and not. I dislike this class with a passion.

The degree, Master’s of Computer Science at Georgia Tech. The final class, Introduction to Health Informatics. Would I ever suggest taking this course? No, just a straight no. The only reason I can see for someone to take it would be if their job is going to deal with Health Informatics. If they are already dealing with it, forget it, they will be happier not taking this class.

For those of us that were never going to go into healthcare software this is really just an exercise in frustration. The paramount frustration is supplied by the course itself, or rather the original instructor. He essentially describes all the different ways everyone is doing healthcare data, and what a nightmare interoperability is. The tone is set in frustration that even after 60+ years of thinking about electronic health records that it’s still a really hard problem. Just like telling doctors they aren’t doing something right.

The second frustration was how the class was run. It seems that it was mostly TA run because the professors aren’t actually technical, but having taken classes by David Joyner I found their expectations did not normally match their directions and the grade weights for particular sections that were administrative details far exceeded actual content. Missing a name in a presentation was the difference between getting 100% and getting a B in one case.

The third frustration was with our TA, responding to pertinent questions after the assignment was due. Nothing better to make an assignment even tougher.

If there was an alternative, I would have taken it. And throughout the 4 years that I have labored through all of these courses there have been a few that have been promised to whole time, one that would have been interesting is the computational journalism or something like that. It couldn’t have been worse than this one, right?

In some ways I really shouldn’t complain, the main one being that there were no tests. A few courses with tests have been my worst courses. Give me coding and projects over tests, I am just not built for them anymore, who is with google right there?

The amount of time and money that I will have available in the future is sort of daunting. What will I do with all that time? But I think saving the money is going to be pretty obvious for that. I guess I can rewrite my novel, write another, and spend time with my wife. She’ll like that.

Featured Post


John studied himself in the mirror as best he could through tears. Red, puffy eyes stared back at him, a running nose already leaked just a ...