Wednesday, December 05, 2018

Fair Warning

If you ever want to enjoy a movie, radio show, or anything with bird sounds in the background, don’t ever learn bird calls and bird songs. I’ve said this before, either in discussion or maybe even somewhere on this blog. Actually, in counterpoint, when they get it right it really helps set the scene. If you want to start learning bird songs then try this Bird Academy class. Or if you are already somewhat experienced you could try the Quiz.

This quiz that I speak of is actually the project I have been working on the last three months. It is a quiz consisting of mostly unrated media, photo or audio, from the Macaulay Library archives, most of it contributed through ebird. I know it’s sort of like my child, but I absolutely adore it. I already took enough quizzes that I was mostly familiar with 80% of the birds in Japan in May…

I really would like to jump into the spaced repetition, but if as one of the rules in the link says you have to make your own cards, then it would be really hard for us to do with our random species and asset selection. It isn’t totally random, you can pick a place and a time of year. We get a list of likely birds seen in that part of the world, you will be disappointed with Barrow, Alaska in December, and then pick 20 random species. Then we get the photo or audio clip with the least number of ratings and serve up the quiz.

This works well for the most part for well inoculated users, there are terrible tiny photos and misidentified photos, but in general I really can’t get enough of it myself. There are a few issues that I think would make this a better experience for someone trying to use this to learn a new region: Weight the chance to see a species by the number of times you have seen it previously, and try to never show the same photo to the same user if it can be helped.

Let’s say that you are trying to learn birds of the Northeast US. One of them is most certainly the American Robin. At the moment in my area there are getting to be around 200 potential species. If you take the quiz 10 times and see 200 photos then you might have seen the good ol’ Robin about five or six times, and other species on an average of two or three.

This is a several runs of completely random choosing of species for quizzes. At 10 quizzes the user will only have seen 125 species of the 200. Even at 50 quizzes there are still 4 species the user has seen once, though because of long tails and randomness 40 quizzes should be a better indication at 13 species seen less than twice, or rather no chance to learn the species.

So totally random is not really a great way to go. On the other hand totally not random is an algorithmic nightmare unless handed over to Machine Learning or better: Machine Teaching. But let’s say we need a bit of middle ground. How about just calculating a weighted random selection that favors the species a user hasn’t seen yet? Basically every time the user sees a species, they are a bit less likely to see it. It isn’t a perfect solution, but it would be that much better.

This second approach shows a much steeper decline in the number of unseen species. The tail is still potentially long, but we have a much better chance of seeing more species over the course of fewer quizzes. We are only missing repetition for 57 species after 10 quizzes rather than 75 for the completely random approach. I know this doesn’t sound like much, but 20 species could be significant, especially if they are the most common species.

The second issue is a bit more tame. Essentially we are sorting the possible assets by how many ratings they have and then taking the most recent one with the least ratings. All we really need to do is get rid of the most recent and check for a user’s previous rating on a piece of media. It would take more time, but it could make it that much more rich. Really this is only a problem for species that have very few photos, so go take more! Maybe also unpopular bird species don’t get enough attention in that there isn’t enough turnover to churn-up the number of ratings on a regular basis. Hopefully will see less of the latter with more participants.

Why am I geeking out about this? Well it goes back to when my wife was studying birds for her two jobs doing bird surveys. She hand-built study decks and then reviewed them every night with Anki. Of course that fits in with the spaced repetition from before, but then I started working at the lab and saw how many photos we had, what a treasure trove to build something that could build a set of flashcards from 26k+ Robins, etc.! Starting this was like a dream come true, and through some not-so-dreamy work here we are. It needs some tweaking, but it is still superb.


I wrote this back in March before we released the app. The whole idea of the app is to help Macaulay Library to curate data via community ratings. Of course it is pretty close to being a learning app that people can learn birds with, but ML doesn’t want to step on toes. Nine months in we have collected 3 million ratings which helps us to show some of the best photos the community has taken. Ratings were critical in the release of eBird species pages where 6k previously un-curated species had photos and audio chosen if they had any.

I think the biggest wish now is to add a bit of filtering. For instance people might be well-versed in their finches, but want to focus on their shorebirds, even if you are birding locally. Currently with hundreds of species it could take quite a few quizzes to learn all of your shorebirds, though we would get quite a few ratings in the process. It is certainly a possibility, but we have plenty of projects to keep us busy, even for a good part of next year.

I've taken a few more since the start.

No comments:

Featured Post


John studied himself in the mirror as best he could through tears. Red, puffy eyes stared back at him, a running nose already leaked just a ...