I stumbled upon some unexpected behavior from the “similar users” section of OkCupid. It is possible, starting at a gay man’s profile and navigating to a “similar user”, to end up at a straight man’s profile (by way of a bisexual profile). This could result in some unwanted pairings…
Looking to determine how often this happens, I looked at the unofficial APIs from class. Unfortunately, these didn’t provide any access to the “similar users” list; this meant deconstructing the javascript to find out how this data is accessed. Once I had programmatic access, I also discovered that they start rate limiting you after ~20 page calls. This made it difficult to do any large scale data collection in a small amount of time.
In total, I scraped about 120 profiles, starting with gay ones. I was only able to hit one straight profile among them all. About 10% were bisexual, but those weren’t hit until several layers of depth in the crawling.
I believe starting with a bisexual profile would more quickly expose a straight one. Next steps are using a dummy profile (so mine on’t get banned) and letting it run long term with pauses/sleep/delays so I don’t hit the rate limit. Then perhaps mapping it out as a network/node visualization