These days, I’ve been thinking again about the problems related with mining social network info. I’m still astonished at Facebook decision of suing Peter Warden because he gathered a massive collection of Facebook data. He was liable just because he wanted to disclose the set to researchers worldwide. Today, I read (via tweet from Ed Chi) an article on NYTimes saying that a growing number of companies are making business with end-user data. At least, their plans are clear… more or less. But I still wonder whether FB and Twitter users were aware of this side-effect: that FB and Twitter can discover a lot of things mining their public accounts. And sell results to private companies interested in them. After all, the best market study is one covering feedback from several million customers, right?
This leads to a natural question: who owns the content posted on social networks sites? Can other companies mine that data? What if my profile is public? How about someone collecting, say, one trillion of tweets from public streams? Could Twitter sue that person, just like FB attorneys did to Peter Warden?
Read more…
Today we have announced at Libresoft, the research group in which I work at University Rey Juan Carlos, a new research project sponsored by Wikimedia Deutschland, the German chapter of the Wikimedia Foundation.
The goal of the project will be to study, from a quantitative point of view, the impact of the flagged revisions extension on the editorial activity of the German Wikipedia. We will focus on the effect of this new tool to reduce the vandalism from anonymous users, while measuring possible influence on other key aspects like the trends of contributions from anonymous editors or the number of new registered contributors.

Wikipedia Logo
I’d like to say thank you to Wikimedia Deutschland for this opportunity. Hopefully, this will be just a starting point for more interesting projects about Wikipedia and massive on-line communities at Libresoft.
We expect to publish the results before Wikimania 2010, so that we can present a brief summary there of our findings to all Wikipedians and attendees in the conference. So… stay tunned!!
For more information, read the official announcement at Libresoft website.
No, this is not just a play on words. So far, I’ve collected some background examples to illustrate this apparent contradiction: public data on the Internet may be not so public. To be more precise, data privacy rights and other interests from some companies can play a major role in this problem.

From Wikimedia Commons
As a researcher on open on-line communities (and open here is a synonym of publicly accessible virtual groups), this is a key concern for me. Likewise, it should be important for many other colleagues in this field, and the myriad of companies collecting and mining huge data sets on a daily basis. Let me show you some examples.
On March 26-27 I attended the CPOV conference in Amsterdam, organized by the Institute of Network Cultures. In the same session, Stuart Geiger presented an overview of the influence of Wikipedia bots in the editorial work of the community. One example quickly got my attention: HagermanBot. This bot was created to look for unsigned comments on Wikipedia talk pages, then automatically insert the signature of the corresponding author next to them. The bot raised a strong controversy among users, since many of them thought that the guideline of signing comments on talk pages was just that, a recommendation, not a rule to be enforced. Some could be embarrased by login names displayed next to comments. Wait a minute: isn’t that information public? Yes, of course it is. Just click on the history tab and you get the revision history page, tracking all comments and their corresponding authors. Read more…
WikiSym 2010 is the 6th edition of the International Symposium on Wikis and Open Collaboration. It will take place in the beautiful Polish city of Gdansk, on July 7-9. It will be a great way to spend some days in the beginning of summer getting in touch with the latest, cutting-edge advances and applications in these fields. My colleague Phoebe Ayers is the Symposium Chair this year, and I had the great pleasure of being appointed as Program Chair.
This will be (hopefully) my 4th WikiSym in a row. I haven’t missed any edition since the first one I attended, back in 2007 at Montreal. Many people asked me why I’m so eager to come back every year. Well, if you have ever attended WikiSym, you may know why. WikiSym is not the “typical Computer Science” conference. It’s another jewel in a small set of conferences on emerging topics, all of them revolving around collaboration using Internet and ICT.
For those of you who never attended WikiSym in the past, I’d like to offer 5 very good reasons to avoid missing WikiSym 2010: Read more…
As a resarcher focused on quantitative anlyses of on-line communities, I need to keep up-to-date on the field. I have to read papers and articles, written by other colleagues and scholars on related topics. I must search for new methods and algorithms to cut out execution times, and finish before the next deadline. I have to evaluate new tools that let me create new graphs or compute new analyses. And I have to review many papers in different conferences, presenting results in this area. In this context, I’m still surprised by finding the same problem, over and over again.
When I started to study Wikipedia, 4 years ago, I was puzzled by the lack of reproducibility in most (but not all) of the papers and analyses I could find at that time. No source code available. Few implementation details. Little discussion on how to set up a similar environment and replicate the analysis. If you were lucky, you could access some evaluation version of a new cool tool, just to discover that it was deadly limited. Forget about the code. Try and do it yourself, if you can. That’s why, since the very beginning, one of the main goals of my PhD. was to publish an alternative, open source software tool to analyze any language version of Wikipedia. Read more…
Recent Comments