Archive

Archive for the ‘Conferences’ Category

Open data sets in science

May 18th, 2011 No comments

I have a question to challenge all my colleagues working with research data in Computer Science: When was the last time you could replicate a previous study, from other author(s)?

For different reasons, over the past few months I have found myself diving into the rich collection of previous research works in several areas: Wikipedia studies, libre software engineering, social media and social network analysis, to name a few. Probably, many of you already know my inborn bias towards quantitative research (but also for multidisciplinar research methods). So, it may sound totally unsurprising that most of the publications I was reviewing included empirical experiments on different datasets gathered from a wide variety of sources, target systems and virtual communities. As I was scrolling through the pages, I realized, once again, the huge proportion of research work that cannot be replicated in a easy way. Still a sad lesson to be learned, considering that, today, most of us researchers work with digital data. And bits can be duplicated or sent to the other side of the world at negligible cost.

4-digit combination padlock I already commented in my first post the curious study conducted by my colleague Gregorio Robles, about replicability of research works published in MSR. For those of you unfamiliar with MSR series, this is a working conference (formerly a workshop) devoted to the art of “Mining Software Repositories”. It is also co-located with ICSE, preeminent conference on software engineering, so it attracts the top-notch specialists in this area. One would expect that a scientific conference focused on such an empirical, hands-on activity would encourage (and even demand) the ability to access all datasets and tools used in previous experiments, in order to i) better learn the insights of different methods and practical solutions to problems in this area and ii) to make their life easier to other researchers willing to build on top of existing methods, tools and results.

Far from this, the conclusions from the replicability study were quite dissapointing. From the 171 papers published in the 6 previous editions of MSR, the most frequent case (64 papers) is that of a study that uses publicly available data sources, but it doesn’t offer access to the processed dataset (the results), or to the tools/scripts to perform that study, either. Even more worrisome is a trend discovered in these publications: as time goes by, the number of papers with publicly available processed datasets was lower!! Therefore, the situation is getting worse.
Read more…

Categories: Conferences, Open Movements Tags:

Seamless support for open content

November 2nd, 2010 No comments

Over the past 4 years, I’ve been an avid consumer of open content, mainly images and text licensed under CC-BY-SA (my favourite license ever). 90% of times, I collect it to prepare slides and other learning materials for university courses, training sessions, lectures or conferences (the other 10% is just for fun, since I love photography and I release all my works under CC-BY-SA). I think we still have a long way to go to faciliate the search, creation and reuse of open content. And now, I have a great opportunity to share my experience with other people and learn other points of view.

Mozilla Drumbeat logoStarting tomorrow till Friday Nov. 5, I’ll be in Barcelona attending Mozilla Drumbeat Festival 2010. I admit I have high expecations on this unconference/festival or whatever name you give to an event that will bring together ~400 persons around OER and the Web. You can check the program here. The Festival has been designed as a forum to foster participation and quick interaction (maybe, it reminds me our great Open Space sessions in WikiSym, but on a larger scale)

Some of the sessions I plan to attend will cover different perspectives of a very important topic: how people create and reuse open content on the Web. In this line, we have for instance a session on “How to encourage content reuse”, another one exploring how to build better platforms to find open content (“Pathways to open content”) and finally, a brainstorming session about “The next big thing in OER”. Thus, I’ve been thinking about these issues, what they have in common and how we can solve any problems that open content creators and users may find. This is my attept to summarize my thoughts, so far.

From my personal experience, and according to comments from other colleagues, there are 3 main issues impacting open content reuse:

  1. Understanding which license to choose: We have many different licenses to choose for our content. However, many people still feel ok licensing their work under a Non-Commercial clause. While it’s true that this is a positive step towards openess, I think we also need to remind why licenses including NC clauses are not compliant with the Open Knowledge Definition.
  2. Searching for open conent: Still today, almost a decade after CC was created, it is still a pain in the neck to find open content on the web. Well, I don’t mean it’s difficult to find any open content or good open content (just visit Wikimedia Commons and let me know what you think). I mean it’s very time-consuming to find the open content you need for a certain cituation (exercise: find an image depicting a fire flame, with decent quality, not including a candle, lincensed under CC-BY-SA. How much time did it take you?).
  3. Using and storing open content: Finally, you found that great image for your slides. OK, you save it on a local folder, you include the image and link the original author (if needed), and you include a licensing comment. You’re done. Now, say that 3 months after that, you need again some images you already downloaded. You go to your local folder and… you don’t remember neither the author nor the license for most of them (if not all). You need to open the file where you used them to search for that info, or you search the web again (and pray for the search results to remain unaltered over the past 3 months). Sometimes, you end up including a long string on the file name to record this info, but that’s not very handy to tidy your stuff, right?

What we find here is the absence of a standarized, seamless support for embedding critical information in open content files (specially author info and license type). What if your favourite text processor or presentations software already tracks for you the author and license info and includes a footnote automatically? What if you can automatically create a table of licenses and authors in LaTeX? And my favourite ones: file managers. How about opening a local folder with Dolphin (or Nautilus, or Gwenview…), right click on your mouse and select “arrange files by author and license type”? They could also present a small note with that info on mouse rollover.

In summary, the root of all these issues (educating your users, finding open content on the web and leveraging the use of open content in academia and other contexts) is the lack of a standarized support to embed open content relevant info in multimedia files. Pierre Far, who’s leading the session on “Pathways to open content”, suggested a possible solution: XMP. This is an example solution for standard support to include information of file contents in the file header. It also supports many different types of multimedia files (including making use of EXIF heders we photographers love in JPEG files). But there may be others. I don’t mind what we finally choose, as long as everyone agree to use the same standard.

Conclusion: if we aspire to get real support from end-users to open content, we must help them offering seamless suspport to perform daily tasks required in the new workflow (dealing with licenses and author info). With this apparently simple step, we would shoot down all problems above with a very simple but effective move. Time for other people to jump in the discussion, and stadards masters to start thinking about this.

Looking forward to meeting you in Barcelona!

Wikimania 2010 recap

July 22nd, 2010 1 comment

OK, now it’s time for Wikimania 2010 summary. I’ve been thinking a lot on the best way to concentrate my thoughts in a short way. I think the best one is this: whenever I attend a conference/meeting, and I have real difficulties to decide which session to attend (because all of them are terrific) is a good signal. Well, every minute I spent in Wikimania 2010, I felt like that. “Mmmm, look at this one….but wait! I wanted to attend that one, as well… Oh no! Strategy Plan at the same time I’m giving one of my talks … What the heck!”.

Wikimania 2010 Gdańsk, Poland.

I admit it was pretty easy that this occurred to me, because: a) this was my first Wikimania; b) I gave too many talks (3!), thus missing other interesting slots and c) I wasn’t ready for the really active ambient of Wikimania. But, let’s go on with some futher details, since I have some “mixed feelings” about certain points.

Read more…

WikiSym 2010 summary

July 19th, 2010 8 comments

Finally, I had some time to write about the experiences in WikiSym and Wikimania 2010. Let’s start with the first one.

WikiSym 2010

WikiSym 2010 has been special in many aspects. The Symposium and Program Committees were appointed between Dec. 2009 and Jan. 2010. Thus, we had only 6 months to rush into everything (CfP, venue location, logistics, proceedings, etc.). We decided that it was a good idea to search for synergies with another important conference celebrated every year: Wikimania 2010. Gdańsk was a very attractive city, and potential interactions between attendees to both events could be great. In the end, we packed a very interesting week, overlapping both events. However, the challenge was also to test if both communities would be able to find common points of interest. Besides this, WikiSym 2010 explicitly broadened the scope of the conference, to welcome  presentations on Open Collaboration in general, beyond the scope of wiki platforms. So, many things to discover!

Read more…

Categories: Conferences, On-line Communities Tags:

5 Reasons to attend WikiSym 2010

March 25th, 2010 No comments

WikiSym 2010 is the 6th edition of the International Symposium on Wikis and Open Collaboration. It will take place in the beautiful Polish city of Gdansk, on July 7-9. It will be a great way to spend some days in the beginning of summer getting in touch with the latest, cutting-edge advances and applications in these fields. My colleague Phoebe Ayers is the Symposium Chair this year, and I had the great pleasure of being appointed as Program Chair.

This will be (hopefully) my 4th WikiSym in a row. I haven’t missed any edition since the first one I attended, back in 2007 at Montreal. Many people asked me why I’m so eager to come back every year. Well, if you have ever attended WikiSym, you may know why. WikiSym is not the “typical Computer Science” conference. It’s another jewel in a small set of conferences on emerging topics, all of them revolving around collaboration using Internet and ICT.

For those of you who never attended WikiSym in the past, I’d like to offer 5 very good reasons to avoid missing WikiSym 2010: Read more…

Categories: Conferences, On-line Communities Tags: