Thursday, December 09, 2010

What should a BC government data catalogue do?

In my day job, I’m part of a team of people who are working to bring the BC Government into the global open data movement. It is very fun, exciting work.


In  Citizens @ the Centre: BC Government 2.0, our public service has committed that people from British Columbia and around the world can access our provincial government data to improve research and decision making, and foster innovation in  information services through things like web and mobile apps.

 

A key element of this shift will be to create a data catalogue that allows people to access BC’s data. And so the question of ‘what should a data catalogue do?’ becomes pretty relevant.

 

Let’s pause there and say that what follows is a set of ruminations and thinking, and not an official representation of the BC government’s position, and that it could be subject to radical change. If anything, it’s an official representation of me trying to do my job (my official title is Executive Director, Citizen Engagement in BC's Ministry of Citizens' Services) by engaging people who are smarter and more experienced in a discussion about  what would be ideal for BC to be doing, and where we might have blind spots as our team goes about our work.  Really, this is risk management—I don’t want to miss some great opportunities and I don’t want to do anything dumb either. That said,  I hope you’re keen to dig in alongside us. Many thanks in advance.

 

Looking around at what’s been done with data catalogues to date, you see most of them working with a basic concept of data provision.  Toronto, Vancouver, Australia and New Zealand fall pretty squarely into this category, with Edmonton’s catalogue being among the most sophisticated. Other sites like http://www.data.gov also encourage data conversations by not only providing data, but also seeking some dialogue around the data through blogs and discussion forums. More interesting to me are data catalogues like http://data.gov.uk that put an emphasis on data action by trying to connect ideas about using datasets to a development or analysis project. Equally interesting are sites like http://data.worldbank.org that seem to focus on data understanding by focusing a lot on visualizations of data sets, making them more understandable to researchers, policy types and other people who aren’t necessarily skilled in data manipulation.

 

We can sum these up as a series of intents or purposes for data catalogues: provision, conversation, action, and understanding. In my mind, while these overlap and build on one another, what you choose as your most important intent will have a big impact on the function and design of your catalogue.

 

So here’a  good time to stop and check—is this typology right? Is there another intent that’s missing that could extend what data catalogues can accomplish?

 

One thought that occurs is what I’ll label data relevance. Data relevance would try and personalize how data sets are presented, especially using locations. So you could imagine searching for an issue, seeing a location of an office that deals with that issue (so, search for health, see a hospital or a clinic or a local nursing school), and then present data that is relevant to that issue (see performance data or research data or enrolment data). Ideally, these data sets might even link to the specific place itself, so you could see the data produced at that location. This, I think, is what Tim Berners-Lee is talking about when he’s going on about ‘linked data’ or ‘web 3.0’—where data can find other data. We see this on the UK data catalogue, but the explanation about what it’s supposed to do is pretty fuzzy. I might be wildly off base in making this connection,  but the feel of where I’m going seems to be following Sir Tim’s line of thinking.

 

Looking at these opportunities, then, where should BC’s emphasis lie?

 

While understanding that provision is fundamental—if our data is no good or impossible to find, everything else is a non-starter—I don’t think that it should be BC’s emphasis. What I think I’d like is to prioritize our intents this way:

1.       Action—BC’s catalogue’s success will be primarily  measured by how many projects it sparks that make use of provincial data sets. As such, the site should be designed with focused calls to action that move people from exploring data into using the data in productive ways. It will use the social networking capacity of the internet to help  ideas connect with skills and other necessary resources to make things happen.

2.       Conversation—BC is in its infancy in providing data to the public in this manner. Having rich feedback loops that allow the province to sense demand for data, how it can be improved, and how it is being used will help BC get better at providing data. Luke Closs’s rough-in work for a new data catalogue for our Apps 4 Climate Action contest is a big inspiration—especially his ideas about how to triage a data set: http://demo.socialtext.net/a4cadata/index.cgi?how_to_triage_a_dataset

3.       Understanding –those skilled at manipulating data—software and web developers, economists, statisticians and researchers—are not the only people we want to learn about the data. Telling stories that place the data in context can help all kinds of people understand issues that are important to them, and help build understanding of the issues facing BC, and maybe what we can do about them. A great inspiration for this kind of approach would the Guardian newspaper’s http://www.guardian.co.uk/news/datablog.

4.       Relevance—that we build on GeoBC’s geospatial strengths to start connecting data sets to places as early as possible. The resulting map interfaces could be incredibly powerful.

5.       Provision—that we get the basics right, meaning: a) anything we call open data is in a format that is machine readable; b) that there is robust metadata that explains the data's provenance etc; c) that the data is structured through standards in such a way that it is usable; c) that it is findable through a strong search function; and d) that data, over time, becomes automated in terms of updates and publishing, using xml feeds and APIs. In my experience these basics aren't necessarily that easy, and really getting these right will be an ongoing process rather than something that gets done right off the bat.


What do you think? Is this the right order? Something missing? Am I off my rocker?

 

Depending on the answer, the next step will be to start to imagine some functionality that could support these intents, in this (or whatever) order. I’ll take that up in a subsequent post.

Posted via email from CoCreative