Adventures with joined-up culture data

Jon Pratty is Relationship Manager (Digital and Creative Economies) of Arts Council England. Earlier this month, he drew on his vast experience to discuss linked open data and culture with the Open-data Brighton and Hove Group. This is an edited version of his talk.

I feel the same kind of excitement about Open-data Brighton and Hove as I did when I first got webby back in 1993-94.

In those days, we sat isolated, unconnected, on islands of digital stuff, gradually brimming half-meg hard drives, occasionally making it into public places via strange things like Geocities or The Well.

Training as a journalist in 1995, I realised joining up people and content was a key dynamic of online journalism.

Sitting in The Telegraph newsroom in 1998, I saw ticking away on a screen and knew it was one of the best things I’d ever seen.

It was aggregated content from a few newsrooms then sending out feeds. It is still one of my key sites; it is like standing on top of a hill and clocking content as far as the eye can see.

Taking my online skills to 24-hour Museum [now] in January 2001, I wanted the site to plug into the newsnow dynamic.

By 2003, our tech wizards at SSL had built us the first dedicated RSS feed of museum and gallery content in the UK, and probably beyond.

Lots of lessons were learned about the importance of titling and keywording remotely-published content, and also when to publish, depending on when your RSS feed refreshes.

An early lesson was to trust your instincts. Most people in public services and higher education said we should use RSS 1.0 or Atom for feeds.

A sideways glance at the mushrooming RSS culture in publishing and journalism showed RSS 2.0 was the way to go. So that’s the way we went. Go with the majority when it comes to joining things up. Don’t put your money into Esperanto when everyone is speaking English.

Other people in the culture sector wanted to join things up, too.

In 2002, the 24-hour Museum’s funders, the Museums, Libraries and Archives Council [MLA], led a project to join up some databases to allow cross-searching.

At the 24-hour Musem, I could see it was a good idea, but the tech used needed to be rich and useful, yet universally accessible. Money was given by MLA and the project was called the 24-hour Museum Metasearch Project.

Learning from that? Well, it’s still ongoing.

Back then, the tech chosen was OAI harvesting, with Z39.50 as a standard. Baffling for me, because I’m not into the tech. The big question in my mind was: why would you do it?

The project leaders decided to join dissimilar data sources from different subject museums. I wanted to know how you bring to life content from different sources, not how you patch together the data.

It seems so simple today. Why join databases together with great big pipelines of mechanical connections – when, really, the first discussion should have been audience-focused.

Someone needed to ask: “What do we want to say? And to whom do we want to say it?”

Also problematic was that data interactions were not live. We’re talking about harvesting here. The databases we joined up were speaking to each other only once a week. This was not the real-time web we love today.

Perhaps the biggest problem for me was that museums or galleries had to pay to engineer a connection to the metasearch project.

Just imagine paying £5,000 to join Twitter!

At some levels, these sort of projects are still being suggested even now. It’s got to be free or at least very easy for cultural organisations or individual artists to export data. The next generation of open-source CMS need to have options for multiple data outputs, RSS, API, or whatever.

The metasearch project pilot led eventually to CultureGrid, our current integrated culture data collection, via something called the Integrated Architecture Project and the People’s Network Discovery Service. These were waypoints towards a stronger strategy that is now morphing into interesting things all the time.

While CultureGrid still has its roots, unfortunately, in OAI technology, there are some cool things being done with a company called Knowledge Integration, which has built something called a terminology engine on the side of CultureGrid that may one day operate as a kind of taxonomy generator.

So what’s the latest picture in terms of cultural data online?

Basically, it’s coming together. Efforts within the UK and US museum geek community concentrate mainly on discussing [endlessly] ways to engineer linked data perfectly.

This might produce wonderful connections between content one day. And it could be a pathway to the lovely patch of sunlit digital downland called the semantic web.

But here’s a reality check: there’s lots of data out there to work with and try to join up in meaningful ways for audiences.

As I saw with the metasearch project and CultureGrid, it’s easy to join similar databases using one technology.

The hard thing is to join dissimilar content in meaningful ways: older content, legacy stuff, archives, .pdfs, ancient databases, different types of files, weird digital standards.

This is the real gross pathology of the digital landscape.

Two years ago [as a consultant] I proposed to JISC a relatively simple data-mining and indexing effort to bring back to life £70 million-worth of Lottery-funded heritage and museum websites, two-thirds of which now lie sleeping but with useful data and content. It wasn’t greenlighted; it wasn’t a great proposal.

But I still think data-mining, powerful search techniques, vocabularies, taxonomy work, and lightweight indexing will help us join data together more effectively than building clunky connections between sources of data.

So what’s out there, right now, for free, in Brighton and Hove?

Plenty: copyright-free culture news, listings, venue info, features, reviews, blogs and more. Lots of RSS 2.0 content and an API output of culture data can be got from sources like Culture24.

Sounds fantastic! But is it really? Some big issues are now appearing over the horizon. What do people want from data like this? Who wants it anyway? Developers? Publishers? Hyperlocal news sites? Individual web-users? Looming large in any open-data conversation needs to be the issue of trust.

As punters with smartphones at the ready, we expect info about places, times, trains, buses, gig tickets and so on to be accurate. It’s got to be trustworthy.

My learning from Culture24 was that info about culture vitally needs to be correct. Families travel to make museum visits. The info they use has to be accurate. Culture24 have 10 years of data, content and really great relationships with thousands of museums, galleries and heritage sites all over the UK.

If arts organisations want to join the information-publishing space and partner with media organisations, for example, they’ll be expected to offer guarantees of quality and availability in a service-level agreement. Arts companies like Culture24 understand how to motivate museums and galleries to enter and check their own listings, venue and exhibition data.

That last point is key. The experts about info about an event are the people running it. They know when it’s happening. If these people are running lots of events they can control the quality of the data they are exporting. Users of the info can trust it to be right. That’s where brand values begin, with data services. Is it right? Yes, we checked it.

Does this cost money? It does, but in culture venues this could come under the overall marketing costs. It’s not a reason to give the job of getting listings right to other people.

Just think: if you are the one really famous museum about Elizabethan culture in Britain – like the Mary Rose Museum – you’re in the best place to be the one true source of info about that subject.

You have what I call data equity. You’re the one place where the real thing is kept. You own the island. If you get the basics of your data right – titles, content, quality, standards – you’re the go-to guys for that data. At that point, since you control IP, quality, consistency, you can make partnerships with other media, or offer it free, or do what you want.

It becomes another part of your cultural offer. It’s a valuable commodity. It’s a data brand – something we need to consider how to market, and how to signify in the future as having value.

This is all quite new really. Many funders and most arts organisations don’t have a data-sharing policy or strategy. It’s evolving as we go along.

If arts organisations want to be out there, we need to be able to offer reliable, trustworthy, data that matches what others are putting into the open-data mix.


About Greg Hadfield

Greg Hadfield is editorial director of Brighton & Hove Independent, a free weekly newspaper. He is a former Fleet Street journalist and internet entrepreneur (including Soccernet and Schoolsnet).
This entry was posted in Talks. Bookmark the permalink.

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )


Connecting to %s