Archive for: June, 2009

What will be the past in the future?

Jun 26 2009 Published by brian under archives, technology

The Guardian has a story speculating on how changes in technology will affect the form of authors’ literary archives.

Used to be that authors would usually have a huge mound of paper for archivists to pore over (unless they destroyed them first). But now, many (probably most) authors write on computers, and in many cases the book never exists on paper until it comes off the printing press. So intermediate drafts, notes, and correspondence survive only if the author actively preserves them.

This is, of course, the same problem faced by archivists and historians across the board when dealing with digital information. Paper (at least, acid-free paper) will survive for centuries with even minimal care, and can be used easily by anyone who can read the language in which it is written. Digital formats, on the other hand, are perishable and require not just physical preservation, but also maintenance of both hardware and software, as well as thorough documentation of file formats (which is often hard to come by as software companies alter their products and deprecate earlier versions).

Personally, I do make some effort to maintain and regularly update my data. I keep multiple backups of everything, and I migrate all of it to new formats as they come along (currently, external hard drives). So I’m in little danger of losing photos, videos, MP3s, or my schoolwork, at least during my lifetime. And my oldest work–such as college papers from the late 1980′s–has been converted a couple of times already to new formats.

On the other hand, much of my recent stuff–such as this blog–resides somewhere online where I have only indirect control over it. Ditto for my calendar and email, which live deep in the bowels of Google. It’s possible to back them up, but it’s difficult and time-consuming enough that I haven’t yet gone to the trouble.

There’s also the question of whether anyone will care to look at my stuff. I have no children, so I’m not sure who I’d even leave it to. Historians are always interested in ephemera and might find mine interesting, if it survives–but I’m a nobody, so who knows. With a couple of billion people all producing data, historians and archivists will have plenty to deal with, and it’s likely that nobody will bother to preserve any but the most important ramblings very far into the future.

Of course, there have always been similar selection pressures on historical information, and what’s interesting to me is that the likelihood that info will be preserved is usually in direct proportion to the wealth and power of the person or people producing it.

As an example: one of my research interests is Gold Rush-era San Francisco. The city’s major newspaper at the time, the Alta California, is available in complete print runs at several libraries, in both print and on microfilm. However, there was also an important African-American paper, the Mirror of the Times, which has exactly two surviving issues. Similar disparities exist for other materials. So anyone researching San Francisco’s black community is at a serious disadvantage, as there was so little interest in preserving its history until very recently.

Much the same is true of other historical periods. VIPs of all stripes have no trouble ensuring that their info survives long after their departure, while the poor and unknown are usually not remembered by anyone but their families–and not for long even then. Anybody wanting to assemble this kind of information a century later will have to wade through fragmentary information in a variety of places, and will be lucky to find anything useful at all.

I expect that, in the end, the same kinds of selection pressures will exist for information that is born-digital. The wealthy and powerful have always been prominent in history, and will always be. The poor and unknown, on the other hand, will largely disappear from the record, simply because nobody will have the interest to preserve their data–unless storage becomes cheap and reliable enough, and file formats standardized enough, that all data will be stored and maintained automatically and indefinitely.

SF writer Charles Stross seems to think this will happen, and in the relatively near future; he gave an important talk about the possibility a couple of years ago:

This century we’re going to learn a lesson about what it means to be unable to forget anything. And it’s going to go on, and on. Barring a catastrophic universal collapse of human civilization — which I should note was widely predicted from August 1945 onward, and hasn’t happened yet — we’re going to be laying down memories in diamond that will outlast our bones, and our civilizations, and our languages. Sixty kilograms will handily sum up the total history of the human species, up to the year 2000. From then on … we still don’t need much storage, in bulk or mass terms. There’s no reason not to massively replicate it and ensure that it survives into the deep future.

But even if this rosy prediction turns out to be correct–and that we’re able to develop search algorithms that can make this mass of data useful–will anyone care to look? Will your life, or mine, be interesting enough to some future historian to make it worth her while to dig through it?

No responses yet

Back to the future

Jun 16 2009 Published by brian under technology, web 2.0

(The following is adapted from a bulletin board post I wrote for a class on Web 2.0 nearly a year ago. I just ran across it, and I still think it’s pretty good.)

Meredith Farkas, in her book Social Software in Libraries, describes social software as having the following values (in her words):

  • Easy content creation and content sharing
  • Online collaboration
  • Conversations: distributed and in real time
  • Communities developed from the bottom up
  • Capitalizing on the wisdom of crowds
  • Transparency
  • Personalization
  • Portability
  • Overcoming barriers of distance and time

The “Web 2.0″ buzz implies that all of these values are new. But, in fact, most of them were characteristic of the early Internet, before the advent of the World Wide Web in the early 1990′s. The core Internet services of email, Internet Relay Chat, and especially Usenet newsgroups, allowed a wide range of interactivity and collaboration, between people spread around the world, much like the modern Web. A little history might help to illustrate what I mean.

The Internet began in the 1970′s as a government-sponsored academic network. Having been developed in an atmosphere of free and open collaboration (both in the culture of scientific research and in the sharing of software), the early online culture organized in such a way as to provide for easy collaboration and sharing of information.

Anyone who wished could contribute to a Usenet group, and their contributions would be quickly judged, and either promoted or shouted down. Email addresses were generally public knowledge (spam did not yet exist), and so anyone who was online could be easily reached; and IRC and other chat services allowed for quick real-time discussions. Therefore, communities were able to organize quickly around particular projects or subjects, and there was a huge amount of collaboration at all levels.

When the World Wide Web came along, on the other hand, it required considerably more technical expertise in order to create material for it. WYSIWYG web page editors did not exist, nor did blogging tools; if one wanted to have a presence on the web, he or she had to write the page directly, in HTML, from the ground up. Additionally, in the early days it was necessary to maintain server software, often a dedicated computer, and even one’s own persistent Internet connection.

With these barriers to entry for potential authors and collaborators, the Web, unlike most existing Internet services, evolved as much more of a broadcast-style medium, with a few technically-minded producers creating material for a large audience, and with very little collaboration between them. When the Web was commercialized starting in 1994, much of the expertise and resources needed to create advanced content became concentrated in the corporate sector.

What has changed with the new social tools is that those barriers to entry and collaboration have begun to come down. It is now far easier than in the early days of the Web to create material and to add to and comment on others’ work. This mirrors the Internet as it was in the 1980′s–except that the audience is far larger than before.

Therefore, in my opinion, Web 2.0 technologies really represent a restoration of the original values of Internet culture–community, collaboration, interconnectedness, two-way communication. Only now, the tools that enable those values are available to the mass culture rather than to a few technically-minded academics. Of course, the so-called “Web 1.0″ period was necessary to build this audience, but now they are learning the benefits of the kinds of community that existed in the early days.

No responses yet

Freedom of speech, freedom of coding

Jun 11 2009 Published by brian under technology

By way of the good Cory Doctorow comes this story about Kyle Brady, a computer science major at San Jose State University (where I just finished my MLIS), who got into a tiff with a professor over the right to share code he wrote in a class.

Kyle posted his code–after the date it was due in the class–because he felt that it might be useful to other programmers in the future, and also to employers who might be interested in seeing his work. The professor, on the other hand, felt it was a violation of SJSU’s academic integrity standards, and threatened to fail him and forbid all such postings by his students in the future. Long story short, Brady countered that nothing he did violated any part of the standards, and the matter escalated until the Judicial Affairs Officer ruled that he was correct and the professor could not penalize him for what he had done. So SJSU professors are now blocked from preventing their students from sharing code, at least after projects are due.

I find the story fascinating because it highlights the potential for conflict between academic integrity standards and the scientific tradition of openly sharing information. The professor does have a point that sharing this code could potentially help future students. However, Brady also points out that this is no different from sharing homework solutions after class, the only difference being that the information is shared online.

And, as far as I’m concerned, Kyle’s actions are in the best scientific tradition. Researchers thrive on openly sharing their research; disclosure and peer review are so fundamental to scientific work that we’re automatically suspicious of anyone who doesn’t make all their data and procedures public.

Also, in computer science in particular, programmers borrow code from each other all the time, because there’s usually a good chance that somebody else has tackled a similar problem to yours. And the entire open-source movement is based on freely sharing code and collaborating on improvements to it.

So, I’m pretty firmly on Kyle’s side here. Good for him that he persevered and won.

One response so far

What’s up with those crazy kids?

Jun 09 2009 Published by brian under web 2.0

I just reread this post on teenagers’ attitudes about the Internet (it’s by Web anthropologist danah boyd, whose stuff I highly recommend). In it, she answers questions she received via Twitter about what her research on teens is showing these days.

There’s a lot of good stuff to think about in this, touching on many of the current Web 2.0 issues like privacy and copyright. But I’m particularly struck by the following, which have to do with authority and information literacy:

@mauraweb: when they’re searching for info, how do they know what info to trust? esp. w/internet searches

Media literacy amongst teens is extremely varied, but the short answer is that most don’t know what to trust. They know that they are not supposed to trust Wikipedia because it’s editable (and they automatically recall Wikipedia when you ask about trustworthy information.. that’s so actively hammered down their throat, it’s painful). One girl told me that she trusts websites that “look” like they are reputable. When I asked her about this, she told me that she could “just tell” when something was a good source. And besides, it came from Google. Le sigh.

( . . . )

@lazygal: Do they really care about/use school library websites? Twitter? Pageflakes? Libguides? or only if teacher insists?

Nope, they don’t. All but Twitter are categorized as school tools and are only used when absolutely necessary and Google won’t suffice.

The implications for teaching information literacy ought to be pretty clear: either they don’t know there’s a problem, or they know there might be a problem but don’t know what to do about it, or they don’t care. I hate to say it, but whatever we’re doing, it doesn’t seem to be working.

Of course, it’s an old issue, and it goes far beyond the reliability of a Google search. The problem of media literacy has been around for decades, if not centuries. We’ve never been particularly good at teaching people to evaluate the information they receive.

What worries me is what it says about the future, and not just of libraries. Young people, as a grossly overgeneralized whole, have never been particularly enthusiastic about making the effort to track down reliable information. But now, they’re even less interested; after all, Google appears to give them everything they need (even though it doesn’t). And, of course, Google has a vested interest in keeping up that illusion–not to mention the rest of the corporate world, which much prefers an ignorant populace to an informed one.

We’ve already seen what happens when large numbers of people don’t particularly care about evaluating the information they receive (insert your least-favored media outlet here). But what happens when a whole generation grows up thinking that if it’s not in Google, it doesn’t exist–or, worse, it doesn’t matter?

No responses yet

The OED is watching you

Jun 04 2009 Published by brian under web 2.0

Just learned (by way of Roy Tennant) that the people behind the Oxford English Dictionary are monitoring Twitter. They’re apparently interested in the way that people use the language there:

OUP lexicographers have been monitoring more than 1.5 million random tweets Since January 2009 and have noticed any number of interesting facts about the impact of Twitter on language usage. For example the 500 words most frequently used words on Twitter are significantly different from the top 500 words in general English text. At the very top, there are many of the usual suspects: “the”, “to”, “as”, “and”, “in”… though “I” is right up at number 2, whereas for general text it is only at number 10. No doubt this reflects on the intrinsically solipsistic nature of Twitter.

I find that last bit amusing–”intrinsically solipsistic”. I’m not sure that they entirely understand Twitter; in my experience, it doesn’t isolate at all, but rather connects people on a very deep level. True, much of it is banal, but you also get a real sense of the person behind each tweet that you just don’t get through any other medium.

Anyway, I think it makes perfect sense for the OED to be monitoring social media. After all, throughout its history the OED has used quotes and examples from books, magazines, newspapers, and more recently Usenet and the Web, to illustrate how language is used. Web 2.0 allows much the same kind of direct view of real-world usage–but with the added convenience of indexing, search, and passive monitoring. And with most of the developed world communicating this way, of course they’re going to want to watch what people are doing. When the much-anticipated third edition is finally published in full, it should provide an amazing look at how the language has been altered by these communication tools.

That said, though, it still feels weird to see someone at the OED write “that is how we roll”.

No responses yet

And so it begins

Jun 02 2009 Published by brian under meta

Hi there. I’m Brian, and this is my new blog. Welcome!

So who am I? Well, I live in Santa Rosa, California, and I’ve just completed a master’s in library and information science. I’m interested primarily in the role that information technologies of all kinds play in culture, and particularly how our technology interacts with our perception and cognition.

I’m also a history buff, and so I’m also fascinated by the history of information technology. And, of course, I’m a book nerd and computer geek (kinda goes with the territory). I entered library school because it offered me a chance to combine all of these interests in a productive way–and now that I’ve graduated, I’m looking forward to starting my career.

In this blog, I intend to talk about all of these things. I’m expecting history, psychology, computers–and, of course, libraries. Plus the occasional bit about science fiction. I don’t know where exactly it will lead, but I’m sure it’ll be entertaining and enlightening. So, onward!

No responses yet