The internet has had a relatively brief existence so far, yet it has already had a huge cultural impact on us. However the internet is often short lived so preserving it for historical reasons is very important as it is such a huge part of our modern life.
In this session we’ll learn about current web archiving technologies (crawlers) and strategies. In particular, we’ll talk about where these “crawlers” succeed and where they face difficulties due to the ever changing technologies on the web. As part of this we’ll have hands on demos of the open source tools that you can use to preserve the web.
Finally we’ll discuss the techniques that we used to archive our interactive object-based media experiences in BBC R&D and how they can be applied to other difficult to archive content on the web
The tools we will look at will all be open source and available via a website so users can run them on their own machine. We’ll also be providing help documentation to go along with the workshop. As a result varying numbers of users should not pose an issue. To aid discussions, participants will also be broken out into breakout rooms.
We're hoping that many efforts and discussions will continue after Mozfest. Share any ideas you already have for how to continue the work from your session.:Part of the session will be to talk about websites participants have viewed/own and see how easy they are to archive. We hope that by collecting together a list of difficult websites to archive this can help focus developer effort on web archiving tools such as Webrecorder to improve them and generally to improve knowledge all round of the importance of web archiving and the challenges.
Thomas Preece works as an R&D Engineer in the BBC’s Research and Development department mainly focusing on software development and website security.