Documenti di Didattica
Documenti di Professioni
Documenti di Cultura
The service is not just a convenient way to preserve todays web resources for
PageFreezer is the name of a technology
future generations, but an easy-to-use additional tool for judicial protection,
start-up and also a web service which archives
regulatory compliance or marketing needs. Being an enterprise class solution,
websites in a convenient and
easy-to-use way, according to flexible
PageFreezer is able to process the crawling of even the most complex pages and
schedules defined by the user. Any website, is easy to implement for both individuals and companies of all stripes.
blog, or even Facebook and Twitter profiles,
can be preserved for future generations in an Apart from easy page archiving, youre also able to run them live any time
interactive way, going much further than needed, just like they were never down.
common screenshots.
Website Playback
One of the main goals and most impressive usage scenarios was that users had
to be able to browse copies of websites as if they were live now.
CASE STUDY | PageFreezer
Social Media
Crawling social media profiles was a much harder challenge, as different rules
apply to them compared to conventional websites. PageFreezers link
extraction was initially created with the help of regular expressions and
content parsers, but most Twitter, Facebook and other social networks are
dynamically built with JavaScript. As they were all different, it was very
exhausting to build the framework and extend it to additional social networks.
The whole solution was unreliable at this stage, and all future modifications to
these social networks would have had to be implemented in the system, too. In
the end, it was decided to develop a social network adapter based on
third-party social network client libraries in Java. Spring Social was identified
as meeting our requirements.
Data Storage
One of the most difficult tasks in this project was to select the best storage
option, which had to be very scalable. The project started with approximately
500 sites, but had to be prepared for much more. We toyed with the idea of
using S3 or Google for some time, but those proved to be too slow to access
and too expensive. So Redwerk had to come up with a more flexible,
custom-tailored idea, and after some benchmarking we built a simple yet
scalable custom storage cloud from scratch, based on a database and NFS file
system.
Data Integrity
makes crawlers stop and wait in case the database or the file system are
Awarded
unavailable. When these components come back, no information gathered by
the crawlers is lost, and the use of checksums helps maintain the integrity of all
stored data.
Digital Signatures
Red Herring Top 100 Global Finalist
Once the system is enabled, all snapshots available to the user will be signed
through TSA, and the signature can be verified on the browsing page at any
time.
Security
Results
All in all, PageFreezer is one of the projects we are renowned for. Over the last
couple of years, Redwerk team carried out a successful prototyping and
building of a product along with a couple of re-designs to keep it up to date.
We strived for perfection and kept adding new functionalities to satisfy users
needs. Our team was responsible for the full system maintenance, up to
administrative tasks like database and the archived content upgrades and
backups. As for now, PageFreezer is one of the top online content archiving
solutions, and we are proud to say Redwerks technology and know-how much
contributed to its success!