0 valutazioniIl 0% ha trovato utile questo documento (0 voti)
17 visualizzazioni4 pagine
This study seeks to test the hypothesis that the software industry's use of EULAs is primarily to protect themselves from liability arising from the potential low quality of their software. It used natural language processing and manual, qualitative analysis to identify the degree to which EULAs of popular software products are short clauses. The study is an attempt to use the data found in EULAs to draw conclusions as to whether the software industry is held to a high enough quality standard.
This study seeks to test the hypothesis that the software industry's use of EULAs is primarily to protect themselves from liability arising from the potential low quality of their software. It used natural language processing and manual, qualitative analysis to identify the degree to which EULAs of popular software products are short clauses. The study is an attempt to use the data found in EULAs to draw conclusions as to whether the software industry is held to a high enough quality standard.
This study seeks to test the hypothesis that the software industry's use of EULAs is primarily to protect themselves from liability arising from the potential low quality of their software. It used natural language processing and manual, qualitative analysis to identify the degree to which EULAs of popular software products are short clauses. The study is an attempt to use the data found in EULAs to draw conclusions as to whether the software industry is held to a high enough quality standard.
Adam Coimbra University of Pittsburgh 3445-3 Ward Street Pittsburgh, PA 15213 amc177@pitt.edu ABSTRACT
This study seeks to test the hypothesis that the software
industry's use of EULAs is primarily to protect themselves from liability arising from the potential low quality of their software. It used natural language processing and manual, qualitative analysis to identify the degree to which EULAs of popular software products are short clauses that attempt to release the publisher from any and all legal repercussions for the side effects of the low quality of the software. In doing so, I draw the conclusion that it is the case that the industry uses EULAs primarily as blank checks for poor software quality.
INTRODUCTION
The problem with EULAs is that they tend to attempt to be
"blank checks" for software publishers, releasing them from nearly all liability from side-effects of low-quality software as well as from any sort of commitment to a pre-defined level of quality. This study is an attempt to use the data found in EULAs to draw conclusions as to whether the software industry is held to a high enough quality standard. Herein I test the hypothesis that the software industry's use of EULAs is primarily to protect themselves from liability arising from the potential low quality of their software.
The following entities are referenced throughout this
discussion:
Below, I use natural language processing techniques to
analyze plain text EULAs, generating a quantified indicator of the degree to which a given EULA's content is oriented toward limiting the publisher's liability for any consequences that may befall a user due to the low quality of the software (as opposed to the degree to which a given EULA's content is oriented toward limiting the user's rights with regards to how he or she may use the software). This provides the basis for a qualitative analysis of the implications of values for this indicator extracted from a small sample of EULAs, in which I will draw conclusions as to whether major, popular offerings are guilty of using the EULA primarily as a "blank check" to produce low quality software.
BACKGROUND
The following definitions inform this discussion:
EULA -- End-user license agreements (hereafter
EULAs) are legal documents generally comprised of two main elements: first, an identification of what the user can expect from the software, and
second, an identification of limitations placed on
what the user can do with a copy of the software. Software quality Refers to the degree to which a piece of software complies with its ability to meet the functional requirements of its users. Liability blank check a short clause in a EULA that attempts to release the publisher from any and all legal repercussions for the side-effects of the low quality of their software. Latent Semantic Analysis a technique in natural language process for computing the similarity between two documents (really just a string of text). As applied here, the technique involves building a model comprised of a set of bags of words and then running queries to compute the similarity of an incoming document to the each bag in the model. OCR Optical Character Recognition is the mechanical or electronic conversion of scanned or photoed imagesinto machineencoded/computer-readable text.
Microsoft Office a software product categorized
as an Office suite with word processing, spreadsheet, and presentation capabilities, among others. Adobe Photoshop -- a software product categorized as a graphics editor. Apple iTunes a software product with media playing, media management, and mobile device management capabilities. Google Chrome a software product categorized as a web browser. McAfee Antivirus a software product that attempts to protect the users computer from viruses. Softpedia a website that hosts a library of software programs. Python a programming language. Gensim a Python library that implements Latent Semantic Algorithm.
The following assumptions inform this discussion:
Assume all EULAs are comprised exhaustively of
1) provisions to limit publisher liability and 2) provisions to limit user rights. Assume many or most limitations on publisher liability are protections against side-effects arising due to the low quality of their software Assume that the size of the portion of a EULA that limits publisher liability will have an inverse relationship to the degree to which that limit is a blank check for liability that is, the longer the liability limitation section, the more specific the publisher is with regards to quality assurance, and thus the less of a blank check the EULA is.
APPROACH Study Design and Motivation
The initial inspiration for this study was the discussion in
class regarding problems with modern EULAs. The contrast between the EULAs and the analogous agreements related to other types of consumer products led me to want to determine just how abusive EULAs tend to be for popular software. In attempting to design this study, my first step was to find EULAs for popular software and manually examine their contents. I found the EULA for Adobe Photoshop by searching photoshop eula on Google. Quickly, I saw that there was very little space (if any) dedicated quality assurances the EULA mostly contained limits on users' rights and very specific legal notices. There was a single paragraph related to quality, but it did not make any assurances as to the quality of the product rather, it absolved Adobe of any liability from damages arising from customers use of the product. Arguably, this is the opposite of quality assurance it is rather an attempt to free the publisher from consequences arising from the low quality of the product. I used the same method to find Microsoft Windowss and Mozilla Firefoxs EULAs (I selected these based on my own use of these applications), and found that they all had similarly worded "liability blank checks. The similar wording of these provisions led me to realize that this study could leverage Latent Semantic Analysis to compute the preponderance of these types of liability statements in EULAs. With this initial, basic research completed, my next step was to assess whether meaningful conclusions about quality could be drawn by comparing and contrasting the resultant data. My thought process was as follows: since the goal is to draw conclusions about the quality standard to which the industry holds itself and a EULA seems like it should contain a reflection of this for a particular software publisher. If the EULA is shown to be a blank check or is full of caveats on, that is an indicator that the company does not consider it feasible to bind itself legally to a high quality standard (as compared to, for example, consumer appliances which do not try to get released from consumer protection law).
Data Acquisition
In determining which software to include in the sample, I
first considered that the degree to which the sample was meaningful would be positively related to the overall popularity or market share of the software. I used the Windows section of Softpedia to find a list of software categories, and decided to select the most popular piece of software in the following categories: Office suites, Graphics Editors, Entertainment software, Web browsers and Antivirus. I decided to use the most popular commercial offering in each category, leading me to choose the products listed above (Microsoft Office, Adobe Photoshop, Apple iTunes, Google Chrome, and McAfee Antivirus). Having selected these applications, the next step was to locate EULAs for each. Once again, I accomplished this by searching [PRODUCT NAME] eula on Google. The most up-to-date EULA was in the first 5 results for each search. In two cases, the EULAs were in PDF format (detailed in the Data section). Since I needed machine-readable text, I used the free NewOCR tool to convert these PDFs to plaintext format. Data Processing
In order to best use Latent Semantic Analysis to compute
semantic document similarity, it is best to remove noise words and punctuation. I wrote a piece of software in Python to normalize the EULAs in this way. In one case, a EULA included translations of its content in various languages, which I removed in order to maintain the accuracy of the experiment. A critical step in the set-up of the experiment was to construct the bags of words for the Latent Semantic Analysis model. To do so, I copied the blank check sections from the Photoshop and Firefox EULAs I had previous found into a text editor, and manually removed all words except for those that were directly related to limiting liability for consequences of low software quality. For example, from the following excerpt I retained the terms warrant, liability, and responsibility: The Application Provider does not warrant or endorse and does not assume and will not have any liability or responsibility to You. Since my definition of a EULA assumes they are comprised exhaustively of limitations on users rights and limitations on publishers liability, I further computed a bag of words for detecting provisions that limit users rights, using the same process as before. To compute similarity between EULAs and these bags I utilized a Python library called Gensim that implements Latent Semantic Analysis. Analysis
Knowing that the output of my software would be tuples of
the following form: (EULA file name, similarity to the liability limitation bag, similarity to the users rights limitation bag), I arrived at the following process for utilizing these intermediate results to manually compute
meaningful final output: I manually examined the outliers
(the EULAs with the lowest and highest liability limitation similarity to arrive at qualitative interpretations of the lowest and highest values. Once I had extrapolated the qualitative meaning of each EULAs results, I interpreted these results with regards to whether each was used primarily to protect a software publisher from liability arising from the potential low quality of their software. DATA Bags of words
The following is the bag of quality / liability-limitation
words against which EULAs were tested for similarity: remedy remedies limited liable customer claims costs whatsoever indirect incidental damages lost profits savings damages business interruption personal injury failure duty loss damages claims costs aggregate liability certificate authorities death personal injury negligence disclaiming excluding limiting obligations warranties limitations customers jurisdiction consumer protection laws law liable damages claims costs whatsoever consequential indirect incidental lost profits savings damages claims costs claim foregoing limitations exclusions sublicensees jurisdiction aggregate liability death personal injury limiting obligations warranty satisfactory quality warranties claim damages obligation develop feature features sole risk reputation damage environmental interruption commercial components. The following is the bag of user-rights-limitation words against which EULAs were tested for similarity: rights intellectual property right privilege priveleges license restrictions proprietary content materials information permitted permit service reproduce reproduced unathorized unauthorize prohibit data use derivative disassemble copy decompile reverse engineer derive modify source code derivative works update right permit prohibited use content change compliance comply limited right access use remove obscure alter copyright trademark non-exclusive non-assignable manner. EULAs
Microsoft Office this EULA was found as a web
page on Microsofts website. It contained approximately 11878 words across approximately 60 pages. Adobe Photoshop this EULA was found as a PDF file on Adobes website. It contained translations in various languages, with a total length of approximately 700 pages. Apple iTunes this EULA was found as a web page on Apples website. It contained approximately 2000 words and 4 pages. Google Chrome this EULA was found as a web page on Googles website. It contained approximately 6500 words and 12 pages.
McAfee Antivirus this EULA was found as a
PDF on McAfees website. It contained 5 pages.
It is instructive to see a comparison of a sample of a
normalized EULA with the non-normalized version: Normalized sample of Google Chrome EULA: apply executable code version google_chrome source code google_chrome available free charge open source software license agreements httpcodegooglecomchromiumtermshtml 1 relationship google 11 use googles products software services web sites referred collectively services document excluding services provided google separate written agreement subject terms legal agreement google google means google_inc whose principal place business 1600 amphitheatre parkway Original equivalent text: These Terms of Service apply to the executable code version of Google Chrome. Source code for Google Chrome is available free of charge under open source software license agreements at http://code.google.com/chromium/terms.html. 1. Your relationship with Google 1.1 Your use of Googles products, software, services and web sites (referred to collectively as the Services in this document and excluding any services provided to you by Google under a separate written agreement) is subject to the terms of a legal agreement between you and Google. Google means Google Inc., whose principal place of business is at 1600 Amphitheatre Parkway, Mountain View, CA 94043, United States. This document explains how the agreement is made up, and sets out some of the terms of that agreement. RESULTS Output of the Document Similarity Computation
Note that the number on the left indicates the similarity
computed (from 0 to 1) between the EULA and the bag of words related to quality, and the nmber of the right indicates the similitarity computed (from 0 to 1) between the EULA and the bag of words related to limiting users rights.
Qualitative analysis
The outliers with regard to similarity to the bag of words
related to quality were Microsoft Office (0.311) and McAfee Antivirus (0.66). Out of a relatively large agreement 60 pages only two paragraphs discuss quality. The paragraph that discusses quality clearly
qualifies as a blank check for Microsoft to sell users low
quality software that may cause damage to the user, yet severely limits Microsofts liability in that case. While McAfee also severely limits obligated repairs and liability under their warranty, they were more specific, dedicating a large portion of their shorter EULA to specific situations, many of which related to failure of the software to perform as expected. CONCLUSIONS
In the above results, the lowest computed indicator
(Microsoft Office) was, indeed, a blank check and the highest computed indicator was also a blank check (though with greater specificity). This indicates that each EULA in between was likely also a blank check. In short, my conclusion is that these results support my hypothesis. The fact that five of the most popular software products in key categories all use EULAs that can be shown to attempt to minimize publisher liability for the poor performance of their own software, in maximally broad circumstances, and with the use of blanket and generally non-specific statements indicates that the industry does indeed use EULAs primarily as blank checks for software quality. This is an important conclusion because it indicates that a new standard for quality assurance is needed in the software industry.
REFERENCES
1.
2. 3. 4.
5.
6.
7.
8.
Optical Character Recognition Web. 20 Feb.
2014. <http://en.wikipedia.org/wiki/Optical_character_re cognition>. Softpedia Web. 20 Feb. 2014. <http://softpedia.com>. New OCR Web. 20 Feb. 2014. <http://newocr.com>. Microsoft Software License Agreement Web. 20 Feb. 2014. <http://office.microsoft.com/enus/products/microsoft-software-license-agreementFX103576343.aspx>. Adobe Photoshop CS6 Web. 20 Feb. 2014. < http://wwwimages.adobe.com/www.adobe.com/co ntent/dam/Adobe/en/legal/licensesterms/pdf/CS6.pdf>. Apple - Legal - Licensed Application End User License Agreement Web. 20 Feb. 2014. < https://www.apple.com/legal/internetservices/itunes/appstore/dev/stdeula/>. Google Chrome Web. 20 Feb. 2014. < https://www.google.com/intl/en_US/chrome/brows er/privacy/eula_text.html>. McAfee End User License Agreement Web. 20 Feb. 2014. < http://www.mcafee.com/us/resources/legal/enduser-license-agreements-en-us.pdf>