e-Discovery Core Glosssary: T - Z


T………………………………………………………………………………………………….....

Taxonomy The science of categorization, or classification, of things based on a predetermined system. In reference to Web sites and portals, a site´s taxonomy is the way it organizes its ESI into categories and subcategories, sometimes displayed in a site map. Used in information retrieval to find documents that is related to a query by identifying other documents in the same category.

Temporary ("Temp") File Files stored on a computer for temporary use only, often created by Internet browsers. These temp files store information about Web sites that a user has visited, and allow for more rapid display of the Web page when the user revisits the site. Forensic techniques can be used to track the history of a computer´s Internet usage through the examination of these files. Temp files are also created by common office applications, such as word process or spreadsheet applications.

Text Mining Roughly equivalent to text analytics, text mining refers to the process of deriving high-quality information from text. High-quality information is typically derived through the divining of patterns and trends through means such as statistical pattern learning. Text mining usually involves the process of structuring the input text (usually parsing, along with the addition of some derived linguistic features and the removal of others, and subsequent insertion into a database), deriving patterns within the structured data, and finally evaluation and interpretation of the output. 'High quality'.

Thread A series of communications, usually on a particular topic. Threads can be a series of bulletin board messages (for example, when someone posts a question and others reply with answers or additional queries on the same topic). A thread can also apply to emails or chats, where multiple conversation threads may exist simultaneously.

Thumbnail A miniature representation of a page or item for quick overviews to provide a general idea of the structure, content and appearance of a document. A thumbnail program may be a standalone or part of a desktop publishing or graphics program. Thumbnails provide a convenient way to browse through multiple images before retrieving the one needed. Programs often allow clicking on the thumbnail to retrieve it.

TIFF (Tagged Image File Format) Originated in the early 1980s, TIFF is a widely used and supported graphic file formats for storing bitmapped images, with many different compression formats and resolutions. File name has .TIF extension. Images are stored in tagged fields, and programs use the tags to accept or ignore fields, depending on the application.

Trojan A program that does something undocumented which the programmer intended, but that the user would not approve of if known to the user.

U………………………………………………………………………………………………….....

 Unallocated Space The area of computer media, such as a hard drive, that does not contain normally accessible data. Unallocated space is usually the result of a file being deleted. When a file is deleted, it is not actually erased, but is simply no longer accessible through normal means. The space that it occupied becomes unallocated space, i.e., space on the drive that can be reused to store new information. Until portions of the unallocated space are used for new data storage, in most instances, the old data remains and can be retrieved using forensic techniques.

Unitization – Physical and Logical The assembly of individually scanned pages into documents. Physical Unitization utilizes actual objects such as staples, paper clips and folders to determine pages that belong together as documents for archival and retrieval purposes. Logical unitization is the process of human review of each individual page in an image collection using logical cues to determine pages that belong together as documents. Such cues can be consecutive page numbering, report titles, similar headers and footers and other logical indicators. This process should also capture document relationships, such as parent and child attachments.

Underinclusive When referring to data sets returned by some method of query, search, filter or cull, results that are returned incomplete or too narrowly.

Upload To send a file from one computer to another via modem, network, or serial cable. With a modem based communications link, the process generally involves the requesting computer instructing the remote computer to prepare to receive the file on its disc and wait for the transmission to begin.

URL (Uniform Resource Locators) The addressing system used in the World Wide Web and other Internet resources. The URL contains information about the method of access, the server to be accessed and the path of any file to be accessed. Although there are many different formats, a URL might look like this: http://thesedonaconference.org/publications_html.

V………………………………………………………………………………………………….....

VendorAdded Metadata Data created and maintained by the electronic discovery vendor as a result of processing the document. While some vendor added metadata has direct value to customers, much of it is used for process reporting, chain of custody and data accountability. Contrast with User Added Metadata.

Vertical DeDuplication A process through which duplicate documents/data are eliminated within a single custodial or production data set.

Virus A self replicating program that spreads by inserting copies of itself into other executable code or documents. A program into which a virus has inserted itself is said to be infected, and the infected file (or executable code that is not part of a file) is a host. Viruses are a kind of malware (malicious software). Viruses can be intentionally destructive, for example by destroying ESI, but many viruses are merely annoying. Some viruses have a delayed payload, sometimes referred to a bomb. The primary downside of viruses is uncontrolled self reproduction, which desecrates or engulfs computer resources.

Vital Record A record that is essential to the organizations operation or to the reestablishment of the organization after a disaster

W………………………………………………………………………………………………….....

WWW (World Wide Web) All of the computers on the Internet which use HTMLcapable software (Netscape, Explorer, etc.) to exchange data. Data exchange on the WWW is characterized by easytouse graphical interfaces, hypertext links, images, and sound. Today the WWW has become synonymous with the Internet, although technically it is really just one component.

Z………………………………………………………………………………………………….....

Zip Drive A floppy disc drive that can usually hold as much as 750 megabytes or more. When first available, was often used for backing up hard discs.

ZIP A common file compression format that allows quick and easy storage for transport.

Zone OCR An add on feature of the imaging software that populates document templates by reading certain regions or zones of a document, and then placing the text into a document index.

e-Discovery Core Glossary: Terminology

Below is a list of terminologies that are necessary for legal professionals involved with e-Discovery and Discovery litigation support.

« Back to News