DMS 300: Introduction to Digital and Media Studies

Key Terms

Unit 1: Defining Data in the Humanites

Week 1

artificial intelligence
use of computing technology to perform advanced tasks. This term has been applied to various types of computing capabilities over time--at one time a hand calculator was considered "artificial intelligence." When a computing capability becomes familiar, it generally no longer is considered AI. Today, AI is generally applied to machine learning capabilities.
machine learning
a algorithm designed to gather specific data and identify patterns in the data to solve a specific task. Once the algorithm is activated, it can recursively improve--essentially write more code--meaning that machine learning algorithms can be particularly opaque and difficult to fix if they do not perform as anticipated.
crowd sourcing
pooling information, input, ideas from a broad range of indiviudals to perform a task or solve a problem.
gamification
turning a task into a game so that users are incentivized to participate by the competitive structure; often participation in the game itself is the reward for participation, but sometimes the game can have a financial incentive for the winner.
visual data
information based on image, video, or other type of visualization. Specific visual characteristics based on defined criteria can be identified and classified. In order for visual data to be used in digital format, humans may be needed to help identify and translate visual information into a format that is useful in a database or algorithm.

Week 2

big data:
extremely large data set that can be analyzed by a computer for patterns. Patterns are then analyzed for meaning.
lexical words:
words in a language that carry conceptual (concrete or abstract) meaning. In English, nouns, verbs, adjectives, and adverbs are lexical.
grammatical words:
words in a language that connect and provide context for lexical terms; they are functional words needed to make the language work. In English, prepositions, pronouns, articles, and verb particles are examples of grammatical words.
corpus:
a large collection of words belonging to a specific set. It is important when performing corpus analysis to identify the characteristics of the set.
lemma:
"set of lexical forms having the same stem [. . .] differeing only in inflection and spelling" (Francis and Kucera 1). For example, walk, walked, walking, walks. Conventionally denoted in small caps.
visual data [from last week]
information based on image, video, or other type of visualization. Specific visual characteristics based on defined criteria can be identified and classified. In order for visual data to be used in digital format, humans may be needed to help identify and translate visual information into a format that is useful in a database or algorithm.
humanities computing:
applications of computing to research and teaching within humanities fields (adapted from CDH Ch 1)
authorship study:
using computer statistical analysis of digitized works to determine an author's "fingerprint" and determine whether unidentified texts can be ascribed to a specific author. For example Shakespeare and Double Falsehood.
batch processing:
computer processing mode (more common in the early days of computing) where an entire process needed to be carried out before any results were reported. Any faults or errors in the batch would result in the entire batch needing to be re-run from the beginning.
serial data access:
storing data in a format (such as magnetic tape) where it can be access only in a linear fashion, where the tape must be read from the beginning to the point where desired data is stored each time the data is accessed.
random data access:
storing data in a format (such as a computer disk) where any unit of data can be accessed in any order, based on a data identification system that can be read by the computer.
Unicode:
one of the first systems (developed 1988-91) for numerically encoding any typeset character in a standardized system. Systems like Unicode allow the precise representation of typeset characters, including diacritical marks and non-Roman characters.
relational database:
collection of items organized as a set of tables that have described relationships. Data can be accessed or reassembled in many different ways based on this relationships without having to re-describe the data within the tables themselves.
archive:
in the context of a digital collection, an archive is a collection of materials where the user chooses the navigation route (adapted from CDH Ch. 1)
edition:
in the context of a digital collection, an edition is a collection of materials that includes additional scholarly context and interpretation of an editor or editors, and where a navigation route is recommended or enforced structurally.
semantic markup:
markup language that can account for the function of marked text rather than just the form. SGML and its most famous application, TEI are semantic markukp languages. The most recent version of HTML, HTML5, has moved towards semantic tagging, for example, replacing <i> (italic) with multiple tags that differentiate function--<em>: emphasis and <cite>: citation.

Unit 2--Teaching and Learning Technology

Week 4

learning objective:
statements that define the expected goals of a learning activity. Learning objectives are created by the teacher/trainer, but should be written from the student/learner's perspective (e.g. "by the end of this activity, students should be able to...")
Bloom's Taxonomy of Learning:
hierarchy of different objectives and skills that learners can master.
evaluation:
process for evaluating individual student/learner mastery of a skill/objective with the goal of providing feedback to the learner. Good evaluation also includes comments to help the learner understand deficits and improve.
assessment:
process for evaluating student/learner mastery of a skill with the goal of improving the learning activity.

Week 5

Platform:
An interface with a specific set of tools that can be used to host/display content. Platforms have administrative creation and editing capability for registered users.
Project:
Exhibition of primary literary, historical, artistic, or other material. Projects may focus on providing access (such as scanned versions of out of print text) or facilitate new types of analysis (such as mapping locations of a specific historical event or creating a searchable version of a text).
Tool:
An interface or app that performs a specific function. Tools usually require specific input (image file, comma separated value text, etc.) and produce a result.
Aggregator:
A clearinghouse site that gathers, classifies, and makes available a collection of resources. Aggregators combine some of the properties of platforms, projects, and tools.

Unit 3: Introduction to Classification and Databases

Week 6/7

Definitions -- Classification

collection:
a group of objects/items. It may be classified or unclassified
classification:
assignment of something to a class; generally, the grouping together of objects into classes.
class:
a set of objects that share some property. For example, a literary genre is a class of texts.
member:
one object in a class.
property:
specific trait of members in a class used to classify members. Properties used to sort members should be relevant to the classification scheme.
one-dimensional classification scheme:
classification system based on a single differentiating property. For example, dividing all humans into "male" and "female" categories.
nominal classification scheme:
a type of one-dimensional classification scheme where the members of the class are not ordered in relation to each other.
ordinal classification scheme:
a type of one-dimensional classification scheme where the members of the class are sequenced or ordered in relation to each other (for example, students classified by freshman, sophomore, junior, senior)
n-dimensional classification scheme:
a classification system that has multiple axes or properties for classifying members of the scheme. Each property may be ordinal or nominal. For example, a student record lists all semesters a student has attended, all classes taken in each semester, and grades earned for each class.
discrete data:
data is provided in units such as by year or per person.
continuous data:
data is a function of continuous change, such as change over time.

Week 8

metadata database:
data that describes materials or files in a repository. A library catalog is an example of metadata: you can see a description of a specific primary material, but not the object itself. The Dublin Core Metadata Initiative is a commonly used set of metadate properties.
primary database:
stand-alone document that does not link to or manage other files. A corpus and concordance are types of primary databases.
flat database:
a classification scheme that is defined by independent properties/axes that can be described for each member of the set. A flat database can be contained in a single table.
relational database:
a classification scheme that is defined by a series of datasets and described relationships among them. They are typically linked by a primary key, a unique index that may be expressive or inexpressive. All related data tables and the relational rules are necessary to describe the database.
independent relationship:
database relationship where the data in multiple tables can be correlated, but exist independently of each other.
dependent relationship:
database relationship where integrity is reinforced between records in two tables. For example, if a master record is deleted, all related records in the other table are also deleted.
field:
one cell of information in a record. A field has a specific content type.
content type:
the type of information that will be recorded for a specific property in a dataset. Each field of a database will have a content type. The possible content types will vary depending on the database system or programming/query language used to encode the data. However, some typical types are boolean, numeric (e.g. double, int), string/character, formatted (e.g. date, time).
boolean:
a content type that stores binary data (true/false; yes/no; 0/1). The value of the binary will be determined by which database system or programming or query language used to encode the data.
numeric:
a content type that stores data as numeric values. Often there are specific subcategories such as short/long integer, double (decimal values). Mathematical operations can be performed with numeric values.
string/character:
a content type that stores data as a string of characters: for example, words/text or a collection of characters of any type. Some databases limit the length of string fields (typically the limit is 255 characters) or differentiate between a limited "string" field and an unlimited "text" field for longer strings of data. Mathematical operations cannot be peformed using string fields, but there are many string handling functions that can be used to analyze data of this type.
formatted:
a content type that requires a specific data format for data entry and display. The format type, such as a specific date or time format, is selected when the database is designed, and then typically a format mask will be provided to the user at the time of input. Data that does not match the required format is rejected.
format mask:
a data entry aid that shows users the required format for data entered in a formatted field. For example: ##/##/#### or ##:##xm. A format mask aids in data validation.
data validation:
A process for checking the data entered into a field before accepting the value in the database. Strong data validation rules ensure data consistency, but they can also lead to the loss of outlier data.
controlled vocabulary:
using an agreed-upon set of terms to classify within the system to avoid duplication or orphan data. For example: LoC subject headings.
index or identifier or primary key:
an indexing field that is unique for every member in the set. It may be expressive or inexpressive. Your Social Security Number is your primary key index value as a citizen/authorized resident of the United States. Your "900 number" is your primary key index value as a Lourdes University Student. For example, a Dewey decimal number on a book in a library collection, or the title of a book and a collection of novels.
expressive notation:
the use of index terms that express meaning about classification system being used. For example the Dewey decimal system. Sometimes expressive systems are incompletely expressive-- that is, some properties are signaled by the index, but some are not.
inexpressive notation:
the use of index terms that uniquely identify members of the class but do not signify any information about the classification system. For example, the book title in a collection of novels.
scope notes:
description of the scope of a class within a classification system. Sometimes the name of the class is sufficient, but these notes can be used when not.