Larry Heck

Larry Paul Heck is the Rhesa Screven Farmer, Jr., Advanced Computing Concepts Chair, Georgia Research Alliance Eminent Scholar, Co-Executive Director of the Machine Learning Center and Professor at the Georgia Institute of Technology. His career spans many of the sub-disciplines of artificial intelligence, including conversational AI, speech recognition and speaker recognition, natural language processing, web search, online advertising and acoustics. He is best known for his role as a co-founder of the Microsoft Cortana Personal Assistant and his early work in deep learning for speech processing. == Education and career == Larry Heck was born in Havre, Montana. After receiving the Bachelor of Science in electrical engineering at Texas Tech University, he was admitted to graduate school at the Georgia Institute of Technology in 1986. Heck received the MSEE in 1989 and the PhD in 1991 under advisor Prof. James H. McClellan. From 1992 to 1998, he was a senior research engineer at SRI International with the Acoustics and Radar Technology Lab (ARTL) and Speech Technology and Research (STAR) Lab, and in 1998 joined Nuance Communications, serving as vice president of R&D. Funded by the US government's NSA and DARPA from 1995-1998, Heck led the SRI team that was the first to successfully create large-scale deep neural network (DNN) deep learning technology in the field of speech processing. The deep learning technology was used to win the 1998 National Institute of Standards and Technology Speaker Recognition evaluation. The approach trained a 5-layer deep neural network, with the first two layers used as a (learned) feature extractor. To stabilize the training of the DNN, a weight normalization method was used (later rediscovered in 2010 by Xavier, et.al). Heck deployed this DNN in 1999 with Nuance Communications at the Home Shopping Network, representing the first major industrial application of deep learning with over 100K Nuance Verifier voiceprints. From 2005 to 2008, he was vice president of search & advertising quality at Yahoo!. In 2008, Heck and Ron Brachman combined search & advertising quality with Yahoo! Research to form Yahoo! Labs. Beginning in 2009, he was the chief scientist of speech products at Microsoft. In this role, he established the vision, mission and long-range plan and hired the initial team to create Microsoft’s digital-personal-assistant Cortana. Heck was named a Microsoft Distinguished Engineer in 2012 and joined Microsoft Research that same year. In 2014, he joined Google as a principal research scientist, where he founded the deep learning-based conversational AI team "Deep Dialogue". The team works on advanced research for the Google Assistant. In 2017, Heck joined Samsung as SVP and co-head of global AI Research. In 2019, he became head of Bixby (virtual assistant) North America and the CEO of Viv Labs, an independent subsidiary of Samsung. In that same year, Heck led one of the first large scale deployments of Transformer-Based LLMs as part of the Bixby Categories launch at the 2019 Samsung Developer Conference. In 2021, Heck returned to the Georgia Institute of Technology as a Professor. == Awards and honors == Larry Heck was named Fellow of the Institute of Electrical and Electronics Engineers (IEEE) in 2016 for leadership in application of machine learning to spoken and text language processing. Heck was inducted as a Fellow of the National Academy of Inventors (NAI) in 2024. Heck received the 2017 Academy of Distinguished Engineering Alumni Award from the Georgia Institute of Technology. In the same year, he also received the Texas Tech University Whitacre College of Engineering Distinguished Engineer Award. Larry Heck has several best papers including the 2020 IEEE Signal Processing Society (SPS) Best Paper Award: “Using Recurrent Neural Networks for Slot Filling in Spoken Language Understanding” published in the IEEE/ACM Transactions on Audio, Speech, and Language Processing in March 2015, and the 2020 ACM Conference on Information and Knowledge Management (CIKM) Test of Time Award for the paper "Learning Deep Structured Semantic Models for Web Search using Clickthrough Data".

Outline of robotics

The following outline is provided as an overview of and topical guide to robotics: Robotics is a branch of mechanical engineering, electrical engineering and computer science that deals with the design, construction, operation, and application of robots, as well as computer systems for their control, sensory feedback, and information processing. These technologies deal with automated machines that can take the place of humans in dangerous environments or manufacturing processes, or resemble humans in appearance, behaviour, and or cognition. Many of today's robots are inspired by nature contributing to the field of bio-inspired robotics. The word "robot" was introduced to the public by Czech writer Karel Čapek in his play R.U.R. (Rossum's Universal Robots), published in 1920. The term "robotics" was coined by Isaac Asimov in his 1941 science fiction short-story "Liar!" == Nature of robotics == Robotics can be described as: An applied science – scientific knowledge transferred into a physical environment. A branch of computer science – A branch of electrical engineering – A branch of mechanical engineering – Research and development – A branch of technology – == Branches of robotics == Adaptive control – control method used by a controller which must adapt to a controlled system with parameters which vary, or are initially uncertain. For example, as an aircraft flies, its mass will slowly decrease as a result of fuel consumption; a control law is needed that adapts itself to such changing conditions. Aerial robotics – development of unmanned aerial vehicles (UAVs), commonly known as drones, aircraft without a human pilot aboard. Their flight is controlled either autonomously by onboard computers or by the remote control of a pilot on the ground or in another vehicle. Android science – interdisciplinary framework for studying human interaction and cognition based on the premise that a very humanlike robot (that is, an android) can elicit human-directed social responses in human beings. Anthrobotics – science of developing and studying robots that are either entirely or in some way human-like. Artificial intelligence – the intelligence of machines and the branch of computer science that aims to create it. Artificial neural networks – a mathematical model inspired by biological neural networks. Autonomous car – an autonomous vehicle capable of fulfilling the human transportation capabilities of a traditional car Autonomous research robotics – Bayesian network – BEAM robotics – a style of robotics that primarily uses simple analogue circuits instead of a microprocessor in order to produce an unusually simple design (in comparison to traditional mobile robots) that trades flexibility for robustness and efficiency in performing the task for which it was designed. Behavior-based robotics – the branch of robotics that incorporates modular or behavior based AI (BBAI). Bio-inspired robotics – making robots that are inspired by biological systems. Biomimicry and bio-inspired design are sometimes confused. Biomimicry is copying the nature while bio-inspired design is learning from nature and making a mechanism that is simpler and more effective than the system observed in nature. Biomimetic – see Bionics. Biomorphic robotics – a sub-discipline of robotics focused upon emulating the mechanics, sensor systems, computing structures and methodologies used by animals. Bionics – also known as biomimetics, biognosis, biomimicry, or bionical creativity engineering is the application of biological methods and systems found in nature to the study and design of engineering systems and modern technology. Biorobotics – a study of how to make robots that emulate or simulate living biological organisms mechanically or even chemically. Cloud robotics – is a field of robotics that attempts to invoke cloud technologies such as cloud computing, cloud storage, and other Internet technologies centered around the benefits of converged infrastructure and shared services for robotics. Cognitive robotics – views animal cognition as a starting point for the development of robotic information processing, as opposed to more traditional Artificial Intelligence techniques. Clustering – Computational neuroscience – study of brain function in terms of the information processing properties of the structures that make up the nervous system. Robot control – a study of controlling robots Robotics conventions – Data mining Techniques – Degrees of freedom – in mechanics, the degree of freedom (DOF) of a mechanical system is the number of independent parameters that define its configuration. It is the number of parameters that determine the state of a physical system and is important to the analysis of systems of bodies in mechanical engineering, aeronautical engineering, robotics, and structural engineering. Developmental robotics – a methodology that uses metaphors from neural development and developmental psychology to develop the mind for autonomous robots Digital control – a branch of control theory that uses digital computers to act as system controllers. Digital image processing – the use of computer algorithms to perform image processing on digital images. Dimensionality reduction – the process of reducing the number of random variables under consideration, and can be divided into feature selection and feature extraction. Distributed robotics – Electronic stability control – is a computerized technology that improves the safety of a vehicle's stability by detecting and reducing loss of traction (skidding). Evolutionary computation – Evolutionary robotics – a methodology that uses evolutionary computation to develop controllers for autonomous robots Extended Kalman filter – Flexible Distribution functions – Feedback control and regulation – Human–computer interaction – a study, planning and design of the interaction between people (users) and computers Human robot interaction – a study of interactions between humans and robots Intelligent vehicle technologies – comprise electronic, electromechanical, and electromagnetic devices - usually silicon micromachined components operating in conjunction with computer controlled devices and radio transceivers to provide precision repeatability functions (such as in robotics artificial intelligence systems) emergency warning validation performance reconstruction. Computer vision – Machine vision – Kinematics – study of motion, as applied to robots. This includes both the design of linkages to perform motion, their power, control and stability; also their planning, such as choosing a sequence of movements to achieve a broader task. Laboratory robotics – the act of using robots in biology or chemistry labs Robot learning – learning to perform tasks such as obstacle avoidance, control and various other motion-related tasks Direct manipulation interface – In computer science, direct manipulation is a human–computer interaction style which involves continuous representation of objects of interest and rapid, reversible, and incremental actions and feedback. The intention is to allow a user to directly manipulate objects presented to them, using actions that correspond at least loosely to the physical world. Manifold learning – Microrobotics – a field of miniature robotics, in particular mobile robots with characteristic dimensions less than 1 mm Motion planning – (a.k.a., the "navigation problem", the "piano mover's problem") is a term used in robotics for the process of detailing a task into discrete motions. Motor control – information processing related activities carried out by the central nervous system that organize the musculoskeletal system to create coordinated movements and skilled actions. Nanorobotics – the emerging technology field creating machines or robots whose components are at or close to the scale of a nanometer (10−9 meters). Passive dynamics – refers to the dynamical behavior of actuators, robots, or organisms when not drawing energy from a supply (e.g., batteries, fuel, ATP). Programming by Demonstration – an End-user development technique for teaching a computer or a robot new behaviors by demonstrating the task to transfer directly instead of programming it through machine commands. Quantum robotics – a subfield of robotics that deals with using quantum computers to run robotics algorithms more quickly than digital computers can. Rapid prototyping – automatic construction of physical objects via additive manufacturing from virtual models in computer aided design (CAD) software, transforming them into thin, virtual, horizontal cross-sections and then producing successive layers until the items are complete. As of June 2011, used for making models, prototype parts, and production-quality parts in relatively small numbers. Reinforcement learning – an area of machine learning in computer science, concerned with how an agent ought to take actions in an environment so as to maximize some notion of cumulative reward. Robot

Randomized benchmarking

Randomized benchmarking is an experimental method for measuring the average error rates of quantum computing hardware platforms. The protocol estimates the average error rates by implementing long sequences of randomly sampled quantum gate operations. Randomized benchmarking is the industry-standard protocol used by quantum hardware developers such as IBM and Google to test the performance of the quantum operations. The original theory of randomized benchmarking, proposed by Joseph Emerson and collaborators, considered the implementation of sequences of Haar-random operations, but this had several practical limitations. The now-standard protocol for randomized benchmarking (RB) relies on uniformly random Clifford operations, as proposed in 2006 by Dankert et al. as an application of the theory of unitary t-designs. In current usage randomized benchmarking sometimes refers to the broader family of generalizations of the 2005 protocol involving different random gate sets that can identify various features of the strength and type of errors affecting the elementary quantum gate operations. Randomized benchmarking protocols are an important means of verifying and validating quantum operations and are also routinely used for the optimization of quantum control procedures. == Overview == Randomized benchmarking offers several key advantages over alternative approaches to error characterization. For example, the number of experimental procedures required for full characterization of errors (called tomography) grows exponentially with the number of quantum bits (called qubits). This makes tomographic methods impractical for even small systems of just 3 or 4 qubits. In contrast, randomized benchmarking protocols are the only known approaches to error characterization that scale efficiently as number of qubits in the system increases. Thus RB can be applied in practice to characterize errors in arbitrarily large quantum processors. Additionally, in experimental quantum computing, procedures for state preparation and measurement (SPAM) are also error-prone, and thus quantum process tomography is unable to distinguish errors associated with gate operations from errors associated with SPAM. In contrast, RB protocols are robust to state-preparation and measurement errors Randomized benchmarking protocols estimate key features of the errors that affect a set of quantum operations by examining how the observed fidelity of the final quantum state decreases as the length of the random sequence increases. If the set of operations satisfies certain mathematical properties, such as comprising a sequence of twirls with unitary two-designs, then the measured decay can be shown to be an invariant exponential with a rate fixed uniquely by features of the error model. == History == Randomized benchmarking was proposed in Scalable noise estimation with random unitary operators, where it was shown that long sequences of quantum gates sampled uniformly at random from the Haar measure on the group SU(d) would lead to an exponential decay at a rate that was uniquely fixed by the error model. Emerson, Alicki and Zyczkowski also showed, under the assumption of gate-independent errors, that the measured decay rate is directly related to an important figure of merit, the average gate fidelity and independent of the choice of initial state and any errors in the initial state, as well as the specific random sequences of quantum gates. This protocol applied for arbitrary dimension d and an arbitrary number n of qubits, where d=2n. The SU(d) RB protocol had two important limitations that were overcome in a modified protocol proposed by Dankert et al., who proposed sampling the gate operations uniformly at random from any unitary two-design, such as the Clifford group. They proved that this would produce the same exponential decay rate as the random SU(d) version of the protocol proposed in Emerson et al.. This follows from the observation that a random sequence of gates is equivalent to an independent sequence of twirls under that group, as conjectured in and later proven in. This Clifford-group approach to Randomized Benchmarking is the now standard method for assessing error rates in quantum computers. A variation of this protocol was proposed by NIST in 2008 for the first experimental implementation of an RB-type for single qubit gates. However, the sampling of random gates in the NIST protocol was later proven not to reproduce any unitary two-design. The NIST RB protocol was later shown to also produce an exponential fidelity decay, albeit with a rate that depends on non-invariant features of the error model In recent years a rigorous theoretical framework has been developed for Clifford-group RB protocols to show that they work reliably under very broad experimental conditions. In 2011 and 2012, Magesan et al. proved that the exponential decay rate is fully robust to arbitrary state preparation and measurement errors (SPAM). They also proved a connection between the average gate fidelity and diamond norm metric of error that is relevant to the fault-tolerant threshold. They also provided evidence that the observed decay was exponential and related to the average gate fidelity even if the error model varied across the gate operations, so-called gate-dependent errors, which is the experimentally realistic situation. In 2018, Wallman and Dugas et al., showed that, despite concerns raised in, even under very strong gate-dependence errors the standard RB protocols produces an exponential decay at a rate that precisely measures the average gate-fidelity of the experimentally relevant errors. The results of Wallman. in particular proved that the RB error rate is so robust to gate-dependent errors models that it provides an extremely sensitive tool for detecting non-Markovian errors. This follows because under a standard RB experiment only non-Markovian errors (including time-dependent Markovian errors) can produce a statistically significant deviation from an exponential decay The standard RB protocol was first implemented for single qubit gate operations in 2012 at Yale on a superconducting qubit. A variation of this standard protocol that is only defined for single qubit operations was implemented by NIST in 2008 on a trapped ion. The first implementation of the standard RB protocol for two-qubit gates was performed in 2012 at NIST for a system of two trapped ions

Radio network

A radio network is a system that distributes radio signals to multiple receivers or enables two-way communication between stations and mobile units. Worldwide, radio networks include broadcast networks, such as BBC Radio in the United Kingdom and NPR in the United States, which transmit one-to-many signals for news, entertainment, and public information; two-way radio networks, used by police, fire services, taxicabs, and delivery fleets for operational communication; and cellular networks, such as Verizon, Vodafone, and China Mobile, which provide mobile telephony and data services using frequency or time division duplexing. While all rely on radio-frequency technology like transmitters, receivers, and antennas, their network architectures, protocols, and regulatory frameworks differ substantially across applications and regions. The two-way type of radio network shares many of the same technologies and components as the broadcast-type radio network but is generally set up with fixed broadcast points (transmitters) with co-located receivers and mobile receivers/transmitters or transceivers. In this way both the fixed and mobile radio units can communicate with each other over broad geographic regions ranging in size from small single cities to entire states/provinces or countries. There are many ways in which multiple fixed transmit/receive sites can be interconnected to achieve the range of coverage required by the jurisdiction or authority implementing the system: conventional wireless links in numerous frequency bands, fibre-optic links, or microwave links. In all of these cases the signals are typically backhauled to a central switch of some type where the radio message is processed and resent (repeated) to all transmitter sites where it is required to be heard. In contemporary two-way radio systems, a concept called trunking is commonly used to achieve better efficiency of radio spectrum use. It provides a very wide range of coverage, with no switching of channels required by the mobile radio user as it roams throughout the system coverage. Trunking of two-way radio is identical to the concept used for cellular phone systems where each fixed and mobile radio is specifically identified to the system controller and its operation is switched by the controller. == Broadcasting networks == The broadcast type of radio network is a network system which distributes radio programming to multiple stations simultaneously, or slightly delayed, for the purpose of extending total coverage beyond the limits of a single broadcast signal. The resulting expanded audience for radio programming or information essentially applies the benefits of mass-production to the broadcasting enterprise. A radio network has two sales departments, one to package and sell programs to radio stations, and one to sell the audience of those programs to advertisers. Most radio networks also produce much of their programming. Originally, radio networks owned some or all of the stations that broadcast the network's radio format programming. Presently however, there are many networks that do not own any stations and only produce and/or distribute programming. Similarly station ownership does not always indicate network affiliation. A company might own stations in several different markets and purchase programming from a variety of networks. Radio networks rose rapidly with the growth of regular broadcasting of radio to home listeners in the 1920s. This growth took various paths in different places. In Britain the BBC was developed with public funding, in the form of a broadcast receiver license, and a broadcasting monopoly in its early decades. In contrast, in the United States various competing commercial broadcasting networks arose funded by advertising revenue. In that instance, the same corporation that owned or operated the network often manufactured and marketed the listener's radio. Major technical challenges to be overcome when distributing programs over long distances are maintaining signal quality and managing the number of switching/relay points in the signal chain. Early on, programs were sent to remote stations (either owned or affiliated) by various methods, including leased telephone lines, pre-recorded gramophone records and audio tape. The world's first all-radio, non-wireline network was claimed to be the Rural Radio Network, a group of six upstate New York FM stations that began operation in June 1948. Terrestrial microwave relay, a technology later introduced to link stations, has been largely supplanted by coaxial cable, fiber, and satellite, which usually offer superior cost-benefit ratios. Many early radio networks evolved into television networks.

Creepiness

Creepiness is the state of being creepy, or causing an unpleasant feeling of fear or unease to someone and/or something. Certain traits or hobbies may make people seem creepy to others; interest in horror or the macabre might come across as 'creepy', and often people who are perverted or exhibit predatory behavior are called 'creeps'. The internet, especially some functions of social media, has been described as increasingly creepy. Adam Kotsko has compared the modern conception of creepiness to the Freudian concept of unheimlich. The term has also been used to describe paranormal or supernatural phenomena. Some people have phobias which are irrational fears, which can make them perceive something as creepy. == History and studies == "Creepiness" is subjective: for example some dolls have been described as creepy, while what makes something "creepy" or "strange" to someone might seem normal to someone else. The adjective "creepy", referring to a feeling of creeping in the flesh, was first used in 1831, but it was Charles Dickens who coined and popularized the term "the creeps" in his 1849 novel David Copperfield. In the 20th century, association was made between involuntary celibacy and creepiness. The concept of creepiness has only recently been formally addressed in social media marketing. The sensation of creepiness has only recently been the subject of psychological research, despite the widespread colloquial use of the word throughout the years. Francis T. McAndrew of Knox College is the first psychologist to do an empirical study on creepiness. == Causes == The state of creepiness has been associated with "feeling scared, nervous, anxious or worried", "awkward or uncomfortable", "vulnerable or violated" in a study conducted by Watt et al. This state arises in the presence of a creepy element, which can be an individual or, as recently observed, new technologies. === Individuals === Creepiness can be caused by the appearance of an individual. Another study investigated the characteristics that make people creepy. Creepy people were thought to be more often male than female by an overwhelming majority of participants (around 95% of both male and female participants). Another study conducted by Watt et al. also found that participants associated the ectomorphic body type (more linear) with creepiness, more than the other two body types (51% vs mesomorphic, 24% and endomorphic, 23%). Other cues of creepiness included low hygiene, especially according to female participants, and a disheveled appearance. Participants also identified the face as an area with potentially creepy features: in particular the eyes and the teeth. Both of those physical features were deemed creepy not only for their unpleasant appearance (ex. squinty eyes or crooked teeth) but also for the movements and expressions they engaged it (ex. darting eye movements and odd smiles). In fact, appearance does not seem to be the only factor making an individual creepy: behaviors provide cues as well. Behaviors such as "being unusually quiet and staring (34%), following or lurking (15%), behaving abnormally (21%), or in a socially awkward, "sketchy" or suspicious way (20%)" are all contributing to a feeling of creepiness, as described by Watt et al.'s study. === Technology === In addition to other individuals, new technologies, such as marketing's targeted ads and AI, have been qualified as creepy. A study by Moore et al. described what aspect of marketing participants considered creepy. The main three reasons are the following: using invasive tactics, causing discomfort and violating of norms. Invasive tactics are practiced by marketers that know so much about the consumer that the ads are "creepily" personalized. Secondly, some ads create discomfort by making the consumer question "the motives of the company advertising the product". Finally, some ads violate social norms by having inappropriate content, for example by unnecessarily sexualizing it. It is marketing's extensive knowledge used in an improper way, together with a certain loss of control over our data, that creates a feeling of creepiness. Another creepy aspect of technology is human-looking AI: this phenomenon is called the uncanny valley. Humans find robots creepy when they start closely resembling humans. It has been hypothesized that the reason why they are viewed as creepy is because they violate our notion of how a robot should look. A study focusing on children's responses to this phenomenon found evidence to support the hypothesis. == Evolutionary explanation == Several studies have hypothesized that creepiness is an evolutionary response to potentially dangerous situations. It could be linked to a mechanism called agent detection which makes individuals expect malignant agents to be responsible for small changes in the environment. McAndrew et al. illustrates the idea with the example of a person hearing some noises while walking in a dark alley. That person would go in high alert, fearing that some dangerous individual was there. If that was not the case the loss would be small. If, on the other hand, a dangerous individual was actually in the alley and the person had not been alerted by this creepy feeling, the loss could have been significant. Creepiness would therefore serve the purpose of alerting us in situations in which the danger is not outright obvious but rather ambiguous. In this case, ambiguity both refers to the possible presence of a threat and to its nature, sexual or physical for example. Creepiness "may reside in between the unknowing and the fear" in the sense that individuals experiencing it are unsure if there truly is something to fear or not. Creepy characteristics are not simply caused by threat potential: in fact, ectomorphic body types are not the most powerful bodies and facial expressions are not a proxy of physical strength either. Therefore, creepiness is not only related to how threatening a characteristic is, in the sense of how dangerous and strong the individual can be. There are more facets to consider. Another characteristic of creepiness is unpredictable behavior. Unpredictability links back to this idea of ambiguity. When an individual is unpredictable it is not possible to tell when their behavior will turn violent: this adds to the ambiguity of a potentially dangerous situation. This theory is endorsed by studies. Not only is unpredictability directly listed as a creepy characteristic, but other behaviors, such as norm-breaking behaviors are indirectly linked with unpredictability. Such behaviors show that the individual does not conform to some social standards others would expect in a given situation. For example, the aforementioned staring at strangers or lack of hygiene—behaviors that make us uneasy or creeped out because they do not fit the norm and therefore are not expected. More generally, participants tended to define creepiness as "different" in the sense of not behaving, or looking, socially acceptable. Such differences point towards a "social mismatch". Humans have a natural system of detection of such mismatch: a physical feeling of coldness. When an individual is creeped out, they report feeling those "cold chills". This phenomenon has been studied by Leander et al, with relation to nonverbal mimicry in social interactions, meaning the unintentional copying of another's behavior. Inappropriate mimicry may leave a person feeling like something is off about the other. Absence of non-verbal mimicry in a friendly interaction, or the presence of it in a professional setting, raises suspicion as it does not follow the relevant social norms. Individuals are left wondering what other unusual behavior the other might engage in.

Concept mining

Concept mining is an activity that results in the extraction of concepts from artifacts. Solutions to the task typically involve aspects of artificial intelligence and statistics, such as data mining and text mining. Because artifacts are typically a loosely structured sequence of words and other symbols (rather than concepts), the problem is nontrivial, but it can provide powerful insights into the meaning, provenance and similarity of documents. == Methods == Traditionally, the conversion of words to concepts has been performed using a thesaurus, and for computational techniques the tendency is to do the same. The thesauri used are either specially created for the task, or a pre-existing language model, usually related to Princeton's WordNet. The mappings of words to concepts are often ambiguous. Typically each word in a given language will relate to several possible concepts. Humans use context to disambiguate the various meanings of a given piece of text, where available machine translation systems cannot easily infer context. For the purposes of concept mining, however, these ambiguities tend to be less important than they are with machine translation, for in large documents the ambiguities tend to even out, much as is the case with text mining. There are many techniques for disambiguation that may be used. Examples are linguistic analysis of the text and the use of word and concept association frequency information that may be inferred from large text corpora. Recently, techniques that base on semantic similarity between the possible concepts and the context have appeared and gained interest in the scientific community. == Applications == === Detecting and indexing similar documents in large corpora === One of the spin-offs of calculating document statistics in the concept domain, rather than the word domain, is that concepts form natural tree structures based on hypernymy and meronymy. These structures can be used to generate simple tree membership statistics, that can be used to locate any document in a Euclidean concept space. If the size of a document is also considered as another dimension of this space then an extremely efficient indexing system can be created. This technique is currently in commercial use locating similar legal documents in a 2.5 million document corpus. === Clustering documents by topic === Standard numeric clustering techniques may be used in "concept space" as described above to locate and index documents by the inferred topic. These are numerically far more efficient than their text mining cousins, and tend to behave more intuitively, in that they map better to the similarity measures a human would generate.

Digital video recorder

A digital video recorder (DVR), also referred to as a personal video recorder (PVR) particularly in Canadian and British English, is an electronic device that records video in a digital format to a disk drive, USB flash drive, SD memory card, SSD or other local or networked mass storage device. The term includes set-top boxes (STB) with direct to disk recording, portable media players and TV gateways with recording capability, and digital camcorders. Personal computers can be connected to video capture devices and used as DVRs; in such cases the application software used to record video is an integral part of the DVR. Many DVRs are classified as consumer electronic devices. Similar small devices with built-in (~5 inch diagonal) displays and SSD support may be used for professional film or video production, as these recorders often do not have the limitations that built-in recorders in cameras have, offering wider codec support, the removal of recording time limitations and higher bitrates. == History == In the 1980s, prototype high-definition (HD) digital video recorders were developed by Fujitsu, Hitachi, Sanyo and Canon Inc. In 1985, Hitachi demonstrated a prototype digital video tape recorder (VTR) that used digital recording video tape as storage media to record digital HD video content. In 1987, the first commercial digital video recorder was the Sony DVR-1000, a digital video cassette recorder (VCR) that recorded digital video content on D-1 (Sony) digital video cassettes. === Hard-disk-based DVR === In early 1995, Tektronix introduced the "Profile" series PDR100 Video Disk Recorder, which recorded and played back video stored on hard disk as motion JPEG. In 1996, Sweden's TV4 used the PDR100 extensively in building a new facility in Stockholm, and NBC used PDR100s at the Olympic games in Atlanta Georgia. The Tektronix Profile disk recorder won an Engineering, Science & Technology Emmy Award for "Outstanding Achievement in Engineering Development" at the 1996 Primetime Emmy Awards. In 1997 the U.S. Patent Office granted Tektronix patent 5,642,497 for two claims key to Profile. In 1998, Tektronix introduced two Profile models which were combined VDRs and file servers: the PDR200 and PDR300. The PDR300 stored its compressed video as MPEG-2 (ISO/IEC 13818-2) A working disk-based DVR prototype was developed in 1998 at Stanford University Computer Science department. The DVR design was a chapter of Edward Y. Chang's PhD dissertation, supervised by Professors Hector Garcia-Molina and Jennifer Widom. Two design papers were published at the 1998 VLDB conference, and the 1999 ICDE conference. The prototype was developed in 1998 at Pat Hanrahan's CS488 class: Experiments in Digital Television, and the prototype was demoed to industrial partners including Sony, Intel, and Apple. Consumer digital video recorders ReplayTV and TiVo were launched at the 1999 Consumer Electronics Show in Las Vegas, Nevada. Microsoft also demonstrated a unit with DVR capability, but this did not become available until the end of 1999 for full DVR features in Dish Network's DISHplayer receivers. TiVo shipped their first units on March 31, 1999. ReplayTV won the "Best of Show" award in the video category with Netscape co-founder Marc Andreessen as an early investor and board member, but TiVo was more successful commercially. Ad Age cited Forrester Research as saying that market penetration by the end of 1999 was "less than 100,000". In 2001, Toshiba introduced a combination DVR that allows video recording on both DVD recordable and hard disk drive. Legal action by media companies forced ReplayTV to remove many features such as automatic commercial skip and the sharing of recordings over the Internet, but newer devices have steadily regained these functions while adding complementary abilities, such as recording onto DVDs and programming and remote control facilities using PDAs, networked PCs, and Web browsers. In contrast to VCRs, hard-disk based digital video recorders make "time shifting" more convenient and also allow for functions such as pausing live TV, instant replay, chasing playback (viewing a recording before it has been completed) and skipping over advertising during playback. Many DVRs use the MPEG format for compressing the digital video. Video recording capabilities have become an essential part of the modern set-top box, as TV viewers have wanted to take control of their viewing experiences. As consumers have been able to converge increasing amounts of video content on their set-tops, delivered by traditional 'broadcast' cable, satellite and terrestrial as well as IP networks, the ability to capture programming and view it whenever they want has become a must-have function for many consumers. === DVR tied to video service === At the 1999 CES, Dish Network demonstrated the hardware that would later have DVR capability with the assistance of Microsoft software, which also included access to the WebTV service. By the end of 1999 the Dishplayer had full DVR capabilities and within a year, over 200,000 units were sold. In the UK, digital video recorders are often referred to as "plus boxes" (such as BSKYB's Sky+ and Virgin Media's V+ which integrates an HD capability, and the subscription free Freesat+ and Freeview+). Freeview+ have been around in the UK since the late 2000s, although the platform's first DVR, the Pace Twin, dates to 2002. British Sky Broadcasting marketed a popular combined receiver and DVR as Sky+, now replaced by the Sky Q box. TiVo launched a UK model in 2000, and is no longer supported, except for third party services, and the continuation of TiVo through Virgin Media in 2010. South African based Africa Satellite TV beamer Multichoice recently launched their DVR which is available on their DStv platform. In addition to ReplayTV and TiVo, there are a number of other suppliers of digital terrestrial (DTT) DVRs, including Technicolor SA, Topfield, Fusion, Commscope, Humax, VBox Communications, AC Ryan Playon and Advanced Digital Broadcast (ADB). Many satellite, cable and IPTV companies are incorporating digital video recording functions into their set-top box, such as with DirecTiVo, DISHPlayer/DishDVR, Scientific Atlanta Explorer 8xxx from Time Warner, Total Home DVR from AT&T U-verse, Motorola DCT6412 from Comcast and others, Moxi Media Center by Digeo (available through Charter, Adelphia, Sunflower, Bend Broadband, and soon Comcast and other cable companies), or Sky+. Astro introduced their DVR system, called Astro MAX, which was the first PVR in Malaysia but was phased out two years after its introduction. In the case of digital television, there is no encoding necessary in the DVR since the signal is already a digitally encoded MPEG stream. The digital video recorder simply stores the digital stream directly to disk. Having the broadcaster involved with, and sometimes subsidizing, the design of the DVR can lead to features such as the ability to use interactive TV on recorded shows, pre-loading of programs, or directly recording encrypted digital streams. It can, however, also force the manufacturer to implement non-skippable advertisements and automatically expiring recordings. In the United States, the FCC has ruled that starting on July 1, 2007, consumers will be able to purchase a set-top box from a third-party company, rather than being forced to purchase or rent the set-top box from their cable company. This ruling only applies to "navigation devices", otherwise known as a cable television set-top box, and not to the security functions that control the user's access to the content of the cable operator. The overall net effect on digital video recorders and related technology is unlikely to be substantial as standalone DVRs are currently readily available on the open market. In Europe Free-To-Air and Pay TV TV gateways with multiple tuners have whole house recording capabilities allowing recording of TV programs to Network Attached Storage or attached USB storage, recorded programs are then shared across the home network to tablet, smartphone, PC, Mac, Smart TV. === Introduction of dual tuners === In 2003 many Satellite and Cable providers introduced dual-tuner digital video recorders. In the UK, BSkyB introduced their first PVR Sky+ with dual tuner support in 2001. These machines have two independent tuners within the same receiver. The main use for this feature is the capability to record a live program while watching another live program simultaneously or to record two programs at the same time, possibly while watching a previously recorded one. Kogan.com introduced a dual-tuner PVR in the Australian market allowing free-to-air television to be recorded on a removable hard drive. Some dual-tuner DVRs also have the ability to output to two separate television sets at the same time. The PVR manufactured by UEC (Durban, South Africa) and used by Multichoice and Scientific Atlanta 8300DVB PVR have the ability to view two