Enhancing Linguistic Search with the Google Books Ngram Viewer



Our book scanning effort, now in its eighth year, has put tens of millions of books online. Beyond the obvious benefits of being able to discover books and search through them, the project lets us take a step back and learn what the entire collection tells us about culture and language.

Launched in 2010 by Jon Orwant and Will Brockman, the Google Books Ngram Viewer lets you search for words and phrases over the centuries, in English, Chinese, Russian, French, German, Italian, Hebrew, and Spanish. It’s become popular for both casual explorations into language usage and serious linguistic research, and this summer we decided to provide some new ways to search with it.

With our interns Jason Mann, Lu Yang, and David Zhang, we’ve added three new features. The first is wildcards: by putting an asterisk as a placeholder in your query, you can retrieve the ten most popular replacement. For instance, what noun most often follows “Queen” in English fiction? The answer is “Elizabeth”:


This graph also reveals that the frequency of mentions of the most popular queens has been decreasing steadily over time. (Language expert Ben Zimmer shows some other interesting examples in his Atlantic article.) Right-clicking collapses all of the series into a sum, allowing you to see the overall change.

Another feature we’ve added is the ability to search for inflections: different grammatical forms of the same word. (Inflections of the verb “eat” include “ate”, “eating”, “eats”, and “eaten”.) Here, we can see that the phrase “changing roles” has recently surged in popularity in English fiction, besting “change roles”, which earlier dethroned “changed roles”:


Curiously, this switching doesn’t happen when we add non-fiction into the mix: “changing roles” is persistently on top, with an odd dip in the late 1980s. As with wildcards, right-clicking collapses and expands the data:


Finally, we’ve implemented the most common feature request from our users: the ability to search for multiple capitalization styles simultaneously. Until now, searching for common capitalizations of “Mother Earth” required using a plus sign to combine ngrams (e.g., “Mother Earth + mother Earth + mother earth”), but now the case-insensitive checkbox makes it easier:


As with our other two features, right-clicking toggles whether the variants are shown.

We hope these features help you discover and share interesting trends in language use!
Read More..

Powershell v2 Finding a Hashtable Member Directly by Referencing Name

As I was working on a script to find stray folders I posted on Technet to get some help with referencing switch values:
Can I reference the switch name from within its block?
The script relies on a set of date ranges defined in a hashtable.
# Store volumenames (key) and dateranges (values -as [Array])
$volumedateranges = @{
  Vol1 = @([DateTime] 2001-01-01, [DateTime] 2001-12-31);
  Vol2 = @([DateTime] 2002-01-01, [DateTime] 2002-12-31);
}
When the script runs I dynamically reference a hashtable member by searching through the hashtable to get its key (or name depending on what you call it).  Once I have this value I pass to a Where clause (highlighted in the red below).

# Get logical disks
Get-WmiObject -Class Win32_LogicalDisk | % {
 switch($_.VolumeName)
 {
  Vol1
  {
   # Set dates
   $startdate = ($volumedateranges.GetEnumerator() | Where {$_.key -match $_.VolumeName}).Value[0]
   $enddate = ($volumedateranges.GetEnumerator() | Where {$_.key -match $_.VolumeName}).Value[1]

   # Get directories from switched drive
   Get-ChildItem -Path "$($_.DeviceID)*Data*" | `
   Where {($_.PSIsContainer) -and (Get-FolderDate $_.fullname -lt $startdate) -and (Get-FolderDate -gt $enddate)} | `
   Select $_.fullname
  }
  Vol2
  {
   # Set dates
   $startdate = ($volumedateranges.GetEnumerator() | Where {$_.key -match $_.VolumeName}).Value[0]
   $enddate = ($volumedateranges.GetEnumerator() | Where {$_.key -match $_.VolumeName}).Value[1]

   # Get directories from switched drive
   Get-ChildItem -Path "$($_.DeviceID)*Data*" | `
   Where {($_.PSIsContainer) -and (Get-FolderDate $_.fullname -lt $startdate) -and (Get-FolderDate -gt $enddate)} | `
   Select $_.fullname
  }
 }
}
mjolinor pointed out I can bypass the enumeration and use the key (name) directly within my hashtable reference with this syntax:
$startdate = $volumedateranges[$_.VolumeName][0]
$enddate = $volumedateranges[$_.VolumeName][1]
Why enumerate when I can just call directly. Much less work, more direct, and, cleaner for sure.  Thanks again hammer man.
Read More..

PowerShell v3 Get a Numbered List of Cmdlets

Nothing special. Just a quick way to get a list of cmdlets available to a shell. Great for reference when you want a controlled, order list of something;
$a=1;
Get-Command-CommandType cmdlet|
ForEach-Object{"$($a): $($_.name)"; $a++
will output this:
1:Add-BitsFile
2:Add-Computer
3:Add-Content
4:Add-History
5:Add-JobTrigger
6:Add-Member
7:Add-PSSnapin
8:Add-RoleMember
9:Add-SqlAvailabilityDatabase
10:Add-SqlAvailabilityGroupListenerStaticIp
11:Add-Type
12:Backup-ASDatabase
13:Backup-SqlDatabase
14:Checkpoint-Computer
15:Clear-Content
16:Clear-EventLog
17:Clear-History
18:Clear-Item
19:Clear-ItemProperty
20:Clear-Variable
21:Compare-Object
22:Complete-BitsTransfer
23:Complete-Transaction
24:Connect-PSSession
25:Connect-WSMan
26:ConvertFrom-Csv
27:ConvertFrom-Json
28:ConvertFrom-SecureString
29:ConvertFrom-StringData
30:Convert-Path
31:ConvertTo-Csv
32:ConvertTo-Html
33:ConvertTo-Json
34:ConvertTo-SecureString
35:ConvertTo-Xml
36:Convert-UrnToPath
37:Copy-Item
38:Copy-ItemProperty
39:Debug-Process
40:Decode-SqlName
41:Disable-ComputerRestore
42:Disable-JobTrigger
43:Disable-PSBreakpoint
44:Disable-PSRemoting
45:Disable-PSSessionConfiguration
46:Disable-ScheduledJob
47:Disable-SqlAlwaysOn
48:Disable-WSManCredSSP
49:Disconnect-PSSession
50:Disconnect-WSMan
51:Enable-ComputerRestore
52:Enable-JobTrigger
53:Enable-PSBreakpoint
54:Enable-PSRemoting
55:Enable-PSSessionConfiguration
56:Enable-ScheduledJob
57:Enable-SqlAlwaysOn
58:Enable-WSManCredSSP
59:Encode-SqlName
60:Enter-PSSession
61:Exit-PSSession
62:Export-Alias
63:Export-Clixml
64:Export-Console
65:Export-Counter
66:Export-Csv
67:Export-FormatData

Computer hard drive drivers Download

CompanyDrivers page
APS TechAPS Tech hard drive drivers
ConnerConner hard drive drivers
FujitsuFujitsu hard drive drivers
HitachiHitachi hard drive drivers
IBMIBM hard drive drivers
MaxtorMaxtor hard drive drivers
QuantumQuantum hard drive drivers
SeagateSeagate hard drive drivers
Western DigitalWestern Digital hard drive drivers
Read More..

Fujitsu LifeBook U810 Tablet PC

As a Tablet PC, Fujitsu LifeBook U810 has the size of this small, with a screen 5.6-inch WSVGA not reduce the quality of the view that, although small in size. Design a robust chassis and the keyboard is quite flexible as the LifeBook T2010 Series U810 still make a wise choice for outdoor activities, a mobile high.

Fujitsu LifeBook U810 Tablet PC

Here is a brief specification of the Fujitsu LifeBook U810:
  1. Intel A110 800MHz Processor
  2. Intel 945GU Express Chipset
  3. Windows Vista Business with Microsoft Office OneNote 2007
  4.  5.6 "WSVGA Crystal View display with passive touch screen
  5.  Memory 1GB DDR2 400MHz SDRAM
  6.  40GB Harddisk (4200 rpm)
  7.  10/100 Ethernet (with included cable)
  8.  Atheros Super AG Wireless LAN (802.11a/b/g)
  9.  Integrated Bluetooth wireless
  10.  Integrated Fingerprint Sensor and embedded TPM
  11.  1 x USB
  12.  SD slot
  13.  Type I / II Compact Flash slot
  14.  Microphone and headphone jack
Overall Fujitsu LifeBook U810 is a feasible option to consider class Tablet PC. Internet connection U810 does not have a problem, except if located in areas with weak signals. Bluetooth facilities easier for users if want to take the mouse as the size of U810, which is relatively very small. In addition is also available which allows the channel VGA Fujitsu LifeBook U810 in the connect to the monitor. Resistance offered by the battery is quite promising, which is approximately 5.5 hours.

What about you Fujitsu LifeBook U810 Tablet PC
Read More..

Google Handwriting Input in 82 languages on your Android mobile device



Entering text on mobile devices is still considered inconvenient by many; touchscreen keyboards, although much improved over the years, require a lot of attention to hit the right buttons. Voice input is an option, but there are situations where it is not feasible, such as in a noisy environment or during a meeting. Using handwriting as an input method can allow for natural and intuitive input method for text entry which complements typing and speech input methods. However, until recently there have been many languages where enabling this functionality presented significant challenges.

Today we launched Google Handwriting Input, which lets users handwrite text on their Android mobile device as an additional input method for any Android app. Google Handwriting Input supports 82 languages in 20 distinct scripts, and works with both printed and cursive writing input with or without a stylus. Beyond text input, it also provides a fun way to enter hundreds of emojis by drawing them (simply press and hold the ‘enter’ button to switch modes). Google Handwriting Input works with or without an Internet connection.
By building on large-scale language modeling, robust multi-language OCR, and incorporating large-scale neural-networks and approximate nearest neighbor search for character classification, Google Handwriting Input supports languages that can be challenging to type on a virtual keyboard. For example, keyboards for ideographic languages (such as Chinese) are often based on a particular dialect of the language, but if a user does not know that dialect, they may be hard to use. Additionally, keyboards for complex script languages (like many South Asian languages) are less standardized and may be unfamiliar. Even for languages where virtual keyboards are more widely used (like English or Spanish), some users find that handwriting is more intuitive, faster, and generally more comfortable.
Writing Hello in Chinese, German, and Tamil.
Google Handwriting Input is the result of many years of research at Google. Initially, cloud based handwriting recognition supported the Translate Apps on Android and iOS, Mobile Search, and Google Input Tools (in Chrome, ChromeOS, Gmail and Docs, translate.google.com, and the Docs symbol picker). However, other products required recognizers to run directly on an Android device without an Internet connection. So we worked to make recognition models smaller and faster for use in Android handwriting input methods for Simplified and Traditional Chinese, Cantonese, and Hindi, as well as multi-language support in Gesture Search. Google Handwriting Input combines these efforts, allowing recognition both on-device and in the cloud (by tapping on the cloud icon) in any Android app.

You can install Google Handwriting Input from the Play Store here. More information and FAQs can be found here.
Read More..

A picture is worth a thousand coherent words building a natural description of images



“Two pizzas sitting on top of a stove top oven”
“A group of people shopping at an outdoor market”
“Best seats in the house”

People can summarize a complex scene in a few words without thinking twice. It’s much more difficult for computers. But we’ve just gotten a bit closer -- we’ve developed a machine-learning system that can automatically produce captions (like the three above) to accurately describe images the first time it sees them. This kind of system could eventually help visually impaired people understand pictures, provide alternate text for images in parts of the world where mobile connections are slow, and make it easier for everyone to search on Google for images.

Recent research has greatly improved object detection, classification, and labeling. But accurately describing a complex scene requires a deeper representation of what’s going on in the scene, capturing how the various objects relate to one another and translating it all into natural-sounding language.
Automatically captioned: “Two pizzas sitting on top of a stove top oven”
Many efforts to construct computer-generated natural descriptions of images propose combining current state-of-the-art techniques in both computer vision and natural language processing to form a complete image description approach. But what if we instead merged recent computer vision and language models into a single jointly trained system, taking an image and directly producing a human readable sequence of words to describe it?

This idea comes from recent advances in machine translation between languages, where a Recurrent Neural Network (RNN) transforms, say, a French sentence into a vector representation, and a second RNN uses that vector representation to generate a target sentence in German.

Now, what if we replaced that first RNN and its input words with a deep Convolutional Neural Network (CNN) trained to classify objects in images? Normally, the CNN’s last layer is used in a final Softmax among known classes of objects, assigning a probability that each object might be in the image. But if we remove that final layer, we can instead feed the CNN’s rich encoding of the image into a RNN designed to produce phrases. We can then train the whole system directly on images and their captions, so it maximizes the likelihood that descriptions it produces best match the training descriptions for each image.
The model combines a vision CNN with a language-generating RNN so it can take in an image and generate a fitting natural-language caption.
Our experiments with this system on several openly published datasets, including Pascal, Flickr8k, Flickr30k and SBU, show how robust the qualitative results are -- the generated sentences are quite reasonable. It also performs well in quantitative evaluations with the Bilingual Evaluation Understudy (BLEU), a metric used in machine translation to evaluate the quality of generated sentences.
A selection of evaluation results, grouped by human rating.
A picture may be worth a thousand words, but sometimes it’s the words that are most useful -- so it’s important we figure out ways to translate from images to words automatically and accurately. As the datasets suited to learning image descriptions grow and mature, so will the performance of end-to-end approaches like this. We look forward to continuing developments in systems that can read images and generate good natural-language descriptions. To get more details about the framework used to generate descriptions from images, as well as the model evaluation, read the full paper here.
Read More..