Planet Topic Maps

December 03, 2016

Patrick Durusau

Identifying Speech/News Writers

David Smith’s post: Stylometry: Identifying authors of texts using R details the use of R to distinguish tweets by president-elect Donald Trump from his campaign staff. (Hmmm, sharing a Twitter account password, there’s bad security for you.)

The same techniques may distinguish texts delivered “live” versus those “inserted” into Congressional Record.

What other texts are ripe for distinguishing authors?

From the post:

Few people expect politicians to write every word they utter themselves; reliance on speechwriters and spokepersons is a long-established political practice. Still, it's interesting to know which statements are truly the politician's own words, and which are driven primarily by advisors or influencers.

Recently, David Robinson established a way of figuring out which tweets from Donald Trump's Twitter account came from him personally, as opposed to from campaign staff, whcih he verified by comparing the sentiment of tweets from Android vs iPhone devices. Now, Ali Arsalan Kazmi has used stylometric analysis to investigate the provenance of speeches by the Prime Minister of Pakistan

A small amount of transparency can go a long way.

Email archives anyone?

by Patrick Durusau at December 03, 2016 01:44 AM

December 02, 2016

Patrick Durusau

War and Peace & R

No, not a post about R versus Python but about R and Tolstoy‘s War and Peace.

Using R to Gain Insights into the Emotional Journeys in War and Peace by Wee Hyong Tok.

From the post:

How do you read a novel in record time, and gain insights into the emotional journey of main characters, as they go through various trials and tribulations, as an exciting story unfolds from chapter to chapter?

I remembered my experiences when I start reading a novel, and I get intrigued by the story, and simply cannot wait to get to the last chapter. I also recall many conversations with friends on some of the interesting novels that I have read awhile back, and somehow have only vague recollection of what happened in a specific chapter. In this post, I’ll work through how we can use R to analyze the English translation of War and Peace.

War and Peace is a novel by Leo Tolstoy, and captures the salient points about Russian history from the period 1805 to 1812. The novel consists of the stories of five families, and captures the trials and tribulations of various characters (e.g. Natasha and Andre). The novel consists of about 1400 pages, and is one of the longest novels that have been written.

We hypothesize that if we can build a dashboard (shown below), this will allow us to gain insights into the emotional journey undertaken by the characters in War and Peace.

Impressive work, even though I would not use it as a short-cut to “read a novel in record time.”

Rather I take this as an alternative way of reading War and Peace, one that can capture insights a casual reader may miss.

Moreover, the techniques demonstrated here could be used with other works of literature, or even non-fictional works.

Imagine conducting this analysis over the reportedly more than 7,000 page full CIA Torture Report, for example.

A heatmap does not connect any dots, but points a user towards places where interesting dots may be found.

Certainly a tool for exploring large releases/leaks of text data.


PS: Large, tiresome, obscure-on-purpose, government reports to practice on with this method?

by Patrick Durusau at December 02, 2016 10:13 PM

December 01, 2016

Patrick Durusau

OSS-Fuzz: Continuous fuzzing for open source software

Announcing OSS-Fuzz: Continuous fuzzing for open source software

From the post:

We are happy to announce OSS-Fuzz, a new Beta program developed over the past years with the Core Infrastructure Initiative community. This program will provide continuous fuzzing for select core open source software.

Open source software is the backbone of the many apps, sites, services, and networked things that make up “the internet.” It is important that the open source foundation be stable, secure, and reliable, as cracks and weaknesses impact all who build on it.

Recent security stories confirm that errors like buffer overflow and use-after-free can have serious, widespread consequences when they occur in critical open source software. These errors are not only serious, but notoriously difficult to find via routine code audits, even for experienced developers. That’s where fuzz testing comes in. By generating random inputs to a given program, fuzzing triggers and helps uncover errors quickly and thoroughly.

In recent years, several efficient general purpose fuzzing engines have been implemented (e.g. AFL and libFuzzer), and we use them to fuzz various components of the Chrome browser. These fuzzers, when combined with Sanitizers, can help find security vulnerabilities (e.g. buffer overflows, use-after-free, bad casts, integer overflows, etc), stability bugs (e.g. null dereferences, memory leaks, out-of-memory, assertion failures, etc) and sometimes even logical bugs.

OSS-Fuzz’s goal is to make common software infrastructure more secure and stable by combining modern fuzzing techniques with scalable distributed execution. OSS-Fuzz combines various fuzzing engines (initially, libFuzzer) with Sanitizers (initially, AddressSanitizer) and provides a massive distributed execution environment powered by ClusterFuzz.
… (emphasis in original)

Another similarity between open and closed source software.

Closed source software is continuously being fuzzed.

By volunteers.

Yes? ;-)

One starting place for more information: Effective file format fuzzing by Mateusz “j00ru” Jurczyk (Black Hat Europe 2016, London) and his website:

by Patrick Durusau at December 01, 2016 08:16 PM

If You Don’t Get A Quantum Computer For Christmas

Learn Quantum Mechanics with Haskell by Scott N. Walck.


To learn quantum mechanics, one must become adept in the use of various mathematical structures that make up the theory; one must also become familiar with some basic laboratory experiments that the theory is designed to explain. The laboratory ideas are naturally expressed in one language, and the theoretical ideas in another. We present a method for learning quantum mechanics that begins with a laboratory language for the description and simulation of simple but essential laboratory experiments, so that students can gain some intuition about the phenomena that a theory of quantum mechanics needs to explain. Then, in parallel with the introduction of the mathematical framework on which quantum mechanics is based, we introduce a calculational language for describing important mathematical objects and operations, allowing students to do calculations in quantum mechanics, including calculations that cannot be done by hand. Finally, we ask students to use the calculational language to implement a simplified version of the laboratory language, bringing together the theoretical and laboratory ideas.

You won’t find a quantum computer under your Christmas tree this year.

But Haskell + Walck will teach you the basics of quantum mechanics.

You may also want to read:

Structure and Interpretation of Quantum Mechanics – a Functional Framework (2003) by Jerzy Karczmarczuk.

You will have to search for it but “Gerald Jay Sussman & Jack Wisdom (2013): Functional Differential Geometry. The MIT Press.” is out on the net somewhere.

Very tough sledding but this snippet from the preface may tempt you into buying a copy:

But the single biggest difference between our treatment and others is that we integrate computer programming into our explanations. By programming a computer to interpret our formulas we soon learn whether or not a formula is correct. If a formula is not clear, it will not be interpretable. If it is wrong, we will get a wrong answer. In either case we are led to improve our program and as a result improve our understanding. We have been teaching advanced classical mechanics at MIT for many years using this strategy. We use precise functional notation and we have students program in a functional language. The students enjoy this approach and we have learned a lot ourselves. It is the experience of writing software for expressing the mathematical content and the insights that we gain from doing it that we feel is revolutionary. We want others to have a similar experience.

If that interests you, check out courses by Sussman at MITOpenCourseware.


by Patrick Durusau at December 01, 2016 06:38 PM

Recycling Old News – NPR Station WMOT

Avoiding “fake” news, NPR station WMOT is recycling “old news.”


Looking for a recent article on combining multiple sources of DNA I found:

Combining The DNA Of Three People Raises Ethical Questions by Rob Stein, Nov. 10, 2014.


In a darkened lab in the north of England, a research associate is intensely focused on the microscope in front of her. She carefully maneuvers a long glass tube that she uses to manipulate early human embryos.

“It’s like microsurgery,” says Laura Irving of Newcastle University.

Irving is part of a team of scientists trying to replace defective DNA with healthy DNA. They hope this procedure could one day help women who are carrying genetic disorders have healthy children.

Compare that post to:

Combining The DNA Of Three People Raises Ethical Questions by Rob Stein, 22 hours ago.


In a darkened lab in the north of England, a research associate is intensely focused on the microscope in front of her. She carefully maneuvers a long glass tube that she uses to manipulate early human embryos.

“It’s like microsurgery,” says Laura Irving of Newcastle University.

Irving is part of a team of scientists trying to replace defective DNA with healthy DNA. They hope this procedure could one day help women who are carrying genetic disorders have healthy children.

I took a screen shot that includes WMOT and the article title, plus saved the page, just in case through the magic of silent correction, this example of “news” reporting goes away.

At least to me, two year old news isn’t the same as news 22 hours ago.


PS: The loss of credibility by the media has been entirely self-inflicted. See media coverage of the 2016 presidential race for example. Why would anyone trust a news source that was so badly wrong?

Hard work, good journalism, timely reporting, all of those are the elements needed for the media to regain credibility. Credible journalists don’t attempt to suppress “fake news.” Attempts to suppress “fake news” signal a lack of commitment to credible journalism. Credible journalism doesn’t notice “fake news.”

by Patrick Durusau at December 01, 2016 05:06 PM

Internet Censor(s) Spotted in Mirror

How to solve Facebook’s fake news problem: experts pitch their ideas by Nicky Woolf.

From the post:

The impact of fake news, propaganda and misinformation has been widely scrutinized since the US election. Fake news actually outperformed real news on Facebook during the final weeks of the election campaign, according to an analysis by Buzzfeed, and even outgoing president Barack Obama has expressed his concerns.

But a growing cadre of technologists, academics and media experts are now beginning the quixotic process of trying to think up solutions to the problem, starting with a rambling 100+ page open Google document set up by Upworthy founder Eli Pariser.

Woolf captures the essential wrongness with the now, 120 pages, of suggestions, quoting Claire Wardle:

“The biggest challenge is who wants to be the arbiter of truth and what truth is,” said Claire Wardle, research director for the Tow Center for Digital Journalism at Columbia University. “The way that people receive information now is increasingly via social networks, so any solution that anybody comes up with, the social networks have to be on board.”

Don’t worry, selecting the arbiter of truth and what truth is won’t be difficult.

The authors of these suggestions see their favorite candidate every day:


So long as they aren’t seeing my image (substitute your name/image) in the mirror, I’m not interested in any censorship proposal.

Personally, even if offered the post of Internet Censor, I would turn it down.

I can’t speak for you but I am unable to be equally impartial to all. Nor do I trust anyone else to be equally impartial.

The “solution” to “fake news,” if you think that is a meaningful term, is more news, not less.

Enable users to easily compare and contrast news sources, if they so choose. Freedom means being free to make mistakes as well as good choices (from some point of view).

by Patrick Durusau at December 01, 2016 02:04 AM

November 30, 2016

Patrick Durusau

Constitution Free Zone [The Only Advantage To Not Living In Hawaii]

Know Your Rights: The Government’s 100-Mile “Border” Zone – Map

From the post:

Many people think that border-related policies impact only people living in border towns like El Paso or San Diego. The reality is that Border Patrol’s interior enforcement operations encroach deep into and across the United States, affecting the majority of Americans.

Roughly two-thirds of the United States’ population, about 200 million people, lives within the 100-mile zone that an outdated federal regulation defines as the border zone—that is, within 100 miles of a U.S. land or coastal border.

Although this zone is not literally “Constitution free”—constitutional protections do still apply—the Border Patrol frequently ignores those protections and runs roughshod over individuals’ civil liberties.

Learn more about the government’s 100-mile border zone.

Read the ACLU factsheet on Custom and Border Protection’s 100-mile zone


The ACLU map demonstrates there are no locations in Hawaii where the border zone does not reach.

Now you can name the one advantage of living outside of Hawaii, just in case it comes up on Jeopardy.


In some ways, this map is mis-leading.

The U.S. government runs roughshod over everyone within and without its borders.

Ask the people of Aleppo for tales of the American government. A city rumored to be founded in the 6th millennium BCE, may be about to become the largest graveyard in history.

Be sure to mention that on holiday cards to the Obama White House.

by Patrick Durusau at November 30, 2016 10:35 PM

Urgent: Update Your Tor Browser [Today, Yes, Today] + Aside on shallow bugs

Tor Browser 6.0.7 is released

From the webpage:

Tor Browser 6.0.7 is now available from the Tor Browser Project page and also from our distribution directory.

This release features an important security update to Firefox and contains, in addition to that, an update to NoScript (

The security flaw responsible for this urgent release is already actively exploited on Windows systems. Even though there is currently, to the best of our knowledge, no similar exploit for OS X or Linux users available the underlying bug affects those platforms as well. Thus we strongly recommend that all users apply the update to their Tor Browser immediately. A restart is required for it to take effect.

Tor Browser users who had set their security slider to “High” are believed to have been safe from this vulnerability.

We will have alpha and hardened Tor Browser updates out shortly. In the meantime, users of these series can mitigate the security flaw in at least two ways:

1) Set the security slider to “High” as this is preventing the exploit from working.
2) Switch to the stable series until updates for alpha and hardened are available, too.

Here is the full changelog since 6.0.6:

  • All Platforms
    • Update Firefox to 45.5.1esr
    • Update NoScript to

A reminder from the Tor project that:

many eyes make all bugs shallow

is marketing talk for open source, nothing more.

For more on that theme: Linus’s Law aka “Many Eyes Make All Bugs Shallow” by Jeff Jones.

A little over 10 years old now, predating HeartBleed for example, but still an interesting read.

I am and remain an open source advocate but not on the basis of false claims of bug finding. Open source improves your changes of finding spyware. No guarantees but open source improves your chances.

Why any government or enterprise would run closed source software is a mystery to me. Upload all your work to the NSA on a weekly basis. With uploads you create a reminder of your risk, which is missing with non-open source software.

by Patrick Durusau at November 30, 2016 08:58 PM

Hacking Journalists (Of self-protection)

Inside the mind of digital attackers: Part 1 — The connection by Justin Kosslyn.

From the post:

John has a target: name, country, brief context, and maybe the email address or website. John has been given a goal: maybe eavesdropping, taking a website offline, or stealing intellectual property. And John has been given constraints: maybe he cannot risk detection, or he has to act within 24 hours, or he cannot reach out to the state-owned telecommunications company for help.

John is a government-backed digital attacker. He sits in an office building somewhere, at a desk. Maybe this is the job he wanted when he was growing up, or maybe it was a way to pay the bills and stretch his technical muscles. He probably has plans for the weekend.

Let’s say, for the sake of this example, that John’s target is Henry, in the same country as John. John’s goal is to copy all the information on Henry’s computer without being detected. John can get help from other government agencies. There’s no rush.

The first thing to realize is that John, like most people, is a busy guy. He’s not going to do more work than necessary. First, he’ll try to use traditional, straightforward techniques — nothing fancy — and only if those methods fail will he try to be more creative with his attack.

The start of an interesting series from Jigsaw:

A technology incubator at Alphabet that tackles geopolitical problems.

Justin proposes to take us inside the mind of hackers who target journalists.

Understanding the enemy and their likely strategies is a starting place for effective defense/protection.

My only caveat is the description of John as a …government-backed digital attacker….

Could be and increases John’s range of tools but don’t premise any defense on attackers being government-backed.

There are only two types of people in the world:

  1. People who are attacking your system.
  2. People have not yet attacked your system.

Any sane and useful security policy accounts for both.

I’m looking forward to the next installment in this series.

by Patrick Durusau at November 30, 2016 08:28 PM

1 Million Compromised Google Accounts – 86 Goolian Infected Apps – In Sort Order

“Gooligan” Android Malware Compromised 1 Million Google Accounts by Bogdan Popa.

From the post:

Security experts at Check Point have discovered a new very aggressive form of Android malware that already compromised no less than 1 million Google accounts and which can infect approximately 74 percent of the Android phones currently on the market.

The firm warns that the malware which they call Gooligan is injected into a total of 86 Android apps that are delivered through third-party marketplaces (you can check the full list of apps in the box at the end of the article). Once installed, these apps root the phone to get full access to the device and then attempt to deploy malicious software which can be used to steal authentication tokens for Google accounts.

This pretty much gives the attackers full control over the targeted Google accounts, and as long as vulnerable phones have Gmail, Google Drive, Google Chrome, YouTube, Google Photos, or any other Google app that can be used with an account, there’s a big chance that the attack is successful.
…(emphasis in original)

You can check to see if your account has been breached: Gooligan Checker.

The article also lists 86 Goolian infected apps, in no particular order. (Rhetorical questions: Why do people make it difficult for readers? What is their payoff?)

To save you from digging through and possibly missing an infected app, here are the 86 Googlian infected apps in dictionary order:

  • แข่งรถสุดโหด
  • Assistive Touch
  • ballSmove_004
  • Battery Monitor
  • Beautiful Alarm
  • Best Wallpapers
  • Billiards
  • Blue Point
  • CakeSweety
  • Calculator
  • Chrono Marker
  • Clean Master
  • Clear
  • com.browser.provider
  • com.example.ddeo
  • com.fabullacop.loudcallernameringtone
  • Compass Lite
  • Daily Racing
  • Demm
  • Demo
  • Demoad
  • Detecting instrument
  • Dircet Browser
  • Fast Cleaner
  • Fingerprint unlock
  • Flashlight Free
  • Fruit Slots
  • gla.pev.zvh
  • Google
  • GPS
  • GPS Speed
  • Hip Good
  • HotH5Games
  • Hot Photo
  • Html5 Games
  • Kiss Browser
  • KXService
  • Light Advanced
  • Light Browser
  • memory booste
  • memory booster
  • Memory Booster
  • Minibooster
  • Multifunction Flashlight
  • Music Cloud
  • OneKeyLock
  • Pedometer
  • Perfect Cleaner
  • phone booster
  • PornClub
  • PronClub
  • Puzzle Bubble-Pet Paradise
  • QPlay
  • SettingService
  • Sex Cademy
  • Sex Photo
  • Sexy hot wallpaper
  • Shadow Crush
  • Simple Calculator
  • Slots Mania
  • Small Blue Point
  • SmartFolder
  • Smart Touch
  • Snake
  • So Hot
  • StopWatch
  • Swamm Browser
  • System Booster
  • Talking Tom 3
  • TcashDemo
  • Test
  • Touch Beauty
  • tub.ajy.ics
  • UC Mini
  • Virtual
  • Weather
  • Wifi Accelerate
  • WiFi Enhancer
  • Wifi Master
  • Wifi Speed Pro
  • YouTube Downloader
  • youtubeplayer
  • 小白点
  • 清理大师

by Patrick Durusau at November 30, 2016 07:48 PM

Visualizing XML Schemas

I don’t have one of the commercial XML packages at the moment and was casting about for a free visualization technique for a large XML schema when I encountered:


I won’t be trying it on my schema until tomorrow but I thought it looked interesting enough to pass along.

Further details: Visualizing Complex Content Models with Spatial Schemas by Joe Pairman.

This looks almost teachable.


Other “free” visualization tools to suggest?

by Patrick Durusau at November 30, 2016 01:51 AM

November 29, 2016

Patrick Durusau

Gab – Censorship Lite?

I submitted my email today at Gab and got this message:

Done! You’re #1320420 in the waiting list.

Only three rules:

Illegal Pornography

We have a zero tolerance policy against illegal pornography. Such material will be instantly removed and the owning account will be dealt with appropriately per the advice of our legal counsel. We reserve the right to ban accounts that share such material. We may also report the user to local law enforcement per the advice our legal counsel.

Threats and Terrorism

We have a zero tolerance policy for violence and terrorism. Users are not allowed to make threats of, or promote, violence of any kind or promote terrorist organizations or agendas. Such users will be instantly removed and the owning account will be dealt with appropriately per the advice of our legal counsel. We may also report the user to local and/or federal law enforcement per the advice of our legal counsel.

What defines a ‘terrorist organization or agenda’? Any group that is labelled as a terrorist organization by the United Nations and/or United States of America classifies as a terrorist organization on Gab.

Private Information

Users are not allowed to post other’s confidential information, including but not limited to, credit card numbers, street numbers, SSNs, without their expressed authorization.

If Gab is listening, I can get the rules down to one:

Court Ordered Removal

When Gab receives a court order from a court of competent jurisdiction ordering the removal of identified, posted content, at (service address), the posted, identified content will be removed.

Simple, fair, gets Gab and its staff out of the censorship business and provides a transparent remedy.

At no cost to Gab!

What’s there not to like?

Gab should review my posts: Monetizing Hate Speech and False News and Preserving Ad Revenue With Filtering (Hate As Renewal Resource), while it is in closed beta.

Twitter and Facebook can keep spending uncompensated time and effort trying to be universal and fair censors. Gab has the opportunity to reach up and grab those $100 bills flying overhead for filtered news services.

What is the New York Times if not an opinionated and poorly run filter on all the possible information it could report?

Apply that same lesson to social media!

PS: Seriously, before going public, I would go to the one court-based rule on content. There’s no profit and no wins in censoring any content on your own. Someone will always want more or less. Courts get paid to make those decisions.

Check with your lawyers but if you don’t look at any content, you can’t be charged with constructive notice of it. Unless and until someone points it out, then you have to follow DCMA, court orders, etc.

by Patrick Durusau at November 29, 2016 10:52 PM

Spies in the Skies [Fostered by Obama, Inherited by Trump]

Spies in the Skies by Peter Aldhous and Charles Seife.

Post in April of 2016, it reads in part:

Each weekday, dozens of U.S. government aircraft take to the skies and slowly circle over American cities. Piloted by agents of the FBI and the Department of Homeland Security (DHS), the planes are fitted with high-resolution video cameras, often working with “augmented reality” software that can superimpose onto the video images everything from street and business names to the owners of individual homes. At least a few planes have carried devices that can track the cell phones of people below. Most of the aircraft are small, flying a mile or so above ground, and many use exhaust mufflers to mute their engines — making them hard to detect by the people they’re spying on.

The government’s airborne surveillance has received little public scrutiny — until now. BuzzFeed News has assembled an unprecedented picture of the operation’s scale and sweep by analyzing aircraft location data collected by the flight-tracking website Flightradar24 from mid-August to the end of December last year, identifying about 200 federal aircraft. Day after day, dozens of these planes circled above cities across the nation.

The FBI and the DHS would not discuss the reasons for individual flights but told BuzzFeed News that their planes are not conducting mass surveillance.

The DHS said that its aircraft were involved with securing the nation’s borders, as well as targeting drug smuggling and human trafficking, and may also be used to support investigations by the FBI and other law enforcement agencies. The FBI said that its planes are only used to target suspects in specific investigations of serious crimes, pointing to a statement issued in June 2015, after reporters and lawmakers started asking questions about FBI surveillance flights.

“It should come as no surprise that the FBI uses planes to follow terrorists, spies, and serious criminals,” said FBI Deputy Director Mark Giuliano, in that statement. “We have an obligation to follow those people who want to hurt our country and its citizens, and we will continue to do so.”

I’m not surprised the FBI follows terrorists, spies, and serious criminals.

What’s problematic is that the FBI follows all of us and then, after the fact, picks out alleged terrorists, spies and serious criminals.

The FBI could just as easily select people on their way to a tryst with a government official’s wife, or to attend an AA meeting, or to attend an unpopular church.

Once collected, the resulting information is subject to any number of uses and abuses.

Aldhous and Seife report the flights drop 70% on the weekend so if you are up to mischief, plan around your weekends.

When writing about the inevitable surveillance excesses under President Trump, give credit to President Obama and his supporters, who built the surveillance state Trump inherited.

by Patrick Durusau at November 29, 2016 06:48 PM

Trump, Twitter and Bullying The Press

Jay Smooth tweeted yesterday:

Keep in mind the purpose of this clown show: the President-Elect of the United States is using twitter to single out & bully a journalist.

Attaching an image that contained tweets 5 through 8 from the following list:

  1. “Nobody should be allowed to burn the American flag – if they do, there must be consequences – perhaps loss of citizenship or year in jail!”
  2. “I thought that @CNN would get better after they failed so badly in their support of Hillary Clinton however, since election, they are worse!”
  3. “The Great State of Michigan was just certified as a Trump WIN giving all of our MAKE AMERICA GREAT AGAIN supporters another victory – 306!”
  4. “@CNN is so embarrassed by their total (100%) support of Hillary Clinton, and yet her loss in a landslide, that they don’t know what to do.”
  5. “@sdcritic: @HighonHillcrest @jeffzeleny @CNN There is NO QUESTION THAT #voterfraud did take place, and in favor of #CorruptHillary !”
  6. “@FiIibuster: @jeffzeleny Pathetic – you have no sufficient evidence that Donald Trump did not suffer from voter fraud, shame! Bad reporter.”
  7. ‘”@JoeBowman12: @jeffzeleny just another generic CNN part time wannabe journalist !” @CNN still doesn’t get it. They will never learn!’
  8. “@HighonHillcrest: @jeffzeleny what PROOF do u have DonaldTrump did not suffer from millions of FRAUD votes? Journalist? Do your job! @CNN”
  9. “Just met with General Petraeus–was very impressed!”
  10. “If Cuba is unwilling to make a better deal for the Cuban people, the Cuban/American people and the U.S. as a whole, I will terminate deal.”

Can Trump bully @jeffzeleny if Jeff and the press aren’t listening?

Jeff filters @realDonaldTrump excluding any tweets with @jeffzeleny and subscribes to a similar filter for all journalists twitter handles.

His feed from @realDonaldTrump now reads:

  1. “Nobody should be allowed to burn the American flag – if they do, there must be consequences – perhaps loss of citizenship or year in jail!”
  2. “I thought that @CNN would get better after they failed so badly in their support of Hillary Clinton however, since election, they are worse!”
  3. “The Great State of Michigan was just certified as a Trump WIN giving all of our MAKE AMERICA GREAT AGAIN supporters another victory – 306!”
  4. “@CNN is so embarrassed by their total (100%) support of Hillary Clinton, and yet her loss in a landslide, that they don’t know what to do.”
  5. “Just met with General Petraeus–was very impressed!”
  6. “If Cuba is unwilling to make a better deal for the Cuban people, the Cuban/American people and the U.S. as a whole, I will terminate deal.”

Trump’s tweets still contain enough material for a stand up routine by a comic or the front page of a news paper.

On the other hand, shareable user filters starve Trump (and other bullies) of the ability to be bullies.

Why isn’t Twitter doing something as dead simple as user filters than can be shared?

You would have to ask Twitter that question, I certainly don’t know.

by Patrick Durusau at November 29, 2016 02:23 PM

CIA Cartography [Comparison to other maps?]

CIA Cartography

From the webpage:

Tracing its roots to October 1941, CIA’s Cartography Center has a long, proud history of service to the Intelligence Community (IC) and continues to respond to a variety of finished intelligence map requirements. The mission of the Cartography Center is to provide a full range of maps, geographic analysis, and research in support of the Agency, the White House, senior policymakers, and the IC at large. Its chief objectives are to analyze geospatial information, extract intelligence-related geodata, and present the information visually in creative and effective ways for maximum understanding by intelligence consumers.

Since 1941, the Cartography Center maps have told the stories of post-WWII reconstruction, the Suez crisis, the Cuban Missile crisis, the Falklands War, and many other important events in history.

There you will find:

Cartography Tools 211 photos

Cartography Maps 1940s 22 photos

Cartography Maps 1950s 14 photos

Cartography Maps 1960s 16 photos

Cartography Maps 1970s 19 photos

Cartography Maps 1980s 12 photos

Cartography Maps 1990s 16 photos

Cartography Maps 2000s 16 photos

Cartography Maps 2010s 15 photos

The albums have this motto at the top:

CIA Cartography Center has been making vital contributions to our Nation’s security, providing policymakers with crucial insights that simply cannot be conveyed through words alone.

President-elect Trump is said to be gaining foreign intelligence from sources other than his national security briefings. Trump is ignoring daily intelligence briefings, relying on ‘a number of sources’ instead. That report is based on a Washington Post account, which puts its credibility somewhere between a conversation overhead in a laundry mat and a stump speech by a member of Congress.

Assuming Trump is gaining intelligence from other sources, just how good are other sources of intelligence?

This release of maps by the CIA, some 160 maps spread from the 1940’s to the 2010’s, provides one axis for evaluating CIA intelligence versus what was commonly known at the time.

As a starting point, may I suggest: Image matching for historical maps comparison by C. Balletti and F. Guerrae, Perimetron, Vol. 4, No. 3, 2009 [180-186] | ISSN 1790-3769?


In cartographic heritage we suddenly find maps of the same mapmaker and of the same area, published in different years, or new editions due to integration of cartographic, such us in national cartographic series. These maps have the same projective system and the same cut, but they present very small differences. The manual comparison can be very difficult and with uncertain results, because it’s easy to leave some particulars out. It is necessary to find an automatic procedure to compare these maps and a solution can be given by digital maps comparison.

In the last years our experience in cartographic data processing was opted for find new tools for digital comparison and today solution is given by a new software, ACM (Automatic Correlation Map), which finds areas that are candidate to contain differences between two maps. ACM is based on image matching, a key component in almost any image analysis process.

Interesting paper but it presupposes a closeness of the maps that is likely to be missing when comparing CIA maps to other maps of the same places and time period.

I am in the process of locating other tools for map comparison.

Any favorites you would like to suggest?

by Patrick Durusau at November 29, 2016 03:15 AM

November 28, 2016

Patrick Durusau

False News: Trump and the Emoluments Clause

Numerous false news accounts are circulating about president-elect Trump and the Emoluments Clause.

The story line is that Trump must divest himself of numerous businesses to avoid violating the “Emoluments Clause” of the U.S. Constitution. But when you read the Emoluments Clause:

Clause 8. No Title of Nobility shall be granted by the United States: And no Person holding any Office of Profit or Trust under them, shall, without the Consent of the Congress accept of any present, Emolument, Office, or Title, of any kind whatever, from any King, Prince, or foreign State.

that conclusion is far from clear.

Why would it say: “…without the Consent of Congress….”

That question was answered in 1871 and sheds light on the issue of today:

In 1871 the Attorney General of the United States ruled that: “A minister of the United States abroad is not prohibited by the Constitution from rendering a friendly service to a foreign power, even that of negotiating a treaty for it, provided he does not become an officer of that power . . . but the acceptance of a formal commission, as minister plenipotentiary, creates an official relation between the individual thus commissioned and the government which in this way accredits him as its representative,” which is prohibited by this clause of the Constitution. 2013

ftnt: 2013 13 Ops. Atty. Gen. 538 (1871).

All of that is from: Constitution Annotated | | Library of Congress, in particular:

If you read the Emoluments Clause to prohibit Trump from representing another government, unless Congress consents, it makes sense as written.

Those falsely claiming that Trump must divest himself of his business interests and/or put them in a blind trust under the Emoluments Clause, Lawrence Tribe comes to mind, are thinking of a tradition of presidents using blind trusts.

But tradition doesn’t amend the Constitution.

Any story saying that the Emoluments Clause compels president-elect Trump to either divest himself of assets and/or use a blind trust are false.

PS: I have admired Prof. Lawrence Tribe’s work for years and am saddened that he is willing to sully his reputation in this way.

by Patrick Durusau at November 28, 2016 03:07 AM

November 27, 2016

Patrick Durusau

Ulysses, Joyce and Stanford CoreNLP

Introduction to memory and time usage

From the webpage:

People not infrequently complain that Stanford CoreNLP is slow or takes a ton of memory. In some configurations this is true. In other configurations, this is not true. This section tries to help you understand what you can or can’t do about speed and memory usage. The advice applies regardless of whether you are running CoreNLP from the command-line, from the Java API, from the web service, or from other languages. We show command-line examples here, but the principles are true of all ways of invoking CoreNLP. You will just need to pass in the appropriate properties in different ways. For these examples we will work with chapter 13 of Ulysses by James Joyce. You can download it if you want to follow along.

You have to appreciate the use of a non-trivial text for advice on speed and memory usage of CoreNLP.

How does your text stack up against Chapter 13 of Ulysses?

I’m supposed to be reading Ulysses long distance with a friend. I’m afraid we have both fallen behind. Perhaps this will encourage me to have another go at it.

What favorite or “should read” text would you use to practice with CoreNLP?


by Patrick Durusau at November 27, 2016 01:37 AM

November 26, 2016

Patrick Durusau

Programming has Ethical Consequences?

Has anyone tracked down the blinding flash that programming has ethical consequences?

Programmers are charged to point out ethical dimensions and issues not noticed by muggles.

This may come as a surprise but programmers in the broader sense have been aware of ethical dimensions to programming for decades.

Perhaps the best known example of a road to Damascus type event is the Trinity atomic bomb test in New Mexico. Oppenheimer recalling a line from the Bhagavad Gita:

“Now I am become Death, the destroyer of worlds.”

To say nothing of the programmers who labored for years to guarantee world wide delivery of nuclear warheads in 30 minutes or less.

But it isn’t necessary to invoke a nuclear Armageddon to find ethical issues that have faced programmers prior to the current ethics frenzy.

Any guesses as to how red line maps were created?

Do you think “red line” maps just sprang up on their own? Or was someone collecting, collating and analyzing the data, much as we would do now but more slowly?

Every act of collecting, collating and analyzing data, now with computers, can and probably does have ethical dimensions and issues.

Programmers can and should raise ethical issues, especially when they may be obscured or clouded by programming techniques or practices.

However, programmers announcing ethical issues to their less fortunate colleagues isn’t likely to lead to a fruitful discussion.

by Patrick Durusau at November 26, 2016 01:38 AM

November 24, 2016

Patrick Durusau

China Gets A Facebook Filter, But Not You

Facebook ‘quietly developing censorship tool’ for China by Bill Camarda.

From the post:

That’s one take on the events that might have led to today’s New York Times expose: it seems Facebook has tasked its development teams with “quietly develop[ing] software to suppress posts from appearing in people’s news feeds in specific geographic areas”.

As “current and former Facebook employees” told the Times, Facebook wouldn’t do the suppression themselves, nor need to. Rather:

It would offer the software to enable a third party – in this case, most likely a partner Chinese company – to monitor popular stories and topics that bubble up as users share them across the social network… Facebook’s partner would then have full control to decide whether those posts should show up in users’ feeds.

This is a step beyond the censorship Facebook has already agreed to perform on behalf of governments such as Turkey, Russia and Pakistan. In those cases, Facebook agreed to remove posts that had already “gone live”. If this software were in use, offending posts could be halted before they ever appeared in a local user’s news feed.

You can’t filter your own Facebook timeline or share your filter with other Facebook users, but the Chinese government can filter the timelines of 721,000,000+ internet users?

My proposal for Facebook filters would generate income for Facebook, filter writers and enable the 3,600,000,000+ internet users around the world to filter their own content.

All of Zuckerberg’s ideas:

Stronger detection. The most important thing we can do is improve our ability to classify misinformation. This means better technical systems to detect what people will flag as false before they do it themselves.

Easy reporting. Making it much easier for people to report stories as fake will help us catch more misinformation faster.

Third party verification. There are many respected fact checking organizations and, while we have reached out to some, we plan to learn from many more.

Warnings. We are exploring labeling stories that have been flagged as false by third parties or our community, and showing warnings when people read or share them.

Related articles quality. We are raising the bar for stories that appear in related articles under links in News Feed.

Disrupting fake news economics. A lot of misinformation is driven by financially motivated spam. We’re looking into disrupting the economics with ads policies like the one we announced earlier this week, and better ad farm detection.

Listening. We will continue to work with journalists and others in the news industry to get their input, in particular, to better understand their fact checking systems and learn from them.

Enthrone Zuckerman as Censor of the Internet.

His blinding lust to be Censor of the Internet*, is responsible for Zuckerman passing up $millions if not $billions in filtering revenue.

Facebook shareholders should question this loss of revenue at every opportunity.

* Zuckerberg’s “lust” to be “Censor of the Internet” is an inference based on the Facebook centered nature of his “ideas” for dealing with “fake news.” Unpaid censorship instead of profiting from user-centered filtering is a sign of poor judgment and/or madness.

by Patrick Durusau at November 24, 2016 11:11 PM

Fake News Is Not the Only Problem

Fake News Is Not the Only Problem by Gilad Lotan.

From the post:

There have been so many conversations on the impact of fake news on the recent US elections. An already polarized public is pushed further apart by stories that affirm beliefs or attack the other side. Yes. Fake news is a serious problem that should be addressed. But by focusing solely on that issue, we are missing the larger, more harmful phenomenon of misleading, biased propaganda.

It’s not only fringe publications. Think for a moment about the recent “Hamilton”-Pence showdown. What actually happened there? How disrespectful was the cast towards Mike Pence? Was he truly being “Booed Like Crazy” as the Huffington Post suggests? The short video embedded in that piece makes it seem like it. But this video on ABC suggests otherwise. “There were some cheers and some boos,” says Pence himself.

In an era of post-truth politics, driven by the 24-hour news cycle, diminishing trust in institutions, rich visual media, and the ubiquity and velocity of social networked spaces, how do we identify information that is tinted — information that is incomplete, that may help affirm our existing beliefs or support someone’s agenda, or that may be manipulative — effectively driving a form of propaganda?

Biased information — misleading in nature, typically used to promote or publicize a particular political cause or point of view — is a much more prevalent problem than fake news. It’s a problem that doesn’t exist only within Facebook but across social networks and other information-rich services (Google, YouTube, etc.).

A compelling piece of work but I disagree that biased information “….is a much more prevalent problem than fake news.

I don’t disagree with Lotan’s “facts.” I would go further and say all information is “biased,” from one viewpoint or another.

Collecting, selecting and editing information are done to attract readers by biased individuals for delivery to biased audiences. Biased audiences who are driving the production of content which they find agreeable.

Non-news example: How long would a classical music record label survive insisting its purchasers enjoy rap music?

At least if they were attempting to use a classical music mailing list for their records?

To blame “news/opinion” writers for bias is akin to shooting the messenger.

A messenger who is delivering the content readers requested.

Take Lotan’s example of providing more “context” for a story drawn from the Middle East:

A more recent example from the Middle East is that of Ahmed Manasra, a 13-year old Palestinian-Israeli boy who stabbed a 13-year old Israeli Jew in Jerusalem last Fall. A video [warning: graphic content] that was posted to a public Facebook page shows Mansara wounded, bleeding, and being cursed at by an Israeli. It was viewed over 2.5M times with the following caption:

Israeli Zionists curse a dying Palestinian child as Israeli Police watch…. His name was Ahmad Manasra and his last moments were documented in this video.

But neither the caption nor the video itself presents the full context. Just before Manasra was shot, he stabbed a few passersby, as well as a 13-year old Israeli Jew. Later, he was taken to a hospital.

Lotan fails to mention Ahmad Manasra’s actions were in the context of a decades old, systematic campaign by the Israeli government (not the Israeli people) to drive Palestinians from illegally occupied territory. A campaign in which thousands of Palestinians have died, homes and olive groves have been destroyed, etc.

Bias? Context? Your call.

Whichever way you classify my suggested “additional” context for the story of Ahmad Manasra, will be considered needed correction by some and bias by others.

In his conclusion, Lotan touches every so briefly on the issue upper most in my mind when discussion “fake” or “biased” content:

There are other models of automated filtering and downgrading for limiting the spread of misleading information (the Facebook News Feed already does plenty of filtering and nudging). But again, who decides what’s in or out, who governs? And who gets to test the potential bias of such an algorithmic system?

In a nutshell: who governs?

Despite unquestioned existence of “false,” “fake,” “biased,” “misleading,” information, “who governs?,” has only one acceptable answer:

No one.

Enabling readers to discover, if they wish, alternative, or in the view of some, more complete or contextual accounts, great! We have the beginnings of technology to do so.

A story could be labeled “false,” “fake,” by NPR and if you subscribe to NPR labeling, that appears in your browser. Perhaps I subscribe to Lady GaGa labeling and it has no opinion on that story and unfortunate subscribers to National Review labeling see a large green $$$ or whatever it is they use to show approval.

I fear censors far more than any form or degree of “false,” “fake,” “biased,” “misleading,” information.

You should too.

by Patrick Durusau at November 24, 2016 10:16 PM

Learning R programming by reading books: A book list

Learning R programming by reading books: A book list by Liang-Cheng Zhang.

From the post:

Despite R’s popularity, it is still very daunting to learn R as R has no click-and-point feature like SPSS and learning R usually takes lots of time. No worries! As self-R learner like us, we constantly receive the requests about how to learn R. Besides hiring someone to teach you or paying tuition fees for online courses, our suggestion is that you can also pick up some books that fit your current R programming level. Therefore, in this post, we would like to share some good books that teach you how to learn programming in R based on three levels: elementary, intermediate, and advanced levels. Each level focuses on one task so you will know whether these books fit your needs. While the following books do not necessarily focus on the task we define, you should focus the task when you reading these books so you are not lost in contexts.

Books and reading form the core of my most basic prejudice: Literacy is the doorway to unlimited universes.

A prejudice so strong that I have to work hard at realizing non-literates live in and sense worlds not open to literates. Not less complex, not poorer, just different.

But book lists in particular appeal to that prejudice and since my blog is read by literates, I’m indulging that prejudice now.

I do have a title to add to the list: Practical Data Science with R by Nina Zumel and John Mount.

Judging from the other titles listed, Practical Data Science with R falls in the intermediate range. Should not be your first R book but certainly high on the list for your second R book.

Avoid the rush! Start working on your Amazon wish list today! ;-)

by Patrick Durusau at November 24, 2016 04:10 PM

NPR Posts “Fake News” Criticism of “Fake News”

There may be others but this is the first “fake news” story that I have seen that is critical of “fake news.” At least by NPR.

Students Have ‘Dismaying’ Inability To Tell Fake News From Real, Study Finds by Camila Domoske

Domoske does a credible summary of the contents of the executive summary, for which only one paragraph is necessary to opt out of presenting this story on NPR:

When we began our work we had little sense of the depth of the problem. We even found ourselves rejecting ideas for tasks because we thought they would be too easy. Our first round of piloting shocked us into reality. Many assume that because young people are fluent in social media they are equally savvy about what they find there. Our work shows the opposite. We hope to produce a series of high-quality web videos to showcase the depth of the problem revealed by students’ performance on our tasks and demonstrate the link between digital literacy and citizenship. By drawing attention to this connection, a series of videos could help to mobilize educators, policymakers, and others to address this threat to democracy.

Comparing the NPR coverage and the executive summary, the article reflects the steps taken by the study, but never questions its conclusion that an inability to assess online information is indeed a “threat to democracy.”

To support that conclusion, which earned this story a spot on NPR, the researchers would need historical data on how well or poorly, students assessed sources of information at other time periods in American history, along with an assessment of “democracy” at the time, along with the demonstration of a causal relationship between the two.

But as you can see from the NPR article, Domoske fails to ask the most rudimentary questions about this study, such as:

“Is there a relationship between democracy and the ability to evaluate sources of information?”

Or, “What historical evidence demonstrates a relationship between democracy and the ability to evaluate sources of information?”

Utter silence on the part of Domoske.

The real headline for a follow-up on this story should be:

NPR Reporter Unable To Distinguish Credible Research From Headline Driven Reports.

I’m going to be listening for that report.

Are you?

by Patrick Durusau at November 24, 2016 02:15 AM

“sexy ads or links” – Facebook can’t catch a break

The Fact Checker’s guide for detecting fake news by Glenn Kessler.

Glenn’s post isn’t an outright attack on Facebook, the standard fare at the New York Times since Donald Trump’s election. How long the Times is going to sulk over its rejection by most Americans isn’t clear.

Glenn descends into the sulking with the Times when he writes:

Look at the ads

A profusion of pop-up ads or other advertising indicates you should handle the story with care. Another sign is a bunch of sexy ads or links, designed to be clicked — “Celebs who did Porn Movies” or “Naughty Walmart Shoppers Who have no Shame at All” — which you generally do not find on legitimate news sites.

The examples are nearly Facebook ad headlines and Glenn knows that.

Rather than saying “Facebook,” Glenn wants you to conclude that “on your own.” (An old manipulation/propaganda technique.)

Glenn’s “read the article closely” was #4, coming in after #1, “determine whether the article is from a legitimate website,” #2, “Check the ‘contact us’ page,” or #3, “examine the byline of the reporter and see whether it makes sense.”

How To Recognize A Fake News Story has “read past the headline” first.

Even “legitimate websites” make mistakes, omit facts, and sometimes are mis-led by governments and others.

Read content critically, even content about spotting “fake news.”

by Patrick Durusau at November 24, 2016 01:04 AM

November 23, 2016

Patrick Durusau

1,198 Free High Resolution Maps of U.S. National Parks

1,198 Free High Resolution Maps of U.S. National Parks

From the post:

I cannot, and do not wish to, imagine the U.S. without its National Park system. The sale and/or despoliation of this more than 80 million acres of mountain, forest, stream, ocean, geyser, cavern, canyon, and every other natural formation North America contains would diminish the country immeasurably. “National parks,” wrote novelist Wallace Stegner, “are the best idea we ever had. Absolutely American, absolutely democratic, they reflect us at our best rather than our worst.”

Stegner’s quote—which gave Ken Burns’ National Parks documentary its subtitle–can sound overoptimistic when we study the parks’ history. Though not officially designated until the 20th century, the idea stretches back to 1851, when a battalion, intent on finding and destroying an Indian village, also found Yosemite. Named for what the soldiers thought was the tribe they killed and burned, the word actually translates as “they are killers.”

Westward expansion and the annexation of Hawaii have left us many sobering stories like that of Yosemite’s “discovery.” And during their development in the early- to mid-20th century, the parks often required the mass displacement of people, many of whom had lived on the land for decades—or centuries. But despite the bloody history, the creation of these sanctuaries have preserved the country’s embarrassment of natural beauty and irreplaceable biodiversity for a century now. (The National Park Service celebrated its 100th anniversary just this past August.)

The National Park Service and its allies have acted as bulwarks against privateers who would turn places like Yosemite into prohibitively expensive resorts, and perhaps fell the ancient Redwood National forests or blast away the Smokey Mountains. Instead, the parks remain “absolutely democratic,” open to all Americans and international visitors, the pride of conservationists, scientists, hikers, bird watchers, and nature-lovers of all kinds. Given the sprawling, idealistic, and violent history of the National Parks, it may be fair to say that these natural preserves reflect the country at both its worst and its best. And in that sense, they are indeed “absolutely American.”

Links to numerous resources, including National Parks Maps. (Home of 1,198 free high resolution maps of U.S. national parks.)

The national parks of the United States were born in violence and disenfranchisement of the powerless. It is beyond our power to atone for those excesses and injuries done in the past.

It is our task, to preserve those parks as monuments to our violence against the powerless and as natural treasures for all humanity.

by Patrick Durusau at November 23, 2016 10:16 PM

Taping Donald, Melania, Mike and others

Just in time for a new adminstration, Great. Now even your headphones can spy on you by Andy Greenberg.

From the post:

CAUTIOUS COMPUTER USERS put a piece of tape over their webcam. Truly paranoid ones worry about their devices’ microphones—some even crack open their computers and phones to disable or remove those audio components so they can’t be hijacked by hackers. Now one group of Israeli researchers has taken that game of spy-versus-spy paranoia a step further, with malware that converts your headphones into makeshift microphones that can slyly record your conversations.

Researchers at Israel’s Ben Gurion University have created a piece of proof-of-concept code they call “Speake(a)r,” designed to demonstrate how determined hackers could find a way to surreptitiously hijack a computer to record audio even when the device’s microphones have been entirely removed or disabled. The experimental malware instead repurposes the speakers in earbuds or headphones to use them as microphones, converting the vibrations in air into electromagnetic signals to clearly capture audio from across a room.

“People don’t think about this privacy vulnerability,” says Mordechai Guri, the research lead of Ben Gurion’s Cyber Security Research Labs. “Even if you remove your computer’s microphone, if you use headphones you can be recorded.”

But the Ben Gurion researchers took that hack a step further. Their malware uses a little-known feature of RealTek audio codec chips to silently “retask” the computer’s output channel as an input channel, allowing the malware to record audio even when the headphones remain connected into an output-only jack and don’t even have a microphone channel on their plug. The researchers say the RealTek chips are so common that the attack works on practically any desktop computer, whether it runs Windows or MacOS, and most laptops, too. RealTek didn’t immediately respond to WIRED’s request for comment on the Ben Gurion researchers’ work. “This is the real vulnerability,” says Guri. “It’s what makes almost every computer today vulnerable to this type of attack.”

(emphasis in original)

Wired doesn’t give up any more details but that should be enough to get you started.

You must search for RealTek audio codec datasheets. RealTek wants a signed NDA from a development partner before you can access the datasheets.

Among numerous others, I know for a fact that datasheets on ALC655, ALC662, ALC888, ALC1150, and ALC5631Q are freely available online.

You will have to replicate the hack but then:

  1. Choose your targets for taping
  2. Obtain their TV/music preferences from Amazon, etc.
  3. License new content (would not want to upset the RIAA) for web streaming
  4. Offer your target the “latest” TV/music by (name) for free 30 day trial

For the nosy non-hacker, expect to see “hacked” earphones for sale on the Dark Web.

Perhaps even in time for holiday shopping!

Warning:Hacking or buying hacked headphones is a violation of any number of federal, state and local laws, depending on your jurisdiction.

PS: I am curious if the mic in cellphones is subject to a similar hack.

Perhaps this is the dawning of the age of transparency. ;-)

by Patrick Durusau at November 23, 2016 09:23 PM

Comic Book Security

The Amazing Mysteries of the Gutter: Drawing Inferences Between Panels in Comic Book Narratives by Mohit Iyyer, et al.


Visual narrative is often a combination of explicit information and judicious omissions, relying on the viewer to supply missing details. In comics, most movements in time and space are hidden in the “gutters” between panels. To follow the story, readers logically connect panels together by inferring unseen actions through a process called “closure”. While computers can now describe the content of natural images, in this paper we examine whether they can understand the closure-driven narratives conveyed by stylized artwork and dialogue in comic book panels. We collect a dataset, COMICS, that consists of over 1.2 million panels (120 GB) paired with automatic textbox transcriptions. An in-depth analysis of COMICS demonstrates that neither text nor image alone can tell a comic book story, so a computer must understand both modalities to keep up with the plot. We introduce three cloze-style tasks that ask models to predict narrative and character-centric aspects of a panel given n preceding panels as context. Various deep neural architectures underperform human baselines on these tasks, suggesting that COMICS contains fundamental challenges for both vision and language.

From the introduction:


Comics are fragmented scenes forged into full-fledged stories by the imagination of their readers. A comics creator can condense anything from a centuries-long intergalactic war to an ordinary family dinner into a single panel. But it is what the creator hides from their pages that makes comics truly interesting, the unspoken conversations and unseen actions that lurk in the spaces (or gutters) between adjacent panels. For example, the dialogue in Figure 1 suggests that between the second and third panels, Gilda commands her snakes to chase after a frightened Michael in some sort of strange cult initiation. Through a process called closure [40], which involves (1) understanding individual panels and (2) making connective inferences across panels, readers form coherent storylines from seemingly disparate panels such as these. In this paper, we study whether computers can do the same by collecting a dataset of comic books (COMICS) and designing several tasks that require closure to solve.

(emphasis in original)

Comic book security: A method for defeating worldwide data slurping and automated analysis.

The authors find that human results easily exceed automated analysis, raising the question of the use of a mixture of text and images as a means to evade widespread data sweeps.

Security based on a lack of human eyes to review content is chancy but depending upon your security needs, it may be sufficient.

For example, a cartoon in a local newspaper that designates a mission target and time, only needs to be secure from the time of its publication until the mission has finished. That it is discovered days, weeks or even months later, doesn’t impact the operational security of the mission.

The data set of cartoons is available at:

Guaranteed, algorithmic security is great, but hiding in gaps of computational ability may be just as effective.


by Patrick Durusau at November 23, 2016 08:21 PM

How To Recognize A Fake News Story

How To Recognize A Fake News Story by Nick Robin-Searly.

A handy “fake news” graphic:


Even if Facebook, Twitter, etc., eventually take up my idea of shareable content filters, you should evaluate all stories (including mine) with the steps in this graphic.

Short form: Don’t be a passive consumer of content. Engage with content. Question its perspective, what was left unsaid, sources that were or were not relied upon, etc.

Your ignorance is your own and no one can fix that other than you.

by Patrick Durusau at November 23, 2016 07:03 PM

The 10 Commandments of Exfiltration

‘Perfect’ Data Exfiltration Demonstrated by Larry Loeb.

From the post:

The 10 Commandments of Exfiltration

Following the experiment, the researchers came up with a technique of exfiltration based on their newly established 10 commandments. According to the SafeBreach presentation, these commandments are:

  1. No security through obscurity should be used.
  2. Only Web browsing and derived traffic is allowed.
  3. Anything that may theoretically be perceived as passing information is forbidden.
  4. Scrutinize every packet during comprehensive network monitoring.
  5. Assume TLS/SSL termination at the enterprise level.
  6. Assume the receiving party has no restrictions.
  7. Assume no nation-state or third-party site monitoring.
  8. Enable time synchronization between the communicating parties.
  9. There’s bonus points for methods that can be implemented manually from the sender side.
  10. Active disruption by the enterprise is always possible.

The technique discussed is criticized as “low bandwidth” but then I think, how much bandwidth does it take to transmit an admin login and password?

Definitely worth a slow read.

Other contenders for similar 10 commandments of exflitration?

As a trivial example, consider a sender who leaves work every day at the same time through a double door. If they exit to their right, it is a 0 and if they exit to their left, it is a 1. Perhaps only on set days of the week or month.

Very low bandwidth but as I said, for admin login/password, it would be sufficient.

How imaginative is your exflitration security?

by Patrick Durusau at November 23, 2016 01:34 AM

November 22, 2016

Patrick Durusau

Egyptological Museum Search

Egyptological Museum Search

From the post:

The Egyptological museum search is a PHP tool aimed to facilitate locating the descriptions and images of ancient Egyptian objects in online catalogues of major museums. Online catalogues (ranging from selections of highlights to complete digital inventories) are now offered by almost all major museums holding ancient Egyptian items and have become indispensable in research work. Yet the variety of web interfaces and of search rules may overstrain any person performing many searches in different online catalogues.

Egyptological museum search was made to provide a single search point for finding objects by their inventory numbers in major collections of Egyptian antiquities that have online catalogues. It tries to convert user input into search queries recognised by museums’ websites. (Thus, for example, stela Geneva D 50 is searched as “D 0050,” statue Vienna ÄS 5046 is searched as “AE_INV_5046,” and coffin Turin Suppl. 5217 is searched as “S. 05217.”) The following online catalogues are supported:

The search interface uses a short list of aliases for museums.

Once you see/use the interface proper, here, I hope you are interested in volunteering to improve it.

by Patrick Durusau at November 22, 2016 10:06 PM

Manipulate XML Text Data Using XQuery String Functions

Manipulate XML Text Data Using XQuery String Functions by Adam Steffanick.

Adam’s notes from the Vanderbilt University XQuery Working Group:

We evaluated and manipulated text data (i.e., strings) within Extensible Markup Language (XML) using string functions in XQuery, an XML query language, and BaseX, an XML database engine and XQuery processor. This tutorial covers the basics of how to use XQuery string functions and manipulate text data with BaseX.

We used a limited dataset of English words as text data to evaluate and manipulate, and I’ve created a GitHub gist of XML input and XQuery code for use with this tutorial.

A quick run-through of basic XQuery string functions that takes you up to writing your own XQuery function.

While we wait for more reports from the Vanderbilt University XQuery Working Group, have you considered using XQuery to impose different views on a single text document?

For example, some Bible translations follow the “traditional” chapter and verse divisions (a very late addition), while others use paragraph level organization and largely ignore the tradition verses.

Creating a view of a single source text as either one or both should not involve permanent changes to a source file in XML. Or at least not the original source file.

If for processing purposes there was a need for a static file rendering one way or the other, that’s doable but should be separate from the original XML file.

by Patrick Durusau at November 22, 2016 08:31 PM

Geek Jeopardy – Display Random Man Page

While writing up Julia Evans’ Things to learn about Linux, I thought it would be cool to display random man pages.

Which resulted in this one-liner in an executable file (man-random, invoke ./man-random):

man $(ls /usr/share/man/man* | shuf -n1 | cut -d. -f1)

As written, it displays a random page from the directories man1 – man8.

If you replace /man* with /man1/, you will only get results for man1 (the usual default).

All of which made me think of Geek Jeopardy!

Can you name this commands from their first paragraph descriptions? (omit their names)

  • remove sections from each line of files
  • pattern scanning and processing language
  • stream editor for filtering and transforming text
  • generate random permutations
  • filter reverse line feeds from input
  • dump files in octal and other formats

Looks easy now, but after a few glasses of holiday cheer? With spectators? Ready to try another man page section?



  • cut: remove sections from each line of files
  • awk: pattern scanning and processing language
  • sed: stream editor for filtering and transforming text
  • shuf: generate random permutations
  • col: filter reverse line feeds from input
  • od: dump files in octal and other formats

PS: I changed the wildcard in the fourth suggested solution from “?” to “*” to arrive at my solution. (Ubuntu 14.04)

by Patrick Durusau at November 22, 2016 07:52 PM

Things to learn about Linux

Things to learn about Linux

From the post:

I asked on Twitter today what Linux things they would like to know more about. I thought the replies were really cool so here’s a list (many of them could be discussed on any Unixy OS, some of them are Linux-specific)

I count forty-seven (47) entries on Julia’s list, which should keep you busy through any holiday!


by Patrick Durusau at November 22, 2016 07:28 PM

The five-step fact-check (Africa Check)

The five-step fact-check from AfricaCheck

From the post:

Print our useful flow-chart and stick it up in a place where you can quickly refer to it when a deadline is pressing.


Click here to download the PDF for printing.

A great fact checking guide for reporters but useful insight for readers as well.

What’s missing from a story you are reading right now?

AfricaCheck offers to fact check claims about Africa tweeted with: #AfricaCheckIt.

There’s a useful service to the news community!

A quick example, eNCA (South African news site) claimed Zimbabwe’s President Robert Mugabe announced his retirement.

Africa Check responded with Mugabe’s original words plus translation.

I don’t read Mugabe as announcing his retirement but see for yourself.

by Patrick Durusau at November 22, 2016 06:38 PM

Advancing exploitation: a scriptless 0day exploit against Linux desktops

Advancing exploitation: a scriptless 0day exploit against Linux desktops by Chris Evans.

From the post:

A powerful heap corruption vulnerability exists in the gstreamer decoder for the FLIC file format. Presented here is an 0day exploit for this vulnerability.

This decoder is generally present in the default install of modern Linux desktops, including Ubuntu 16.04 and Fedora 24. Gstreamer classifies its decoders as “good”, “bad” or “ugly”. Despite being quite buggy, and not being a format at all necessary on a modern desktop, the FLIC decoder is classified as “good”, almost guaranteeing its presence in default Linux installs.

Thanks to solid ASLR / DEP protections on the (some) modern 64-bit Linux installs, and some other challenges, this vulnerability is a real beast to exploit.

Most modern exploits defeat protections such as ASLR and DEP by using some form of scripting to manipulate the environment and make dynamic decisions and calculations to move the exploit forward. In a browser, that script is JavaScript (or ActionScript etc.) When attacking a kernel from userspace, the “script” is the userspace program. When attacking a TCP stack remotely, the “script” is the program running on the attacker’s computer. In my previous full gstreamer exploit against the NSF decoder, the script was an embedded 6502 machine code program.

But in order to attack the FLIC decoder, there simply isn’t any scripting opportunity. The attacker gets, once, to submit a bunch of scriptless bytes into the decoder, and try and gain code execution without further interaction…

… and good luck with that! Welcome to the world of scriptless exploitation in an ASLR environment. Let’s give it our best shot.

Above my head, at the moment, but I post it as a test for hackers who want to test their understanding/development of exploits.

BTW, some wag, I didn’t bother to see which one, complained Chris’ post is “irresponsible disclosure.”

Sure, the CIA, FBI, NSA and their counter-parts in other governments, plus their cybersecurity contractors should have sole access to such exploits. Ditto for the projects concerned. (NOT!)

“Responsible disclosure” is just another name for unilateral disarmament, on behalf of all of us.

Open and public discussion is much better.

Besides, a hack of Ubuntu 16.04 won’t be relevant at most government installations for years.

Plenty of time for a patched release. ;-)

by Patrick Durusau at November 22, 2016 06:16 PM

Practical Palaeography: Recreating the Exeter Book in a Modern Day ‘Scriptorium’

Practical Palaeography: Recreating the Exeter Book in a Modern Day ‘Scriptorium’

From the post:

Dr Johanna Green is a lecturer in Book History and Digital Humanities at the University of Glasgow. Her PhD (English Language, University of Glasgow 2012) focused on a palaeographical study of the textual division and subordination of the Exeter Book manuscript. Here, she tells us about the first of two sessions she led for the Society of Northumbrian Scribes, a group of calligraphers based in North East England, bringing palaeographic research and modern-day calligraphy together for the public.
(emphasis in original)

Not phrased in subject identity language, but concerns familiar to the topic map community are not far away:

My own research centres on the scribal hand of the manuscript, specifically the ways in which the poems are divided and subdivided from one another and the decorative designs used for these litterae notabiliores throughout. For much of my research, I have spent considerable time (perhaps more than I am willing to admit) wondering where one ought to draw the line with palaeography. When do the details become so tiny to no longer be of any significance? When are they just important enough to mean something significant for our understanding of how the manuscript was created and arranged? How far am I willing to argue that these tiny features have significant impact? Is, for example, this littera notabilior Đ on f. 115v (Judgement Day I, left) different enough in a significant way to this H on f.97v, (The Partridge, bottom right), and in turn are both of these litterae notabiliores performing a different function than the H on f.98r (Soul and Body II, far right)?[5]
(emphasis in original, footnote omitted)

When Dr. Green says:

…When do the details become so tiny to no longer be of any significance?…

I would say: When do the subjects (details) become so tiny we want to pass over them in silence? That is they could be but are not represented in a topic map.

Green ends her speculation, to a degree, by enlisting scribes to re-create the manuscript of interest under her observation.

I’ll leave her conclusions for her post but consider a secondary finding:

The experience also made me realise something else: I had learned much by watching them write and talking to them during the process, but I had also learned much by trying to produce the hand myself. Rather than return to Glasgow and teach my undergraduates the finer details of the script purely through verbal or written description, perhaps providing space for my students to engage in the materials of manuscript production, to try out copying a script/exemplar for themselves would help increase their understanding of the process of writing and, in turn, deepen their knowledge of the constituent parts of a letter and their significance in palaeographic endeavour. This last is something I plan to include in future palaeography teaching.

Dr. Green’s concern over palaeographic detail illustrates two important points about topic maps:

  1. Potential subjects for a topic map are always unbounded.
  2. Different people “see” different subjects.

Which also account for my yawn when Microsoft drops the Microsoft Concept Graph of more than 5.4 million concepts.

…[M]ore than 5.4 million concepts[?]

Hell, Copleston’s History of Western Philosophy easily has more concepts.

But the Microsoft Concept Graph is more useful than a topic map of Copleston in your daily, shallow, social sea.

What subjects do you see and how would capturing them and their identities make a difference in your life (professional or otherwise)?

by Patrick Durusau at November 22, 2016 04:31 PM

PubMed comments & their continuing conversations

PubMed comments & their continuing conversations

From the post:

We have many options for communication. We can choose platforms that fit our style, approach, and time constraints. From pop culture to current events, information and opinions are shared and discussed across multiple channels. And scientific publications are no exception.

PubMed Commons was established to enable commenting in PubMed, the largest biomedical literature database. In the past year, commenters posted to more than 1,400 publications. Of those publications, 80% have a single comment today, and 12% have comments from multiple members. The conversation carries forward in other venues.

Sometimes comments pull in discussion from other locations or spark exchanges elsewhere.Here are a few examples where social media prompted PubMed Commons posts or continued the commentary on publications.

An encouraging review of examples of sane discussion through the use of comments.

Unlike the abandoning of comments by some media outlets, NPR for example, NPR Website To Get Rid Of Comments by Elizabeth Jensen.

My take away from Jensen’s account was that NPR likes its free speech, not so much interested in the free speech of others.

See also: Have Comment Sections on News Media Websites Failed?, for op-ed pieces at the New York Times from a variety of perspectives.

Perhaps comments on news sites are examples of casting pearls before swine? (Matthew 7:6)

by Patrick Durusau at November 22, 2016 01:40 AM

November 21, 2016

Patrick Durusau

Resources to Find the Data You Need, 2016 Edition

Resources to Find the Data You Need, 2016 Edition by Nathan Yau.

From the post:

Before you get started on any data-related project, you need data. I know. It sounds crazy, but it’s the truth. It can be frustrating to sleuth for the data you need, so here are some tips on finding it (the openly available variety) and some topic-specific resources to begin your travels.

This is an update to the guide I wrote in 2009, which as it turns out, is now mostly outdated. So, 2016. Here we go.

If you know Nathan Yau’s work, FlowingData, then you know this is “the” starting list for data.


by Patrick Durusau at November 21, 2016 10:25 PM

OPM Farce Continues – 2016 Inspector General Report

U.S. Office of Personnel Management – Office of the Inspector General – Office of Audits

The Office of Personnel Management hack was back in the old days when China was being blamed for every hack. There’s no credible evidence of that but the Chinese were blamed in any event.

The OMP hack illustrated the danger inherent in appointing campaign staff to run mission critical federal agencies. Just a sampling of the impressive depth of Archuleta’s incompetence, read Flash Audit on OPM Infrastructure Update Plan.

The executive summary of the current report offers little room for hope:

This audit report again communicates a material weakness related to OPM’s Security Assessment and Authorization (Authorization) program. In April 2015, the then Chief Information Officer issued a memorandum that granted an extension of the previous Authorizations for all systems whose Authorization had already expired, and for those scheduled to expire through September 2016. Although the moratorium on Authorizations has since been lifted, the effects of the April 2015 memorandum continue to have a significant negative impact on OPM. At the end of fiscal year (FY) 2016, the agency still had at least 18 major systems without a valid Authorization in place.

However, OPM did initiate an “Authorization Sprint” during FY 2016 in an effort to get all of the agency’s systems compliant with the Authorization requirements. We acknowledge that OPM is once again taking system Authorization seriously. We intend to perform a comprehensive audit of OPM’s Authorization process in early FY 2017.

This audit report also re-issues a significant deficiency related to OPM’s information security management structure. Although OPM has developed a security management structure that we believe can be effective, there has been an extremely high turnover rate of critical positions. The negative impact of these staffing issues is apparent in the results of our current FISMA audit work. There has been a significant regression in OPM’s compliance with FISMA requirements, as the agency failed to meet requirements that it had successfully met in prior years. We acknowledge that OPM has placed significant effort toward filling these positions, but simply having the staff does not guarantee that the team can effectively manage information security and keep OPM compliant with FISMA requirements. We will continue to closely monitor activity in this area throughout FY 2017.

It’s illegal but hacking the OPM remains easier than the NSA.

Hacking the NSA requires a job at Booz Allen and a USB drive.

by Patrick Durusau at November 21, 2016 09:59 PM

Preserving Ad Revenue With Filtering (Hate As Renewal Resource)

Facebook and Twitter haven’t implemented robust and shareable filters for their respective content streams for fear of disturbing their ad revenue streams.* The power to filter feared as the power to exclude ads.

Other possible explanations include: Drone employment, old/new friends hired to discuss censoring content; Hubris, wanting to decide what is “best” for others to see and read; NIH (not invented here), which explains silence concerning my proposals for shareable content filters; others?

* Lest I be accused of spreading “fake news,” my explanation for the lack of robust and shareable filters on content on Facebook and Twitter is based solely on my analysis of their behavior and not any inside leaks, etc.

I have a solution for fearing filters as interfering with ad revenue.

All Facebook posts and Twitter tweets, will be delivered with an additional Boolean field, ad, which defaults to true (empty field), meaning the content can be filtered. (following Clojure) When the field is false, that content cannot be filtered.

Filters being registered and shared via Facebook and Twitter, testing those filters for proper operation (and not applying them if they filter ad content) is purely an algorithmic process.

Users pay to post ad content, a step where the false flag can be entered, resulting in no more ad freeloaders being free from filters.

What’s my interest? I’m interested in the creation of commercial filters for aggregation, exclusion and creating a value-add product based on information streams. Moreover, ending futile and bigoted attempts at censorship seems like a worthwhile goal to me.

The revenue potential for filters is nearly unlimited.

The number of people who hate rivals the number who want to filter the content seen by others. An unrestrained Facebook/Twitter will attract more hate and “fake news,” which in turn will drive a great need for filters.

Not a virtuous cycle but certainly a profitable one. Think of hate and the desire to censor as renewable resources powering that cycle.

PS: I’m not an advocate for hate and censorship but they are both quite common. Marketing is based on consumers as you find them, not as you wish they were.

by Patrick Durusau at November 21, 2016 09:02 PM

MuckRock Needs Volunteers (9 states in particular)

MuckRock needs your help to keep filing in all 50 states by Beryl Lipton.

From the post:

Election time excitement got you feeling a little more patriotic than usual? Looking for a way to help but not sure you have the time? Well, MuckRock is looking for a few good people to do a big service requiring little effort: serve as our resident proxies.

A few states have put up barriers at their borders, limiting required disclosure and response to requests to only residents. One more thing added to the regular rigamarole of requesting public records, it’s huge block to comparative studies and useful, outside accountability.

This is where you come in.


We’re looking for volunteers in the ten states that can whip out their residency requirements whenever they get the chance:

  • Alabama
  • Arkansas
  • Georgia
  • Missouri
  • Montana.
  • New Hampshire
  • New Jersey
  • Tennessee
  • Virginia

As a MuckRock proxy requester, you’ll serve as the in-state request representative, allowing requests to be submitted in your name and enabling others to continue to demand accountability. In exchange, you’ll get your own Professional MuckRock account – 20 requests a month and all that comes with them – and the gratitude of the transparency community.

Interested in helping the cause? Let us know at, or via the from below.

Despite my view that government disclosures are previously undisclosed government lies, I have volunteered for this project.

Depending on where you reside, you should too and/or contribute to support MuckRock.

by Patrick Durusau at November 21, 2016 07:02 PM

November 20, 2016

Patrick Durusau

How to get started with Data Science using R

How to get started with Data Science using R by Karthik Bharadwaj.

From the post:

R being the lingua franca of data science and is one of the popular language choices to learn data science. Once the choice is made, often beginners find themselves lost in finding out the learning path and end up with a signboard as below.

In this blog post I would like to lay out a clear structural approach to learning R for data science. This will help you to quickly get started in your data science journey with R.

You won’t find anything you don’t already know but this is a great short post to pass onto others.

Point out R skills will help them expose and/or conceal government corruption.

by Patrick Durusau at November 20, 2016 10:40 PM

Refining The Dakota Access Pipeline Target List

I mentioned in Exploding the Dakota Access Pipeline Target List that while listing of the banks financing Dakota Access Pipeline is great, banks and other legal entities are owned, operated and act through people. People, who unlike abstract legal entities, are subject to the persuasion of other people.

Unfortunately, almost all discussions of #DAPL focus on the on-site brutality towards Native Americans and/or the corporations involved in the project.

The protesters deserve our support but resisting local pawns (read police) may change the route of the pipeline, but it won’t stop the pipeline.

To stop the Dakota Access Pipeline, there are only two options:

  1. Influence investors to abandon the project
  2. Make the project prohibitively expensive

In terms of #1, you have to strike through the corporate veil to reach the people who own and direct the affairs of the corporation.

“Piercing the corporate veil” is legal terminology but I mean it as in knowing the named and located individuals are making decisions for a corporation and the named and located individuals who are its owners.

A legal fiction, such as a corporation, cannot feel public pressure, distress, social ostracism, etc., all things that people are subject to suffering.

Even so, persuasion can only be brought to bear on named and located individuals.

News reports giving only corporate names and not individual owners/agents creates a boil of obfuscation.

A boil of obfuscation that needs lancing. Shall we?

To get us off on a common starting point, here are some resources I will be reviewing/using:

Corporate Research Project

The Corporate Research Project assists community, environmental and labor organizations in researching companies and industries. Our focus is on identifying information that can be used to advance corporate accountability campaigns. [Sponsors Dirt Diggers Digest]

Dirt Diggers Digest

chronicling corporate misbehavior (and how to research it) [blog]


LittleSis* is a free database of who-knows-who at the heights of business and government.

* opposite of Big Brother


The largest open database of companies in the world [115,419,017 companies]

Revealing the World of Private Companies by Sheila Coronel

Coronel’s blog post has numerous resources and links.

She also points out that the United States is a top secrecy destination:

A top secrecy jurisdiction is the United States, which doesn’t collect the names of shareholders of private companies and is unsurprisingly one of the most favored nations for hiding illicit wealth. (See, for example, this Reuters report on shell companies in Wyoming.) As Senator Carl Levin says, “It takes more information to obtain a driver’s license or open a U.S. bank account than it does to form a U.S. corporation.” Levin has introduced a bill that would end the formation of companies for unidentified persons, but that is unlikely to pass Congress.

If we picked one of the non-U.S. sponsors of the #DAPL, we might get lucky and hit a transparent or semi-transparent jurisdiction.

Let’s start with a semi-tough case, a U.S. corporation but a publicly traded one, Wells Fargo.

Where would you go next?

by Patrick Durusau at November 20, 2016 08:56 PM

How to get superior text processing in Python with Pynini

How to get superior text processing in Python with Pynini by Kyle Gorman and Richard Sproat.

From the post:

It’s hard to beat regular expressions for basic string processing. But for many problems, including some deceptively simple ones, we can get better performance with finite-state transducers (or FSTs). FSTs are simply state machines which, as the name suggests, have a finite number of states. But before we talk about all the things you can do with FSTs, from fast text annotation—with none of the catastrophic worst-case behavior of regular expressions—to simple natural language generation, or even speech recognition, let’s explore what a state machine is, what they have to do with regular expressions.

Reporters, researchers and others will face a 2017 where the rate of information has increased, along with noise from media spasms over the latest taut from president-elect Trump.

Robust text mining/filtering will your daily necessities, if they aren’t already.

Tagging text is the first example. Think about auto-generating graphs from emails with “to:,” “from:,” “date:,” and key terms in the email. Tagging the key terms is essential to that process.

Once tagged, you can slice and dice the text as more information is uncovered.


by Patrick Durusau at November 20, 2016 02:35 AM

Tracking Business Records Across Asia

Tracking Business Records Across Asia by GIJN staff.

From the post:

The paper trail has changed — money now moves digitally and business registries are databases — and this lets journalists do more than ever before in tracking people and companies across borders.

Backgrounding an individual or a company? Following an organized crime ring? The key to uncovering corruption is to “follow the money” — to discover who owns what, who gets which contract, and how business are linked to each other.

Resources on tracking corporate records in China, the Philippines and India!

While you are sharpening your tracking skills, don’t forget to support GIJN.

by Patrick Durusau at November 20, 2016 02:10 AM

November 19, 2016

Patrick Durusau

Python Data Science Handbook

Python Data Science Handbook (Github)

From the webpage:

Jupyter notebook content for my OReilly book, the Python Data Science Handbook.


See also the free companion project, A Whirlwind Tour of Python: a fast-paced introduction to the Python language aimed at researchers and scientists.

This repository will contain the full listing of IPython notebooks used to create the book, including all text and code. I am currently editing these, and will post them as I make my way through. See the content here:


by Patrick Durusau at November 19, 2016 10:27 PM

CIA Raises Technical Incompetence Flag

The CIA‘s responded to Michael Morisy‘s request for:

“a copy of emails sent to or from the CIA’s FOIA office regarding FOIA Portal’s Technical Issues.”

gives these requirements for requesting emails:

We require requesters seeking any form of “electronic communications” such as emails, to provide the specific “to” and “from” recipients, time frame and subject.

(The full response.)

Recalling that the FBI requested special software to separate emails of Huma Abedin and Anthony Weiner on the same laptop, is the CIA really that technically incompetent in terms of searching?

Is the CIA is incapable of searching emails by subject alone?

With a dissatisfied-with-intelligence-community president-elect Donald Trump about to take office, I would not be flying the Technical Incompetence Here flag.

The CIA may respond it is not incompetent but rather was acting in bad faith.

In debate we used to call that the “horns of a dilemma,” yes?

I’m voting for bad faith.

How about you?

by Patrick Durusau at November 19, 2016 09:29 PM

If You Don’t Get A New Car For The Holidays

Just because you aren’t expecting:


Doesn’t mean a new car isn’t in your future:


From the Sparrows Lock Pick website:

Sparrows Gridlock

There is a reason as to why a coat hanger is the tool of choice for most Automobile lockouts. Picking a standard 10 wafer Automotive lock is a Huge challenge. Most often it is achieved by being stubborn with a pinch of lucky a dash of skill.

The Gridlock set lets you develop that skill by working through three automotive locks of ever increasing difficulty. Building from a 3 to a 6 to a full 10 wafer Automotive lock will allow you to develop the skill for picking wafers. Wafer picking is an entirely different skill set when compared to pin tumbler picking.

A standard pin tumbler key is cut just along the top to lift the pins up into place letting you open the lock. A wafer lock key is cut on the top and bottom, this then moves the wafers Up and Down positioning them for the lock to open.

Learning to manipulate and rock those wafers into position is a skill ….. a skill that one day may get you a well deserved high five or a court appointed lawyer.

The Gridlock comes with 3 progressive wafer locks and an automotive tension wrench specific to appling tension to wafer locks. The locks are solid aluminum and perfect in scale to a classic car lock.

Think of lock picking as an expansion of your skills at digitally hacking access to automobiles.

With the Auto Rocker Picks, sans shipping, the package lists for $41.50. You may want some additional accessories from the LockPickShop

Security discussions determine when your security will fail, not if.

Security discussions that don’t include physical security determine it will be sooner rather than later.

by Patrick Durusau at November 19, 2016 08:49 PM

Eight steps reporters should take … [every day]

Eight steps reporters should take before Trump assumes office by Dana Priest.

Reporters should paste these eight steps to their bathrooms mirror for review every day, not just for the Trump presidency:

Rebuild sources: Call every source you’ve ever had who is either still in government or still connected to those who are. Touch base, renew old connections, and remind folks that you’re all ears.

Join forces: Triangulate tips and sources across the newsroom, like we did after 9/11, when reporting became more difficult.

Make outside partnerships: Reporting organizations outside your own newspaper, especially those abroad and with international reach, can help uncover the moves being considered and implemented in foreign countries.

Discover the first family: Now part of the White House team, Donald Trump’s children and son-in-law are an important target for deep-dive reporting into their own financial holdings and their professional and personal records.

Renew the hunt: Find those tax filings!

Out disinformation: Find a way to take on the many false news sites that now hold a destructive sway over some Americans.

Create a war chest: Donate and persuade your news organization to donate large sums to legal defense organizations preparing to jump in with legal challenges the moment Trump moves against access, or worse. The two groups that come to mind are the Reporters’ Committee for Freedom of the Press and the American Civil Liberties Union. Encourage your senior editors to get ready for the inevitable, quickly.

Be grateful: Celebrate your freedom to do hard-hitting, illuminating work by doing much more of it.

Don’t wait for reporters to carry all the load.

Many of these steps, “Renew the hunt” comes to mind, can be performed by non-reporters and then leaked.

A lack of transparency of government signals a lack of effort on the part of the press and public.

FOIA is great but it’s also being spoon fed what the government chooses to release.

I’m thinking of transparency that is less self-serving than FOIA releases.

by Patrick Durusau at November 19, 2016 07:38 PM

The Postal Museum (UK)

The Postal Museum

Set to open in mid-2017, the Postal Museum covers five hundred years of “Royal Mail.”

It’s Online catalogue has more than 120,000 records describing its collection.

Which includes this gem:


Registering for the catalogue will enable you to access downloadable content, save searches, create wish-lists, etc. Registration is free and worth the effort.

The site is in beta and my confirmation email displayed as blank in Thunderbird but viewing source gave the confirmation URL.

A terminology issue. Where the tabs for an item say “Ordering and Viewing,” they mean requesting an items to be retrieved for you to view on a specified day.

I was confused because I thought “ordering” meant obtaining a copy, print or digital of the item in question.

The turnpike road map above is available in a somewhat larger size but not nearly large enough for actual use.

Very high resolution images of maps and similar materials would be a welcome addition to the resources already available.


PS: I didn’t look but the Postal Museum has resources on stamps as well. ;-)

by Patrick Durusau at November 19, 2016 07:00 PM

November 18, 2016

Patrick Durusau

Successful Hate Speech/Fake News Filters – 20 Facts About Facebook

After penning Monetizing Hate Speech and False News yesterday, I remembered non-self-starters will be asking:

Where are examples of successful monetized filters for hate speech and false news?

Of The Top 20 Valuable Facebook Statistics – Updated November 2016, I need only two to make the case for monetized filters.

1. Worldwide, there are over 1.79 billion monthly active Facebook users (Facebook MAUs) which is a 16 percent increase year over year. (Source: Facebook as of 11/02/16)

15. Every 60 seconds on Facebook: 510 comments are posted, 293,000 statuses are updated, and 136,000 photos are uploaded. (Source: The Social Skinny)

(emphasis in the original)

By comparison, Newsonomics: 10 numbers on The New York Times’ 1 million digital-subscriber milestone [2015], the New York Times has 1 million digital subscribers.

If you think about it, the New York Times is a hate speech/fake news filter, although it has a much smaller audience than Facebook.

Moreover, the New York Times is spending money to generate content whereas on Facebook, content is there for the taking or filtering.

If the New York Times can make money as a filter for hate speech/fake news carrying its overhead, imagine the potential for profit from simply filtering content generated and posted by others. Across a market of 1.79 billion viewers. Where “hate,” and “fake” varies from audience to audience.

Content filters at Facebook and the ability to “follow” those filters for on timelines is all that is missing. (And Facebook monetizing the use of those filters.)

Petition Mark Zuckerberg and Facebook for content filters today!

by Patrick Durusau at November 18, 2016 04:04 PM

Operating Systems Design and Implementation (12th USENIX Symposium)

Operating Systems Design and Implementation (12th USENIX Symposium) – Savannah, GA, USA, November 2-4, 2016.

Message from the OSDI ’16 Program Co-Chairs:

We are delighted to welcome to you to the 12th USENIX Symposium on Operating Systems Design and Implementation, held in Savannah, GA, USA! This year’s program includes a record high 47 papers that represent the strength of our community and cover a wide range of topics, including security, cloud computing, transaction support, storage, networking, formal verification of systems, graph processing, system support for machine learning, programming languages, troubleshooting, and operating systems design and implementation.

Weighing in at seven hundred and ninety-seven (797) pages, this tome will prove more than sufficient to avoid annual family arguments during the holiday season.

Not to mention this is an opportunity to hone your skills to a fine edge.

by Patrick Durusau at November 18, 2016 02:59 AM

November 17, 2016

Patrick Durusau

Monetizing Hate Speech and False News

Eli Pariser has started If you were Facebook, how would you reduce the influence of fake news? on GoogleDocs.

Out of the now seventeen pages of suggestions, I haven’t noticed any that promise a revenue stream to Facebook.

I view ideas to filter “false news” and/or “hate speech” that don’t generate revenue for Facebook as non-starters. I suspect Facebook does as well.

Here is a broad sketch of how Facebook can monetize “false news” and “hate speech,” all while shaping Facebook timelines to diverse expectations.

Monetizing “false news” and “hate speech”

Facebook creates user defined filters for their timelines. Filters can block other Facebook accounts (and any material from them), content by origin, word and I would suggest, regex.

User defined filters apply only to that account and can be shared with twenty other Facebooks users.

To share a filter with more than twenty other Facebook users, Facebook charges an annual fee, scaled on the number of shares.

Unlike the many posts on “false news” and “hate speech,” being a filter isn’t free beyond twenty other users.

Selling Subscriptions to Facebook Filters

Organizations can sell subscriptions to their filters, Facebook, which controls the authorization of the filters, contracts for a percentage of the subscription fee.

Pro tip: I would not invoke Facebook filters from the Washington Post and New York Times at the same time. It is likely they exclude each other as news sources.

Advantages of Monetizing Hate Speech and False News

First and foremost for Facebook, it gets out of the satisfying every point of view game. Completely. Users are free to define as narrow or as broad a point of view as they desire.

If you see something you don’t like, disagree with, etc., don’t complain to Facebook, complain to your Facebook filter provider.

That alone will expose the hidden agenda behind most, perhaps not all, of the “false news” filtering advocates. They aren’t concerned with what they are seeing on Facebook but they are very concerned with deciding what you see on Facebook.

For wannabe filters of what other people see, beyond twenty other Facebook users, that privilege is not free. Unlike the many proposals with as many definitions of “false news” as appear in Eli’s document.

It is difficult to imagine a privilege people would be more ready to pay for than the right to attempt to filter what other people see. Churches, social organizations, local governments, corporations, you name them and they will be lining up to create filter lists.

The financial beneficiary of the “drive to filter for others” is of course Facebook but one could argue the filter owners profit by spreading their worldview and the unfortunates that follow them, well, they get what they get.

Commercialization of Facebook filters, that is selling subscriptions to Facebook filters creates a new genre of economic activity and yet another revenue stream for Facebook. (That two up to this point if you are keeping score.)

It isn’t hard to imagine the Economist, Forbes, professional clipping services, etc., creating a natural extension of their filtering activities onto Facebook.

Conclusion: Commercialization or Unfunded Work Assignments

Preventing/blocking “hate speech” and “false news,” for free has been, is and always will be a failure.

Changing Facebook infrastructure isn’t free and by creating revenue streams off of preventing/blocking “hate speech” and “false news,” creates incentives for Facebook to make the necessary changes and for people to build filters off of which they can profit.

Not to mention that filtering enables everyone, including the alt-right, alt-left and the sane people in between, to create the Facebook of their dreams, and not being subject to the Facebook desired by others.

Finally, it gets Facebook and Mark Zuckerberg out of the fantasy island approach where they are assigned unpaid work by others. New York Times, Mark Zuckerberg Is in Denial. (It’s another “hit” piece by Zeynep Tufekci.)

If you know Mark Zuckerberg, please pass this along to him.

by Patrick Durusau at November 17, 2016 10:48 PM

Pentagon Says: Facts Don’t Matter (Pre-Trump)

Intel chairman: Pentagon plagiarized Wikipedia in report to Congress by Kristina Wong.

From the post:

The Pentagon submitted information plagiarized from Wikipedia to members of Congress, the chairman of the House Intelligence Committee said at a hearing Thursday.

Chairman Devin Nunes (R-Calif.) said on March 21, Deputy Defense Secretary Bob Work submitted a document to the chairmen of the House Intelligence, Armed Services, and Defense appropriations committees with information directly copied from Wikipedia, an online open-source encyclopedia.

The information was submitted in a document used to justify a determination that Croughton was the best location for a joint intelligence center with the United Kingdom, Nunes said. The determination was required by the 2016 National Defense Authorization Act.

If that weren’t bad enough, here’s the kicker:

Work said he still fulfilled the law by making a determination and that the plagiarized information had “no bearing” on that determination.

Do you read that to mean:

  1. Work made the determination
  2. The “made” determination was packed with facts to justify it

In that order?

Remarkably candid admission that Pentagon decisions are made and then those decisions are packed with facts to justify them.

Not particularly surprising to me.


by Patrick Durusau at November 17, 2016 09:02 PM

The new Tesseract package: High Quality OCR in R

The new Tesseract package: High Quality OCR in R by Jeroen Ooms.

From the post:

Optical character recognition (OCR) is the process of extracting written or typed text from images such as photos and scanned documents into machine-encoded text. The new rOpenSci package tesseract brings one of the best open-source OCR engines to R. This enables researchers or journalists, for example, to search and analyze vast numbers of documents that are only available in printed form.

People looking to extract text and metadata from pdf files in R should try our pdftools package.

Reading too quickly at first I thought I had missed a new version of Tesseract (tesseract-ocr Github), an OCR program that I use on a semi-regular basis.

Reading a little slower, ;-), I discovered Ooms is describing a new package for R, which uses Tesseract for OCR.

This is great news but be aware that Tesseract (whether called by an R package or standalone) can generate a large amount of output in a fairly short period of time.

One of the stumbling blocks of OCR is the labor intensive process of cleaning up the inevitable mistakes.

Depending on how critical accuracy is for searching, for example, you may choose to verify and clean only quotes for use in other publications.

Best to make those decisions up front and not be faced with a mountain of output that isn’t useful unless and until it has been corrected.

by Patrick Durusau at November 17, 2016 06:38 PM

Mute Account vs. Mute Word/Hashtag – Ineffectual Muting @Twitter


I mentioned yesterday the distinction between muting an account versus the new muting by word or #hashtag at Twitter.

Take a moment to check my sources at Twitter support to make sure I have the rules correctly stated. I’ll wait.

(I’m not a journalist but readers should be enabled to satisfy themselves claims I make are at least plausible.)

No feedback from Twitter on the don’t appear in your timeline vs. do appear in your timeline distinction.

Why would I want to only block notifications of what I think of as hate speech and still have those tweets in my timeline?

Then it occurred to me:

If you can block tweets from appearing in your timeline by word or hashtag, you can block advertising tweets from appearing in your timeline.

You cannot effectively mute hate speech @Twitter because you could also mute advertising.

What about it Twitter?

Must feminists, people of color, minorities of all types be subjected to hate speech in order to preserve your revenue streams?

Not that I object to Twitter having revenue streams from advertising but it needs to be more sophisticated than the Nigerian spammer model now in use. Charge a higher price for targeted advertising that users are unlikely to block.

For example, I would be highly unlikely to block ads for cs theory/semantic integration tomes. On the other hand, I would follow a mute list that blocked histories of famous cricket matches. (Apologies to any cricket players in the audience.)

In my post: Twitter Almost Enables Personal Muting + Roving Citizen-Censors I offer a solution that requires only minor changes based on data Twitter already collects plus regexes for muting. It puts what you see entirely in the hands of users.

That enables Twitter to get out of the censorship business altogether, something it doesn’t do well anyway, and puts users in charge of what they see. A win-win from my perspective.

by Patrick Durusau at November 17, 2016 03:55 PM

Alt-right suspensions lay bare Twitter’s consistency [hypocrisy] problem

Alt-right suspensions lay bare Twitter’s consistency problem by Nausicaa Renner.

From the post:

TWITTER SUSPENDED A NUMBER OF ACCOUNTS associated with the alt-right, USA Today reported this morning. This move was bound to be divisive: While Twitter has banned and suspended users in the past (prominently, Milo Yiannopoulos for incitement), USA Today points out the company has never suspended so many at once—at least seven in this case. Richard Spencer, one of the suspended users and prominent alt-righter, also had a verified account on Twitter. He claims, “I, and a number of other people who have just got banned, weren’t even trolling.”

If this is true, it would be a powerful political statement, indeed. As David Frum notes in The Atlantic, “These suspensions seem motivated entirely by viewpoint, not by behavior.” Frum goes on to argue that a kingpin strategy on Twitter’s part will only strengthen the alt-right’s audience. But we may never know Twitter’s reasoning for suspending the accounts. Twitter declined to comment on its moves, citing privacy and security reasons.

(emphasis in original)

Contrary to the claims of the Southern Poverty Law Center (SPLC) to Twitter, these users may not have been suspended for violating Twitter’s terms of service, but for their viewpoints.

Like the CIA, FBI and NSA, Twitter uses secrecy to avoid accountability and transparency for its suspension process.

The secrecy – avoidance of accountability/transparency pattern is one you should commit to memory. It is quite common.

Twitter needs to develop better muting options for users and abandon account suspension (save on court order) altogether.

by Patrick Durusau at November 17, 2016 03:10 PM

November 16, 2016

Patrick Durusau

XML Prague 2017, February 9-11, 2017 – Registration Opens!

XML Prague 2017, February 9-11, 2017

I mentioned XML Prague 2017 last month and now, after the election of Donald Trump as president of the United States, registration for the conference opens!


Maybe. ;-)

Even if you are returning to the U.S. after the conference, XML Prague will be a welcome respite from the tempest of news coverage of what isn’t known about the impending Trump administration.

At 120 Euros for three days, this is a great investment both professionally and emotionally.


by Patrick Durusau at November 16, 2016 08:22 PM

The Amnesic Incognito Live System (Tails) 2.7

The Amnesic Incognito Live System (Tails) 2.7

The Amnesic Incognito Live System (Tails) is a Debian-based, live distribution with the goal of providing Internet anonymity for its users. The distribution accomplishes this by directing Internet traffic through the Tor network and by providing built-in tools for protecting files and scrubbing away meta data. The project’s latest release mostly focuses on fixing bugs and improving security: “Tails 2.7 is out. This release fixes many security issues and users should upgrade as soon as possible. New features: ship LetsEncrypt intermediate SSL certificate so that our tools are able to authenticate our website when its certificate is updated. Upgrades and changes: Tor, Tor Browser 6.0.6, Linux kernel 4.7, Icedove 45.4.0. Fixed problems: Synaptic installs packages with the correct architecture; set default spelling to en_US in Icedove. Known issues: users setting their Tor Browser security slider to High will have to click on a link to see the result of the search they done with the search box.” Additional information on Tails 2.7 can be found in the project’s release notes. A list of issues fixed in the 2.7 release can be found in the list of former security issues. Download: tails-i386-2.7.iso (1,113MB, signature, pkglist). Also available from OSDisc.

An essential part of your overall cybersecurity stance.

All releases are date/time sensitive.

BEFORE installing this release, even later today, check for a later release: Tails.

Checking for the latest release only takes seconds and is a habit that will help you avoid patched security holes.

by Patrick Durusau at November 16, 2016 08:10 PM

PoisonTap – Wishlist 2016

PoisonTap Steals Cookies, Drops Backdoors on Password-Protected Computers by Chris Brook.

From the post:

Even locked, password-protected computers are no rival for Samy Kamkar and his seemingly endless parade of gadgets.

His latest, PoisonTap, is a $5 Raspberry Pi Zero device running Node.js that’s retrofitted to emulate an Ethernet device over USB. Assuming a victim has left their web browser open, once plugged in to a machine, the device can quietly fetch HTTP cookies and sessions from millions of websites, even if the computer is locked.

If that alone doesn’t sound like Mr. Robot season three fodder, the device can also expose the machine’s internal router and install persistent backdoors, guaranteeing an attacker access long after they’ve removed the device from a USB slot.

“[The device] produces a cascading effect by exploiting the existing trust in various mechanisms of a machine and network, including USB, DHCP, DNS, and HTTP, to produce a snowball effect of information exfiltration, network access and installation of semi-permanent backdoors,” Kamkar said Wednesday in a writeup of PoisonTap.

Opportunity may only knock once.

Be prepared by carrying one or more PoisonTaps along with a bootable USB stick.

by Patrick Durusau at November 16, 2016 07:50 PM

“…Fake News Is Not the Problem”

According to Snopes, Fake News Is Not the Problem by Brooke Binkowski.

From the post:

Take it from the internet’s chief myth busters: The problem is the failing media.

This is the state of truth on the internet in 2016, now that it is as easy for a Macedonian teenager to create a website as it is for The New York Times, and now that the information most likely to find a large audience is that which is most alarming, not most correct. In the wake of the election, the spread of this kind of phony news on Facebook and other social media platforms has come under fire for stoking fears and influencing the election’s outcome. Both Facebook and Google have taken moves to bar fake news sites from their advertising platforms, aiming to cut off the sites’ sources of revenue.

But as managing editor of the fact-checking site Snopes, Brooke Binkowski believes Facebook’s perpetuation of phony news is not to blame for our epidemic of misinformation. “It’s not social media that’s the problem,” she says emphatically. “People are looking for somebody to pick on. The alt-rights have been empowered and that’s not going to go away anytime soon. But they also have always been around.”

The misinformation crisis, according to Binkowski, stems from something more pernicious. In the past, the sources of accurate information were recognizable enough that phony news was relatively easy for a discerning reader to identify and discredit. The problem, Binkowski believes, is that the public has lost faith in the media broadly — therefore no media outlet is considered credible any longer. The reasons are familiar: as the business of news has grown tougher, many outlets have been stripped of the resources they need for journalists to do their jobs correctly. “When you’re on your fifth story of the day and there’s no editor because the editor’s been fired and there’s no fact checker so you have to Google it yourself and you don’t have access to any academic journals or anything like that, you will screw stories up,” she says.

Sadly Binkowski’s debunking of the false/fake news meme doesn’t turn up on

That might make it more convincing to mainstream media who have seized upon false/fake news to excuse their lack of credibility with readers.

Please share the Binkowski post with your friends, especially journalists.

by Patrick Durusau at November 16, 2016 06:52 PM