Planet Topic Maps

September 26, 2016

Patrick Durusau

Value-Add Of Wikileaks Hillary Clinton Email Archive?

I was checking Wikileaks today for any new document drops on Hillary Clinton, but only found:

WikiLeaks offers award for #LabourLeaks

Trade in Services Agreement

Assange Medical and Psychological Records

The lesson from the last item is to always seek asylum in a large embassy, preferably one with a pool. You can search at Embassies by embassy for what country, located in what other country. I did not see an easy way to search for size and accommodations.

Oh, not finding any new data on Hillary Clinton, I checked the Hillary Clinton Email Archive at Wikileaks:


Compare that to the State Department FOIA server for Clinton_Email:


Do you see a value-add to Wikileaks re-posting the State Department’s posting of Hillary’s emails?

If yes, please report in comments below the value-add you see. (Thanks in advance.)

If not, what do you think would be a helpful value-add to the Hillary Clinton emails? (Suggestions deeply appreciated.)

by Patrick Durusau at September 26, 2016 05:20 PM

20 Year Lesson On Operational Security

Reports on Ardit Ferizi share a common lead:

A computer hacker who allegedly helped the terrorist organization ISIS by handing over data for 1,351 US government and military personnel has been sentenced to 20 years in a U.S. prison. (Hacker Who Helped ISIS to Build ‘Hit List’ Of US Military Personnel Jailed for 20 Years

An ISIS supporter who hit the headlines after breaking into computer systems in order to steal and leak the details of military personnel has been awarded a sentence of 20 years in prison for his crimes. (Hacker who leaked US military ‘kill list’ for ISIS sent behind bars)

A 20-year-old computer science student from Kosovo described by the Justice Department as “the first terrorist hacker convicted in the United States” was sentenced Friday to two decades in prison for providing the Islamic State with a “kill list” containing the personal information of roughly 1,300 U.S. military members and government employees. (Islamic State hacker sentenced for assisting terrorist group with ‘kill list’)

Missing from those leads (and most stories) is that bad operational security led to Ardit Ferizi’s arrest and conviction.

Charlie Osborne reports in Hacker who leaked US military ‘kill list’ for ISIS sent behind bars:

Ferizi gave this information to the terrorist organization in order for ISIS to “hit them hard” and did not bother to conceal his activity — neither disguising his IP address or using a fake name on social media — which made it easier for law enforcement to track his activities.

Charlie also reports the obligatory blustering of the Assistant Attorney General:

“This case represents the first time we have seen the very real and dangerous national security cyber threat that results from the combination of terrorism and hacking. This was a wake-up call not only to those of us in law enforcement, but also to those in private industry. This successful prosecution also sends a message to those around the world that, if you provide material support to designated foreign terrorist organizations and assist them with their deadly attack planning, you will have nowhere to hide.

We will reach half-way around the world if necessary to hold accountable those who engage in this type of activity.”

A “wake-up call” about computer science students with histories of drug abuse and mental health issues, who don’t practice even minimal operational security, yet who are “…very real and dangerous national security cyber threat[s]…”

You bet.

A better lead for this story would be:

Failure to conceal his IP and identity online nets Kosovo student a 20-year prison sentence in overreaching US prosecution, presided over by callous judge.

Concealment of IP and identity should be practiced until it is second nature.

No identification = No prosecution.

by Patrick Durusau at September 26, 2016 12:33 PM

Colin Powell Email Files posted on September 14, 2016, a set of emails to and from Colin Luther Powell.

From the homepage for those leaked emails:

Colin Luther Powell is an American statesman and a retired four-star general in the United States Army. He was the 65th United States Secretary of State, serving under U.S. President George W. Bush from 2001 to 2005, the first African American to serve in that position. During his military career, Powell also served as National Security Advisor (1987–1989), as Commander of the U.S. Army Forces Command (1989) and as Chairman of the Joint Chiefs of Staff (1989–1993), holding the latter position during the Persian Gulf War. Born in Harlem as the son of Jamaican immigrants, Powell was the first, and so far the only, African American to serve on the Joint Chiefs of Staff, and the first of two consecutive black office-holders to serve as U.S. Secretary of State.

The leaked emails start in June of 2014 and end in August of 2016.

Access to the emails is by browsing and/or full text searching.

Try your luck at finding Powell’s comments on Hillary Clinton or former Vice-President Cheney. Searching one chunk of emails at a time.

I appreciate and admire DCLeaks for taking the lead in posting this and similar materials. And I hope they continue to do so in the future.

However, the access offered reduces a good leak to a random trickle.

This series will use the Colin Powell emails to demonstrate better leaking practices.

Coming Monday, September 26, 2016 – Bulk Access to the Colin Powell Emails.

by Patrick Durusau at September 26, 2016 01:43 AM

September 24, 2016

Patrick Durusau

What are we allowed to say? [Criticism]

What are we allowed to say? by David Bromwich.

From the post:

Free speech is an aberration – it is best to begin by admitting that. In most societies throughout history and in all societies some of the time, censorship has been the means by which a ruling group or a visible majority cleanses the channels of communication to ensure that certain conventional practices will go on operating undisturbed. It is not only traditional cultures that see the point of taboos on speech and expressive action. Even in societies where faith in progress is part of a common creed, censorship is often taken to be a necessary means to effect improvements that will convey a better life to all. Violent threats like the fatwa on Salman Rushdie and violent acts like the assassinations at Charlie Hebdo remind us that a militant religion is a dangerous carrier of the demand for the purification of words and images. Meanwhile, since the fall of Soviet communism, liberal bureaucrats in the North Atlantic democracies have kept busy constructing speech codes and guidelines on civility to soften the impact of unpleasant ideas. Is there a connection between the two?

Probably an inbred trait of human nature renders the attraction of censorship perennial. Most people (the highly literate are among the worst) believe that what is good for them will be good for others. Besides, a regime of censorship must claim to derive its authority from settled knowledge and not opinion. Once enforcement and exclusion have done their work, this assumption becomes almost irresistible; and it is relied on to produce a fortunate and economical result: self-censorship. We stay out of trouble by gagging ourselves. Among the few motives that may strengthen the power of resistance is the consciousness of having been deeply wrong oneself, either regarding some abstract question or in personal or public life. Another motive of resistance occasionally pitches in: a radical, quasi-physical horror of seeing people coerce other people without having to supply reasons. For better or worse, this second motive is likely to be mixed with misanthropy.

As far back as one can trace the vicissitudes of public speech and its suppression, the case for censorship seems to have begun in the need for strictures against blasphemy. The introductory chapter of Blasphemy, by the great American legal scholar Leonard Levy, covers ‘the Jewish trial of Jesus’; it is followed in close succession, in Levy’s account, by the Christian invention of the concept of heresy and the persecution of the Socinian and Arminian heretics and later of the Ranters, Antinomians and early Quakers. After an uncertain interval of state prosecutions and compromises in the 19th century, Levy’s history closes at the threshold of a second Enlightenment in the mid-20th: the endorsement by the North Atlantic democracies of a regime of almost unrestricted freedom of speech and expression.
… (emphasis in original)

Bromwich’s essay runs some twenty pages in print so refresh your coffee before starting!

It is a “must” read but not without problems.

The focus on Charlie Hebdo and The Satanic Verses, gives readers a “safe context” in which to consider the issue of “free speech.”

The widespread censorship of “jihadist” speech, which for the most part passes unnoticed and without even a cursory node towards “free speech” is a more current and confrontational example.

Does Bromwich use safe examples to “stay out of trouble by gagging [himself]?”

Hundreds of thousands have been silenced by Western tech companies. Yet in an essay on freedom of speech, they don’t merit a single mention.

The failure to mention the largest current example of anti-freedom of speech in a freedom of speech essay, should disturb every attentive reader.

Disturb them to ask: What of freedom of speech today? Not as a dry and desiccated abstraction but freedom of speech in the streets.

Where is the freedom of speech to incite others to action? Freedom of speech to oppose corrupt governments? Freedom of speech to advocate harsh measures against criminal oppressors?

The invocation of Milton and Mill provides a groundwork for confrontation of government urged if not required censorship but the opportunity is wasted on the vagaries of academic politics.

Freedom of speech is important on college campuses but people are dying where freedom of speech is being denied. To showcase the former over the latter is a form of censorship itself.

If the question is censorship, as Milton and Mill would agree, the answer is no. (full stop)

PS: For those who raise the bugaboo of child pornography, there are laws against the sexual abuse of children, laws that raise no freedom of speech issues.

Possession of child pornography is attacked because it gives the appearance of meaningful action, while allowing the cash flow from its production and distribution to continue unimpeded.

by Patrick Durusau at September 24, 2016 09:37 PM

Police use-of-force data is finally coming to light (Evidence Based Citizen Safety)

Police use-of-force data is finally coming to light by Megan Rose Dickey.

From the post:


Since 2011, less than 3% of the country’s 18,000 state and local police agencies have reported information about police-involved shootings of citizens. That’s because there’s no mandatory federal requirement to do so. There is, however, a mandate in California (Assembly Bill 71) for all police departments to report their use of force incidents that happened after Jan. 1, 2016 by Jan. 1, 2017.

Winds of Data Change:

With URSUS, California police departments can use the open-source platform to collect and report use-of-force data, in the cases of serious injuries, to the CA DOJ. Back in February, the CA DOJ unveiled a revamped version of the OpenJustice platform featuring data around arrest rates, deaths in custody, arrest-related deaths and law enforcement officers assaulted on the job.

Unlike the first version of OpenJustice, the current platform makes it possible to break down data by specific law enforcement agencies. As URSUS collects data about police use-of-force, OpenJustice will publish that information in its database starting early next year.

Here’s an overview of how the system works:

Evidence Based Citizen Safety

In the analysis, Campaign Zero found that only 21 of the 91 police departments reviewed explicitly prohibit officers from using chokeholds. Even more, the average police department reviewed has only adopted three of the eight policies identified that could prevent unnecessary civilian deaths. Not one of the police departments reviewed has implemented all eight.


According to Campaign Zero’s analysis, if the police departments reviewed were to implement all eight of the use-of-force restrictions, there would be a 54% reduction in killings for the average police department.

With the CA DOJ’s new police use-of-force data system, plus initiatives driven by non-profit organizations and the media, we’re definitely moving in the right direction when it comes to transparency around policing. But if we want real change, the rest of the country’s law enforcement agencies are going to need to get on board. If the PRIDE Act passes, police departments nationwide will not only have to make their use-of-force policies publicly available, but also have to report police use-of-force incidents that result in deaths of civilians. But while the government is stepping up its game around policing data, there is still a need for a community-driven initiatives that track police killings of civilians.

Greater transparency around policing leads to fewer civilian deaths (those folks the police are sworn to serve) and can lead to greater trust/cooperation between the police and the communities they serve. Which means better police work and less danger/stress for police officers.

That’s a win-win situation.

But it starts with data transparency for police activities.

How transparent is your local police department?

Waiting for it to be required by law delays better service to the community and better policing.

Is that a goal of your local police department? You might better ask.

by Patrick Durusau at September 24, 2016 07:28 PM

XQuery Working Group (Vanderbilt)

XQuery Working Group – Learn XQuery in the Company of Digital Humanists and Digital Scientists

From the webpage:

We meet from 3:00 to 4:30 p.m. on most Fridays in 800FA of the Central Library. Newcomers are always welcome! Check the schedule below for details about topics. Also see our Github repository for code samples. Contact Cliff Anderson with any questions or see the FAQs below.

Good thing we are all mindful of the distinction W3C XML Query Working Group and XQuery Working Group (Vanderbilt).

Otherwise, you might need a topic map to sort out casual references. ;-)

Even if you can’t attend meetings in person, support this project by Cliff Anderson.

by Patrick Durusau at September 24, 2016 06:54 PM

Stress-Free #SkippingTheDebate Parties

Unlike the hackers only show in Snow Crash, the first presidential debate of 2016 is projected to make a record number of viewers dumber.

Well, to be fair, Stephen Battaglio, reports:

Millions of viewers are also expected to watch online as many websites and social media platforms, such as Facebook and Twitter, will offer free video streaming of the event.

“This one seems to have aroused the greatest attention and more debate-before-the-debate than any of them,” said Newton Minow, vice chairman of the Commission on Presidential Debates, whose involvement goes back to the the first historic televised showdown between John F. Kennedy and Richard Nixon in 1960.

The reason viewing levels may skyrocket? Larry Sabato, director of the Center for Politics at the University of Virginia, cites the unpredictability of Trump, whose appearances in the Republican primary debates set audience records on four different cable networks over the past year.

“It’s the same reason why this election is different than all other elections,” Sabato said. “People will tune in to see the car crash. Trump’s gotten a big audience from the beginning because you knew you’d either see a fender bender or a fatality. This is the big stage and the first one-on-one debate he’s done.”

Rather than watch a moderator and the candidates indulge in the fiction that any substantive discussion of national or international issues can occur in ninety minutes, hold a #SkippingTheDebate party!

Here’s how:

  1. Invite your friends over for a #SkippingTheDebate Party
  2. Have ball game like snacks and drinks
  3. Have a minimum of 5 back issues of Mad Magazine for each guest
  4. Distribute the Mad magazines, after 10 minute reading intervals, each guest may share their favorite comment or observation, discuss and repeat

Unlike debate watching parties, your guests will be amused, have more quips in their quivers, have enjoyed each others company, and most importantly, they will not be dumber for the experience.

I do have data to demonstrate that Mad Magazine is the right choice for your #SkippingTheDebate party:



Mad didn’t quite capture the dried apricot complexion of Trump and Hillary looks, well, younger, but even Mad can be kind.

by Patrick Durusau at September 24, 2016 03:11 PM

Avoid FBI Demands – Make Your Product Easily Crackable

Joshua Kopstein reports that Apple has discovered a way to dodge future requests for assistance from the FBI.

Make backups of the iOS 10 easily crackable.

From iOS 10 Has a ‘Severe’ Security Flaw, Says iPhone-Cracking Company:

Apple has introduced a “severe” flaw in its newly-released iOS 10 operating system that leaves backup data vulnerable to password-cracking tools, according to researchers at a smartphone forensics company that specializes in unlocking iPhones.

In a blog post published Friday by Elcomsoft, a Russian company that makes software to help law enforcement agencies access data from mobile devices, researcher Oleg Afonin showed that changes in the way local backup files are protected in iOS 10 has left backups dramatically more susceptible to password-cracking attempts than those produced by previous versions of Apple’s operating system.

Specifically, the company found that iOS 10 backups saved locally to a computer via iTunes allow password-cracking tools to try different password combinations at a rate of 6,000,000 attempts per second, more than 40 times faster than with backups created by iOS 9. Elcomsoft says this is due to Apple implementing a weaker password verification method than the one protecting backup data in previous versions. That means that cops and tech-savvy criminals could much more quickly and easily gain access to data from locally-stored iOS 10 backups than those produced by older versions.

After the NSA sat on a Cisco vulnerability for a decade or so, you have to wonder about the motives of Elcomsoft for quick disclosure.

Perhaps they wanted to take away an easy win from their potential competitors?

In any event, be aware that your iOS 10 has a vulnerability the size of a Mack truck.

Got any Russian readers, that’s roughly the equivalent to:


While looking for this image, I saw a number of impressive Russian trucks!

by Patrick Durusau at September 24, 2016 02:23 AM

14 free digital tools that any newsroom can use

14 free digital tools that any newsroom can use by Sara Olstad.

From the post:

ICFJ’s Knight Fellows are global media innovators who foster news innovation and experimentation to deepen coverage, expand news delivery and better engage citizens. As part of their work, they’ve created tools that they are eager to share with journalists worldwide.

Their projects range from Push, a mobile app for news organizations that don’t have the time, money or resources to build their own, to Salama, a tool that assesses a reporter’s risk and recommends ways to stay safe. These tools and others developed by Knight Fellows can help news organizations everywhere find stories in complex datasets, better distribute their content and keep their journalists safe from online and physical security threats.

As part of the 2016 Online News Association conference, try out these 14 digital tools that any newsroom can use. If you adopt any of these tools or lead any new projects inspired by them, tweet about it to @ICFJKnight.

I was mis-led by the presentation of the “14 free digital tools.”

The box where African Network of Centers for Investigative Reporting (ANCIR) and Aleph appear has a scroll marker on the right hand side.

I’m not sure why I missed it or why the embedding of a scrolling box is considered good page design.

But the tools themselves merit your attention.


by Patrick Durusau at September 24, 2016 01:54 AM

September 23, 2016

Patrick Durusau

Tor is released, with important fixes

Tor is released, with important fixes

Source available today, packages over the next week.

Privacy is an active, not passive stance.

Steps to take:

  1. Upgrade your Tor software.
  2. Help someone upgrade their Tor software.
  3. Introduce one new person to Tor.

If you take those steps with every upgrade, Tor will spread more quickly.

I have this vision of James Clapper (Director of National Intelligence), waking up in a cold sweat as darkness spreads across a visualization of the Internet in real time.

Just a vision but an entertaining one.

by Patrick Durusau at September 23, 2016 09:49 PM

5 lessons on the craft of journalism from Longform podcast

5 lessons on the craft of journalism from Longform podcast by Joe Freeman.

From the post:

AT FIRST I WAS RELUCTANT to dive into the Longform podcast, a series of interviews with nonfiction writers and journalists that recently produced its 200th episode. The reasons for my wariness were petty. What sane freelancer wants to listen to highly successful writers and editors droning on about their awards and awesome careers? Not this guy! But about a year ago, I succumbed, and quickly became a thankful convert. The more I listened, the more I realized that the show, started in 2012 on the website and produced in collaboration with The Atavist, was a veritable goldmine of information. It’s almost as if the top baseball players in the country sat down every week and casually explained how to hit home runs.

Whether they meant to or not, the podcast’s creators and interviewers—Aaron Lammer, Max Linsky, and Evan Ratliff—have produced a free master class on narrative reporting, with practitioners sharing tips and advice about the craft and, crucially, the business. As a journalist, I’ve learned a lot listening to the podcast, but a few consistent themes emerge that I have distilled into five takeaways from specific interviews.

(emphasis in original)

I’m impressed with Joe’s five takeaways but as I sit here repackaging leaked data, there is one common characteristic I would emphasize:

They all involve writing!

That is the actual production of content.

Not plans for content.

Not models for content.

Not abstractions for content.


Not to worry, I intend to keep my tools/theory edge but in addition to adding Longform podcast to my listening list, I’m going to try to produce more data content as well.

I started off with that intention using XQuery at the start of this year, a theme that is likely to re-appear in the near future.


by Patrick Durusau at September 23, 2016 09:37 PM

Are You A Closet Book Burner? Google Crowdsources Censorship!

YouTube is cleaning up and it wants your help! by Lisa Vaas.

From the post:

Google is well aware that the hair-raising comments of YouTube users have turned the service into a fright fest.

It’s tried to drain the swamp. In February 2015, for example, it created a kid-safe app that would keep things like, oh, say, racist/anti-Semitic/homophobic comments or zombies from scaring the bejeezus out of young YouTubers.

Now, Google’s trying something new: it’s soliciting “YouTube Heroes” to don their mental hazmat suits and dive in to do some cleanup.

You work hard to make YouTube better for everyone… and like all heroes, you deserve a place to call home.

Google has renamed the firemen of Fahrenheit 451 to YouTube Heroes.

Positive names cannot change the fact that censors by any name, are in fact just that, censors.

Google has taken censorship to a new level in soliciting the participation of the close-minded, the intolerant, the bigoted, the fearful, etc., from across the reach of the Internet, to censor YouTube.

Google does own YouTube and if it wants to turn it into a pasty gray pot of safe gruel, it certainly can do so.

As censors flood into YouTube, free thinkers, explorers, users who prefer new ideas over pablum, need to flood out of YouTube.

Ad revenue needs to fall as this ill-advised campaign, “come be a YouTube censor” succeeds.

Only falling ad revenue will stop this foray into the folly of censorship by Google.

First steps:

  1. Don’t post videos to Google.
  2. Avoid watching videos on Google as much as possible.
  3. Urge other to not post/use YouTube.
  4. Post videos to other venues.
  5. Speak out against YouTube censorship.
  6. Urge YouTube authors to post/repost elsewhere

“Safe place” means a place safe from content control at the whim and caprice of governments, corporations and even other individuals.

What’s so hard to “get” about that?

by Patrick Durusau at September 23, 2016 05:52 PM

Hacker-Proof Code Confirmed [Can Liability Be Far Behind?]

Hacker-Proof Code Confirmed by Kevin Hartnett.

From the post:

In the summer of 2015 a team of hackers attempted to take control of an unmanned military helicopter known as Little Bird. The helicopter, which is similar to the piloted version long-favored for U.S. special operations missions, was stationed at a Boeing facility in Arizona. The hackers had a head start: At the time they began the operation, they already had access to one part of the drone’s computer system. From there, all they needed to do was hack into Little Bird’s onboard flight-control computer, and the drone was theirs.

When the project started, a “Red Team” of hackers could have taken over the helicopter almost as easily as it could break into your home Wi-Fi. But in the intervening months, engineers from the Defense Advanced Research Projects Agency (DARPA) had implemented a new kind of security mechanism — a software system that couldn’t be commandeered. Key parts of Little Bird’s computer system were unhackable with existing technology, its code as trustworthy as a mathematical proof. Even though the Red Team was given six weeks with the drone and more access to its computing network than genuine bad actors could ever expect to attain, they failed to crack Little Bird’s defenses.

“They were not able to break out and disrupt the operation in any way,” said Kathleen Fisher, a professor of computer science at Tufts University and the founding program manager of the High-Assurance Cyber Military Systems (HACMS) project. “That result made all of DARPA stand up and say, oh my goodness, we can actually use this technology in systems we care about.”

Reducing the verification requirement to a manageable size appears to be the key to DARPA’s success.

That is rather than verification of the entire program, only critical parts, such as excluding hackers, need to be verified.

If this spreads, failure to formally verify critical parts of software would be a natural place to begin imposing liability for poorly written code.

PS: Would formal proof of data integration be a value-add?

by Patrick Durusau at September 23, 2016 01:56 AM

Cisco Hunting Report – ISAKMP – 859,233 Vulnerable IPs

The Vulnerable ISAKMP Scanning Project, courtesy of ShadowServer reports:

This scan is looking for devices that contain a vulnerability in their IKEv1 packet processing code that could allow an unauthenticated, remote attacker to retrieve memory contents, which could lead to the disclosure of confidential information. More information on this issue can be found on Cisco’s site at:

The goal of this project is to identify the vulnerable systems and report them back to the network owners for remediation.

Statistics on current run

859,233 distinct IPs have responded as vulnerable to our ISAKMP probe.

(emphasis in the original)

If visuals help:



I trust your map reading skills are sufficient to conclude that ISAKMP vulnerabilities aren’t common in Iceland and northern Finland. There are more fertile areas for exploration.


You can see other land masses or all vulnerable devices.

Is anyone selling ISAKMP scan data?

That would be valuable intell.

Imagine converting it into domain names so c-suite types could cross-check reassurances from their IT departments.

by Patrick Durusau at September 23, 2016 01:15 AM

September 22, 2016

Patrick Durusau

Apache Lucene 6.2.1 and Apache Solr 6.2.1 Available [Presidential Data Leaks]

Lucene can be downloaded from

Solr can be downloaded from

If you aren’t using Lucene/Solr 6.2, here’s your chance to grab the latest bug fixes as well!

Data leaks will accelerate as the US presidential election draws to a close.

What’s your favorite tool for analysis and delivery of data dumps?


by Patrick Durusau at September 22, 2016 03:55 PM

Google Allo – Goodbye!

Google Allo: Don’t use it, says Edward Snowden by Liam Tung.

From the post:

Google’s Allo messaging app and its Assistant bot have finally arrived, but Allo has been slammed for reneging on a promise that it would, by default, make it more difficult to spy on.

Because of the missing privacy feature, NSA-contractor-turned-whistleblower Edward Snowden’s first take of Allo after yesterday’s US launch is that it’s just a honeypot for surveillance.

The main complaints are that security is off by default and that chat logs are stored until deleted by users.

Google made a conscious choice on both of those features.

Now is your opportunity to make a conscious choice about Allo. Goodbye!

Don’t be mis-led into thinking end-to-end encryption ends the danger from preserving chat logs.

Intelligence agencies have long argued knowing who calls who is more important than the content of phone calls. Same is true for chats.

Google has chosen a side other than consumers, that’s enough to avoid it whenever possible.

by Patrick Durusau at September 22, 2016 03:39 PM

September 21, 2016

Patrick Durusau

What Makes A Liar Lie? (Clapper Lying About The Russians)

US intel head suggests Russia behind DNC hacks, says Moscow tried to affect elections in past

From the post:

The US director of national intelligence has suggested Russia is behind the recent hack that saw Democratic National Committee (DNC) records dumped online. The leak undermined the Democrats’ reputation ahead of November’s presidential election.

“It’s probably not real, real clear whether there’s influence in terms of an outcome [of the upcoming elections] – or what I worry about more, frankly – is just the sowing the seeds of doubt, where doubt is cast on the whole [election] process,” James Clapper said on Tuesday evening at an event hosted by the Washington Post, as cited by the Wall Street Journal.

Furthermore, the intelligence chief said Russia and its predecessor the USSR had been adhering to similar practices targeting the US since the 1960s.

“There’s a tradition in Russia of interfering with elections, their own and others.

“[…] It shouldn’t come as a big shock to people. I think it’s more dramatic maybe because now they have the cyber tools,” Clapper is cited as saying.

The comments come in contrast to Clapper’s earlier statements regarding Russia’s alleged connection to the hacking operation, which is believed to have been conducted over more than a year. In July, shortly after the documents had been leaked, he urged an end to the “reactionary mode” of blaming the leak on Russia.
… (emphasis in original)

Do you wonder why Clapper shifted from avoiding a “reactionary mode” of blaming Russia to not only blaming Russia, but claiming a history of Russian interference with United States elections?

I don’t have an email or recorded phone conversation smoking gun, but here’s one possible explanation:

From FiveThirtyEight as of today:


My prediction: The closer the odds become from FiveThirtyEight, the more frantic and far-fetched the lies from James Clapper will become.

Another DNC leak or two (real ones, not the discarded hard drive kind), and Clapper will be warning of Russian influence in county government and school board elections.

PS: If you don’t think Clapper is intentionally lying, when will you break the story his accounts have lost all connection to a reality shared by others?

by Patrick Durusau at September 21, 2016 08:45 PM

Reducing Your “Competition”

Good security practices are a must, whether you live in the Cisco universe or the more mundane realm of drug pushing.

Case in point: Photos On Dark Web Reveal Geo-locations Of 229 Drug Dealers — Here’s How by Swati Khandelwal.

From the post:

It’s a Fact! No matter how smart the criminals are, they always leave some trace behind.

Two Harvard students have unmasked around 229 drug and weapon dealers with the help of pictures taken by criminals and used in advertisements placed on dark web markets.

Do you know each image contains a range of additional hidden data stored within it that can be a treasure to the investigators fighting criminals?

Whatever services you are offering on the Dark Web, here’s an opportunity to reduce the amount of competition you are facing.

Perhaps even a reward from CrimeStoppers, although you need to price shop against your local organization for the better deal.

Failure to scrub Exchangeable Image File Format (EXIF) data lies at the heart of this technique.

See Swati’s post for more details on this “hack.”

Do your civic duty to reduce crime (your competitors) and be rewarded in the process.

Who says cybersecurity can’t be a profit center? ;-)

by Patrick Durusau at September 21, 2016 03:54 PM

Tails [Whatever The Presidential Race Outcome]

Tails – theamnesicincognitolivesystem

From the about page:

Tails is a live system that aims to preserve your privacy and anonymity. It helps you to use the Internet anonymously and circumvent censorship almost anywhere you go and on any computer but leaving no trace unless you ask it to explicitly.

Whatever your prediction for the US 2016 presidential election, Hairy Thunderer or Cosmic Muffin, you are going to need Tails

For free speech and/or privacy in 2017, get Tails.

It really is that simple.

by Patrick Durusau at September 21, 2016 12:58 AM

September 20, 2016

Patrick Durusau

Betraying Snowden:… [Cynical, but not odd]

Betraying Snowden: There’s a special place in journalism hell for The Washington Post editorial board by Daniel Denvir.

From the post:

There is a special place in journalism hell reserved for The Washington Post editorial board now that it has called on President Barack Obama to not pardon National Security Agency whistleblower Edward Snowden.

As Glenn Greenwald wrote, it’s an odd move for a news publication, “which owes its sources duties of protection, and which — by virtue of accepting the source’s materials and then publishing them — implicitly declares the source’s information to be in the public interest.” Notably, the Post decided to “inexcusably omit . . . that it was not Edward Snowden, but the top editors of the Washington Post who decided to make these programs public,” as Greenwald added.

The Post’s peculiar justification is as follows: While the board grudgingly conceded that reporters, thanks to Snowden, revealed that the NSA’s collection of domestic telephone metadata — which “was a stretch, if not an outright violation, of federal surveillance law” — it condemns him for revealing “a separate overseas NSA Internet-monitoring program, PRISM, that was both clearly legal and not clearly threatening to privacy.”

Washington Post opposition to a pardon for Edward Snowden isn’t odd at all.

Which story generates more PR for the Washington Post:

  1. The Washington Post, having won a Pulitzer prize due to Edward Snowden, joins a crowd calling for his pardon?
  2. The Washington Post, having won a Pulitzer prize due to Edward Snowden, opposes his being pardoned?

It’s not hard to guess which one generates more ad-views and therefore the potential for click-throughs.

I have no problems with the disclosure of PRISM, save for Snowden having to break his word as a contractor to keep his client’s secrets, well, secret.

No one could be unaware that the NSA engages in illegal and immoral activity on a daily basis before agreeing to be employed by them.

Although Snowden has done no worse than his former NSA employers, it illustrates why I have no trust in government agencies.

If they are willing to lie for what they consider to be “good” reasons to you, then they are most certainly willing to lie to me.

Once it is established that an agency, take the NSA for example, has lied on multiple occasions, on what basis would you trust them to be telling the truth today?

Their assurance, “we’re not lying this time?” That seems rather tenuous.

Same rule should apply to contractors who lie to or betray their clients.

by Patrick Durusau at September 20, 2016 11:30 PM

NSA: Being Found Beats Searching, Every Time

Equation Group Firewall Operations Catalogue by Mustafa Al-Bassam.

From the post:

This week someone auctioning hacking tools obtained from the NSA-based hacking group “Equation Group” released a dump of around 250 megabytes of “free” files for proof alongside the auction.

The dump contains a set of exploits, implants and tools for hacking firewalls (“Firewall Operations”). This post aims to be a comprehensive list of all the tools contained or referenced in the dump.

Mustafa’s post is a great illustration of why “being found beats searching, every time.”

Think of the cycles you would have to spend to duplicate this list. Multiple that by the number of people interested in this list. Assuming their time is not valueless, do you start to see the value-add of Mustafa’s post?

Mustafa found each of these items in the data dump and then preserved his finding for the use of others.

It’s not a very big step beyond this preservation to the creation of a container for each of these items, enabling the preservation of other material found on them or related to them.

Search is a starting place and not a destination.

Unless you enjoy repeating the same finding process over and over again.

Your call.

by Patrick Durusau at September 20, 2016 09:41 PM

September 19, 2016

Patrick Durusau

Stopping Terrorism: Thieves 2, Security Forces 0

Murray Weiss, Nicholas Rizzi, Trevor Kapp and Aidan Gardiner document in Thieves Helped Crack the Chelsea Bombing Case, Sources Say how common street thieves thwarted terrorist attacks in New York City and New Jersey.

Albeit inadvertently, thieves prevented a second explosion in Chelsea and multiple explosion in New Jersey.

See Thieves Helped Crack the Chelsea Bombing Case, Sources Say for the full story.

Great illustration the surveillance state can track people down, after they have committed a crime. Not good at stopping people before they commit a crime.

So why are we spending $billions on a surveillance state, that is out performed by street thieves?

Reward any thief discovering a terrorist bomb and turning it in with:


Good for life, non-violent crimes only.

Given the track record of security forces in the United States, a far better investment.

by Patrick Durusau at September 19, 2016 09:44 PM

Hackers May Fake Documents, Congress Publishes False Ones

I pointed out in Lions, Tigers, and Lies! Oh My! that Bruce Schneier‘s concerns over the potential for hackers faking documents to be leaked pales beside the mis-information distributed by government.

Executive Summary of Review of the Unauthorized Disclosures of Former National Security Agency Contractor Edward Snowden (their title, not mine), is a case in point.

Barton Gellman in The House Intelligence Committee’s Terrible, Horrible, Very Bad Snowden Report leaves no doubt the House Permanent Select Committee on Intelligence (HPSCI) report is a sack of lies.

Not mistakes, not exaggerations, not simply misleading, but actual, factual lies.

For example:

Since I’m on record claiming the report is dishonest, let’s skip straight to the fourth section. That’s the one that describes Snowden as “a serial exaggerator and fabricator,” with “a pattern of intentional lying.” Here is the evidence adduced for that finding, in its entirety.

“He claimed to have obtained a high school degree equivalent when in fact he never did.”

I do not know how the committee could get this one wrong in good faith. According to the official Maryland State Department of Education test report, which I have reviewed, Snowden sat for the high school equivalency test on May 4, 2004. He needed a score of 2250 to pass. He scored 3550. His Diploma No. 269403 was dated June 2, 2004, the same month he would have graduated had he returned to Arundel High School after losing his sophomore year to mononucleosis. In the interim, he took courses at Anne Arundel Community College.

See Gellman’s post for more examples.

All twenty-two members of the HPSCI signed the report. To save you time in the future, here’s a listing of the members of Congress who agreed to report these lies:



I sorted each group in to alphabetical order. The original listings were in an order that no doubt makes sense to fellow rodents but not to the casual reader.

That’s twenty-two members of Congress who are willing to distribute known falsehoods.

Does anyone have an equivalent list of hackers?

by Patrick Durusau at September 19, 2016 05:47 PM Corrects Clinton-Impeachment Search Results

After posting Search Alert: “…previous total of 261 to the new total of 0.” [Solved] yesterday, pointing out that a change from http:// to https:// altered a search result for Clinton w/in 5 words impeachment, I got an email this morning:


I appreciate the update and correction for saved searches, but my point about remote data changing without notice to you remains valid.

I’m still waiting for word on bulk downloads from both Wikileaks and DC Leaks.

Why leak information vital to public discussion and then limit access to search?

by Patrick Durusau at September 19, 2016 01:14 PM

Exotic Functional Data Structures: Hitchhiker Trees


Functional data structures are awesome–they’re the foundation of many functional programming languages, allowing us to express complex logic immutably and efficiently. There is one unfortunate limitation: these data structures must fit on the heap, limiting their lifetime to that of the process. Several years ago, Datomic appeared as the first functional database that addresses these limitations. However, there hasn’t been much activity in the realm of scalable (gigabytes to terabytes) functional data structures.

In this talk, we’ll first review some of the fundamental principles of functional data structures, particularly trees. Next, we’ll review what a B tree is and why it’s better than other trees for storage. Then, we’ll learn about a cool variant of a B tree called a fractal tree, how it can be made functional, and why it has phenomenal performance. Finally, we’ll unify these concepts to understand the Hitchhiker tree, an open-source functionally persistent fractal tree. We’ll also briefly look at an example API for using Hitchhiker trees that allows your application’s state to be stored off-heap, in the spirit of the 2014 paper “Fast Database Restarts at Facebook”.

David Greenberg (profile)

Hitchhiker Trees (GitHub)

Fast Database Restarts at Facebook by Aakash Goel, Bhuwan Chopra, Ciprian Gerea, Dhrúv Mátáni, Josh Metzler, Fahim Ul Haq, Janet Wiener.

You could have searched for all the information I have included, but isn’t it more convenient to have it “already found?”

by Patrick Durusau at September 19, 2016 01:00 AM

Introducing arxiv-sanity

Only a small part of Arxiv appears at: but it is enough to show the feasibility of this approach.

What captures my interest is the potential to substitute/extend the program to use other similarity measures.

Bearing in mind that searching is only the first step towards the acquisition and preservation of knowledge.

PS: I first saw this in a tweet by Data Science Renee.

by Patrick Durusau at September 19, 2016 12:44 AM

September 18, 2016

Patrick Durusau Search Alert: “…previous total of 261 to the new total of 0.” [Solved]

Odd message from the search alert this AM:


Here’s the search I created back in June, 2016:


My probably inaccurate recall at the moment was I was searching for some quote from the impeachment of Bill Clinton and was too lazy to specify a term of congress, hence:

all congresses – searching for Clinton within five words, impeachment

Fairly trivial search that produced 261 “hits.”

I set the search alert more to explore the search options than any expectation of different future results.

Imagine my surprise to find that all congresses – searching for Clinton within five words, impeachment performed today, results in 0 “hits.”

Suspecting some internal changes to the search interface, I re-entered the search today and got 0 “hits.”

Other saved searches with radically different search results as of today?

This is not, repeat not, the result of some elaborate conspiracy to assist Secretary Clinton in her bid for the presidency.

I do think something fundamental has gone wrong with searching at and it needs to be fixed.

This is an illustration of why Wikileaks, DC Leaks and other data sites should provide easy to access downloads in bulk of their materials.

Providing search interfaces to document collections is a public service, but document collections or access to them can change in ways not transparent to search users. Such as demonstrated by the CIA removing documents previously delivered to the Senate.

Petition Wikileaks, DC Leaks and other data sites for easy bulk downloads.

That will ensure the “evidence” will not shift under your feet and the availability of more sophisticated means of analysis than brute-force search.

Update: The changing from http:// to https:// by the site, trashed my save query and using http:// to re-perform the same search.

Using https:// returns the same 261 search results.

What your experience with other saved searches at

by Patrick Durusau at September 18, 2016 09:32 PM

Scalable Vector Graphics (SVG) 2

Scalable Vector Graphics (SVG) 2: W3C Candidate Recommendation 15 September 2016


This specification defines the features and syntax for Scalable Vector Graphics (SVG) Version 2. SVG is a language based on XML for describing two-dimensional vector and mixed vector/raster graphics. SVG content is stylable, scalable to different display resolutions, and can be viewed stand-alone, mixed with HTML content, or embedded using XML namespaces within other XML languages. SVG also supports dynamic changes; script can be used to create interactive documents, and animations can be performed using declarative animation features or by using script.


Comments on this Candidate Recommendation are welcome. Comments can be sent to, the public email list for issues related to vector graphics on the Web. This list is archived and senders must agree to have their message publicly archived from their first posting. To subscribe send an email to with the word subscribe in the subject line.

W3C publishes a Candidate Recommendation to indicate that the document is believed to be stable and to encourage implementation by the developer community. This Candidate Recommendation is expected to advance to Proposed Recommendation no earlier than 15 July 2017, but we encourage early review, and requests for normative changes after 15 November 2016 may be deferred to SVG 3.

15 November 2016 will be here sooner than you realize. Read and comment early and often.


by Patrick Durusau at September 18, 2016 01:32 AM

Introducing OpenType Variable Fonts

Introducing OpenType Variable Fonts by John Hudson.

From the post:

Version 1.8 of the OpenType font format specification introduces an extensive new technology, affecting almost every area of the format. An OpenType variable font is one in which the equivalent of multiple individual fonts can be compactly packaged within a single font file. This is done by defining variations within the font, which constitute a single- or multi-axis design space within which many font instances can be interpolated. A variable font is a single font file that behaves like multiple fonts.

There are numerous benefits to this technology. A variable font is a single binary with greatly-reduced comparable file size and, hence, smaller disc footprint and webfont bandwidth. This means more efficient packaging of embedded fonts, and faster delivery and loading of webfonts. The potential for dynamic selection of custom instances within the variations design space — or design-variations space, to use its technical name — opens exciting prospects for fine tuning the typographic palette, and for new kinds of responsive typography that can adapt to best present dynamic content to a reader’s device, screen orientation, or even reading distance.

The technology behind variable fonts is officially called OpenType Font Variations. It has been jointly developed by Microsoft, Google, Apple, and Adobe, in an unprecedented collaborative effort also involving technical experts from font foundries and font tool developers. In addition to specifying the font format additions and revisions, the working group has also committed to the goal of interoperable implementation, defining expected behaviours and test suites for software displaying variable fonts. This should be welcome news to font developers and users, who have often struggled with incompatible implementations of earlier aspects of OpenType that were left to the interpretation of individual software companies.

OpenType Font Variations builds on the model established in Apple’s TrueType GX variations in the mid-1990s, but has fully integrated that model into all aspects of the OpenType format, including OpenType Layout, and is available to both TrueType and Compact Font Format (CFF) flavours of OpenType. This has meant not only the addition of numerous tables to the format, but also revision of many existing tables; these changes are summarised in an appendix to this article, which is intended as an introduction and technological summary, primarily for font makers and font tool developers. The full technical specification for OpenType Font Variations is incorporated into the OpenType specification version 1.8.

John Hudson developed the remarkable SBL BibLit, SBL Greek and SBL Hebrew fonts for biblical studies.

An illustration from John’s post:


Figure 1. Normalised design space of a 3-axis variable font.
[Typeface: Kepler, an Adobe Original designed by Robert Slimbach.]

Looking forward to the SBL transitioning its biblical studies font set to this new font technology.

by Patrick Durusau at September 18, 2016 01:13 AM

Lions, Tigers, and Lies! Oh My!

How Long Until Hackers Start Faking Leaked Documents? by Bruce Schneier.

Bruce writes:

No one is talking about this, but everyone needs to be alert to the possibility. Sooner or later, the hackers who steal an organization’s data are going to make changes in them before they release them. If these forgeries aren’t questioned, the situations of those being hacked could be made worse, or erroneous conclusions could be drawn from the documents. When someone says that a document they have been accused of writing is forged, their arguments at least should be heard.


Governments, the United States Government in particular, leak false information and documents as a matter of normal business practice. Not to mention corporations and special interest groups that pay for false research (think Harvard, sugar studies) to be published.

In case you missed it, read Inside the fight to reveal the CIA’s torture secrets. In depth analysis of how the CIA not only lied, but destroyed evidence, spied on the U.S. Senate and otherwise misbehaved during an investigation into its torture practices.

That’s just one example. One could fill a multi-volume series with the lies, false documents and fabrications of the current and immediately previous U.S. President.

The argument torturers were “doing their duty to protect the country” and so merit a pass on accountability I recommend to any future political assassins. See how that plays out in a court of law. Hint: Crimes are crimes whatever your delusional understanding of “the greater good.”

The easier rule is:

Consider all documents/statements as false unless and until:

  1. You are satisfied of the truth of the document/statement, or
  2. It is to your advantage to treat the document/statement as true.

That covers situations like “fact free” accusations of cyber hacking against the Russians, North Koreans and/or Chinese by the U.S. government.

No “evidence” has been offered for any of those allegations, only vaguely worded rumors circulated among “experts” who are also government contractors. You can imagine the credibility I assign to such sources.

Probably happenstance but such contractors could be telling the truth. Unfortunately, in the absence of any real evidence, only the self-interested in such “truths” or the extremely credulous, crack-pipe users for example, would credit such statements.

by Patrick Durusau at September 18, 2016 12:49 AM

September 17, 2016

Patrick Durusau

How Mapmakers Make Mountains Rise Off the Page

How Mapmakers Make Mountains Rise Off the Page by Greg Miller.

From the post:

The world’s most beautiful places are rarely flat. From the soaring peaks of the Himalaya to the vast chasm of the Grand Canyon, many of the most stunning sites on Earth extend in all three dimensions. This poses a problem for mapmakers, who typically only have two dimensions to work with.

Fortunately, cartographers have some clever techniques for creating the illusion of depth, many of them developed by trial and error in the days before computers. The best examples of this work use a combination of art and science to evoke a sense of standing on a mountain peak or looking out an airplane window.

One of the oldest surviving maps, scratched onto an earthenware plate in Mesopotamia more than 4,000 years ago, depicts mountains as a series of little domes. It’s an effective symbol, still used today in schoolchildren’s drawings and a smartphone emoji, but it’s hardly an accurate representation of terrain. Over the subsequent centuries, mapmakers made mostly subtle improvements, varying the size and shape of their mountains, for example, to indicate that some were bigger than others.

But cartography became much more sophisticated during the Renaissance. Topographic surveys were done for the first time with compasses, measuring chains, and other instruments, resulting in accurate measurements of height. And mapmakers developed new methods for depicting terrain. One method, called hachuring, used lines to indicate the direction and steepness of a slope. You can see a later example of this in the 1807 map below of the Mexican volcano Pico de Orizaba. Cartographers today refer (somewhat dismissively) to mountains depicted this way as “woolly caterpillars.”

Stunning illusions of depth on maps, creating depth illusions in 2 dimensions (think computer monitors), history of map making techniques, are all reasons to read this post.

What seals it for me is that the quest for the “best” depth illusion continues. It’s not a “solved” problem. (No spoiler, see the post.)

Physical topography to one side, how are you going to bring “depth” to your topic map?

Some resources in a topic map may have great depth and others, unfortunately, may be like Wikipedia articles marked as:

This article has multiple issues.

How do you define and then enable navigation of your topic maps?

by Patrick Durusau at September 17, 2016 03:34 PM

September 16, 2016

Patrick Durusau

Guccifer 2.0 – 13Sept2016 Leak – A Reader’s Guide (Part 2) [Discarded Hard Drive?]

Guccifer 2.0‘s latest release of DNC documents is generally described as:

In total, the latest dump contains more than 600 megabytes of documents. It is the first Guccifer 2.0 release to not come from the hacker’s WordPress account. Instead, it was given out via a link to the small group of security experts attending the London conference. Guccifer 2.0 drops more DNC docs by Cory Bennett.

The “600 megabytes of documents” is an attention grabber, but how much of that 600 megabytes is useful and/or interesting?

The answer turns out to be, not a lot.

Here’s an overview of the directories and files:


Financial investment data.


Financial investment data.


Redistricting documents.


One file with fields of VANDatabaseCode StateID VanID cons_id?


A large amount of documentation for “IQ8,” apparently address cleaning software. Possibly useful if you want to know address cleaning rules from eight years ago.


Sound promising but is summary data based on media markets.


Early voting analysis.


Typical election voting analysis, from 2002 to 2008.


Duplicates to FEC filings. Checking the .csv file, data from 2008. BTW, you can find this date (2008) and later data of the same type at:


More duplicates to FEC filings. 11-26-08 NFC Members Raised.xlsx (no credit cards) – Dated but 453 names with contacts, amounts raised, etc.


Holiday card addresses, these are typical:



Two jpegs were included in the dump.


Lists of donors.

DNC union_05-09.txt
November VF EOC – MEYER.txt
dem9A6_NGP.txt – password protected


Grepping looks like May, 2009 data for the FEC.


More donor lists.



IT hosting proposals.

/Reports for Kaine

Various technology memos


IT security reports


Contacts not necessarily in FEC records

Contact List-Complete List.xlsx – Contact list with emails and phone numbers (no credit cards)
WH Staff 2010.xlsx – Names but no contact details

The data is eight (8) years old. Do you have the same phone number you did eight (8) years ago?

Guccifer 2.0 makes no claim on their blog for ownership of this leak.

A “hack” that results in eight year old data, most of which is more accessible at

No, this looks more like a discarded hard drive that was harvested and falsely labeled as a “hack” of the DNC.

Unless Guccifer 2.0 says otherwise on their blog, you have better things to do with your time.

PS: You don’t need old hard drives to discover pay-to-play purchases of public appointments. Check back tomorrow for: How-To Discover Pay-to-Play Appointment Pricing.

by Patrick Durusau at September 16, 2016 08:56 PM

How-To Discover Pay-to-Play Appointment Pricing

You have seen one or more variations on:

This Is How Much It ‘Costs’ To Get An Ambassadorship: Guccifer 2.0 Leaks DNC ‘Pay-To-Play’ Donor List

DNC Leak Exposes Pay to Play Politics, How the Clinton’s REALLY Feel About Obama

CORRUPTION! Obama caught up in Pay for Play Scandal, sold every job within his power to sell.

You may be wondering why CNN, the New York Time and the Washington Post aren’t all over this story?

While selling public offices surprises some authors, whose names I omitted out of courtesy to their families, selling offices is a regularized activity in the United States.

So regularized that immediately following each presidential election, the Government Printing Office publishes the United States Government Policy and Supporting Positions 2012 (Plum Book) that lists the 9,000 odd positions that are subject to presidential appointment.

From the description of the 2012 edition:

Every four years, just after the Presidential election, “United States Government Policy and Supporting Positions” is published. It is commonly known as the “Plum Book” and is alternately published between the House and Senate.

The Plum Book is a listing of over 9,000 civil service leadership and support positions (filled and vacant) in the Legislative and Executive branches of the Federal Government that may be subject to noncompetitive appointments, or in other words by direct appointment.

These “plum” positions include agency heads and their immediate subordinates, policy executives and advisors, and aides who report to these officials. Many positions have duties which support Administration policies and programs. The people holding these positions usually have a close and confidential relationship with the agency head or other key officials.

Even though the 2012 “plum” book is currently on sale for $19.00 (usual price is $38.00), given that a new one will appear later this year, consider using the free online version at: Plum Book 2012.


The online interface is nothing to brag on. You have to select filters and then find to obtain further information on positions. Very poor UI.

However, if under title you select “Chief of Mission, Monaco” and then select “find,” the resulting screen looks something like this:


To your far right there is a small arrow that if selected, takes you to the details:


If you were teaching a high school civics class, the question would be:

How much did Charles Rivkin have to donate to obtain the position of Chief of Mission, Monaco?

FYI, the CIA World FactBook gives this brief description for Monaco:

Monaco, bordering France on the Mediterranean coast, is a popular resort, attracting tourists to its casino and pleasant climate. The principality also is a banking center and has successfully sought to diversify into services and small, high-value-added, nonpolluting industries.

Unlike the unhappy writers that started this post, you would point the class to: Transaction Query By Individual Contributor at the Federal Election Commission site.

Entering the name Rivkin, Charles and select “Get Listing.”

Rivkin’s contributions are broken into categories and helpfully summed to assist you in finding the total.

Contributions to All Other Political Committees Except Joint Fundraising Committees – $72399.00

Joint Fundraising Contributions – $22300.00

Recipient of Joint Fundraiser Contributions – $36052.00

Caution: There is an anomalous Rivkin in that last category, contributing $40 to Donald Trump. For present discussions, I would subtract that from the grand total of:

$130,711 to be the Chief of Mission, Monaco.

Realize that this was not a lump sum payment but a steady stream of contributions starting in the year 2000.

Using the Transaction Query By Individual Contributor resource, you can correct stories that claim:

Jane Hartley paid DNC $605,000 and then was nominated by Obama to serve concurrently as the U.S. Ambassador to the French Republic and the Principality of Monaco.


(from: This Is How Much It ‘Costs’ To Get An Ambassadorship: Guccifer 2.0 Leaks DNC ‘Pay-To-Play’ Donor List)

If you run the FEC search you will find:

Contributions to Super PACs, Hybrid PACs and Historical Soft Money Party Accounts – $5000.00

Contributions to All Other Political Committees Except Joint Fundraising Committees – $516609.71

Joint Fundraising Contributions – $116000.00

Grand total: $637,609.71.

So, $637,609.71, not $605,000.00 but also as a series of contributions starting in 1997, not one lump sum.

You don’t have to search discarded hard drives to get pay-to-play appointment pricing. It’s all a matter of public record.

PS: I’m not sure how accurate or complete Nominations & Appointments (White House) may be, but its an easier starting place for current appointees than the online Plum book.

PPS: Estimated pricing for “Plum” book positions could be made more transparent. Not a freebie. Let me know if you are interested.

by Patrick Durusau at September 16, 2016 08:55 PM

Android Hacking – $200K First Prize – Other Offers?

Announcing the Project Zero Prize by Natalie Silvanovich.

Before reading the “official” post, consider this Dilbert cartoon.

Same logic applies here:

How to compare alternatives? ($200K sets a minimum bid.)

Potential for repeat business?

For a pwn of any Android phone, $200K sounds a bit “lite.”

Watch the Android issue tracker. A third-party bidder won’t insist on you using only your reported bugs in an exploit chain.

Before anyone gets indignant, the NSA, CIA, the “Russians,” Chinese, Mossad, etc., will all be watching as well. Think of it as having “governmental” ethics.

From the post:

Despite the existence of vulnerability rewards programs at Google and other companies, many unique, high-quality security bugs have been discovered as a result of hacking contests. Hoping to continue the stream of great bugs, we’ve decided to start our own contest: The Project Zero Prize.

The goal of this contest is to find a vulnerability or bug chain that achieves remote code execution on multiple Android devices knowing only the devices’ phone number and email address. Successful submissions will be eligible for the following prizes.

First Prize

$200,000 USD, awarded to the first winning entry.

Second Prize

$100,000 USD, awarded to the second winning entry.

Third Prize

At least $50,000 USD awarded by Android Security Rewards, awarded to additional winning entries.

In addition, participants who submit a winning entry will be invited to write a short technical report on their entry, which will be posted on the Project Zero Blog.

Contest Structure

This contest will be structured a bit differently than other contests. Instead of saving up bugs until there’s an entire bug chain, and then submitting it to the Project Zero Prize, participants are asked to report the bugs in the Android issue tracker. They can then be used as a part of submission by the participant any time during the six month contest period. Only the first person to file a bug can use it as a part of their submission, so file early and file often! Of course, any bugs that don’t end up being used in a submission will be considered for Android Security Rewards and any other rewards program at Google they might be eligible for after the contest has ended.

In addition, unlike other contests, the public sharing of vulnerabilities and exploits submitted is paramount. Participants will submit a full description of how their exploit works with their submission, which will eventually be published on the Project Zero blog. Every vulnerability and exploit technique used in each winning submission will be made public.

Full contest rules

Frequently asked questions

Contest period:

The Contest begins at 12:00:00 A.M. Pacific Time (PT) Zone in the United States on September 13, 2016 and ends at 11:59:59 P.M. PT on March 14, 2017 (“Contest Period”).

Good hunting!

PS: If possible, post the paid price for your exploit to help set the market price for future such exploits.

by Patrick Durusau at September 16, 2016 03:46 PM

If It’s Good Enough For Colin Powell…

Some security advice for Colin Powell to better protect his Gmail account by Graham Cluley.

Graham posted webmail security advice for Colin Powell after 26 months worth of his private emails were leaked by DC Leaks.

Nothing surprising for my readers but pass it on to the c-suite types.

You can search and view Powell’s emails at DC Leaks / Colin Luther Powell.

Graham omits any link to DC Leaks and says:

Of course, the emails aren’t just embarrassing and damaging for the privacy of Colin Powell – they are also potentially humiliating for the people he was corresponding with, who have had their own private conversations exposed to the world.

Oh, the horror! Invasions of privacy!

You mean like the millions of ordinary people who aren’t secure in their phone calls, emails, web browsing, banking, credit histories, etc., all the time?

The extremely privileged getting nicked every now and again doesn’t trouble me.

“Oversight” hasn’t protected our freedoms, perhaps constant and detailed exposure of the privileged will. Worth a shot!

by Patrick Durusau at September 16, 2016 01:20 PM

September 15, 2016

Patrick Durusau

Guccifer 2.0 – 13Sept2016 Leak – A Reader’s Guide (Part 1)

Guccifer 2.0 dropped a new bundle of DNC documents on September 13, 2016! Like most dumps, there was no accompanying guide to make use of that dump easier. ;-) Not a criticism, just an observation.

As a starting point to make your use of that dump a little easier, I am posting an ls -lR listing of all the files in that dump, post extraction with 7z and unrar. Guccifer2.0-13Sept2016-filelist.txt.

I’m working on a list of the files most likely to be of interest. Look for that tomorrow.

I can advise that no credit card numbers were included in this dump.


grep --color -H -rn --include="*.txt" '\([345]\{1\}[0-9]\{3\}\|6011\)\{1\}[ -]\?[0-9]\{4\}[ -]\?[0-9]\{2\}[-]\?[0-9]\{2\}[ -]\?[0-9]\{1,4\}'

I checked all the .txt files for credit card numbers. (I manually checked the xsl/xslx files.)

There were “hits” but those were in Excel exports of vote calculations. Funny how credit card numbers don’t ever begin with “0.” as a prefix.

Since valid credit card numbers vary in length, I don’t know of an easy way to avoid that issue. So inspection of the files it was.

by Patrick Durusau at September 15, 2016 02:00 AM

September 14, 2016

Patrick Durusau

Investigatory Powers Bill As Amended In Committee

For those of you watching the UK’s plunge into darkness, the Investigatory Powers Bill, as amended in committee, has been posted online.

Apologies for the lite amount of posting today but a very large data dump was released earlier today that distracted me from posting. ;-)

by Patrick Durusau at September 14, 2016 12:31 AM


FPCasts – Your source for Functional Programming Related Podcasts

Ten (10) sources of podcasts, with a link to the latest podcast from each source.

Without notice to the reader, the main link to each podcast series is a link to an RSS file.

Not a problem but took me by surprise on my first visit.

As useful as this will be, indexed podcasts where you could jump to a subject of interest would be even better.


by Patrick Durusau at September 14, 2016 12:04 AM

September 13, 2016

Patrick Durusau

R Weekly

R Weekly

A new weekly publication of R resources that began on 21 May 2016 with Issue 0.

Mostly titles of post and news articles, which is useful, but not as useful as short summaries, including the author’s name.

by Patrick Durusau at September 13, 2016 01:47 AM

Persuasive Cartography

Vintage Infodesign [161]: More examples of persuasive cartography, diagrams and charts from before 1960 by Tiago Veloso.

From the post:

A recurrent topic here on Vintage InfoDesign is “persuasive cartography” – the use of maps to influence and in many cases, deceive. We showcased examples of these maps here and here, with a special mention to the PJ Mode Collection at Cornell University Library. The collection was donated to Cornell back in 2014, and until now more than 300 examples are available online in high resolution.

A must for all of those interested in the subject, and we picked a few examples to open this post, courtesy of Allison Meier, who published a rente article about the PJ Mode Collection over at Hyperallergic.


Re-reading The Power of Maps (1992) by Denis Wood, in preparation to read Rethinking The Power of Maps (2010), also by Denis Wood, has made me acutely aware of aspersions such as:

“persuasive cartography” – the use of maps to influence and in many cases, deceive.

I say “aspersion” because Wood makes the case that all maps, with no exceptions, are the results of omissions, characterizations, enhancements, emphasis on some features and not others, for stated and/or unstated purposes.

Indeed, all of The Power of Maps (1992) is devoted to teasing out, with copious examples, where a user of a map may fail to recognize the “truth” of any map, is a social construct in a context shaped by factors known and unknown.

I characterize maps I disagree with as being deceptive, disingenuous, inaccurate, etc., but doesn’t take away from Wood’s central point that all maps are acts of persuasion.

The critical question being: Do you support the persuasion a map is attempting to make?

When I teach topic maps again I will make The Power of Maps (1992) required reading.

It is an important lesson to realize that any map, even a topic map, need only map so much of the territory or domain, as is sufficient for the task at hand.

A topic maps for nuclear physics won’t have much in common with one for war criminals of the George W. Bush and Barack Obama administrations.

Moreover, even topic maps of the same subject domain, may or may not merge in a meaningful way.

The idea of useful merger of arbitrary topic maps, like the idea of “objective maps,” is a false one that serves no useful purpose.

Say rather that topic maps can make enough information explicit about subjects to determine if merging will be meaningful to one or more users of a topic map. That alone is quite a feat.

by Patrick Durusau at September 13, 2016 01:12 AM

September 12, 2016

Patrick Durusau

Invite Government Into The Cellphone Fish Bowl

Long-Secret Stingray Manuals Detail How Police Can Spy On Phones by Sam Biddle.

Sam summarizes the high points from around 200 pages of current but never seen before Harris instruction manuals. Good show!

From the post:

Harris declined to comment. In a 2014 letter to the Federal Communications Commission, the company argued that if the owner’s manuals were released under the Freedom of Information Act, this would “harm Harris’s competitive interests” and “criminals and terrorist[s] would have access to information that would allow them to build countermeasures.”

Creating countermeasures?

Better, treat these documents as a basis for reverse-engineering Harris Stingrays into DIY kits.

False promises from known liars on use of “Stingray”s or “IMSI catchers are not going to combat government abuse of this technology.

Inviting governments to join the general public in the cellphone fish bowl might.

Can you imagine the reaction of your local sheriff, district attorney, judge, etc. when they are being silently tracked?

Not just in their routine duties but to mistresses, drug dens, prostitutes, porn parlors and the like?

We won’t have to wait long for the arrival of verifiable, secure cellphones.

by Patrick Durusau at September 12, 2016 09:25 PM

Inside the fight to reveal the CIA’s torture secrets [Support The Guardian]

Inside the fight to reveal the CIA’s torture secrets by Spencer Ackerman.

Part one: Crossing the bridge

Part two: A constitutional crisis

Part three: The aftermath

Ackerman captures the drama of a failed attempt by the United States Senate to exercise oversight on the Central Intelligence Agency (CIA) in this series.

I say “failed attempt” because even if the full 6,200+ page report is ever released, the lead Senate investigator, Daniel Jones, obscured the identities of all the responsible CIA personnel and sources of information in the report.

Even if the full report is serialized in your local newspaper, the CIA contractors and staff guilty of multiple felonies, will be not one step closer to being brought to justice.

To that extent, the “full” report is itself a disservice to the American people, who elect their congressional leaders and expect them to oversee agencies such as the CIA.

From Ackerman’s account you will learn that the CIA can dictate to its overseers, the location and conditions under which it can view documents, decide which documents it is allowed to see and in cases of conflict, the CIA can spy on the Select Senate Committee on Intelligence.

Does that sound like effective oversight to you?

BTW, you will also learn that members of the “most transparent administration in history” aided and abetted the CIA in preventing an effective investigation into the CIA and its torture program. I use “aided and abetted” deliberately and in their legal sense.

I mention in my header that you should support The Guardian.

This story by Spencer Ackerman is one reason.

Another reason is that given the plethora of names and transfers recited in Ackerman’s story, we need The Guardian to cover future breaks in this story.

Despite the tales of superhuman security, nobody is that good.

I leave you with the thought that if more than one person knows a secret, then it it can be discovered.

Check Ackerman’s story for a starting list of those who know secrets about the CIA torture program.

Good hunting!

by Patrick Durusau at September 12, 2016 08:19 PM

United States Treaties [Library of Congress] – Incomplete – Missing Native American Treaties

United States Treaties Added to the Law Library Website by Jennifer González.

From the webpage:

We have added the United States Treaty Series, compiled by Charles I. Bevans, to our online digital collection. This collection includes treaties that the United States signed with other countries from 1776 to 1949. The collection consists of 13 volumes: four volumes of multilateral treaties, eight volumes of bilateral treaties and one volume of an index.

Multilateral Treaties

Bilateral Treaties

Charles I. Bevans did not include the treaties with native Americans listed at Treaties Between the United States and Native Americans, part of the Avalon project at Yale Law School, Lillian Goldman Law Library.

The Avalon project lists thirty treaties from 1778 – 1868, along with links to their full texts.

For your reading convenience, the list follows:

  • Chickasaw Peace Treaty Feeler
  • 1784
  • Treaty With the Wyandot, etc.

  • Treaty With the Chocktaw

  • Treaty With the Shawnee
  • 1789
  • Treaty With the Six Nations
  • 1790
  • Treaty With the Cherokee
  • 1794
  • Treaty With the Six Nations

  • Treaty of Greenville
  • 1805
  • Treaty With the Chickasaw
  • 1818
  • Treaty With the Chickasaw : 1818
  • 1826
  • Treaty With The Potawatami, 1828.
  • 1830
  • Treaty With the Potawatami, 1832.
  • 1852
  • Treaty with the Comanche, Kiowa, and Apache; July 27, 1853
  • 1865
  • Treaty with the Apache, Cheyenne, and Arapaho; October 17, 1865.
  • 1867
  • Fort Laramie Treaty : 1868
  • You should draw your own conclusions about why these treaties were omitted from the Bevans edition. Their omission isn’t mentioned or explained in its preface.

    by Patrick Durusau at September 12, 2016 01:58 AM

    projectSlam [Public self-protection. Think Trojans.]

    projectSlam by Michael Banks.

    From the webpage:

    Project Slam is an initiative to utilize open source programs, operating systems and tools to aid in defending against nefarious adversaries. The overall focus is to research adversary’s behavior and utilize the data that can be captured to generate wordlists, blacklists, and expose methodologies of various threat actors that can be provided back to the public in a meaningful and useful way…

    Partial data for 2016 includes:

    A medium interaction honeypot was deployed with a focus on usernames and passwords. While attackers were attacking the honeypot, projectSlam was sucking up the attempts to generate a wordlist of what NOT to make your passwords.

    Imagine that! Instead of hoarding information from a vulnerable public, or revealing only the top 10/20 worst passwords, Michael is posting the passwords hackers are looking for online!

    Looking forward to more results from projectSlam and cybersecurity projects that enable the public to protect themselves!

    Contrast a national network of Trojan dispensers versus Trojan representatives catching couples in need of a condom.

    Which one is more effective?

    Promote cyberself-protection today!

    by Patrick Durusau at September 12, 2016 12:39 AM

    Watch your Python script with strace


    Modern operating systems sandbox each process inside of a virtual memory map from which direct I/O operations are generally impossible. Instead, a process has to ask the operating system every time it wants to modify a file or communicate bytes over the network. By using operating system specific tools to watch the system calls a Python script is making — using “strace” under Linux or “truss” under Mac OS X — you can study how a program is behaving and address several different kinds of bugs.

    Brandon Rhodes does a delightful presentation on using strace with Python.

    Slides for Tracing Python with strace or truss.

    I deeply enjoyed this presentation, which I discovered while looking at a Python regex issue.

    Anticipate running strace on the Python script this week and will report back on any results or failure to obtain results! (Unlike in academic publishing, experiments and investigations do fail.)

    by Patrick Durusau at September 12, 2016 12:21 AM

    September 11, 2016

    Patrick Durusau

    Weapons of Math Destruction:… [Constructive Knowledge of Discriminatory Impact?]

    Weapons of Math Destruction: invisible, ubiquitous algorithms are ruining millions of lives by Cory Doctorow.

    From the post:

    I’ve been writing about the work of Cathy “Mathbabe” O’Neil for years: she’s a radical data-scientist with a Harvard PhD in mathematics, who coined the term “Weapons of Math Destruction” to describe the ways that sloppy statistical modeling is punishing millions of people every day, and in more and more cases, destroying lives. Today, O’Neil brings her argument to print, with a fantastic, plainspoken, call to arms called (what else?) Weapons of Math Destruction.


    I’ve followed Cathy’s posts long enough to recommend Weapons of Math Destruction sight unseen. (Publication date September 6, 2016.)

    Warning: If you read Weapons of Math Destruction, unlike executives who choose models based on their “gut,” or “instinct,” you may be charged with constructive knowledge of how you model discriminates against group X or Y.

    If, like a typical Excel user, you can honestly say “I type in the numbers here and the output comes out there,” it’s going to be hard to prove any intent to discriminate.

    You are no more responsible for a result than a pump handle is responsible for cholera.

    Doctorow’s conclusion:

    O’Neil’s book is a vital crash-course in the specialized kind of statistical knowledge we all need to interrogate the systems around us and demand better.

    depends upon your definition of “better.”

    “Better” depends on your goals or those of a client.


    PS: It is important to understand models/statistics/data so you can shape results to be your definition of “better.” But acknowledging all results are shaped. The critical question is “What shape do you want?”

    by Patrick Durusau at September 11, 2016 01:03 AM

    September 10, 2016

    Patrick Durusau

    Self-Destruct Smart Phone Feature

    The Samsung Galaxy 7 Note offer a self-destruct feature may defeat even quantum computers. It melts itself.

    Like most new features, it’s erratic and difficult to invoke reliably. The 35 known cases don’t establish a pattern of how to make the Galaxy 7 Note explode on-demand, an essential characteristic for a self-destruct feature.

    Having discovered this feature accidentally, one expects Samsung to offer the self-destruct feature on a standard Galaxy 8. Pricing has yet to be determined.


    From the post:


    PS: The self-destruct UI should be two-buttons. Say on/off plus phone. Something easy to remember and perform as you are being seized.

    by Patrick Durusau at September 10, 2016 08:15 PM

    September 09, 2016

    Patrick Durusau

    Let’s Offend Mark Zuckerberg! Napalm-Girl – Please Repost Image

    Facebook deletes Norwegian PM’s post as ‘napalm girl’ row escalates by Alice Ross and Julia Carrie Wong.


    From the post:

    Facebook has deleted a post by the Norwegian prime minister in an escalating row over the website’s decision to remove content featuring the Pulitzer-prize winning “napalm girl” photograph from the Vietnam war.

    Erna Solberg, the Conservative prime minister, called on Facebook to “review its editing policy” after it deleted her post voicing support for a Norwegian newspaper that had fallen foul of the social media giant’s guidelines.

    Solberg was one of a string of Norwegian politicians who shared the iconic image after Facebook deleted a post from Tom Egeland, a writer who had included the Nick Ut picture as one of seven photographs he said had “changed the history of warfare”.

    I remember when I first saw that image during the Vietnam War. As if the suffering of the young girl wasn’t enough, the photo captures the seeming indifference of the soldiers in the background.

    This photo certainly changes approach of the U.S. military to press coverage of wars. From TV cameras recording live footage of battles and the wounded in Vietnam, present day coverage is highly sanitized and “safe” for any viewing audience.

    There are the obligatory shots of the aftermath of “terrorist” bombings but where is the live reporting on allied bombing of hospitals, weddings, schools and the like? Where are the shrieking wounded and death rattles?

    Too much of that and American voters might get the idea that war has real consequences, for real people. Well, war always does but it the profit consequences that concern military leadership and their future employers. Can’t have military spending without a war and a supposed enemy.

    Zuckerberg should not shield us and especially not children from the nasty side of war.

    Sanitized and “safe” reporting of wars is a recipe for the continuation of the same.

    Read more about the photo and the photographer who took it: Nick Ut’s Napalm Girl Helped End the Vietnam War. Today in L.A., He’s Still Shooting

    You can’t really tell from the photo but the girl’s skin (Kim Phuc) was melting off in strips. That’s the reality of war that needs to be brought home to everyone who supports war to achieve abstract policy goals and objectives.

    by Patrick Durusau at September 09, 2016 04:01 PM

    No Properties/No Structure – But, Subject Identity

    Jack Park has prodded me into following some category theory and data integration papers. More on that to follow but as part of that, I have been watching Bartosz Milewski’s lectures on category theory, reading his blog, etc.

    In Category Theory 1.2, Mileski goes to great lengths to emphasize:

    Objects are primitives with no properties/structure – a point

    Morphism are primitives with no properties/structure, but do have a start and end point

    Late in that lecture, Milewski says categories are the “ultimate in data hiding” (read abstraction).

    Despite their lack of properties and structure, both objects and morphisms have subject identity.


    I think that is more than clever use of language and here’s why:

    If I want to talk about objects in category theory as a group subject, what can I say about them? (assuming a scope of category theory)

    1. Objects have no properties
    2. Objects have no structure
    3. Objects mark the start and end of morphisms (distinguishes them from morphisms)
    4. Every object has an identity morphism
    5. Every pair of objects may have 0, 1, or many morphisms between them
    6. Morphisms may go in both directions, between a pair of morphisms
    7. An object can have multiple morphisms that start and end at it

    Incomplete and yet a lot of things to say about something that has no properties and no structure. ;-)

    Bearing in mind, that’s just objects in general.

    I can also talk about a specific object at a particular time point in the lecture and screen location, which itself is a subject.

    Or an object in a paper or monograph.

    We can declare primitives, like objects and morphisms, but we should always bear in mind they are declared to be primitives.

    For other purposes, we can declare them to be otherwise.

    by Patrick Durusau at September 09, 2016 01:08 AM

    September 07, 2016

    Patrick Durusau

    New Plea: Charges Don’t Reflect Who I Am Today

    Traditionally, pleas have been guilty, not guilty, not guilty by reason of insanity and nolo contendere (no contest).

    Beth Cobert, acting director at the OPM, has added a fifth plea:

    Charges Don’t Reflect Who I Am Today

    Greg Masters captures the new plea in Congressional report faults OPM over breach preparedness and response:

    While welcoming the committee’s acknowledgement of the OPM’s progress, Beth Cobert, acting director at the OPM, disagreed with the committee’s findings in a blog post published on the OPM site on Wednesday, responding that the report does “not fully reflect where this agency stands today.”
    … (emphasis added)

    Any claims about “…where this agency stands today…” are a distraction from the question of responsibility for a system wide failure of security.

    If you know any criminal defense lawyers, suggest they quote Beth Cobert as setting a precedent for responding to allegations of prior misconduct with:

    Charges Don’t Reflect Who I Am Today

    Please forward links to news reports of successful use of that plea to my attention.

    by Patrick Durusau at September 07, 2016 08:20 PM

    Audio/Video Conferencing – Apache OpenMeetings

    Apache OpenMeetings

    Ignorance of Apache OpenMeetings is the only explanation I can offer for non-Apache Openmeetings webinars with one presenter, listeners and a chat channel.

    Proprietary solutions limit your audience’s choice of platforms, while offering no, repeat no advantages over Apache OpenMeetings.

    It may be that your IT department is too busy creating SQLi weaknesses to install and configure Apache OpenMeetings, but even so that’s a fairly poor excuse for not using it.

    If you just have to spend money to “trust” software, there are commercial services that offer hosting and other services for Apache OpenMeetings.

    Apologies, sort of, for the Wednesday rant, but I tire of limited but “popular logo” commercial services used in place of robust open source solutions.

    by Patrick Durusau at September 07, 2016 07:47 PM

    Data Provenance: A Short Bibliography

    The video Provenance for Database Transformations by Val Tannen ends with a short bibliography.

    Links and abstracts for the items in Val’s bibliography:

    Provenance Semirings by Todd J. Green, Grigoris Karvounarakis, Val Tannen. (2007)

    We show that relational algebra calculations for incomplete databases, probabilistic databases, bag semantics and whyprovenance are particular cases of the same general algorithms involving semirings. This further suggests a comprehensive provenance representation that uses semirings of polynomials. We extend these considerations to datalog and semirings of formal power series. We give algorithms for datalog provenance calculation as well as datalog evaluation for incomplete and probabilistic databases. Finally, we show that for some semirings containment of conjunctive queries is the same as for standard set semantics.

    Update Exchange with Mappings and Provenance by Todd J. Green, Grigoris Karvounarakis, Zachary G. Ives, Val Tannen. (2007)

    We consider systems for data sharing among heterogeneous peers related by a network of schema mappings. Each peer has a locally controlled and edited database instance, but wants to ask queries over related data from other peers as well. To achieve this, every peer’s updates propagate along the mappings to the other peers. However, this update exchange is filtered by trust conditions — expressing what data and sources a peer judges to be authoritative — which may cause a peer to reject another’s updates. In order to support such filtering, updates carry provenance information. These systems target scientific data sharing applications, and their general principles and architecture have been described in [20].

    In this paper we present methods for realizing such systems. Specifically, we extend techniques from data integration, data exchange, and incremental view maintenance to propagate updates along mappings; we integrate a novel model for tracking data provenance, such that curators may filter updates based on trust conditions over this provenance; we discuss strategies for implementing our techniques in conjunction with an RDBMS; and we experimentally demonstrate the viability of our techniques in the ORCHESTRA prototype system.

    Annotated XML: Queries and Provenance by J. Nathan Foster, Todd J. Green, Val Tannen. (2008)

    We present a formal framework for capturing the provenance of data appearing in XQuery views of XML. Building on previous work on relations and their (positive) query languages, we decorate unordered XML with annotations from commutative semirings and show that these annotations suffice for a large positive fragment of XQuery applied to this data. In addition to tracking provenance metadata, the framework can be used to represent and process XML with repetitions, incomplete XML, and probabilistic XML, and provides a basis for enforcing access control policies in security applications.

    Each of these applications builds on our semantics for XQuery, which we present in several steps: we generalize the semantics of the Nested Relational Calculus (NRC) to handle semiring-annotated complex values, we extend it with a recursive type and structural recursion operator for trees, and we define a semantics for XQuery on annotated XML by translation into this calculus.

    Containment of Conjunctive Queries on Annotated Relations by Todd J. Green. (2009)

    We study containment and equivalence of (unions of) conjunctive queries on relations annotated with elements of a commutative semiring. Such relations and the semantics of positive relational queries on them were introduced in a recent paper as a generalization of set semantics, bag semantics, incomplete databases, and databases annotated with various kinds of provenance information. We obtain positive decidability results and complexity characterizations for databases with lineage, why-provenance, and provenance polynomial annotations, for both conjunctive queries and unions of conjunctive queries. At least one of these results is surprising given that provenance polynomial annotations seem “more expressive” than bag semantics and under the latter, containment of unions of conjunctive queries is known to be undecidable. The decision procedures rely on interesting variations on the notion of containment mappings. We also show that for any positive semiring (a very large class) and conjunctive queries without self-joins, equivalence is the same as isomorphism.

    Collaborative Data Sharing with Mappings and Provenance by Todd J. Green, dissertation. (2009)

    A key challenge in science today involves integrating data from databases managed by different collaborating scientists. In this dissertation, we develop the foundations and applications of collaborative data sharing systems (CDSSs), which address this challenge. A CDSS allows collaborators to define loose confederations of heterogeneous databases, relating them through schema mappings that establish how data should flow from one site to the next. In addition to simply propagating data along the mappings, it is critical to record data provenance (annotations describing where and how data originated) and to support policies allowing scientists to specify whose data they trust, and when. Since a large data sharing confederation is certain to evolve over time, the CDSS must also efficiently handle incremental changes to data, schemas, and mappings.

    We focus in this dissertation on the formal foundations of CDSSs, as well as practical issues of its implementation in a prototype CDSS called Orchestra. We propose a novel model of data provenance appropriate for CDSSs, based on a framework of semiring-annotated relations. This framework elegantly generalizes a number of other important database semantics involving annotated relations, including ranked results, prior provenance models, and probabilistic databases. We describe the design and implementation of the Orchestra prototype, which supports update propagation across schema mappings while maintaining data provenance and filtering data according to trust policies. We investigate fundamental questions of query containment and equivalence in the context of provenance information. We use the results of these investigations to develop novel approaches to efficiently propagating changes to data and mappings in a CDSS. Our approaches highlight unexpected connections between the two problems and with the problem of optimizing queries using materialized views. Finally, we show that semiring annotations also make sense for XML and nested relational data, paving the way towards a future extension of CDSS to these richer data models.

    Provenance in Collaborative Data Sharing by Grigoris Karvounarakis, dissertation. (2009)

    This dissertation focuses on recording, maintaining and exploiting provenance information in Collaborative Data Sharing Systems (CDSS). These are systems that support data sharing across loosely-coupled, heterogeneous collections of relational databases related by declarative schema mappings. A fundamental challenge in a CDSS is to support the capability of update exchange — which publishes a participant’s updates and then translates others’ updates to the participant’s local schema and imports them — while tolerating disagreement between them and recording the provenance of exchanged data, i.e., information about the sources and mappings involved in their propagation. This provenance information can be useful during update exchange, e.g., to evaluate provenance-based trust policies. It can also be exploited after update exchange, to answer a variety of user queries, about the quality, uncertainty or authority of the data, for applications such as trust assessment, ranking for keyword search over databases, or query answering in probabilistic databases.

    To address these challenges, in this dissertation we develop a novel model of provenance graphs that is informative enough to satisfy the needs of CDSS users and captures the semantics of query answering on various forms of annotated relations. We extend techniques from data integration, data exchange, incremental view maintenance and view update to define the formal semantics of unidirectional and bidirectional update exchange. We develop algorithms to perform update exchange incrementally while maintaining provenance information. We present strategies for implementing our techniques over an RDBMS and experimentally demonstrate their viability in the ORCHESTRA prototype system. We define ProQL, iv a query language for provenance graphs that can be used by CDSS users to combine data querying with provenance testing as well as to compute annotations for their data, based on their provenance, that are useful for a variety of applications. Finally, we develop a prototype implementation ProQL over an RDBMS and indexing techniques to speed up provenance querying, evaluate experimentally the performance of provenance querying and the benefits of our indexing techniques.

    Provenance for Aggregate Queries by Yael Amsterdamer, Daniel Deutch, Val Tannen. (2011)

    We study in this paper provenance information for queries with aggregation. Provenance information was studied in the context of various query languages that do not allow for aggregation, and recent work has suggested to capture provenance by annotating the different database tuples with elements of a commutative semiring and propagating the annotations through query evaluation. We show that aggregate queries pose novel challenges rendering this approach inapplicable. Consequently, we propose a new approach, where we annotate with provenance information not just tuples but also the individual values within tuples, using provenance to describe the values computation. We realize this approach in a concrete construction, first for “simple” queries where the aggregation operator is the last one applied, and then for arbitrary (positive) relational algebra queries with aggregation; the latter queries are shown to be more challenging in this context. Finally, we use aggregation to encode queries with difference, and study the semantics obtained for such queries on provenance annotated databases.

    Circuits for Datalog Provenance by Daniel Deutch, Tova Milo, Sudeepa Roy, Val Tannen. (2014)

    The annotation of the results of database queries with provenance information has many applications. This paper studies provenance for datalog queries. We start by considering provenance representation by (positive) Boolean expressions, as pioneered in the theories of incomplete and probabilistic databases. We show that even for linear datalog programs the representation of provenance using Boolean expressions incurs a super-polynomial size blowup in data complexity. We address this with an approach that is novel in provenance studies, showing that we can construct in PTIME poly-size (data complexity) provenance representations as Boolean circuits. Then we present optimization techniques that embed the construction of circuits into seminaive datalog evaluation, and further reduce the size of the circuits. We also illustrate the usefulness of our approach in multiple application domains such as query evaluation in probabilistic databases, and in deletion propagation. Next, we study the possibility of extending the circuit approach to the more general framework of semiring annotations introduced in earlier work. We show that for a large and useful class of provenance semirings, we can construct in PTIME poly-size circuits that capture the provenance.

    Incomplete but a substantial starting point exploring data provenance and its relationship/use with topic map merging.

    To get a feel for “data provenance” just prior to the earliest reference here (2007), consider A Survey of Data Provenance Techniques by Yogesh L. Simmhan, Beth Plale, Dennis Gannon, published in 2005.

    Data management is growing in complexity as large-scale applications take advantage of the loosely coupled resources brought together by grid middleware and by abundant storage capacity. Metadata describing the data products used in and generated by these applications is essential to disambiguate the data and enable reuse. Data provenance, one kind of metadata, pertains to the derivation history of a data product starting from its original sources.

    The provenance of data products generated by complex transformations such as workflows is of considerable value to scientists. From it, one can ascertain the quality of the data based on its ancestral data and derivations, track back sources of errors, allow automated re-enactment of derivations to update a data, and provide attribution of data sources. Provenance is also essential to the business domain where it can be used to drill down to the source of data in a data warehouse, track the creation of intellectual property, and provide an audit trail for regulatory purposes.

    In this paper we create a taxonomy of data provenance techniques, and apply the classification to current research efforts in the field. The main aspect of our taxonomy categorizes provenance systems based on why they record provenance, what they describe, how they represent and store provenance, and ways to disseminate it. Our synthesis can help those building scientific and business metadata-management systems to understand existing provenance system designs. The survey culminates with an identification of open research problems in the field.

    Another rich source of reading material!

    by Patrick Durusau at September 07, 2016 12:45 AM

    September 06, 2016

    Patrick Durusau

    Why OrientDB?

    Why OrientDB?

    From the webpage:

    Understanding the strengths, limitations and trade-offs among the leading DBMS options can be DIS-ORIENTING. Developers have grown tired of making compromises in speed and flexibility or supporting several DBMS products to satisfy their use case requirements.

    Thus, OrientDB was born: the first Multi-Model Open Source NoSQL DBMS that combines the power of graphs and the flexibility of documents into one scalable, high-performance operational database.

    In addition to great software, OrientDB also has a clever marketing department:


    That’s an image from an OrientDB tweet that sends you to the Why OrientDB? page.

    What’s your great image to gain attention?

    PS: I remember one from an IT zine in the 1990’s where employee’s were racing around the office on fire. Does that ring a bell with anyone? Seems like it was one of the large format, Computer Shopper size zines.

    by Patrick Durusau at September 06, 2016 09:45 PM

    Why No Wild Wild West? Parity Between Large/Small Governments? Citizens?

    Jordyn Phelps reports in Obama Tells Putin Hackers Shouldn’t Create Cyber ‘Wild Wild West’:

    “What we cannot do is have a situation where this becomes the wild, wild West, where countries that have significant cyber capacity start engaging in unhealthy competition or conflict through these means,” the president said. He added that nations have enough to worry about in the realm of cyber attacks from non-state actors without nation-states engaging in hacking against one another.

    Interesting that weapons that don’t require a major industrial base, like poison gas, biological, computer hacking, are such a pressing concern.

    Weapons that small governments, small groups of people or even single individuals can produce and use effectively, well, those need to be severely policed if not prohibited outright.

    If anything, there is too much hacking of private email accounts, celebrity nude pics, and rasomware with too little hacking of government emails, databases and document troves.

    For example, there was a coup in Egypt (the most recent one 2013) but did you see vast quantities of diplomatic correspondence being leaked?

    I am always disappointed when governments change and a bright spotlight isn’t shown on its predecessors. Especially if those predecessors had dealings with the United States and its minions. It’s not possible to tell what might be unearthed.

    Hacking maybe the great leveler between governments and between governments and their peoples.

    What’s there not to like about that?

    PS: Unless, like Obama, you are loathe to share any of the wealth and power in the world.

    by Patrick Durusau at September 06, 2016 08:01 PM

    September 05, 2016

    Patrick Durusau

    Merge 5 Proxies, Take Away 1 Proxy = ? [Data Provenance]

    Provenance for Database Transformations by Val Tannen. (video)


    Database transformations (queries, views, mappings) take apart, filter,and recombine source data in order to populate warehouses, materialize views,and provide inputs to analysis tools. As they do so, applications often need to track the relationship between parts and pieces of the sources and parts and pieces of the transformations’ output. This relationship is what we call database provenance.

    This talk presents an approach to database provenance that relies on two observations. First, provenance is a kind of annotation, and we can develop a general approach to annotation propagation that also covers other applications, for example to uncertainty and access control. In fact, provenance turns out to be the most general kind of such annotation,in a precise and practically useful sense. Second, the propagation of annotation through a broad class of transformations relies on just two operations: one when annotations are jointly used and one when they are used alternatively.This leads to annotations forming a specific algebraic structure, a commutative semiring.

    The semiring approach works for annotating tuples, field values and attributes in standard relations, in nested relations (complex values), and for annotating nodes in (unordered) XML. It works for transformations expressed in the positive fragment of relational algebra, nested relational calculus, unordered XQuery, as well as for Datalog, GLAV schema mappings, and tgd constraints. Finally, when properly extended to semimodules it works for queries with aggregates. Specific semirings correspond to earlier approaches to provenance, while others correspond to forms of uncertainty, trust, cost, and access control.

    What does happen when you subtract from a merge? (Referenced here as an “aggregation.”)

    Although possible to paw through logs to puzzle out a result, Val suggests there are more robust methods at our disposal.

    I watched this over the weekend and be forewarned, heavy sledding ahead!

    This is an active area of research and I have only begun to scratch the surface for references.

    I may discover differently, but the “aggregation” I have seen thus far relies on opaque strings.

    Not that all uses of opaque strings are inappropriate, but imagine the power of treating a token as an opaque string for one use case and exploding that same token into key/value pairs for another.


    by Patrick Durusau at September 05, 2016 11:45 PM

    Keystroke Recognition Using WiFi Signals [Identifying Users With WiFi?]

    Keystroke Recognition Using WiFi Signals by Kamran Ali, Alex X. Liu, Wei Wang, and Muhammad Shahzad.


    Keystroke privacy is critical for ensuring the security of computer systems and the privacy of human users as what being typed could be passwords or privacy sensitive information. In this paper, we show for the first time that WiFi signals can also be exploited to recognize keystrokes. The intuition is that while typing a certain key, the hands and fingers of a user move in a unique formation and direction and thus generate a unique pattern in the time-series of Channel State Information (CSI) values, which we call CSI-waveform for that key. In this paper, we propose a WiFi signal based keystroke recognition system called WiKey. WiKey consists of two Commercial Off-The-Shelf (COTS) WiFi devices, a sender (such as a router) and a receiver (such as a laptop). The sender continuously emits signals and the receiver continuously receives signals. When a human subject types on a keyboard, WiKey recognizes the typed keys based on how the CSI values at the WiFi signal receiver end. We implemented the WiKey system using a TP-Link TL-WR1043ND WiFi router and a Lenovo X200 laptop. WiKey achieves more than 97.5% detection rate for detecting the keystroke and 96.4% recognition accuracy for classifying single keys. In real-world experiments, WiKey can recognize keystrokes in a continuously typed sentence with an accuracy of 93.5%.

    In discussing the limitations of their technique the authors mention:

    User Specific Training. In our current implementation of WiKey, we train the classifiers using one user and test the classifier using the test samples from the same user. However, we hypothesize that if we train our classifier using a large number of users, the trained classifier will be able to capture commonalities between users and will then be able to recognize the keystrokes of any unknown user. At the same time, we also acknowledge that it is extremely challenging to build such a universal classifier that works for almost every user because WiFi signals are susceptible to various factors such as finger length/width, typing styles, and environmental noise.

    The more interesting case would be identifying users in surveillance mode by their keystrokes, assuming persistent digital capture of their keystrokes wasn’t possible.

    Subject (as in human) identification by WiFi signals?

    by Patrick Durusau at September 05, 2016 09:41 PM

    Data Science Series [Starts 9 September 2016 but not for *nix users]

    The BD2K Guide to the Fundamentals of Data Science Series

    From the webpage:

    Every Friday beginning September 9, 2016
    9am – 10am Pacific Time

    Working jointly with the BD2K Centers-Coordination Center (BD2KCCC) and the NIH Office of Data Science, the BD2K Training Coordinating Center (TCC) is spearheading this virtual lecture series on the data science underlying modern biomedical research. Beginning in September 2016, the seminar series will consist of regularly scheduled weekly webinar presentations covering the basics of data management, representation, computation, statistical inference, data modeling, and other topics relevant to “big data” biomedicine. The seminar series will provide essential training suitable for individuals at all levels of the biomedical community. All video presentations from the seminar series will be streamed for live viewing, recorded, and posted online for future viewing and reference. These videos will also be indexed as part of TCC’s Educational Resource Discovery Index (ERuDIte), shared/mirrored with the BD2KCCC, and with other BD2K resources.

    View all archived videos on our YouTube channel:

    Please join our weekly meetings from your computer, tablet or smartphone.

    You can also dial in using your phone.

    United States +1 (872) 240-3311

    Access Code: 786-506-213

    First GoToMeeting? Try a test session:

    Of course, running Ubuntu, when I follow the “First GoToMeeting? Try a test session,” I get this result:

    OS not supported

    Long-Term Fix: Upgrade your computer.

    You or your IT Admin will need to upgrade your computer’s operating system in order to install our desktop software at a later date.

    Since this is most likely a lecture format, could just stream the video and use WebConf as a Q/A channel.

    Of course, that would mean losing the various technical difficulties, licensing fees, etc., all of which are distractions from the primary goal of the project.

    But who wants that?

    PS: Most *nix users won’t be interested except to refer others but still, over engineered solutions to simple issues should not be encouraged.

    by Patrick Durusau at September 05, 2016 01:20 AM

    Plugins for Newsgathering and Verification

    7 vital browser plugins for newsgathering and verification by Alastair Reid.

    From the post:

    When breaking news can travel the world in seconds, it is important for journalists to have the tools at their disposal to get to work fast. When searching the web, what quicker way is there to have those tools available than directly in the browser window?

    Most browsers have a catalogue of programs and software to make your browsing experience more powerful, like a smartphone app store. At First Draft we find Google’s Chrome browser is the most effective but there are obviously other options available.

    Text says “five” but this has been updated to include “seven” plugins.

    One of the updates is: Frame by Frame for YouTube, which like the name says, enables frame by frame viewing, is touted for verification.

    I can think of a number of uses for frame-by-frame viewing. You?

    See Alastair’s post for the rest and follow @firstdraftnews to stay current on digital tools for journalists.

    by Patrick Durusau at September 05, 2016 12:55 AM

    Running a Tor Exit Node for fun and e-mails

    Running a Tor Exit Node for fun and e-mails by Antonios A. Chariton.

    From the post:

    To understand the logistics behind running a Tor Exit Node, I will tell you how I got to run my Tor Exit Node for over 8 months. Hopefully, during the process, some of your questions will be answered, and you’ll also learn some new things. Please note that this is my personal experience and I cannot guarantee it will be the same for you. Also, I must state that I have run other exit nodes in the past, as well as multiple non-exit relays and bridges.

    A great first person account on running a Tor Exit Node.

    Some stats after 8 months:

    • It has been running for almost 8 months
    • It costs 4,90 EUR / month. In comparison, the same server in AWS would cost $1,122, or 992€ as of today
    • The total cost to date is 40€. In comparison, the same server in AWS would cost about 8,000€.
    • It is pushing up to 50 Mb/s, every second
    • It relayed over 70 TB of Tor traffic
    • It generated 2,729 Abuse E-Mails
    • It is only blocking port 25, and this to prevent spam
    • It helped hundreds or thousands of people to reach an uncensored Internet
    • It helped even more people browse the Internet anonymously and with privacy

    If your not quite up to running an exit node, consider running a Tor relay node: Add Tor Nodes For 2 White Chocolate Mochas (Venti) Per Month.

    Considering the bandwidth used by governments for immoral purposes, the observation:

    Finally, just like with everything else, we have malicious users. Not necessarily highly skilled criminals, but people in general who (ab)use the anonymity that Tor provides to commit things they otherwise wouldn’t.

    doesn’t trouble me.

    As a general rule, highly skilled or not, criminals don’t carry out air strikes against hospitals and such.

    by Patrick Durusau at September 05, 2016 12:34 AM

    September 03, 2016

    Patrick Durusau

    Predicting American Politics

    Presidential Election Predictions 2016 (an ASA competition) by Jo Hardin.

    From the post:

    In this election year, the American Statistical Association (ASA) has put together a competition for students to predict the exact percentages for the winner of the 2016 presidential election. They are offering cash prizes for the entry that gets closest to the national vote percentage and that best predicts the winners for each state and the District of Columbia. For more details see:

    To get you started, I’ve written an analysis of data scraped from The analysis uses weighted means and a formula for the standard error (SE) of a weighted mean. For your analysis, you might consider a similar analysis on the state data (what assumptions would you make for a new weight function?). Or you might try some kind of model – either a generalized linear model or a Bayesian analysis with an informed prior. The world is your oyster!

    Interesting contest but it is limited to high school and college students. Separate prizes, one for high school and one for college, $200.00 each. Oh, plus ASA memberships and a 2016 Election Prediction t-shirt.

    For adults in the audience, strike up a prediction pool by state and/or for the nation.

    by Patrick Durusau at September 03, 2016 09:04 PM