Planet Topic Maps

August 24, 2016

Patrick Durusau

Debugging

Julia Evans tweeted:

evans-debugging-460

It’s been two days without another suggestion.

Considering Brendan D. Gregg’s homepage, do you have another suggestion?

Too rich of a resource to not write down.

Besides, for some subjects and their relationships, you need specialized tooling to see them.

Not to mention that if you can spot patterns in subjects, detecting an unknown 0-day may be easier.

Of course, you can leave USB sticks at popular eateries near Fort Meade, MD 20755-6248, but some people prefer to work for their 0-day exploits.

;-)

by Patrick Durusau at August 24, 2016 12:19 AM

Eloquent JavaScript

Eloquent JavaScript by by Marijn Haverbeke.

From the webpage:

This is a book about JavaScript, programming, and the wonders of the digital. You can read it online here, or get your own paperback copy of the book.

javascript-cover

Embarrassing that authors post free content for the betterment of others, but wealthy governments play access games.

This book is also available in Български (Bulgarian), Português (Portuguese), and Русский (Russian).

Enjoy!

by Patrick Durusau at August 24, 2016 12:03 AM

August 23, 2016

Patrick Durusau

“Why Should I Trust You?”…

“Why Should I Trust You?”: Explaining the Predictions of Any Classifier by Marco Tulio Ribeiro, Sameer Singh, Carlos Guestrin.

Abstract:

Despite widespread adoption, machine learning models remain mostly black boxes. Understanding the reasons behind predictions is, however, quite important in assessing trust, which is fundamental if one plans to take action based on a prediction, or when choosing whether to deploy a new model. Such understanding also provides insights into the model, which can be used to transform an untrustworthy model or prediction into a trustworthy one.

In this work, we propose LIME, a novel explanation technique that explains the predictions of any classifier in an interpretable and faithful manner, by learning an interpretable model locally around the prediction. We also propose a method to explain models by presenting representative individual predictions and their explanations in a non-redundant way, framing the task as a submodular optimization problem. We demonstrate the flexibility of these methods by explaining different models for text (e.g. random forests) and image classification (e.g. neural networks). We show the utility of explanations via novel experiments, both simulated and with human subjects, on various scenarios that require trust: deciding if one should trust a prediction, choosing between models, improving an untrustworthy classifier, and identifying why a classifier should not be trusted.

LIME software at Github.

For a quick overview consider: Introduction to Local Interpretable Model-Agnostic Explanations (LIME) (blog post).

Or what originally sent me in this direction: Trusting Machine Learning Models with LIME at Data Skeptic, a podcast described as:

Machine learning models are often criticized for being black boxes. If a human cannot determine why the model arrives at the decision it made, there’s good cause for skepticism. Classic inspection approaches to model interpretability are only useful for simple models, which are likely to only cover simple problems.

The LIME project seeks to help us trust machine learning models. At a high level, it takes advantage of local fidelity. For a given example, a separate model trained on neighbors of the example are likely to reveal the relevant features in the local input space to reveal details about why the model arrives at it’s conclusion.

Data Science Renee finds deeply interesting material such as this on a regular basis and should follow her account on Twitter.

I do have one caveat on a quick read of these materials. The authors say in the paper, under 4. Submodular Pick For Explaining Models:


Even though explanations of multiple instances can be insightful, these instances need to be selected judiciously, since users may not have the time to examine a large number of explanations. We represent the time/patience that humans have by a budget B that denotes the number of explanations they are willing to look at in order to understand a model. Given a set of instances X, we define the pick step as the task of selecting B instances for the user to inspect.

The pick step is not dependent on the existence of explanations – one of the main purpose of tools like Modeltracker [1] and others [11] is to assist users in selecting instances themselves, and examining the raw data and predictions. However, since looking at raw data is not enough to understand predictions and get insights, the pick step should take into account the explanations that accompany each prediction. Moreover, this method should pick a diverse, representative set of explanations to show the user – i.e. non-redundant explanations that represent how the model behaves globally.

The “judicious” selection of instances, in models of any degree of sophistication, based upon large data sets seems problematic.

The focus on the “non-redundant coverage intuition” is interesting but based on the assumption that changes in factors don’t lead to “redundant explanations.” In the cases presented that’s true, but I lack confidence that will be true in every case.

Still, a very important area of research and an effort that is worth tracking.

by Patrick Durusau at August 23, 2016 11:35 PM

[Free] Cyber Security Courses for Officials and Veterans [And Contractors, But Not Citizens]

Cryptome posted Cyber Security Courses for Officials and Veterans

When you visit the Federal Virtual Training Environment (FedVTE) homepage, the FAQ for Spring 2016 (PDF) advises:

Who can take FedVTE training?
FedVTE courses are offered at no cost to government personnel, including contractors, and to U.S. veterans.

Can the general public register on this site and take courses?
No, these courses are not available to the general public.

Cybersecurity is in the news on a daily basis, citizens being victimized right and left, yet the National Initiative for Cybersecurity Careers and Studies denies those same citizens the ability to develop the skills necessary to protect themselves.

While at the same time offering free training to government personnel and contractors, who operated the Office of Personnel Management like a sieve (21.5 million victims). Not to mention the NSA, which seems to have a recurrent case of USB-disease.

For reasons known only to the U.S. government, it lacks the ability or interest in protecting its citizens from repeated cyber-attacks.

The least it can do is open up the Federal Virtual Training Environment (FedVTE) to all citizens.

Or as Randy Newman almost said:

…if you won’t take care of us
Won’t you please, please let us do [it ourselves]?”

From “God’s Song (That’s Why I Love Mankind)

Enough freebies for contractors at the federal teat. How about a benefit or two for ordinary citizens?

by Patrick Durusau at August 23, 2016 08:56 PM

Spatial Module in OrientDB 2.2

Spatial Module in OrientDB 2.2

From the post:

In versions prior to 2.2, OrientDB had minimal support for storing and retrieving GeoSpatial data. The support was limited to a pair of coordinates (latitude, longitude) stored as double in an OrientDB class, with the possibility to create a spatial index against those 2 coordinates in order to speed up a geo spatial query. So the support was limited to Point.

In OrientDB v.2.2 we created a brand new Spatial Module with support for different types of Geometry objects stored as embedded objects in a user defined class

  • Point (OPoint)
  • Line (OLine)
  • Polygon (OPolygon)
  • MultiPoint (OMultiPoint)
  • MultiLine (OMultiline)
  • MultiPolygon (OMultiPlygon)
  • Geometry Collections

Along with those data types, the module extends OrientDB SQL with a subset of SQL-MM functions in order to support spatial data.The module only supports EPSG:4326 as Spatial Reference System. This blog post is an introduction to the OrientDB spatial Module, with some examples of its new capabilities. You can find the installation guide here.

Let’s start by loading some data into OrientDB. The dataset is about points of interest in Italy taken from here. Since the format is ShapeFile we used QGis to export the dataset in CSV format (geometry format in WKT) and import the CSV into OrientDB with the ETL in the class Points and the type geometry field is OPoint.

The enhanced spatial functions for OrientDB 2.2 reminded me of this passage in “Silences and Secrecy: The Hidden Agenda of Cartography in Early Modern Europe:”

Some of the most clear-cut cases of an increasing state concern with control and restriction of map knowledge are associated with military or strategic considerations. In Europe in the sixteenth and seventeenth centuries hardly a year passed without some war being fought. Maps were an object of military intelligence; statesmen and princes collected maps to plan, or, later, to commemorate battles; military textbooks advocated the use of maps. Strategic reasons for keeping map knowledge a secret included the need for confidentiality about the offensive and defensive operations of state armies, the wish to disguise the thrust of external colonization, and the need to stifle opposition within domestic populations when developing administrative and judicial systems as well as the more obvious need to conceal detailed knowledge about fortifications. (reprinted in: The New Nature of Maps: Essays in the History of Cartography, by J.B. Harley: Paul Laxton, John Hopkins, 2001. page 89)

I say “reminded me,” better to say increased my puzzling over the widespread access to geographic data that once upon a time had military value.

Is it the case that “ordinary maps,” maps of streets, restaurants, hotels, etc., aren’t normally imbued (merged?) with enough other information to make them “dangerous?”

If that’s true, the lack of commonly available “dangerous maps” is a disadvantage to emergency and security planners.

You can’t plan for the unknown.

Or to paraphrase Dibert: “Ignorance is not a reliable planning guide.”

How would you cure the ignorance of “ordinary” maps?

PS: While hunting for the quote, I ran across The Power of Maps by Denis Wood; with John Fels. Which has been up-dated: Rethinking the power of maps by Denis Wood; with John Fels and John Krygier. I am now re-reading the first edition and awaiting for the updated version to arrive.

Neither book is a guide to making “dangerous” maps but may awaken in you a sense of the power of maps and map making.

by Patrick Durusau at August 23, 2016 07:51 PM

A Whirlwind Tour of Python (Excellent!)

A Whirlwind Tour of Python by Jake VanderPlas.

From the webpage:

To tap into the power of Python’s open data science stack—including NumPy, Pandas, Matplotlib, Scikit-learn, and other tools—you first need to understand the syntax, semantics, and patterns of the Python language. This report provides a brief yet comprehensive introduction to Python for engineers, researchers, and data scientists who are already familiar with another programming language.

Author Jake VanderPlas, an interdisciplinary research director at the University of Washington, explains Python’s essential syntax and semantics, built-in data types and structures, function definitions, control flow statements, and more, using Python 3 syntax.

You’ll explore:

  • Python syntax basics and running Python code
  • Basic semantics of Python variables, objects, and operators
  • Built-in simple types and data structures
  • Control flow statements for executing code blocks conditionally
  • Methods for creating and using reusable functions
  • Iterators, list comprehensions, and generators
  • String manipulation and regular expressions
  • Python’s standard library and third-party modules
  • Python’s core data science tools
  • Recommended resources to help you learn more

Jake VanderPlas is a long-time user and developer of the Python scientific stack. He currently works as an interdisciplinary research director at the University of Washington, conducts his own astronomy research, and spends time advising and consulting with local scientists from a wide range of fields.

A Whirlwind Tour of Python, can be recommended without reservation.

In addition to the book, the Jupyter notebooks behind the book have been posted.

Enjoy!

by Patrick Durusau at August 23, 2016 05:35 PM

Add Tor Nodes For 2 White Chocolate Mochas (Venti) Per Month

I don’t have enough local, reliable bandwidth to run a Tor relay node so I cast about for a remote solution.

David Huerta details in How You Can Help Make Tor Faster for $10 a Month, how you can add a Tor relay node for the cost of 2 White Chocolate Mochas (Venti) per month.

Chris Morran gives the annual numbers as close to $1,100 per year by American workers.

How much privacy does your $1,100 coffee habit buy? None.

Would you spend $1,000/year to sponsor a Tor relay node? Serious question.

Do you have a serious answer?

by Patrick Durusau at August 23, 2016 02:39 PM

First Amendment Secondary? [Full Text – Response to Stay]

Backpage.com defies sex trafficking subpoena despite Senate contempt vote by David Kravets.

From the post:

The First Amendment has been good, really good to the online classified ads portal Backpage.com. In 2015, the US Constitution helped Backpage dodge a lawsuit from victims of sex trafficking. What’s more, a federal judge invoked the First Amendment and crucified an Illinois sheriff—who labeled Backpage a “sex trafficking industry profiteer”—because the sheriff coerced Visa and Mastercard to refrain from processing payments to the site. The judge said Cook County Sheriff Thomas Dart’s anti-Backpage lobbying amounted to “an informal extralegal prior restraint of speech” because Dart’s actions were threatening the site’s financial survival.

But the legal troubles didn’t end there for Backpage, which The New York Times had labeled “the leading site for trafficking of women and girls in the United States.”

Kravets does a great job of linking to the primary documents in this case and while quoting from the government’s response to the request for a stay, does not include a link for the government’s response.

For your research and reading convenience, RESPONSE IN OPPOSITION [1631269] filed by Senate Permanent Subcommittee on Investigations to motion to stay case. A total of 128 pages.

In that consolidated document, Schedule A of the subpoena runs from page 40 to page 50, although the government contends in its opposition that it tried to be more reasonable that it appears.

Even more disturbing than the Senate’s fishing expedition into the records of Backpage is the justification for disregarding the First Amendment:

The Subcommittee is investigating the serious problem of human trafficking on the Internet—much of which takes place on Backpage’s website—and has subpoenaed Mr. Ferrer for documents relating to Backpage’s screening for illegal trafficking. It is important for the Subcommittee’s investigation of Internet sex trafficking to understand what methods the leading online marketplace for sex advertisements employs to screen out illegal sex trafficking on its website. Mr. Ferrer has no First Amendment right to ignore a subpoena for documents about Backpage’s business practices related to that topic. He has refused to identify his First Amendment interests except in sweeping generalities and failed even to attempt to show that any such interests outweigh important governmental interests served by the Subcommittee’s investigation. Indeed, Mr. Ferrer cannot make any balancing argument because he refused to search for responsive documents or produce a privilege log describing them, claiming that the First Amendment gave him blanket immunity from having to carry out these basic duties of all subpoena respondents.

As serious a problem as human trafficking surely is, there are no exceptions to the First Amendment because a crime is a serious one. Just as there are no exceptions to the Fourth or Fifth Amendments because a crime is a serious one.

If you are interested in the “evidence” cited against Backpage, S. Hrg. 114–179 Human Trafficking Investigation (November 2015), runs some 260 pages, details the commission of illegal human trafficking by others, not Backpage.

Illegal sex traffic undoubtedly occurs in the personal ads of the New York Times (NYT) but the Senate hasn’t favored the NYT with such a subpoena.

Kravets reports Backpage is due to respond to the government by 4:00 p.m. Wednesday of this week. I will post a copy of that response as soon as it is available.

by Patrick Durusau at August 23, 2016 01:49 AM

August 22, 2016

Patrick Durusau

Regexer [JavaScript Regexes – Railroad Diagrams]

Regexer

From the documentation page:

The images generated by Regexper are commonly referred to as “Railroad Diagrams”. These diagram are a straight-forward way to illustrate what can sometimes become very complicated processing in a regular expression, with nested looping and optional elements. The easiest way to read these diagrams to to start at the left and follow the lines to the right. If you encounter a branch, then there is the option of following one of multiple paths (and those paths can loop back to earlier parts of the diagram). In order for a string to successfully match the regular expression in a diagram, you must be able to fulfill each part of the diagram as you move from left to right and proceed through the entire diagram to the end.

As an example, this expression will match “Lions and tigers and bears. Oh my!” or the more grammatically correct “Lions, tigers, and bears. Oh my!” (with or without an Oxford comma). The diagram first matches the string “Lions”; you cannot proceed without that in your input. Then there is a choice between a comma or the string ” and”. No matter what choice you make, the input string must then contain ” tigers” followed by an optional comma (your path can either go through the comma or around it). Finally the string must end with ” and bears. Oh my!”.

js-regex-example-460

JavaScript-style regular expression input and railroad diagram output.

Can you think of a better visualization for teaching regexes? (Or analysis when they get hairy.)

by Patrick Durusau at August 22, 2016 09:43 PM

What is a Stingray?

Pitched at an adult Sunday School level, which makes this perfect for informing the wider public about government surveillance issues.

Share this video far and wide!

For viewers who want more detail, direct them to: How IMSI Catchers Work by Jason Hernandez.

Every group has a persecution story so tie present day government surveillance to “…what if (historical) X had surveillance…” to drive your point home.

by Patrick Durusau at August 22, 2016 09:16 PM

U.K. Parliament – U.S. Congress : Legislative Process Glossaries

I encountered the glossary for legislative activity for the U.S. Congress and remembered a post where I mentioned a similar resource for the U.K.

Rather than having to dig for both of them in the future:

U.K. Parliment – Glossary

U.S. Congress – Glossary

To be truly useful, applications displaying information from either source should automatically tag these terms for quick reference by readers.

Enjoy!

by Patrick Durusau at August 22, 2016 08:57 PM

Marketing Vulnerabilities (The Shadow Brokers)

Auction File: Only Worth What Someone Is Willing To Pay (August 22)

Another update on the Shadow Brokers saga and following auction. For hackers who aren’t also MBA’s, some insight into auction markets for vulnerabilities.

From the post:

There are so many facets to the recent Shadow Brokers’ leak it can be a bit overwhelming. But the Shadow Brokers’ mess does highlight front and center the importance of the perceived value of exploits and vulnerabilities. It is impossible to ignore the value of the exploits when this whole situation is potentially about an auction of high-end vulnerabilities.

In each RBS blog update covering the leak, we have provided a quick update on the auction status, and the reality is that the auction itself isn’t going very well. The leaked data auction recently showed an increase to 1.74847373 BTC (about US$1017.47), jumping from 41 to 56 bids:

You may find all the marketing data gathered here useful but as far as this auction, I suspect this captures the reality of the situation:


If this auction really contains valuable 0-day exploits, then one would expect that this would be worth bidding on for sure. But the parameters of the auction are far from standard, and may be one of the many reasons that the auction isn’t proceeding quickly. Rather than a traditional auction where a losing bid means your bid is returned and you lose no money, any bid on this data is not refunded if you do not win. It is also important to note that many believe that this really isn’t about an auction at all, rather to make a statement.

There may be valuable 0-day exploits but it isn’t possible to value them sight unseen.

Noting that reassurances from someone who allegedly stole from the NSA, don’t fill me with a sense of confidence.

If there are 0-days the NSA concealed, that the Shadow Brokers reveal, that open up the banking industry like a gumball machine:

gumball-smash-460

do you know the name for the agent for service of process at the NSA?

;-)

by Patrick Durusau at August 22, 2016 08:41 PM

September 1, 2016 – Increase Tor’s Bandwidth

Reports of government surveillance and loss of privacy are so common it’s hard to sustain moral outrage over them.

Tor offers involvement to treat impotent moral outrage!

You can donate $$, bandwidth, or volunteer to help the Tor project!

Lose that moral outrage ED! Make a difference at the Tor project!

September 1, 2016 is important because of a call for a 24-hour boycott of Tor on that day.

The use of innocent Tor users as hostages speaks volumes about any boycott of Tor and its supporters.

by Patrick Durusau at August 22, 2016 05:26 PM

August 21, 2016

Patrick Durusau

EasyCrypt Reference Manual

EasyCrypt Reference Manual (PDF)

For your reading convenience, I have emended the hyperlinks in the introduction to point to online versions of the citations and not to the paper’s bibliography.

From the introduction:

EasyCrypt [BDG+14, BGHZ11] is a framework for interactively finding, constructing, and machine-checking security proofs of cryptographic constructions and protocols using the codebased sequence of games approach [BR04, BR06, Sho04]. In EasyCrypt, cryptographic games and algorithms are modeled as modules, which consist of procedures written in a simple userextensible imperative language featuring while loops and random sampling operations. Adversaries are modeled by abstract modules—modules whose code is not known and can be quantified over. Modules may be parameterized by abstract modules.

EasyCrypt has four logics: a probabilistic, relational Hoare logic (pRHL), relating pairs of procedures; a probabilistic Hoare logic (pHL) allowing one to carry out proofs about the probability of a procedure’s execution resulting in a postcondition holding; an ordinary (possibilistic) Hoare logic (HL); and an ambient higher-order logic for proving general mathematical facts and connecting judgments in the other logics. Once lemmas are expressed, proofs are carried out using tactics, logical rules embodying general reasoning principles, and which transform the current lemma (or goal) into zero or more subgoals—sufficient conditions for the original lemma to hold. Simple ambient logic goals may be automatically proved using SMT solvers. Proofs may be structured as sequences of lemmas, and EasyCrypt’s theories may be used to group together related types, predicates, operators, modules, axioms and lemmas. Theory parameters that may be left abstract when proving its lemmas—types, operators and predicates—may be instantiated via a cloning process, allowing the development of generic proofs that can later be instantiated with concrete parameters.

Be aware the documentation carries this warning (1.6 About this Documentation):

This document is intended as a reference manual for the EasyCrypt tool, and not as a tutorial on how to build a cryptographic proof, or how to conduct interactive proofs. We provide some detailed examples in Chapter 7, but they may still seem obscure even with a good understanding of cryptographic theory. We recommend experimenting.

My first time seeing documentation advising “experimenting” to understand it. ;-)

You?

Before you jump to Chapter 7, be aware that Chapters 4 Structuring Specifications and Proofs, Chapter 5 EasyCrypt Library, Chapter 6 Advanced Features and Usage, and Chapter 7 Examples, have yet to be written.

You have time to work through the first three chapters and to experiment with EasyCrypt before being called upon to evaluate Chapter 7.

Enjoy!

by Patrick Durusau at August 21, 2016 09:56 PM

The Ethics of Data Analytics

The Ethics of Data Analytics by Kaiser Fung.

Twenty-one slides on ethics by Kaiser Fung, author of: Junk Charts (data visualization blog), and Big Data, Plainly Spoken (comments on media use of statistics).

Fung challenges you to reach your own ethical decisions and acknowledges there are a number of guides to such decision making.

Unfortunately, Fung does not include professional responsibility requirements, such as the now out-dated Canon 7 of the ABA Model Code Of Professional Responsibility:

A Lawyer Should Represent a Client Zealously Within the Bounds of the Law

That canon has a much storied history, which is capably summarized in Whatever Happened To ‘Zealous Advocacy’? by Paul C. Sanders.

In what became known as Queen Caroline’s Case, the House of Lords sought to dissolve the marriage of King George the IV

George IV 1821 color

to Queen Caroline

CarolineOfBrunswick1795

on the grounds of her adultery. Effectively removing her as queen of England.

Queen Caroline was represented by Lord Brougham, who had evidence of a secret prior marriage by King George the IV to Catholic (which was illegal), Mrs Fitzherbert.

Portrait of Mrs Maria Fitzherbert, wife of George IV

Brougham’s speech is worth your reading in full but the portion most often cited for zealous defense reads as follows:


I once before took leave to remind your lordships — which was unnecessary, but there are many whom it may be needful to remind — that an advocate, by the sacred duty of his connection with his client, knows, in the discharge of that office, but one person in the world, that client and none other. To save that client by all expedient means — to protect that client at all hazards and costs to all others, and among others to himself — is the highest and most unquestioned of his duties; and he must not regard the alarm, the suffering, the torment, the destruction, which he may bring upon any other; nay, separating even the duties of a patriot from those of an advocate, he must go on reckless of the consequences, if his fate it should unhappily be, to involve his country in confusion for his client.

The name Mrs. Fitzherbert never slips Lord Brougham’s lips but the House of Lords has been warned that may not remain to be the case, should it choose to proceed. The House of Lords did grant the divorce but didn’t enforce it. Saving fact one supposes. Queen Caroline died less than a month after the coronation of George IV.

For data analysis, cybersecurity, or any of the other topics I touch on in this blog, I take the last line of Lord Brougham’s speech:

To save that client by all expedient means — to protect that client at all hazards and costs to all others, and among others to himself — is the highest and most unquestioned of his duties; and he must not regard the alarm, the suffering, the torment, the destruction, which he may bring upon any other; nay, separating even the duties of a patriot from those of an advocate, he must go on reckless of the consequences, if his fate it should unhappily be, to involve his country in confusion for his client.

as the height of professionalism.

Post-engagement of course.

If ethics are your concern, have that discussion with your prospective client before you are hired.

Otherwise, clients have goals and the task of a professional is how to achieve them. Nothing more.

by Patrick Durusau at August 21, 2016 09:00 PM

US Army committed $6.5 trillion in accounting fraud in one year (w/correction)

US Army committed $6.5 trillion in accounting fraud in one year by Cory Doctorow.

From the post:

In June, the Defense Department’s Inspector General released a report on the US Army’s accounting, revealing that the Army had invented $6.5 trillion in “improper adjustments” ($2.8T in one quarter!) to make its books appear balanced though it could not account for where the funds had gone.

If you are interested in transparent and trackable information systems, that’s a headline that captures your attention!

Except that when you run it back to the original story, U.S. Army fudged its accounts by trillions of dollars, auditor finds by Scot J. Paltrow, which reads in part:

The United States Army’s finances are so jumbled it had to make trillions of dollars of improper accounting adjustments to create an illusion that its books are balanced.

The Defense Department’s Inspector General, in a June report, said the Army made $2.8 trillion in wrongful adjustments to accounting entries in one quarter alone in 2015, and $6.5 trillion for the year. Yet the Army lacked receipts and invoices to support those numbers or simply made them up.

You won’t find a reference to the “June report,” as cited by Paltrow. No link, no title, no nothing.

In fact, there is no such June report.

If you look carefully enough at the Inspector General site for the DoD you will find:

07-26-2016
Financial Management
Army General Fund Adjustments Not Adequately Documented or Supported (Project No. D2015-D000FL-0243.000)
DODIG-2016-113

The webpage for that July report, reads in part:

Finding

The Office of the Assistant Secretary of the Army (Financial Management & Comptroller) (OASA[FM&C]) and the Defense Finance and Accounting Service Indianapolis (DFAS Indianapolis) did not adequately support $2.8 trillion in third quarter journal voucher (JV) adjustments and $6.5 trillion in yearend JV adjustments1 made to AGF data during FY 2015 financial statement compilation.2 The unsupported JV adjustments occurred because OASA(FM&C) and DFAS Indianapolis did not prioritize correcting the system deficiencies that caused errors resulting in JV adjustments, and did not provide sufficient guidance for supporting system‑generated adjustments.

In addition, DFAS Indianapolis did not document or support why the Defense Departmental Reporting System‑Budgetary (DDRS-B), a budgetary reporting system, removed at least 16,513 of 1.3 million records during third quarter FY 2015. This occurred because DFAS Indianapolis did not have detailed documentation describing the DDRS-B import process or have accurate or complete system reports.

As a result, the data used to prepare the FY 2015 AGF third quarter and yearend financial statements were unreliable and lacked an adequate audit trail. Furthermore, DoD and Army managers could not rely on the data in their accounting systems when making management and resource decisions. Until the Army and DFAS Indianapolis correct these control deficiencies, there is considerable risk that AGF financial statements will be materially misstated and the Army will not achieve audit readiness by the congressionally mandated deadline of September 30, 2017.

Everybody makes mistakes. I’m sure I make several everyday without hardly trying.

However, if you link to original sources, readers stand some chance of discovering and correcting those errors.

If you cite a resource, link to the resource.

PS: Before you use the word “fraud” with regard to military accounting systems, realize financial accounting is not a primary or even secondary concern of a military force. There are possible solutions to military accounting issues but congressional tantrums, a/k/a mandates, aren’t among them.

by Patrick Durusau at August 21, 2016 01:55 AM

August 20, 2016

Patrick Durusau

NASA just made all its research available online for free (Really?)

NASA just made all its research available online for free by Tim Walker.

Caution: The green colored links in the original post are pop-up ads and not links to content.

From the post:

Care to learn more about 400-foot tsunamis on Mars? Now you can, after Nasa announced it is making all its publicly funded research available online for free. The space agency has set up a new public web portal called Pubspace, where the public can find Nasa-funded research articles on everything from the chances of life on one of Saturn’s moons to the effects of space station living on the hair follicles of astronauts.

In 2013, the White House Office of Science and Technology Policy directed Nasa and other agencies to increase access to their research, which in the past was often available (if it was available online at all) only via a paywall. Now, it is Nasa policy that any research articles funded by the agency have to be posted on Pubspace within a year of publication.

There are some exceptions, such as research that relates to national security. Nonetheless, there are currently a little over 850 articles available on the website with many more to come.

Created in 1958, all of NASA’s research “available online for free,” amounts to approximately 850 documents?

Even starting in 2013, 850 documents seems a bit light.

Truth of the matter is that NASA has created yet another information silo of NASA data.

Here are just a few of the other NASA silos that come to mind right off hand:

Johnson Space Center Document Index System

NASA Aeronautics and Space Database

NASA Documents Online

NASA GALAXIE

NASA Technical Report Server

I don’t know if any of those include data repositories from NASA missions or not. Plus any other information silos NASA has constructed over the years.

I applaud NASA making sponsored research public but building yet another silo to do so seem wrong-headed.

Conversion and replacement of any of these silos is obviously out of the question.

Under taking to map all of them together, for some undefined ROI, seems equally unlikely.

Suggestions on how to approach such a large, extant silo problem?

by Patrick Durusau at August 20, 2016 08:47 PM

@rstudio Easter egg: Alt-Shift-K (shows all keyboard shortcuts)

Carl Schemertmann asks:

rstudio-easter-egg-460

Forty-two people have retweeted Carl’s tweet without answering Carl’s question.

If you have an answer, please reply to Carl. Otherwise, remember:

Alt-Shift-K

shows all keyboard shortcuts in RStudio.

Enjoy!

by Patrick Durusau at August 20, 2016 08:19 PM

Everybody Discusses The Weather In R (+ Trigger Warning)

Well, maybe not everybody but if you are interested in weather statistics, there’s a trio of posts at R-Bloggers made for you.

Trigger Warning: If you are a climate change denier, you won’t like the results presented by the posts cited below. Facts dead ahead.

Tracking Precipitation by Day-of-Year

From the post:

Plotting cumulative day-of-year precipitation can helpful in assessing how the current year’s rainfall compares with long term averages. This plot shows the cumulative rainfall by day-of-year for Philadelphia International Airports rain gauge.

Checking Historical Precipitation Data Quality

From the post:

I am interested in evaluating potential changes in precipitation patterns caused by climate change. I have been working with daily precipitation data for the Philadelphia International Airport, site id KPHL, for the period 1950 to present time using R.

I originally used the Pennsylvania State Climatologist web site to download a CSV file of daily precipitation data from 1950 to the present. After some fits and starts analyzing this data set, I discovered that data for January was missing for the period 1950 – 1969. This data gap seriously limited the usable time record.

John Yagecic, (Adventures In Data) told me about the weatherData package which provides easy to use functions to retrieve Weather Underground data. I have found several precipitation data quality issues that may be of interest to other investigators.

Access and Analyze 170 Monthly Climate Time Series Using Simple R Scripts

From the post:

Open Mind, a climate trend data analysis blog, has a great Climate Data Service that provides updated consolidated csv file with 170 monthly climate time series. This is a great resource for those interested in studying climate change. Quick, reliable access to 170 up-to-date climate time series will save interested analysts hundreds – thousands of data wrangling hours of work.

This post presents a simple R script to show how a user can select one of the 170 data series and generate a time series plot like this:

All of these posts originated at RClimate, a new blog that focuses on R and climate data.

Drop by to say hello to D Kelly O’Day, PE (professional engineer) Retired.

Relevant searches at R-Bloggers (as of today):

Climate – 218 results

Flood – 61 results

Rainfall – 55 results

Weather – 291 results

Caution: These results contain duplicates.

Enjoy!

by Patrick Durusau at August 20, 2016 08:01 PM

Frinkaic (Simpsons)

Frinkaic

From the webpage:

Frinkiac has nearly 3 million Simpsons screencaps so get to searching for crying out glayvin!

With a link to Morbotron as well.

Once you recover, consider reading: Introducing Frinkiac, The Simpsons Search Engine Built by Rackers by Abe Selig.

Where you aren’t trying to boil the ocean with search, the results can be pretty damned amazing.

by Patrick Durusau at August 20, 2016 07:36 PM

235,000 Voices Cried Out And Were Suddenly Silenced

Yahoo! News carried this report of censorship: Twitter axes 235,000 more accounts in terror crackdown.

From the post:

Twitter on Thursday announced that it has cut off 235,000 more accounts for violating its policies regarding promotion of terrorism at the global one-to-many messaging service.

The latest account suspensions raised to 360,000 the total number of accounts sidelined since the middle of 2015 and was helping “drive meaningful results” in curbing the activity, according to the San Francisco-based company.

Twitter has been under pressure to balance protecting free speech at the service with not providing a stage for terrorist groups to spread violent messages and enlist people to their causes.

The latest account suspensions came since February, when Twitter announced that it had neutralized 125,000 accounts for violating rules against violent threats and promotion of terrorism.

“Since that announcement, the world has witnessed a further wave of deadly, abhorrent terror attacks across the globe,” Twitter said in a blog post.

When you read Twitter’s blog post, An update on our efforts to combat violent extremism, out of 235,000 accounts, how many are directly tied to a terrorist attack?

Would you guess:


235,000?

150,000?

100,000?

50,000?

25,000?

10,000?

5,000?

Twitter reports 0 accounts as being tied to terrorist attacks.

Odd considering that Twitter says:

Since that announcement, the world has witnessed a further wave of deadly, abhorrent terror attacks across the globe

“…wave of deadly, abhorrent terror attacks…” What wave?

From March of 2016 until July 31, the List of terrorist incidents, 2016 lists some 864 attacks.

A far cry from the almost 1/4 million silenced accounts.

Of course, “terrorism” depends on your definition, the Global Terrorism Database lists over 6,000 terrorist attacks for the time period March 2015 until July 31, 2015.

Even using 2015’s 6,000 attack figure, that’s a long way from 235,000 Twitter accounts.

If you think “…wave of deadly, abhorrent terror attacks…” is just marketing talk on the part of Twitter, the evidence is on your side.

Anyone who thinks they may be in danger of being silenced by Twitter should obtain regular archives of their tweets. Don’t allow your history to be stolen by Twitter.

I do have a question for anyone working on this issue:

Are there efforts to create a non-Twitter servers that make fair use of the Twitter API, so that archives of tweets and/or even new accounts, could continue silenced accounts? Say a Dark Web Not-Twitter Server?

I ask because Twitter continues to demonstrate that “free speech” is subject to its whim and caprice.

A robust and compatible alternative to Twitter, especially if archives can be loaded, would enable free speech for many diverse groups.

by Patrick Durusau at August 20, 2016 06:19 PM

Top Ten #ddj:… [18 August 2016]

Top Ten #ddj: The Week’s Most Popular Data Journalism Links

A weekly feature of the Global Investigative Journalism Network and particularly good this week:

  • Animated Data Visualisation: Trends in Household Debts Reveals a Constant Increase in Student Loans
  • Discover Why and How The New York Times is Changing the Way They Present Interactive Content
  • Analysis of Trump’s Tweets: Trump Writes Angrier Tweets on Android While His Staff Tweets More Positively On His Behalf Using an iPhone
  • Analysis of Trump’s Tweets: Sharp Decline in Trump’s Own Tweets from 77 to 24 percent Suggests Tighter Campaign Control
  • Open Data-Driven Articles Using Olympic Data: Edit the Source Code and Create Your Own Visualisations
  • Interactive Map of Recreational Areas in Ravensburg, Germany
  • Onodo: Network Visualisation and Analysis Tool for Non-Tech Users
  • Opinion: Not Every Venn Diagram Has Something Worth Reporting
  • Data on Teenage Pregnancies and HIV rates in Kenya
  • Mapbox: How to Customise and Embed Maps on Websites

See the original post for links and very annoying “share” options. (Annoying to me, others may find them indispensable.)

Mark your calendars to check for new top ten lists and/or follow @gijn.

by Patrick Durusau at August 20, 2016 01:34 AM

Typography for User Interfaces

Typography for User Interfaces by Viljami Salminen.

From the post:

Back in 2004, when I had just started my career, sIFR was the hottest thing out there. It was developed by Shaun Inman and it embedded custom fonts in a small Flash movie, which could be utilized with a little bit of JavaScript and CSS. At the time, it was basically the only way to use custom fonts in browsers like Firefox or Safari. The fact that this technique relied on Flash soon made it obsolete, with the release of the iPhone (without flash) in 2007.

Our interfaces are written, text being the interface, and typography being our main discipline.

In 2008, browsers started eventually supporting the new CSS3 @font-face rule. It had already been a part of the CSS spec in 1998, but later got pulled out of it. I remember the excitement when I managed to convince one of our clients to utilize the new @font-face and rely on progressive enhancement to deliver an enhanced experience for browsers which already supported this feature.

Since my early days in the industry, I’ve grown to love type and all the little nuances that go into setting it. In this article, I want to share some of the fundamentals that I’ve learned, and hopefully help you get better at setting type for user interfaces.

A nice stroll through the history of typography for user interfaces.

With ten (10) tips on choosing a typeface for a UI.

Enjoy and produce better UIs!

by Patrick Durusau at August 20, 2016 12:45 AM

August 19, 2016

Patrick Durusau

29 common beginner Python errors on one page [Something Similar For XQuery?]

29 common beginner Python errors on one page

From the webpage:

A few times a year, I have the job of teaching a bunch of people who have never written code before how to program from scratch. The nature of programming being what it is, the same error crop up every time in a very predictable pattern. I usually encourage my students to go through a step-by-step troubleshooting process when trying to fix misbehaving code, in which we go through these common errors one by one and see if they could be causing the problem. Today, I decided to finally write this troubleshooting process down and turn it into a flowchart in non-threatening colours.

Behold, the “my code isn’t working” step-by-step troubleshooting guide! Follow the arrows to find the likely cause of your problem – if the first thing you reach doesn’t work, then back up and try again.

Click the image for full-size, and click here for a printable PDF. Colour scheme from Luna Rosa.

Useful for Python beginner’s and should be inspirational for other languages.

Thoughts on something similar for XQuery Errors? Suggestions for collecting the “most common” XQuery errors?

by Patrick Durusau at August 19, 2016 08:55 PM

What’s the Difference Between Artificial Intelligence, Machine Learning, and Deep Learning?

What’s the Difference Between Artificial Intelligence, Machine Learning, and Deep Learning? by Michael Copeland.

From the post:

Artificial intelligence is the future. Artificial intelligence is science fiction. Artificial intelligence is already part of our everyday lives. All those statements are true, it just depends on what flavor of AI you are referring to.

For example, when Google DeepMind’s AlphaGo program defeated South Korean Master Lee Se-dol in the board game Go earlier this year, the terms AI, machine learning, and deep learning were used in the media to describe how DeepMind won. And all three are part of the reason why AlphaGo trounced Lee Se-Dol. But they are not the same things.

The easiest way to think of their relationship is to visualize them as concentric circles with AI — the idea that came first — the largest, then machine learning — which blossomed later, and finally deep learning — which is driving today’s AI explosion — fitting inside both.

If you are confused by the mix of artificial intelligence, machine learning, and deep learning, floating around, Copeland will set you straight.

It’s a fun read and one you can recommend to non-technical friends.

by Patrick Durusau at August 19, 2016 08:37 PM

Report of the Bulk Powers Review

Report of the Bulk Powers Review (PDF) by David Anderson Q.C. Independent Reviewer of Terrorism Legislation. (Web version)

From its webpage:

This report includes the findings of the independent review of the operational case for bulk powers, which will inform scrutiny of the Investigatory Powers Bill.

If you find yourself dissatisfied with the sound bite and excerpt commentaries on this report, you may find the two hundred and three (203) full version more to your likely. At least in terms of completeness.

I have glanced at the conclusions but will refrain from commenting until reading the report in full. It is possible that Anderson will persuade me to change my initial impressions, although I concede that is highly unlikely.

by Patrick Durusau at August 19, 2016 07:55 PM

TLDR pages [Explanation and Example Practice]

TLDR pages

From the webpage:

The TLDR pages are a community effort to simplify the beloved man pages with practical examples.

Try the live demo below, have a look at the pdf version, or follow the installing instructions.

Be sure to read the Contributing guidelines.

I checked and ngrep isn’t there. :-(

Well, ngrep only has thirty (30) options and switches before you reach <match expression> and <bpf filter>, so how much demand could there be for examples?

;-)

Great opportunity to practice your skills at explanation and creating examples.

by Patrick Durusau at August 19, 2016 07:16 PM

Contributing to StackOverflow: How Not to be Intimidated

Contributing to StackOverflow: How Not to be Intimidated by Ksenia Coulter.

From the post:

StackOverflow is an essential resource for programmers. Whether you run into a bizarre and scary error message or you’re blanking on something you should know, StackOverflow comes to the rescue. Its popularity with coders spurred many jokes and memes. (Programming to be Officially Renamed “Googling Stackoverflow,” a satirical headline reads).

(image omitted)

While all of us are users of StackOverflow, contributing to this knowledge base can be very intimidating, especially to beginners or to non-traditional coders who many already feel like they don’t belong. The fact that an invisible barrier exists is a bummer because being an active contributor not only can help with your job search and raise your profile, but also make you a better programmer. Explaining technical concepts in an accessible way is difficult. It is also well-established that teaching something solidifies your knowledge of the subject. Answering StackOverflow questions is great practice.

All of the benefits of being an active member of StackOverflow were apparent to me for a while, but I registered an account only this week. Let me walk you t[h]rough thoughts that hindered me. (Chances are, you’ve had them too!)

I plead guilty to using StackOverFlow but not contributing back to it.

Another “intimidation” to avoid is thinking you must have the complete and killer answer to any question.

That can and does happen, but don’t wait for a question where you can supply such an answer.

Jump in! (Advice to myself as well as any readers.)

by Patrick Durusau at August 19, 2016 05:43 PM

Re-Use, Re-Use! Using Weka within Lisp

Suggesting code re-use, as described by Paul Homer in The Myth of Code Reuse, provokes this reaction from most programmers (substitute re-use for refund):

;-)

Atabey Kaygun demonstrates he isn’t one of those programmers in Using Weka within Lisp:

From the post:

As much as I like implementing machine learning algorithms from scratch within various languages I like using, in doing serious research one should not take the risk of writing error-prone code. Most likely somebody already spent many thousand hours writing, debugging and optimizing code you can use with some effort. Re-use people, re-use!

In any case, today I am going to describe how one can use weka libraries within ABCL implementation of common lisp. Specifically, I am going to use the k-means implementation of weka.

As usual, well written and useful guide to using Weka and Lisp.

The issues of code re-use aren’t confined to programmers.

Any stats you can suggest on re-use of database or XML schemas?

by Patrick Durusau at August 19, 2016 05:24 PM

Readable Regexes In Python?

Doug Mahugh retweeted Raymond Hettinger tweeting:

#python tip: Complicated regexes can be organized into readable, commented chucks.
https://docs.python.org/3/library/re.html#re.X

Twitter hasn’t gotten around to censoring Python related tweets for accuracy so I did check the reference:

re.X
re.VERBOSE

This flag allows you to write regular expressions that look nicer and are more readable by allowing you to visually separate logical sections of the pattern and add comments. Whitespace within the pattern is ignored, except when in a character class or when preceded by an unescaped backslash. When a line contains a # that is not in a character class and is not preceded by an unescaped backslash, all characters from the leftmost such # through the end of the line are ignored.

This means that the two following regular expression objects that match a decimal number are functionally equal:

Which is the better question?

Why would anyone want to produce a readable regex in Python?

or,

Why would anyone NOT produce a readable regex given the opportunity?

Enjoy!

PS: It occurs to me that with a search expression you could address such strings as subjects in a topic map. A more robust form of documentation than # syntax.

by Patrick Durusau at August 19, 2016 03:45 PM

Is UC-San Diego Running A Military Commission?

Will UC-San Diego keep hiding witnesses that could prove accused students innocent? by Greg Piper.

From the post:

The University of California-San Diego routinely hides the identity of witnesses that could help students accused of wrongdoing exonerate themselves, departing from its own rules on who is “relevant” to an investigation.

This policy, which has been applied against accused students for at least the past five years, was not publicly known until 11 months ago. A state appeals court fleshed out its existence in a due-process lawsuit against the school by a student who was found responsible for cheating and expelled.

That court struck down UCSD’s ruling against Jonathan Dorfman, saying it had no legal reason to withhold the identity of “Student X” – whose test answers Dorfman allegedly copied – from him.

Arguing before the court, the UC System’s own lawyer admitted that the school had never bothered to ask Student X where he was sitting in class that day in 2011 – potentially preempting its case against Dorfman.

UC-San Diego has copied the government’s use of “secret” evidence in U.S. military commissions.

Here UC-San Diego decided who or what was “relevant” to its inquiry, saying:

When a female judge suggests that UCSD decided “this was enough and we’re not going to give the information to the defense to try to poke holes in it,” Goldstein responds with apparent earnestness: “That is the procedure here.”

If U.S. prosecutors were so honest, they would echo:

we’re not going to give the information to the defense to try to poke holes in it,

That works, only if you have a presumption of guilt. So far as I know, lip service is still payed to the presumption of innocence.

If prosecutors want a presumption of guilt, they should argue for it openly, and not conceal that as well.

by Patrick Durusau at August 19, 2016 03:22 PM

Using Search Terms and Facets on Congress.gov (Video) (Evaluation Help?)

Using Search Terms and Facets on Congress.gov (Video)

I would love to tell you about the contents of this video!

However, not having Flash is the only effect way to defeat Flash vulnerabilities.

Adobe advises 1.3 billion people are vulnerable to Flash security issues but I am not one of them.

If you care to review this resources and submit comments, I would appreciate it.

by Patrick Durusau at August 19, 2016 12:39 AM

August 18, 2016

Patrick Durusau

National Food Days

All the National Food Days by Nathan Yau.

Nathan has created an interactive calendar of all the U.S. national food days.

Here is a non-working replica to entice you to see his interactive version:

national-food-days-460

What’s with July having a national food day every day?

Lobby for your favorite food and month!

by Patrick Durusau at August 18, 2016 07:13 PM

Rich Hickey and Brian Beckman – Inside Clojure (video)

From the description:

Clojure is a dynamic programming language created by Rich Hickey that targets both the Java Virtual Machine and the CLR. It is designed to be a general-purpose language, combining the approachability and interactive development of a scripting language with an efficient and robust infrastructure for multithreaded programming. Clojure is a compiled language – it compiles directly to JVM bytecode, yet remains completely dynamic. Every feature supported by Clojure is supported at runtime. Clojure provides easy access to the Java frameworks, with optional type hints and type inference, to ensure that calls to Java can avoid reflection.

Clojure is a dialect of Lisp, and shares with Lisp the code-as-data philosophy and a powerful macro system. Clojure is predominantly a functional programming language, and features a rich set of immutable, persistent data structures. When mutable state is needed, Clojure offers a software transactional memory system and reactive Agent system that ensure clean, correct, multithreaded designs.

Astrophysicist and Software Architect Brian Beckman interviews Rich Hickey to dig into the details of this very interesting language. If you don’t know much about Clojure and the general problems it aims to solve, well, watch and listen carefully to this great conversation with plenty of whiteboarding and outstanding questions. Expert to Expert simply rocks! Thank you for spending time with us, Rich! Clojure is great!

From 2013 but what a nice find for a Thursday afternoon!

Do you know the origin of “conj” in Clojure? ;-)

Enjoy!

by Patrick Durusau at August 18, 2016 06:43 PM

Why “We” Get Hacked

Whether these are “authentic” tweets or not, I cannot say. However, I thought the rather pinched definition of “we” needed to be pointed out.

snowden-nsa-hack-460

Say rather:

#NSA left catastrophic flaws in all networks for 3+ years to aid offense, rather than fixing them

If any of us are insecure, then all of us are insecure.

When it comes to cybersecurity, check your nationalism at the door, or we will all be insecure.

by Patrick Durusau at August 18, 2016 03:52 PM

Grokking Deep Learning

Grokking Deep Learning by Andrew W. Trask.

From the description:

Artificial Intelligence is the most exciting technology of the century, and Deep Learning is, quite literally, the “brain” behind the world’s smartest Artificial Intelligence systems out there. Loosely based on neuron behavior inside of human brains, these systems are rapidly catching up with the intelligence of their human creators, defeating the world champion Go player, achieving superhuman performance on video games, driving cars, translating languages, and sometimes even helping law enforcement fight crime. Deep Learning is a revolution that is changing every industry across the globe.

Grokking Deep Learning is the perfect place to begin your deep learning journey. Rather than just learn the “black box” API of some library or framework, you will actually understand how to build these algorithms completely from scratch. You will understand how Deep Learning is able to learn at levels greater than humans. You will be able to understand the “brain” behind state-of-the-art Artificial Intelligence. Furthermore, unlike other courses that assume advanced knowledge of Calculus and leverage complex mathematical notation, if you’re a Python hacker who passed high-school algebra, you’re ready to go. And at the end, you’ll even build an A.I. that will learn to defeat you in a classic Atari game.

In the Manning Early Access Program (MEAP) with three (3) chapters presently available.

A much more plausible undertaking than DARPA’s quest for “Explainable AI” or “XAI.” (DARPA WANTS ARTIFICIAL INTELLIGENCE TO EXPLAIN ITSELF) DARPA reasons that:


Potential applications for defense are endless—autonomous aerial and undersea war-fighting or surveillance, among others—but humans won’t make full use of AI until they trust it won’t fail, according to the Defense Advanced Research Projects Agency. A new DARPA effort aims to nurture communication between machines and humans by investing in AI that can explain itself as it works.

If non-failure is the criteria for trust, U.S. troops should refuse to leave their barracks in view of the repeated failures of military strategy since the end of WWII.

DARPA should choose a less stringent criteria for trusting an AI. However, failing less often than the Joint Chiefs of Staff may be too low a bar to set.

by Patrick Durusau at August 18, 2016 01:58 AM

R Markdown

R Markdown

From the webpage:

R Markdown provides an authoring framework for data science. You can use a single R Markdown file to both

  • save and execute code
  • generate high quality reports that can be shared with an audience

R Markdown documents are fully reproducible and support dozens of static and dynamic output formats. This 1-minute video provides a quick tour of what’s possible with R Markdown:

I started to omit this posting, reasoning that with LaTeX and XML, what other languages for composing documents are really necessary?

;-)

I don’t suppose it will hurt to have a third language option for your authoring needs.

Enjoy!

by Patrick Durusau at August 18, 2016 01:41 AM

Text [R, Scraping, Text]

Text by Amelia McNamara.

Covers “scraping, text, and timelines.”

Using R, focuses on scraping, works through some of “…Scott, Karthik, and Garrett’s useR tutorial.”

In case you don’t know the useR tutorial:

Also known as (AKA) Extracting data from the web APIs and beyond:

No matter what your domain of interest or expertise, the internet is a treasure trove of useful data that comes in many shapes, forms, and sizes, from beautifully documented fast APIs to data that need to be scraped from deep inside of 1990s html pages. In this 3 hour tutorial you will learn how to programmatically read in various types of web data from experts in the field (Founders of the rOpenSci project and the training lead of RStudio). By the end of the tutorial you will have a basic idea of how to wrap an R package around a standard API, extract common non-standard data formats, and scrape data into tidy data frames from web pages.

Covers other resources and materials.

Enjoy!

by Patrick Durusau at August 18, 2016 01:31 AM

Pandas

Pandas by Reuven M. Lerner.

From the post:

Serious practitioners of data science use the full scientific method, starting with a question and a hypothesis, followed by an exploration of the data to determine whether the hypothesis holds up. But in many cases, such as when you aren’t quite sure what your data contains, it helps to perform some exploratory data analysis—just looking around, trying to see if you can find something.

And, that’s what I’m going to cover here, using tools provided by the amazing Python ecosystem for data science, sometimes known as the SciPy stack. It’s hard to overstate the number of people I’ve met in the past year or two who are learning Python specifically for data science needs. Back when I was analyzing data for my PhD dissertation, just two years ago, I was told that Python wasn’t yet mature enough to do the sorts of things I needed, and that I should use the R language instead. I do have to wonder whether the tables have turned by now; the number of contributors and contributions to the SciPy stack is phenomenal, making it a more compelling platform for data analysis.

In my article “Analyzing Data“, I described how to filter through logfiles, turning them into CSV files containing the information that was of interest. Here, I explain how to import that data into Pandas, which provides an additional layer of flexibility and will let you explore the data in all sorts of ways—including graphically. Although I won’t necessarily reach any amazing conclusions, you’ll at least see how you can import data into Pandas, slice and dice it in various ways, and then produce some basic plots.

Of course, scientific articles are written as though questions drop out of the sky and data is interrogated for the answer.

Aside from being rhetoric to badger others with, does anyone really think that is how science operates in fact?

Whether you have delusions about how science works in fact or not, you will find that Pandas will assist you in exploring data.

by Patrick Durusau at August 18, 2016 01:19 AM

August 17, 2016

Patrick Durusau

Nomination For #1 Impediment To IT Reform

I saw this on Twitter and nominate it as the #1 impediment to IT reform. In government or private industry (in case you think there is a difference).

mistakes-460

Your nominations?

by Patrick Durusau at August 17, 2016 09:39 PM

Double Standards At NPR

NPR Host Demands That Assange Do Something Its Own Reporters Are Told Never to Do by Naomi LaChance.

From the post:

In a ten-minute interview aired Wednesday morning, NPR’s David Greene asked Wikileaks founder Julian Assange five times to reveal the sources of the leaked information he has published on the internet.

A major tenet of American journalism is that reporters protect their sources. Wikileaks is certainly not a traditional news organization, but Greene’s persistent attempts to get Assange to violate confidentiality was alarming, especially considering that there has been no challenge to the authenticity of the material in question.

NPR (National Public Radio) shows its true colors, not as a free and independent press but as a lackey of the Democratic Party in this interview with Assange.

David Greene (Morning Edition) was fixated on repeating the unconfirmed reports that the Russians (which Russians no one every says), were behind the leak of DNC emails.

You can read the transcript of Assange/Greene interview for yourself.

Greene never asks one substantive question about the 20,000 emails. Not one. The first leak of its kind and all Greene does is whine about rumors of Russian involvement.

Well, that’s not entirely fair, Greene does have this exchange with Assange:


GREENE: Well, let me – apart from the different investigations, could you see people in the U.S. government thinking that you might be a threat to national security?

ASSANGE: Well, I mean, there’s great people in the U.S. government – many of them are our sources – and there’s terrible people in the U.S. government. Unfortunately, the U.S. government is a – you know, a reflection, to some degree, of the rest of society. So it’s filled with its share of paranoid and sociopathic power climbers…

GREENE: But is it paranoid to look at these uncensored documents?

ASSANGE: …People who make errors of judgment, etc.

GREENE: Is it paranoid to look at these uncensored documents, these emails, that are released by you? And if they believe that that could change a U.S. presidential election, could be a threat to national security, why isn’t it logical…

ASSANGE: I just – I mean…

GREENE: …For them to see you as a possible threat?

Hmmm, telling the truth about DNC emails can be a threat to national security?

What a bizarre concept in a democracy! Disclosure of evidence of manipulation of the democratic process is a “…threat to national security?”

NPR can and should do better than David Greene shilling for the Democratic Party.

by Patrick Durusau at August 17, 2016 09:00 PM

The Shadow Brokers: Lifting the Shadows of the NSA’s Equation Group?

The Shadow Brokers: Lifting the Shadows of the NSA’s Equation Group?.

A detailed summary of what is or isn’t known about The Shadow Brokers and the alleged hack of the Equation Group (NSA owned and operated).

The story is being updated at this location so check back for breaking details.

Enjoy!

by Patrick Durusau at August 17, 2016 03:52 PM

WikiLeaks AKP dump contains 80 types of malware (!OutLook)

WikiLeaks AKP dump contains 80 types of malware by Nicky Cappella.

From the post:

The latest WikiLeaks AKP email contains more than 80 types of malware, an independent researcher has confirmed. The malware includes ransomware and remote-access trojans.

WikiLeaks released emails from the Turkish political party AKP in two parts: one in July, and one on August 5. Anti-virus and malware expert Vesselin Bontchev reviewed the content of those emails and published his findings on his GitHub page. Bontchev listed more than 200 individual emails that contain a link to a confirmed malicious attachment.

His report shows a link to infected emails on the WikiLeaks site, the URL for the malware attachment within the email, and a link to a VirusTotal page, showing the way that different anti-virus scanners are reporting the malware. The URL to the malicious attachment has been made unclickable by substituting ‘hxxxxx’ for ‘https’, as the URL listed is a direct link to the malware and a click would result in an immediate download.

A word to the wise I suppose.

You weren’t going to look at a stolen email archive using OutLook were you?

by Patrick Durusau at August 17, 2016 12:34 AM

A Conflict-Free Replicated JSON Datatype

A Conflict-Free Replicated JSON Datatype by Martin Kleppmann, Alastair R. Beresford.

Abstract:

Many applications model their data in a general-purpose storage format such as JSON. This data structure is modified by the application as a result of user input. Such modifications are well understood if performed sequentially on a single copy of the data, but if the data is replicated and modified concurrently on multiple devices, it is unclear what the semantics should be. In this paper we present an algorithm and formal semantics for a JSON data structure that automatically resolves concurrent modifications such that no updates are lost, and such that all replicas converge towards the same state. It supports arbitrarily nested list and map types, which can be modified by insertion, deletion and assignment. The algorithm performs all merging client-side and does not depend on ordering guarantees from the network, making it suitable for deployment on mobile devices with poor network connectivity, in peer-to-peer networks, and in messaging systems with end-to-end encryption.

Not a fast read and I need to think about its claim that JSON supports more complexity than XML. ;-)

Enjoy!

by Patrick Durusau at August 17, 2016 12:25 AM

strace’ing a Clojure process under lein

strace’ing a Clojure process under lein by Tim McCormack.

From the post:

Today I wanted to strace a JVM process to see if it was making network calls, and I discovered a minor roadblock: It was a Clojure program being run using the Leiningen build tool. lein run spawns a JVM subprocess and then exits, and I only wanted to trace that subprocess.

The solution is simple, but worth a post: Tell lein to run a different “java” command that actually wraps a call to java with strace. Here’s how I did it:

For the “…you never do know file…” and because it’s better to know than to assume.

by Patrick Durusau at August 17, 2016 12:12 AM

August 16, 2016

Patrick Durusau

BaseX 8.5.3 Released!

BaseX 8.5.3 Released! (2016/08/15)

BaseX 8.5.3 was released today!

The changelog reads:

VERSION 8.5.3 (August 15, 2016) —————————————-

Minor bug fixes, improved thread-safety.

Still, not a bad idea to upgrade today!

Enjoy!

PS: You do remember that Congress is throwing XML in ever increasing amounts at the internet?

Perhaps in hopes of burying information in angle-bang syntax.

XQuery can help disappoint them.

by Patrick Durusau at August 16, 2016 12:09 AM

August 15, 2016

Patrick Durusau

Hackers Say They Hacked NSA-Linked Group… (Fact or Fantasy?)

Hackers Say They Hacked NSA-Linked Group, Want 1 Million Bitcoins to Share More by Lorenzo Franceschi-Biccierai.

From the post:

A mysterious hacker or hackers going by the name “The Shadow Brokers” claims to have hacked a group linked to the NSA and dumped a bunch of its hacking tools. In a bizarre twist, the hackers are also asking for 1 million bitcoin (around $568 million) in an auction to release more files.

“Attention government sponsors of cyber warfare and those who profit from it!!!!” the hackers wrote in a manifesto posted on Pastebin, on GitHub, and on a dedicated Tumblr. “How much you pay for enemies cyber weapons? […] We find cyber weapons made by creators of stuxnet, duqu, flame.”

The hackers referred to their victims as the Equation Group, a codename for a government hacking group widely believed to be the NSA.

What is the first thing that strikes you as dodgy about this claimed hack?

If you had a hacking weapons from the NSA, wouldn’t you first approach other national governments?

The NSA would still hear about it but the buyers would be doing their best to keep sale and hack secret as well.

Here? The alleged hackers have painted a target on their backs and “chump” on anyone who parts with any bitcoins for a release of the alleged weapons.

The best to hope for is the alleged hackers aren’t prosecuted for fraud as a result of any online auction.

They shouldn’t be. Buying allegedly stolen property and being cheated isn’t a crime, it’s a valuable lesson.

by Patrick Durusau at August 15, 2016 07:11 PM

Simit: A Language for Physical Simulation

Simit: A Language for Physical Simulation by Fredrik Kjolstad, et al.

Abstract:

With existing programming tools, writing high-performance simulation code is labor intensive and requires sacrificing readability and portability. The alternative is to prototype simulations in a high-level language like Matlab, thereby sacrificing performance. The Matlab programming model naturally describes the behavior of an entire physical system using the language of linear algebra. However, simulations also manipulate individual geometric elements, which are best represented using linked data structures like meshes. Translating between the linked data structures and linear algebra comes at significant cost, both to the programmer and to the machine. High-performance implementations avoid the cost by rephrasing the computation in terms of linked or index data structures, leaving the code complicated and monolithic, often increasing its size by an order of magnitude.

In this article, we present Simit, a new language for physical simulations that lets the programmer view the system both as a linked data structure in the form of a hypergraph and as a set of global vectors, matrices, and tensors depending on what is convenient at any given time. Simit provides a novel assembly construct that makes it conceptually easy and computationally efficient to move between the two abstractions. Using the information provided by the assembly construct, the compiler generates efficient in-place computation on the graph. We demonstrate that Simit is easy to use: a Simit program is typically shorter than a Matlab program; that it is high performance: a Simit program running sequentially on a CPU performs comparably to hand-optimized simulations; and that it is portable: Simit programs can be compiled for GPUs with no change to the program, delivering 4 to 20× speedups over our optimized CPU code.

Very deep sledding ahead but consider the contributions:


Simit is the first system that allows the development of physics code that is simultaneously:

Concise. The Simit language has Matlab-like syntax that lets algorithms be implemented in a compact, readable form that closely mirrors their mathematical expression. In addition, Simit matrices assembled from hypergraphs are indexed by hypergraph elements like vertices and edges rather than by raw integers, significantly simplifying indexing code and eliminating bugs.

Expressive. The Simit language consists of linear algebra operations augmented with control flow that let developers implement a wide range of algorithms ranging from finite elements for deformable bodies to cloth simulations and more. Moreover, the powerful hypergraph abstraction allows easy specification of complex geometric data structures.

Fast. The Simit compiler produces high-performance executable code comparable to that of hand-optimized end-to-end libraries and tools, as validated against the state-of-the-art SOFA [Faure et al. 2007] and Vega [Sin et al. 2013] real-time simulation frameworks. Simulations can now be written as easily as a traditional prototype and yet run as fast as a high-performance implementation without manual optimization.

Performance Portable. A Simit program can be compiled to both CPUs and GPUs with no additional programmer effort, while generating efficient code for each architecture. Where Simit delivers performance comparable to hand-optimized CPU code on the same processor, the same simple Simit program delivers roughly an order of magnitude higher performance on a modern GPU in our benchmarks, with no changes to the program.

Interoperable. Simit hypergraphs and program execution are exposed as C++ APIs, so developers can seamlessly integrate with existing C++ programs, algorithms, and libraries.
(emphasis in original)

Additional resources:

http://simit-lang.org/

Getting Started

Simit mailing list

Source code (MIT license)

Enjoy!

by Patrick Durusau at August 15, 2016 02:28 AM

Threat Intelligence Starter Resources

Threat Intelligence Starter Resources by Amanda McKeon.

From the post:

Creating a threat intelligence capability can be a challenging undertaking, and not all companies are ready for it. Businesses that run successful threat intelligence teams generally:

  • Collect externally available data on threats and correlate it with internal events.
  • Be aware of threats driving proactive security controls.
  • Establish proactive internal hunting for unidentified threats.
  • Invest in employee and customer threat education.
  • Expand security industry peer relationships.
  • Apply methods for collecting and analyzing external threat data.

For more information, read our white paper on building an advanced threat intelligence team.

Now, if your company is just starting out with threat intelligence and doesn’t have the time or resources to dedicate an entire department to the task, there are some easy ways to begin integrating threat intelligence into your daily routine without breaking the bank.

The following resources can help build awareness of the threat landscape and prepare your company for defense.

Great starting points for collection of general threat intelligence.

Unfortunately, the elimination of repetition of the same information/reports from different sources, separation of surmises from facts, etc., remain the responsibility of the reader.

by Patrick Durusau at August 15, 2016 02:01 AM

August 14, 2016

Patrick Durusau

noms (decentralized database)

noms

From the webpage:

Noms is a decentralized database based on ideas from Git.

This repository contains two reference implementations of the database—one in Go, and one in JavaScript. It also includes a number of tools and sample applications.

Noms is different from other databases. It is:

  • Content-addressed. If you have some data you want to put into Noms, you don’t have to worry about whether it already exists. Duplicate data is automatically ignored. There is no update, only insert.
  • Append-only. When you commit data to Noms, you aren’t overwriting anything. Instead you’re adding to a historical record. By default, data is never removed from Noms. You can see the entire history of the database, diff any two commits, or rewind to any previous point in time.
  • Strongly-typed. Noms doesn’t have schemas that you design up front. Instead, each version of a Noms database has a type, which is generated automatically as you add data. You can write code against the type of a Noms database, confident that you’ve handled all the cases you need to.
  • Decentralized. If I give you a copy of my database, you and I can modify our copies disconnected from each other, and come back together and merge our changes efficiently and correctly days, weeks, or years later.

Noms is supported on Mac OS X and Linux. Windows usually works, but isn’t officially supported.

I’m taking a chance and adding a category for noms at this point.

I need to install Go 1.6+ at this point.

Not close to prime time but content-addressing and append-only are enough to prompt further investigation.

by Patrick Durusau at August 14, 2016 09:39 PM

Alert! Non-Lobbyists Have Personal Contact For Members Of Congress!

Hacker posts contact information for almost 200 congressional Democrats

Summary: Guccifer 2.0 posted a spreadsheet with the personal contact details of almost 200 Democratic members of Congress.

Sorry, I don’t see why non-lobbyists having personal contact information of members of Congress is a bad thing?

The very thought of non-lobbyists contacting members of Congress provoked frantic activity at WordPress, which promptly disabled Guccifer 2.0 page because of:

receipt of a valid complaint regarding the publication of private information, (WordPress blocks latest Guccifer 2.0 docs

The WordPress model of democracy looks something like this:

wordpress-democracy

I’m not vouching for the donation amounts and/or the amount of access you get for those amounts. It varies from congressional district to district.

Check with your local representative for current prices and access.

If and when you meet with your representative, be sure to ask for their new cellphone number.

by Patrick Durusau at August 14, 2016 05:54 PM

Elementary Category Theory and Some Insightful Examples

Elementary Category Theory and Some Insightful Examples (video)

From the description:

Eddie Grutman
New York Haskell Meetup (http://www.meetup.com/NY-Haskell/events/232382379/)
July 27, 2016

It turns out that much of Haskell can be understood through a branch of mathematics called Category Theory. Concepts such as Functor, Adjoints, Monads and others all have a basis in the Category Theory. In this talk, basic categorical concepts, starting with categories and building through functors, natural transformations, and universality, will be introduced. To illustrate these, some mathematical concepts such as homology and homotopy, monoids and groups will be discussed as well (proofs omitted).

Kudos to the NYC Haskell User’s Group for posting videos of its presentations.

For those of us unable to attend such meetings, these videos are a great way to remain current.

by Patrick Durusau at August 14, 2016 02:33 AM

Twitter Too Busy With Censorship To Care About Abuse

Complaints about Twitter ignoring cases of abuse are quite common, “A Honeypot For Assholes” [How To Monetize Assholes/Abuse]. I may have stumbled on why Twitter “ignores” abuse cases.

Twitter staff aren’t “ignoring” abuse cases, they are too damned busy being ad hoc government censors to handle abuse cases.

Consider: How Israel is trying to enforce gag orders beyond its borders by Michael Schaeffer Omer-Man.

From the post:

Israeli authorities are taking steps to block their own citizens from reading materials published online in other countries, including the United States.

The Israeli State Attorney’s Office Cyber Division has sent numerous take-down requests to Twitter and other media platforms in recent months, demanding that they remove certain content, or block Israeli users from viewing it.

In an email viewed by +972, dated August 2, 2016, Twitter’s legal department notified American blogger Richard Silverstein that the Israeli State Attorney claimed a tweet of his violates Israeli law. The tweet in question had been published 76 days earlier, on May 18. Silverstein has in the past broken stories that Israeli journalists have been unable to report due to gag orders, including the Anat Kamm case.

Without demanding that he take any specific action, Twitter asked Silverstein to let its lawyers know, “if you decide to voluntarily remove the content.” The American blogger, who says he has not stepped foot in any Israeli jurisdiction for two decades, refused, noting that he is not bound by Israeli law. Twitter is based in California.

Two days later, Twitter sent Silverstein a follow-up email, informing him that it was now blocking Israeli users from viewing the tweet in question. Or in Twitter-talk, “In accordance with applicable law and our policies, Twitter is now withholding the following Tweet(s) in Israel.”

It’s no wonder Twitter lacks the time and resources to think of robust solutions that enable free speech and at the same time, protects users who aren’t interested in listening to the free speech of certain others.

Both rights are equally important but Twitter has its hands full responding in an ad hoc fashion to unreasonable demands.

Adopt a policy of delivering any content, anywhere, from any author and empower users to choose what they see.

The seething ball of lawyers, which add no value for Twitter or its users, will suddenly melt away.

No issues to debate.

Governments block content on their own or they don’t.

Users block content on their own or they don’t.

BTW, 972mag.com needs your financial support to keep up this type of reporting. If you are having a good month, keep them in mind.

by Patrick Durusau at August 14, 2016 02:29 AM

August 13, 2016

Patrick Durusau

Twitter Censor Strikes Again (and again, and again)

Twitter censors accounts for reasons known only to itself, but in the case, truth telling is one obvious trigger for Twitter censorship:

twitter-censors-again-460

Twitter censors accounts every day that don’t make the news and those are just as serious violations of free speech as this instance.

Twitter could trivially empower users to have free speech and the equally important right to not listen but also for reasons known only to Twitter, has chosen not to do so.

Free speech and the right to not listen are equally important.

What’s so difficult to understand about that?

by Patrick Durusau at August 13, 2016 08:54 PM

August 12, 2016

Patrick Durusau

Atlanta Hack Opens 1.2 Billion Vehicles

The reports of a wireless hack that can open 100 million Volkswagens are impressive:

vw-hack-460

Car Thieves Can Unlock 100 Million Volkswagens With A Simple Wireless Hack by Swati Khandelwal.

But I wanted to point out an Atlanta hack that opens the estimated 1.2 billion vehicles in the world.

640px-Brick

To be complete, here is film footage of this hack in action:

Both the wireless opening hack and the Atlanta hack require additional effort to drive the opened car away.

On securing your car, see: Simple hack unlocks 100 million Volkswagen vehicles – Simple Absolute Defense.

PS: Contrast the estimated $40 cost of an Arduino-based RF Transceiver (from Swati’s post) plus technical expertise with the $0.00 cost of the Atlanta hack and lack of technical expertise. Which do you think will be more widespread?

by Patrick Durusau at August 12, 2016 03:46 PM

Government Toadies Target “Propaganda”

Sam Schechner gives a “heads up” in Tech Giants Target Terrorist Propaganda to plans by tech companies to counter “propaganda.”

From the post:

Nearly half a million teenagers and young adults who had posted content with terms like “sharia” or “mujahideen” began last fall seeing a series of animated videos pop up on their Facebook news feeds.

In one, cartoon figures with guns appear underneath an Islamic State flag. “Do not be confused by what extremists say, that you must reject the new world. You don’t need to pick,” the narrator says. “Remember, peace up. Extremist thinking out.”

The videos are part of three experiments—funded by Google parent Alphabet Inc., with help from Facebook Inc. and Twitter Inc.—that explore how to use the machinery of online advertising to counterbalance the growing wave of extremist propaganda on the internet, both from Islamist radicals and far-right groups.

The goal: See what kinds of messages and targeting could reach potential extremists before they become radicalized—and then quickly roll the model out to content producers across the internet.

The study, detailed in a report set to be published Monday by London-based think tank Institute for Strategic Dialogue, is a step toward understanding what techniques work, said Yasmin Green, who heads the counter-radicalization efforts at Jigsaw, the Alphabet unit formerly known as Google Ideas.

Sam never gives you the link to the report from the “London-based think tank Institute for Strategic Dialogue,” which you can find at: The Impact of Counter-Narratives.

Which might lead you to discover another August 2016 publication: “Shooting in the right direction”: Anti-ISIS Foreign Fighters in Syria and Iraq, a study on recruitment and facilitating the use of anti-ISIS foreign fighters in Syria and Iraq.

The Institute for Strategic Dialogue (ISD) would be better named “Institute for Strategic Propaganda.

It isn’t “propaganda” that the ISD seeks to counter but the choice of particular propaganda.

A simple count of the lives of Arabs blighted or ended by the Western Powers since 9/11 (just to pick a well known starting point), will leave you wondering who are the terrorists in this “conflict?”

If that weren’t enough disappointment, Google, Facebook and others are enabling this foolish effort by not demanding payment for their work. The lack of budget busting expenses encourages governments to act irresponsibly.

by Patrick Durusau at August 12, 2016 03:16 PM

August 11, 2016

Patrick Durusau

Eduard Imhof – Swiss Cartographer (Video)

Eduard Imhof – Swiss Cartographer

A tv documentary on the Swiss cartographer Eduard Imhof.

In Swiss German but this English sub-title caught my eye:

But what can be extracted again from the map is also important.

A concern that should be voiced with attractive but complex visualizations.

The production of topographical maps at differing scales is a recurring theme in the video.

How to visualize knowledge at different scales is an open question. Not to mention an important one as more data becomes available for visualization.

Imhof tells a number of amusing anecdotes, including answering the question: Which two cantons in Switzerland have the highest density of pigs?

Enjoy!

For background:

Virtual Library Eduard Imhof

Eduard Imhof (1895-1986) was professor of cartography at the Swiss Federal Institute of Technology Zurich from 1925 – 1965. His fame far beyond the Institute of Technology was based on his school maps and atlases. In 1995 it was 100 years since his birthday. On this occasion several exhibitions celebrated his life and work, among others in Zurich, Bern, Bad Ragaz, Küsnacht/ZH, Barcelona, Karlsruhe and Berlin. The last such exhibition took place in summer 1997 in the Graphische Sammlung of the ETH. There it was possible to show a large number of maps and pictures in the original. At the conclusion of the exhibition Imhof’s family bequested his original works to the ETH-Bibliothek Zurich. Mrs. Viola Imhof, the widow of Eduard Imhof, being very much attached to his work, had a major part in making it accessible to the public.

Imhof wie ein Kartographische Rockstar

Eduard Imhof was born in Schiers on 25 Jan 1895 to the geographer Dr. Eduard Imhof and his wife Sophie.1 At the age of 19 he enrolled in ETH Zürich,2 and after several interruptions for military service, was awarded a geodesist/surveyor diploma in 1919.

He returned to ETH as an assistant to his mentor Prof. Fridolin Becker, himself a cartographic god widely viewed as the inventor of the Swiss style shaded relief map.3 In 1925, the year after Becker’s death, Imhof became an assistent professor and founded the Kartographische Institut (Institute of Cartography). Although the Institute was initially little more than a hand-painted sign above his small office, it was nevertheless the first of its kind in the world.

In 1925 he produced his first major work – the Schulkarte der Schweiz 1:500 000 (the School map of Switzerland). Over the years he would update the national school map several times as well as produce school maps for nearly half of the cantons in the Federation. He even did the school map for the Austrian Bundesländer of Vorarlberg. (footnotes omitted)

by Patrick Durusau at August 11, 2016 08:49 PM

“A Honeypot For Assholes” [How To Monetize Assholes/Abuse]

“A Honeypot For Assholes”: Inside Twitter’s 10-Year Failure To Stop Harassment by Charlie Warzel.

From the post:

For nearly its entire existence, Twitter has not just tolerated abuse and hate speech, it’s virtually been optimized to accommodate it. With public backlash at an all-time high and growth stagnating, what is the platform that declared itself “the free speech wing of the free speech party” to do? BuzzFeed News talks to the people who’ve been trying to figure this out for a decade.

Warzel’s 6,000 word (5966 by my count) ramble uses “abuse” without ever defining the term. Nor do any of the people quoted in his post. But, like Justice Stewart, they “know it when they see it.”

One of the dangers Warzel’s post is every reader will insert their definition of “abuse.” Hard to find people who disagree that “abuse as they define it” should be blocked by Twitter.

All of Warzel’s examples are “abuse” (IMHO) but even so, I don’t support Twitter blocking that content from being posted. I emphasize posted because being posted on Twitter doesn’t obligate any user to read the content.

I don’t support Twitter censorship of any account, for any reason. Four Horsemen Of Internet Censorship + One.

If Twitter doesn’t block content, then how do to deal with “abuse?”

Why not monetize the blocking of assholes and abuse?

Imagine a Twitter client/app that:

  1. Maintains a list of people blocked not only by a user but allowed a user to subscribe to block lists of any other user.
  2. Employed stop lists, regexes, neural networks to filter tweets from people who have not been blocked.
  3. Neural networks trained on collections of “dick pics” and other offensive content to filter visual content as well.

Every user can have a customized definition of “abuse” for their own feed. Without impinging on the definitions of “abuse” of other users.

Twitter clients to support such filtering options are already in place. TweetDeck Versus Hootsuite – The Essential Guide discusses two popular clients. There are hundreds of others, both web and smart phone based.

Circling the question: “Why isn’t Twitter using my personal definition of “abuse” to protect me for free?” generates a lot of discussion, but no viable solutions.

Monetizing filtering of assholes and abuse, resources available in vast quantities, protects both free speech and freedom from unwanted speech.

The only useful question on Twitter abuse is the price point to set for avoiding X amount of abuse?

Yes?

by Patrick Durusau at August 11, 2016 07:33 PM

Simple hack unlocks 100 million Volkswagen vehicles – Simple Absolute Defense

Simple hack unlocks 100 million Volkswagen vehicles by Patrick Howell O’Neill.

From the post:

Some 100 million Volkswagens are vulnerable to hackers who discovered key vulnerabilities that allow them to unlock the doors of the most popular cars on earth, according to a new research paper first reported by Wired.

University of Birmingham computer scientist Flavio Garcia was already widely known for working with colleagues to find major security flaws in Volkswagens last year that enabled hackers to quickly takeover a keyless car.

The new attack could result in the theft of anything kept in a car.

When you put the two attacks together, you have a recipe for getting into and driving off with a stolen car in less than 60 seconds—Nic Cage-caliber grand theft auto.

Actually, you don’t need to be as good as Nic Cage at all. A thief can pull this off with cheap equipment like a TI Chronos smart watch.

In the interest of “responsible” disclosure, you will have to reconstruct some of the research for yourself.

There is a simple and absolute defense to this hack:

640px-Denver_boot-460

You can order one of these starting at $239.00.

Compared to the aggravation of having your Volkswagen stolen?

Thieves will pick an easier target.

(Be innovative in your security thinking.)

by Patrick Durusau at August 11, 2016 02:30 AM