Let’s start with a moment of silence for O’Reilly Author Toby Segaran, who passed away on August 11, 2021. Toby was one of the people who got the Data Science movement started. His book, Programming Collective Intelligence, taught many how to start using their data. Throughout his career, he mentored many, and was particularly influential in mentoring young women interested in science and technology. Toby is greatly missed by everyone in the Data Science community.
AI and Data
- Margaret Mitchell joins HuggingFace to create tools to help build fair algorithms.
- Embedded Machine Learning for Hard Hat Detection is an interesting real-world application of AI on the edge. Wearing hard hats is essential to work site safety; this project developed a model for detecting whether workers were wearing hard hats that could easily be deployed without network connectivity. It also goes into rebalancing datasets–in this case, public datasets with too few hard hats, but this technique is applicable to other instances of bias.
- Liquid Neural Networks are neural networks that can adapt in real time to incoming data. They are particularly useful for time series data–which, as the author points out, is almost all data.
- US Government agencies plan to increase their use of facial recognition, in many cases for law enforcement, despite well-known accuracy problems for minorities and women. Local bans on face recognition cannot prohibit federal use.
- Data and Politics is an ongoing research project that studies how political organizations are collecting and using data.
- FaunaDB is a distributed document database designed for serverless architectures. It comes with REST API support, GraphQL, built-in attribute based access control, and a lot of other great features.
- Facial expression recognition is being added to a future version of Android as part of their accessibility package. Developers can create applications where expressions (smiles, etc.) can be used as commands.
- Open AI’s Codex (the technology behind Copilot) takes the next step: translating English into runnable code, rather than making suggestions. Codex is now in private beta.
- Who is responsible for publicly available datasets, and how do you ensure that they’re used appropriately? Margaret Mitchell suggests organizations for data stewardship. These would curate, maintain, and enforce legal standards for the use of public data.
- An AI system can predict race accurately based purely on medical images, with no other information about the subject. This creates huge concerns about how bias could enter AI-driven diagnostics; but it also raises the possibility that we might discover better treatments for minorities who are underserved (or badly served) by the medical industry.
- DeepMind has made progress in building a generalizable AI: AI agents that can solve problems that they have never seen before, and transfer learning from one problem to another. They have developed XLand, an environment that creates games and problems, to enable this research.
- GPT-J is one of a number of open source alternatives to Github Copilot. It is smaller and faster, and appears to be at least as good.
- “Master faces” are images generated by adversarial neural networks that are capable of passing facial recognition tests without corresponding to any specific face.
- Researchers have created a 3D map of a small part of a mouse’s brain. This is the most detailed map of how neurons connect that has ever been made. The map contains 75,000 neurons and 523 million synapses; the map and the data set have been released to the public.
- Robotic chameleons (or chameleon robotics): Researchers have developed a robotic “skin” that can change color in real time to match its surroundings.
- Elon Musk announces that Tesla will release a humanoid robot next year; it will be capable of performing tasks like going to the store. Is this real, or just a distraction from investigations into the safety of Tesla’s autonomous driving software?
- According to the UN, lethal autonomous robots (robots capable of detecting and attacking a target without human intervention) have been deployed and used by the Libyan government.
- A new generation of warehouse robots is capable of simple manipulation (picking up and boxing objects); robots capable of more fine-grained manipulation are coming.
- The end of passwords draws even closer. GitHub is now requiring 2-factor authentication, preferably using WebAuthn or Yubikey. Amazon will be giving free USB authentication keys to some customers (root account owners spending over $100/month).
- There are many vulnerabilities in charging systems for electric vehicles. This is sad, but not surprising: the automotive industry hasn’t learned from the problems of IoT security.
- Advances in cryptography may make it more efficient to do computation without decrypting encrypted data.
- Amazon is offering store credit to people who give them their palm prints, for use in biometric checkout at their brick-and-mortar stores.
- Amazon, Google, Microsoft, and others join the US Joint Cyber Defense Collaborative to fight the spread of ransomware.
- Apple will be scanning iPhones for images of child abuse. Child abuse aside, this decision raises questions about cryptographic backdoors for government agencies and Apple’s long-standing marketing of privacy. If they can monitor for one thing, they can monitor for others, and can presumably be legally forced to do so.
- Automating incident response: self-healing auto-remediation could be the next step in automating all the things, building more reliable systems, and eliminating the 3AM pager.
- Hearables are very small computers, worn in the ear, for which the only interface is a microphone, a speaker, and a network. They may have applications in education, music, real time translation (like Babelfish), and of course, next-generation hearing aids.
- Timekeeping is an old and well-recognized problem in distributed computing. Facebook’s Time cards are an open-source (code and hardware) solution for accurate time keeping. The cards are PCIe bus cards (PC standard) and incorporate a satellite receiver and an atomic clock.
- A new cellular board for IoT from Ray Ozzie’s company Blues Wireless is a very interesting product. It is easy to program (JSON in and out), interfaces easily to Raspberry Pi and other systems, and $49 includes 10 years of cellular connectivity.
- Researchers are using Google Trends data to identify COVID symptoms as a proxy for hospital data, since hospital data isn’t publicly available. The key is distinguishing between flu-like flu symptoms and flu-like COVID symptoms.
- A topic-based approach to targeted advertising may be Google’s new alternative to tracking cookies, replacing the idea of assigning users to cohorts with similar behavior.
- Facebook shares a little information about what’s most widely viewed on their network. It only covers the top 20 URLs and, given Facebook’s attempts to shut down researchers studying their behavior, qualifies as transparency theater rather than substance.
- As an experiment, Twitter is allowing certain users to mark misleading content. They have not (and presumably won’t) specified how to become one of these users. The information they gain won’t be used directly for blocking misinformation, but to study how it propagates.
- Banning as a service: It’s now possible to hire a company to get someone banned from Instagram and other social media. Not surprisingly, these organizations may be connected to organizations that specialize in restoring banned accounts.
- Facebook may be researching ways to use some combination of AI and homomorphic encryption to place targeted ads on encrypted messages without decrypting them.
- Inspired by the security community and bug bounties, Twitter offers a bounty to people who discover algorithmic bias.
- Facebook’s virtual reality workrooms could transform remote meetings by putting all the participants in a single VR conference room–assuming that all the participants are willing to wear goggles.
- A survey shows that 70% of employees would prefer to work at home, even if it costs them in benefits, including vacation time and salaries. Eliminating the commute adds up.
- Sky computing–the next step towards true utility computing–is essentially what we now call “multi cloud,” but with an inter-cloud layer that provides interoperability between cloud providers.
- Thoughts on the future of the data stack as data starts to take advantage of cloud: how do organizations get beyond “lift and shift” and other early approaches to use clouds effectively?
- Matrix is another protocol for decentralized messaging (similar in concept to Scuttlebutt) that appears to be getting some enterprise traction.
- Using federated learning to build decentralized intelligent wireless communications systems that predict traffic patterns to help traffic management may be part of 6G.
- How do you scale intelligence at the edge of the network? APIs, industrially hardened Linux systems, and Kubernetes adapted to small systems (e.g., K3S).
- The EU is considering a law that would require cryptocurrency transactions to be traceable. An EU-wide authority to prevent money laundering would have authority over cryptocurrencies.
- Autocorrect errors in Excel are a problem in genomics: autocorrect modifies gene names, which are frequently “corrected” to dates.
- Google may have created the first time crystals in a quantum computer. Time crystals are a theoretical construct that has a structure that constantly changes but repeats over time, without requiring additional energy.