dimarts, 14 d’abril del 2015

14 abr'15 sessió BgDt al PCB

Xavier Rafael, tècnic  a Barcelona Digital, explica per enèsima el concepte clàssic VVV de Big data, i comenta els seus treballs amb holters 24h, per detectar automaticament anomalies, amb imatges RX per patrons normals, i amb teleassistència per amb senzills sensors detectar canvis a hàbits de gent gran.

Carme Peirò, periodista, explica que és el periodisme de dades, i mostra exemples.  Mirar la secció Big pharma big files, de Dollars for docs a www.projectspropublica.org
Es veu els $$$$ entregats i per quin concepte, de  empreses farmacèutiques a entitats USA.

Arcadi Navarro, genetista, explica que son les dades òmiques, i comenta alguns arxius oberts
www.ncbi.nlm.nih.gov/gap
www.ebi.ac.uk/ega

Ramon Maspons, disculpa a JM Argimón que està parlant de VISC+ a TV3, i diu que al no deixar avançar el model inicial public-privat, s'ha retallat el projecte que ha passat de ser de competitivitat  ser de seguretat (?). Entre les moltes limitacions, diu que no es permetrà reuar les dades que facilitin amb altres fonts, i a la meva pregunta de si vol dir que no podem fer l'estudi de dades MPOC del cmbd-ah, amb les dades ambiental del meteocat per aguditzacions, diu que no........

Mirar www reactome
Visualitzacions impactants de hans Rosling al TED
...........


dilluns, 13 d’abril del 2015

data science free-lance ?????

I've been working as a data-scientist for years without ever hearing it labelled as such. Working as a programmer with a love of R&D I've been able to imagineer data-science strategies to find the solution to most of the challenges involved in decisions/strategy/identified problems. I've engaged in freelance arrangements with marketing companies and agencies for years because I developed my own tools to do data-mining quickly years before it was even common practice to involve in small business. I've been working as primarily a developer of software and websites. Being around such businesses that generate lots of data in a digital market over 16 years, case study examples I can initially think of would include:
  • generating marketing databases
  • monitoring competitor product price changes
  • providing tracking information on the search engine rankings on target keywords for companies and their competitors (along with using actual search engine user histories to inform on low-competitor keywords and popularity etc) - the usual seo thing everyone is doing now.
  • tacking organised crime gangs buying high-value goods with stolen identities and credit cards - the card operators forcing a charge-back to the retailer who have to take the loss of the stock. After complaining to government channels about how retailers are taking the cost of the weaknesses of a credit card product and not seeing police help - I R&D better ways for my retail clients to automatically have a new layer of risk-assessment in their ordering systems to flag suspicious looking orders (you'd be surprised what the best metric I've found for this)
  • building customer profiles from their behaviour patterns and past purchases to customise promotional e-mails - each customer getting a unique email. In the case of a fashion retailer - they got  the promotion timed just after their payday when they are usually spending disposable income, pushing colours and brands and styles they like, in the price range they like, seasonal/weather sensitivity of product selections and only stock available in their size.
  • profiling the survivability of the current and potential clients for a business to business provider of an IT software solution. This included pulling in public accounting records to get profit performances, crossing this with the targets local population for their own local market size buying their products, distance to competitors, likely age group of their customers. This goes towards identifying the VIP or low-hanging clients to strategically target amongst agressive competition focusing on the same small market of clients in an industry set to shrink to a point not sustainable for all competing suppliers - helping to identify which are the right customers to end up holding.
  • product design - even just graphic design has been a big R&D/analytical area for data-science skills - conversion rate improvement.
  • alot of research goes into offering an evidence-driven marketing strategy - good research guarantees good results, I do more than necessary - but you are only as good as your last project right?
I do not have great stage-skills or sales abilities so 90% of the time these data-science ideas and concepts were not understood by the non-technical clients and it was entertained purely on trust I'd established customers from previous work. I might have originally only helped tweak a pay-per-click campaign -  It's amazing how much interesting work I've found from a relationship that started taking on some short, low paid very dull work. I've always bumped into clients challenges and problems where I can use data-science approaches by accident. Usually from over-hearing staff chatter or asking why a client doesn't sound as happy as they usually are - I sleep on it and then start firing solutions and ideas to help. Rarely have I been contacted by anyone looking for data-science approaches, skills or solutions to their problems. Now I'm actively re-starting my own freelance business to fund my studies to focus my future in medical research tools - my biggest challenge is figuring out how non-technical people who have never heard of these strategies would actually describe it if they were seeking such assistance? None of the freelancer market places have entertained my request to buy search data from them on the freelancer searches they see, or which searches do not find any matches - these are the gold nuggets we both need!

diumenge, 12 d’abril del 2015

Clases magistrals ( blog Big Data Science)

4 Ways Big Data Is Transforming Healthcare

It’s hard to think of a more worthwhile use for big data than saving lives – and around the world the healthcare industry is finding more ways to do that every day.
From predicting epidemics to curing cancer and making staying in hospital a more pleasant experience, big data is proving invaluable to improving outcomes.
This is very good news indeed – as the cost of caring has skyrocketed in recent years and is expected to continue to do so as the population ages – to the point where we could be headed for serious trouble.
I’ve spoken before about the hospital unit which found it could detect infections in newborns 24 hours before symptoms showed, by monitoring a live stream of heartbeats and breathing patterns.
And I’ve also mentioned Google’s (disputed but interesting) claims that it could detect outbreaks of flu more accurately than standard prediction methods by monitoring search activity.
But these are just the tip of the iceberg in an industry which generates mountains of data across every area of its operations.
In fact last year a survey by IDC Health Insights found that 50% of the hospitals and healthcare insurers put increasing their analytics capabilities as their top priority for investment over the next year.
And the body of medical literature from which further research evolves continues to grow every day – with an estimated one million records per year added to Medline, the online repository of scientific studies related to medicine.
Efficiency is the great driver here – with the cost of healthcare in the US currently standing at around 18% of GDP and forecast to rise, payment models are changing. While traditionally providers have been paid according to number of patients they treat, a move towards payment based on results and quality of treatment is taking place. These more complex metrics require more data and a different analytical skill set, rather than simply counting the number of patients coming through the door.
McKinsey & Company compiled a report for the Center for US Health System Reform which identified four main sources of big data in the healthcare industry.
They are:
Activity (claims) and cost data.
These are the basic figures showing the amount of care which has been supplied by providers in the system, and the cost of paying for that care. Analysis of this tells us about the spread of diseases, and the priority that should be given to dealing with specific health threats. The most cost-effective treatments for specific ailments can be identified and the number of duplicate or unnecessary treatments can be significantly reduced. In the United States, Methodist Health System has used a tool which analyses Medicare claims data to highlight groups and individuals who may need expensive care in the future, allowing for less costly preventative action at an early stage.
Clinical data
These include patient medical records and images gathered during examinations or procedures, as well as doctors’ notes. For example, the Carilion Clinic, in Virginia, says it used natural language processing algorithms to analyse 350,000 patient records, identifying 8,500 people at risk of heart problems. Similarly, the American Medical Association reported that analysis of patient records found only 26% of children who had recorded three high blood pressure readings at separate visits to their doctors had been diagnosed as suffering hypertension – highlighting a significant number of failures to spot the condition.
Pharmaceutical R&D data
Over the last few years a large number of partnerships have sprung up between pharmaceutical companies – as if they have suddenly become aware of the huge benefits of pooling their knowledge. In the US major firms such as Pfizer and Novartis pool their data from trials into the clinicaltrials.gov website. And in the UK GlaxoSmithKline recently unveiled its partnership with the SAS Institute which aims to increase collaboration based on data from clinical trials. Suitable candidates can be found for trials more effectively by looking into lifestyle information. And comparison of data from multiple trials can throw up surprising results which can lead to new breakthroughs. For example the antidepressant desipramine is being trialled for its potential to destroy cancer cells in patients with small cell lung cancer.
Patient behaviour and sentiment data
This is data from over-the-counter drug sales combined with the latest “wearables” which monitor your activity and heart rates, patient experience and customer satisfaction surveys as well as the vast amount of unstructured information about our lifestyles broadcast every day over social media. At the moment wearable devices are mainly used for personal fitness, but this is set to change – spending on bringing this information from smart watches, wrist bands, running shoes and other wearables is expected to reach $52 million by 2019, according to a study by ABI Research. Services such as ginger.io already allow care providers to monitor their patients through sensor-based applications on their smartphones. And Proteus manufacture an “ingestible” scanner the size of a grain of sand, which can be used to track when and how patients are taking their medication. This gives providers information about “compliance rates” – how often patients follow their doctor’s orders – and can even alert a family member to remind them.
 Of course with medical matters patient privacy is always high priority, and big data brings big challenges in this respect. How insurance companies will act on the vast increase in information about our lives that they are able to glean is a concern – will we see individuals turned down for cover because their running shoes have snitched that they are lazy?
It is plain to see that there are huge benefits to be had from analyzing the data about our health that is out there. The mantra of “prevention is better than cure” has led to a focus on predicting problems in the early stages when they are easier to treat, and outbreaks can be more easily contained.
For example, Global Viral monitors data sources including a network of “listening posts” across Africa and Asia, as well as social media chatter, to detect the spread of disease from wildlife to humans – considered to be the source of 75% of diseases which are harmful to human health.
In the future we are likely to recover more quickly from illness and injury, and we will live longer. New drugs will come into existence and our hospitals and surgeries will operate more efficiently – all thanks to big data

dilluns, 6 d’abril del 2015

Dades obertes (blog ALCAIDE)

Las características principales de los datos abiertos son:
  • Disponibilidad y acceso: los datos deben estar disponibles en su conjunto, preferentemente mediante la descarga a través de Internet y  estar disponibles en una forma conveniente y modificable.
  • Reutilización y redistribución: los datos deben ser proporcionados bajo términos que permitan la reutilización y redistribución incluyendo el entremezclado con otros conjuntos de datos.
  • Participación universal: cada uno debe ser capaz de utilizar, reutilizar y redistribuir – no debe haber discriminación por  campos de la actividad o en contra de personas o grupos.
¿Qué tipos de datos abiertos hay? 
 Hay muchos tipos de datos abiertos , pero podríamos clasificarlos en función de los usos y aplicaciones potenciales, como por  ejemplo:
Cultural: Los datos sobre las obras y bienes culturales ,por ejemplo títulos y autore,   y en general los de galerías, bibliotecas, archivos y museos.
Ciencia: Los datos que se producen como parte de la investigación científica desde la astronomía hasta la zoología.
Finanzas: Los datos tales como las cuentas del gobierno (gastos e ingresos) y la información sobre los mercados financieros (acciones, acciones, bonos, etc.)
Estadísticas: Los datos producidos por las oficinas de estadística, tales como el censo y los indicadores socioeconómicos clave.
Tiempo: Los muchos tipos de información que se utilizan para comprender y predecir el tiempo y el clima.
Medio ambiente: La información relacionada con el medio ambiente natural, la presencia y el nivel de contaminantes, la calidad  de los ríos y los mares.
Transporte: Los datos tales como horarios, rutas, estadísticas en tiempo.

divendres, 3 d’abril del 2015